0% found this document useful (0 votes)
653 views

Database Management System (DBMS) Notes

Database Management System Notes For B.E. Students. This notes contain two units of database management system syllabus.

Uploaded by

VivekKhandelwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
653 views

Database Management System (DBMS) Notes

Database Management System Notes For B.E. Students. This notes contain two units of database management system syllabus.

Uploaded by

VivekKhandelwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 57

UNIT-1

Database
A database is a collection of information that is organized so that
it can be easily accessed, managed and updated.

Data is organized into rows, columns and tables, and it is indexed


to make it easier to find relevant information. Data gets updated,
expanded and deleted as new information is added. Databases
process workloads to create and update themselves, querying the
data they contain and running applications against it. Computer
databases typically contain aggregations of data records or files,
such as sales transactions, product catalogs and inventories, and
customer profiles.

Database Management System


A database management system (DBMS) is system software for
creating and managing databases. The DBMS provides users and
programmers with a systematic way to create, retrieve, update
and manage data. A DBMS makes it possible for end users to
create, read, update and delete data in a database. The DBMS
essentially serves as an interface between the database and end
users or application programs, ensuring that data is consistently
organized and remains easily accessible.

Components of Database System


The database system can be divided into four components.

 Users : Users may be of various type such as DB administrator, System

developer and End users.

 Database application : Database application may be Personal, Departmental,

Enterprise and Internal


 DBMS : Software that allow users to define, create and manages database

access, Ex: MySql, Oracle etc.

 Database : Collection of logical data.

Characteristics
Traditionally, data was organized in file formats. DBMS was a new
concept then, and all the research was done to make it overcome the
deficiencies in traditional style of data management. A modern DBMS
has the following characteristics −


Real-world entity − A modern DBMS is more realistic and uses real-world
entities to design its architecture. It uses the behavior and attributes too.
For example, a school database may use students as an entity and their age
as an attribute.
 Relation-based tables − DBMS allows entities and relations among them
to form tables. A user can understand the architecture of a database just by
looking at the table names.
 Isolation of data and application − A database system is entirely different
than its data. A database is an active entity, whereas data is said to be
passive, on which the database works and organizes. DBMS also stores
metadata, which is data about data, to ease its own process.
 Less redundancy − DBMS follows the rules of normalization, which splits a
relation when any of its attributes is having redundancy in values.
Normalization is a mathematically rich and scientific process that reduces
data redundancy.
 Consistency − Consistency is a state where every relation in a database
remains consistent. There exist methods and techniques, which can detect
attempt of leaving database in inconsistent state. A DBMS can provide
greater consistency as compared to earlier forms of data storing
applications like file-processing systems.
 Query Language − DBMS is equipped with query language, which makes it
more efficient to retrieve and manipulate data. A user can apply as many
and as different filtering options as required to retrieve a set of data.
Traditionally it was not possible where file-processing system was used.
 ACID Properties − DBMS follows the concepts
of Atomicity, Consistency, Isolation, and Durability (normally shortened as
ACID). These concepts are applied on transactions, which manipulate data
in a database. ACID properties help the database stay healthy in multi-
transactional environments and in case of failure.
 Multiuser and Concurrent Access − DBMS supports multi-user
environment and allows them to access and manipulate data in parallel.
Though there are restrictions on transactions when users attempt to handle
the same data item, but users are always unaware of them.
 Multiple views − DBMS offers multiple views for different users. A user
who is in the Sales department will have a different view of database than a
person working in the Production department. This feature enables the
users to have a concentrate view of the database according to their
requirements.
 Security − Features like multiple views offer security to some extent where
users are unable to access data of other users and departments. DBMS
offers methods to impose constraints while entering data into the database
and retrieving the same at a later stage. DBMS offers many different levels
of security features, which enables multiple users to have different views
with different features. For example, a user in the Sales department cannot
see the data that belongs to the Purchase department. Additionally, it can
also be managed how much data of the Sales department should be
displayed to the user. Since a DBMS is not saved on the disk as traditional
file systems, it is very hard for miscreants to break the code.

Users
A typical DBMS has users with different rights and permissions who use
it for different purposes. Some users retrieve data and some back it up.
The users of a DBMS can be broadly categorized as follows −

 Administrators − Administrators maintain the DBMS and are responsible


for administrating the database. They are responsible to look after its usage
and by whom it should be used. They create access profiles for users and
apply limitations to maintain isolation and force security. Administrators
also look after DBMS resources like system license, required tools, and
other software and hardware related maintenance.
 Designers − Designers are the group of people who actually work on the
designing part of the database. They keep a close watch on what data
should be kept and in what format. They identify and design the whole set of
entities, relations, constraints, and views.
 End Users − End users are those who actually reap the benefits of having a
DBMS. End users can range from simple viewers who pay attention to the
logs or market rates to sophisticated users such as business analysts.

Advantages of DBMS
The database management system has a number of advantages as compared to
traditional computer file-based processing approach. The DBA must keep in
mind these benefits or capabilities during databases and monitoring the
DBMS.The Main advantages of DBMS are described below.

 Controlling Data Redundancy


In non-database systems each application program has its own private files. In
this case, the duplicated copies of the same data is created in many places. In
DBMS, all data of an organization is integrated into a single database file. The
data is recorded in only one place in the database and it is not duplicated.

 Sharing of Data
In DBMS, data can be shared by authorized users of the organization. The
database administrator manages the data and gives rights to users to access the
data. Many users can be authorized to access the same piece of
information simultaneously. The remote users can also share same data.
Similarly, the data of same database can be shared between different
application programs.

 Data Consistency
By controlling the data redundancy, the data consistency is obtained. If a data
item appears only once, any update to its value has to be performed only once
and the updated value is immediately available to all users. If the DBMS has
controlled redundancy, the database system enforces consistency.

 Integration of Data
In Database management system, data in database is stored in tables. A single
database contains multiple tables and relationships can be created between
tables (or associated data entities). This makes easy to retrieve and update
data.
 Integration Constraints
Integrity constraints or consistency rules can be applied to database so that the
correct data can be entered into database. The constraints may be applied to
data item within a single record or the may be applied to relationships between
records.
 Data Security
Form is very important object of DBMS. You can create forms very easily and
quickly in DBMS. Once a form is created, it can be used many times and it can
be modified very easily. The created forms are also saved along with database
and behave like a software component. A form provides very easy way (user-
friendly) to enter data into database, edit data and display data from database.
The non-technical users can also perform various operations on database
through forms without going into technical details of a fatabase.
 Report Writers
Most of the DBMSs provide the report writer tools used to create reports. The
users can create very easily and quickly. Once a report is created, it can be
used may times and it can be modified very easily. The created reports are also
saved along with database and behave like a software component.
 Control Over Concurrency
In a computer file-based system, if two users are allowed to access data
simultaneously, it is possible that they will interfere with each other. For
example, if both users attempt to perform update operation on the same record,
then one may overwrite the values recorded by the other. Most database
management systems have sub-systems to control the concurrency so that
transactions are always recorded with accuracy.
 Backup and Recovery Procedures
In a computer file-based system, the user creates the backup of data regularly
to protect the valuable data from damage due to failures to the computer
system or application program. It is very time consuming method, if amount of
data is large. Most of the DBMSs provide the 'backup and recovery' sub-systems
that automatically create the backup of data and restore data if required.
 Data Independence
The separation of data structure of database from the application program that
uses the data is called data independence. In DBMS, you can easily change the
structure of database without modifying the application program.

Disadvantages of DBMS
The disadvantages of the database approach are summarized as follows:

1. Cost

DBMS requires high initial investment for hardware, software and


trained staff. A significant investment based upon size and functionality
of organization if required. Also organization has to pay concurrent
annual maintenance cost.

2. Complexity

A DBMS fulfill lots of requirement and it solves many problems related to


database. But all these functionality has made DBMS an extremely
complex software. Developer, designer, DBA and End user of database
must have complete skills if they want to use it properly. If they don’t
understand this complex system then it may cause loss of data or
database failure.

3. Technical staff requirement

Any organization have many employees working for it and they can
perform many others tasks too that are not in their domain but it is not
easy for them to work on DBMS. A team of technical staff is required who
understand DBMS and company have to pay handsome salary to them
too.

4. Database Failure

As we know that in DBMS, all the files are stored in single database so
chances of database failure become more. Any accidental failure of
component may cause loss of valuable data. This is really a big question
mark for big firms.

5. Extra Cost of Hardware

A DBMS requires disk storage for the data and sometimes you need to
purchase extra space to store your data. Also sometimes you need to a
dedicated machine for better performance of database. These machines
and storage space increase extra costs of hardware.

6. Size

As DBMS becomes big software due to its functionalities so it requires


lots of space and memory to run its application efficiently. It gains bigger
size as data is fed in it.

7. Cost of Data Conversion

Data conversion may require at any time and organization has to take
this step. It is unbelievable that data conversion cost is more than the
costs of DBMS hardware and machine combined. Trained staff is needed
to convert data to new system. It is a key reason that most of the
organizations are still working on their old DBMS due to high cost of
data conversion.

8. Currency Maintenance

As new threats comes daily, so DBMS requires to updates itself daily.


DBMS should be updates according to the current scenario.

9. Performance

Traditional files system was very good for small organizations as they
give splendid performance. But DBMS gives poor performance for small
scale firms as its speed is slow.

File Organization
Relative data and information is stored collectively in file formats. A file
is a sequence of records stored in binary format. A disk drive is
formatted into several blocks that can store records. File records are
mapped onto those disk blocks.

File Organization defines how file records are mapped onto disk blocks.
We have four types of File Organization to organize file records −
Heap File Organization
When a file is created using Heap File Organization, the Operating
System allocates memory area to that file without any further
accounting details. File records can be placed anywhere in that memory
area. It is the responsibility of the software to manage the records. Heap
File does not support any ordering, sequencing, or indexing on its own.

Sequential File Organization


Every file record contains a data field (attribute) to uniquely identify
that record. In sequential file organization, records are placed in the file
in some sequential order based on the unique key field or search key.
Practically, it is not possible to store all the records sequentially in
physical form.

Hash File Organization


Hash File Organization uses Hash function computation on some fields
of the records. The output of the hash function determines the location
of disk block where the records are to be placed.

Clustered File Organization


Clustered file organization is not considered good for large databases. In
this mechanism, related records from one or more relations are kept in
the same disk block, that is, the ordering of records is not based on
primary key or search key.

Traditional File System Versus Database Systems

Conventionally, the data were stored and processed using traditional file processing
systems. In these traditional file systems, each file is independent of other file, and data
in different files can be integrated only by writing individual program for each
application. The data and the application programs that uses the data are so arranged
that any change to the data requires modifying all the programs that uses the data. This
is because each file is hard-coded with specific information like data type, data size etc.
Some time it is even not possible to identify all the programs using that data and is
identified on a trial-and-error basis.

A file processing system of an organization is shown in figure below. All functional areas
in the organization creates, processes and disseminates its own files. The files such as
inventory and payroll generate separate files and do not communicate with each other.

No doubt such an organization was simple to operate and had better local control but
the data of the organization is dispersed throughout the functional sub-systems. These
days, databases are preferred because of many disadvantages of traditional file systems.

Disadvantages of Traditional File System


A traditional file system has the following disadvantages.

1) Data Redundancy: Since each application has its own data file, the same data may
have to be recorded and stored in many files. For example, personal file and payroll file,
both contain data on employee name, designation etc. The result is unnecessary
duplicate or redundant data items. This redundancy requires additional or higher
storage space, costs extra time and money, and requires additional efforts to keep all
files upto-date.

2) Data Inconsistency: Data redundancy leads to data inconsistency especially when


data is to be updated. Data inconsistency occurs due to the same data items that appear
in more than one file do not get updated simultaneously in each and every file. For
example, an employee is promoted from Clerk to Superintendent and the same is
immediately updated in the payroll file may not necessarily be updated in provident
fund file. This results in two different designations of an employee at the same time.
Over the period of time, such discrepencis degrade the quality of information contain in
the data file that affects the accuracy of reports.

3) Lack of Data Integration: Since independent data file exists, users face difficulty in
getting information on any ad hoc query that requires accessing the data stored in many
files. In such a case complicated programs have to be developed to retrieve data from
every file or the users have to manually collect the required information.

4) Program Dependence: The reports produced by the file processing system are
program dependent, which means if any change in the format or structure of data and
records in the file is to be made, the programs have to modified correspondingly. Also, a
new program will have to be developed to produce a new report.

5) Data Dependence: The Applications/programs in file processing system are data


dependent i.e., the file organization, its physical location and retrieval from the storage
media are dictated by the requirements of the particular application. For example, in
payroll application, the file may be organised on employee records sorted on their last
name, which implies that accessing of any employee's record has to be through the last
name only.

6) Limited Data Sharing: There is limited data sharing possibilities with the
traditional file system. Each application has its own private files and users have little
choice to share the data outside their own applications. Complex programs required to
be written to obtain data from several incompatible files.

7) Poor Data Control: There was no centralised control at the data element level,
hence a traditional file system is decentralised in nature. It could be possible that the
data field may have multiple names defined by the different departments of an
organization and depending on the file it was in. This situation leads to different
meaning of a data field in different context or same meaning for different fields. This
causes poor data control.

8) Problem of Security: It is very difficult to enforce security checks and access rights
in a traditional file system, since application programs are added in an adhoc manner.

9) Data Manipulation Capability is Inadequate: The data manipulation capability is


very limited in traditional file systems since they do not provide strong relationships
between data in different files.

Needs Excessive Programming: An excessive programming effort was needed to


develop a new application program due to very high interdependence between program
and data in a file system. Each new application requires that the developers start from
the scratch by designing new file formats and descriptions and then write the file access
logic for each new file.

Database Systems or Database System Environment


The DBMS software together with the Database is called a database system. In other
words, it can be defined as an organization of components that define and regulate the
collection, storage, management and use of data in a database. Furthermore, it is a
system whose overall purpose is to record and maintain information. A database system
consists of four major components:

Data: The whole data in the system is stored in a single database. This data in the
database are both shared and integrated. Sharing of data means individual pieces of
data in the database is shared among different users and every user can access the
same piece of data but may be for different purposes. Integration of data means the
database can be function of several distinct files with redundancy controlled among the
files.

Hardware: The hardware consists of the secondary storage devices like disks, drums
and so on, where the database resides together with other devices. There is two types of
hardware. The first one, i.e., processor and main memory that supports in running the
DBMS. The second one is the secondary storage devices, i.e., hard disk, magnetic disk
etc., that are used to hold the stored data.

Software: A layer or interface of software exists between the physical database and the
users. This layer is called the DBMS. All requests from the users to access the database
are handled by the DBMS. Thus, the DBMS shields the database users from hardware
details. Furthermore, the DBMS provides the other facilities like accessing and
updating the data in the files and adding and deleting files itself.

Users: The users are the people interacting with the database system in any way. There
are four types of users interacting with the database systems. These are Application
Programmers, online users, end users or naive users and finally the Database
Administrator (DBA).

Advantages of Database Systems (DBMS's)

The Database Systems provide the following advantages over the traditional file system.

1) Controlled redundancy: In a traditional file system, each application program has


its own data, which causes duplication of common data items in more than one file. This
duplication/redundancy requires multiple updations for a single transaction and wastes
a lot of storage space. We cannot eliminate all redundancy due to technical reasons. But
in a database, this duplication can be carefully controlled, that means the database
system is aware of the redundancy and it assumes the responsibility for propagating
updates.

2) Data consistency: The problem of updating multiple files in traditional file system
leads to inaccurate data as different files may contain different information of the same
data item at a given point of time. This causes incorrect or contradictory information to
its users. In database systems, this problem of inconsistent data is automatically solved
by controlling the redundancy.

3) Program data independence: The traditional file systems are generally data
dependent, which implies that the data organization and access strategies are dictated
by the needs of the specific application and the application programs are developed
accordingly. However, the database systems provide an independence between the file
system and application program, that allows for changes at one level of the data without
affecting others. This property of database systems allow to change data without
changing the application programs that process the data.

4) Sharing of data: In database systems, the data is centrally controlled and can be
shared by all authorized users. The sharing of data means not only the existing
applications programs can also share the data in the database but new application
programs can be developed to operate on the existing data. Furthermore, the
requirements of the new application programs may be satisfied without creating any
new file.

5) Enforcement of standards: In database systems, data being stored at one central


place, standards can easily be enforced by the DBA. This ensures standardised data
formats to facilitate data transfers between systems. Applicable standards might include
any or all of the following—departmental, installation, organizational, industry,
corporate, national or international.

6) Improved data integrity: Data integrity means that the data contained in the
database is both accurate and consistent. The centralized control property allow
adequate checks can be incorporated to provide data integrity. One integrity check that
should be incorporated in the database is to ensure that if there is a reference to certain
object, that object must exist.

7) Improved security: Database security means protecting the data contained in the
database from unauthorised users. The DBA ensures that proper access procedures are
followed, including proper authentical schemes for access to the DBMS and additional
checks before permitting access to sensitive data. The level of security could be
different for various types of data and operations.

8) Data access is efficient: The database system utilizes different sophisticated


techniques to access the stored data very efficiently.

9) Conflicting requirements can be balanced: The DBA resolves the conflicting


requirements of various users and applications by knowing the overall requirements of
the organization. The DBA can structure the system to provide an overall service that is
best for the organization.

10) Improved backup and recovery facility: Through its backup and recovery
subsystem, the database system provides the facilities for recovering from hardware or
software failures. The recovery subsystem of the database system ensures that the
database is restored to the state it was in before the program started executing, in case
of system crash.

11) Minimal program maintenance: In a traditional file system, the application


programs with the description of data and the logic for accessing the data are built
individually. Thus, changes to the data formats or access methods results in the need to
modify the application programs. Therefore, high maintenance effort are required.
These are reduced to minimal in database systems due to independence of data and
application programs.

12) Data quality is high: The quality of data in database systems are very high as
compared to traditional file systems. This is possible due to the presence of tools and
processes in the database system.

13) Good data accessibility and responsiveness: The database systems provide
query languages or report writers that allow the users to ask ad hoc queries to obtain
the needed information immediately, without the requirement to write application
programs (as in case of file system), that access the information from the database. This
is possible due to integration in database systems.

14) Concurrency control: The database systems are designed to manage simultaneous
(concurrent) access of the database by many users. They also prevents any loss of
information or loss of integrity due to these concurrent accesses.

15) Economical to scale: In database systems, the operational data of an organization


is stored in a central database. The application programs that work on this data can be
built with very less cost as compared to traditional file system. This reduces overall
costs of operation and management of the database that leads to an economical scaling.

16) Increased programmer productivity: The database system provides many


standard functions that the programmer would generally have to write in file system.
The availability of these functions allow the programmers to concentrate on the specific
functionality required by the users without worrying about the implementation details.
This increases the overall productivity of the programmer and also reduces the
development time and cost.

Disadvantages of Database Systems


In contrast to many advantages of the database systems, there are some disadvantages
as well. The disadvantages of a database system are as follows:

1) Complexity increases: The data structure may become more complex because of
the centralised database supporting many applications in an organization. This may lead
to difficulties in its management and may require professionals for management.

2) Requirement of more disk space: The wide functionality and more complexity
increase the size of DBMS. Thus, it requires much more space to store and run than the
traditional file system.

3) Additional cost of hardware: The cost of database system's installation is much


more. It depends on environment and functionality, size of the hardware and
maintenance costs of hardware.

4) Cost of conversion: The cost of conversion from old file-system to new database
system is very high. In some cases the cost of conversion is so high that the cost of
DBMS and extra hardware becomes insignificant. It also includes the cost of training
manpower and hiring the specialized manpower to convert and run the system.

5) Need of additional and specialized manpower: Any organization having database


systems, need to be hire and train its manpower on regular basis to design and
implement databases and to provide database administration services.

6) Need for backup and recovery: For a database system to be accurate and available
all times, a procedure is required to be developed and used for providing backup copies
to all its users when damage occurs.

7) Organizational conflict: A centralised and shared database system requires a


consensus on data definitions and ownership as well as responsibilities for accurate
data maintenance.

8) More installational and management cost: The big and complete database
systems are more costly. They require trained manpower to operate the system and has
additional annual maintenance and support costs.

Database Schema
A database schema is the skeleton structure that represents the logical view of
the entire database. It defines how the data is organized and how the relations
among them are associated. It formulates all the constraints that are to be
applied on the data.
A database schema defines its entities and the relationship among them. It
contains a descriptive detail of the database, which can be depicted by means of
schema diagrams. It’s the database designers who design the schema to help
programmers understand the database and make it useful.

The description of a database is called the database schema, which is specified


during database design and is not expected to change frequently.A displayed
schema is called a schema diagram. We call each object in the schema—such as
STUDENT or COURSE—a schema construct. A schema diagram displays only some
aspects of a schema, such as the names of record types and data items, and some
types of constraints.
A database schema can be divided broadly into two categories:

SCHEMAS
Physical Database Schema: This schema pertains to the actual storage
of data and its form of storage like files, indices, etc. It defines how the
data will be stored in a secondary storage.
Logical Database Schema: This schema defines all the logical
constraints that need to be applied on the data stored. It defines tables,
views, and integrity constraints.

Database Instance
It is important that we distinguish these two terms individually. Database
schema is the skeleton of database. It is designed when the database doesn't
exist at all. Once the database is operational, it is very difficult to make any
changes to it. A database schema does not contain any data or information. A
database instance is a state of operational database with data at any given time.
It contains a snapshot of the database. Database instances tend to change with
time. A DBMS ensures that its every instance (state) is in a valid state, by
diligently following all the validations, constraints, and conditions that the
database designers have imposed.
The data in the database at a particular moment in time is called a database
state or snapshot. It is also called the current set of occurrences or
instances in the database
Three Level Architecture of DBMS
Following are the three levels of database architecture,

1. Physical Level
2. Conceptual Level
3. External Level

In the above diagram,

 It shows the architecture of DBMS.


 Mapping is the process of transforming request response between various
database levels of architecture.
 Mapping is not good for small database, because it takes more time.
 In External / Conceptual mapping, DBMS transforms a request on an
external schema against the conceptual schema.
 In Conceptual / Internal mapping, it is necessary to transform the request
from the conceptual to internal levels.
1. Physical Level
 Physical level describes the physical storage structure of data in database.
 It is also known as Internal Level.
 This level is very close to physical storage of data.
 At lowest level, it is stored in the form of bits with the physical addresses
on the secondary storage device.
 At highest level, it can be viewed in the form of files.
 The internal schema defines the various stored data types. It uses a
physical data model.
2. Conceptual Level
 Conceptual level describes the structure of the whole database for a group
of users.
 It is also called as the data model.
 Conceptual schema is a representation of the entire content of the
database.
 These schema contains all the information to build relevant external
records.
 It hides the internal details of physical storage.
3. External Level
 External level is related to the data which is viewed by individual end users.
 This level includes a no. of user views or external schemas.
 This level is closest to the user.
 External view describes the segment of the database that is required for a
particular user group and hides the rest of the database from that user
group.
.
Data Independence

A database system normally contains a lot of data in addition to users’ data. For
example, it stores data about data, known as metadata, to locate and retrieve
data easily. It is rather difficult to modify or update a set of metadata once it is
stored in the database. But as a DBMS expands, it needs to change over time to
satisfy the requirements of the users. If the entire data is dependent, it would
become a tedious and highly complex job. Metadata itself follows a layered
architecture, so that when we change data at one layer, it does not affect the
data at another level. This data is independent but mapped to each other.
The concept of data independence can be defined as the capacity to
change the schema at one level of a database system without having to
change the schema at the next higher level. .
We can define two types of data independence:

1. Logical data independence is the capacity to change the conceptual schema


without having to change external schemas or application programs. We
may change the conceptual schema to expand the database (by adding a
record type or data item), to change constraints, or to reduce the database
(by removing a record type or data item).
2. Physical data independence is the capacity to change the internal schema
without having to change the conceptual schema. Hence, the external
schemas need not be changed as well. Changes to the internal schema may be
needed because some physical files were reorganized—for example, by creating
additional access structures—to improve the performance of retrieval or
update. If the same data as before remains in the database, we should not
have to change the conceptual schema.

Database Administrator
One of the main reasons for using DBMSs is tohave central control of both
thedata
and the programs that access those data. A person who has such central
control
over the system is called a database administrator (DBA). The functions of
a DBA
include:
• Schema definition. The DBA creates the original database schema by
executing
a set of data definition statements in the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification. TheDBAcarries out
changes
to the schema and physical organization to reflect the changing needs of the
organization, or to alter the physical organization to improve performance.
• Granting of authorization for data access. By granting different types
of
authorization, the database administrator can regulate which parts of the
database various users can access. The authorization information is kept in a
special system structure that the database system consults whenever
someone
attempts to access the data in the system.
• Routine maintenance. Examples of the database administrator’s routine
maintenance activities are:
◦ Periodically backing up the database, either onto tapes or onto remote
servers, to prevent loss of data in case of disasters such as flooding.
◦ Ensuring that enough free disk space is available for normal operations,
and upgrading disk space as required.
◦ Monitoring jobs running on the database and ensuring that performance
is not degraded by very expensive tasks submitted by some users.

Entity
The basic object that the ER model represents is
an entity, which is a thing in the real world with an independent existence.An
entity
may be an object with a physical existence (for example, a particular person, car,
house, or employee) or it may be an object with a conceptual existence (for
instance,
a company, a job, or a university course).

Attributes
Entities are represented by means of their properties called
attributes. All attributes have values. For example, a student entity
may have name, class, and age as attributes. There exists a domain
or range of values that can be assigned to attributes. For example,
a student's name cannot be a numeric value. It has to be
alphabetic. A student's age cannot be negative, etc.
Types of Attributes
1. Simple attribute: Simple attributes are atomic values, which
cannot be divided further. For example, a student's phone
number is an atomic value of 10 digits.
2. Composite attribute: Composite attributes are made of more
than one simple attribute. For example, a student's complete
name may have first_name and last_name.
3. Derived attribute: Derived attributes are the attributes that
do not exist in the physical database, but their values are
derived from other attributes present in the database. For
example, average_salary in a department should not be saved
directly in the database, instead it can be derived. For another
example, age can be derived from date_of_birth.
4. Single-value attribute: Single-value attributes contain
single value. For example: Social_Security_Number.
5. Multi-value attribute: Multi-value attributes may contain
more than one values. For example, a person can have more
than one phone number, email_address, etc.
These attribute types can come together in a way like:
 simple single-valued attributes
 simple multi-valued attributes
 composite single-valued attributes
 composite multi-valued attributes

Entity Types and Entity Sets:-

A database usually contains groups of entities that


are similar. For example, a company employing hundreds of employees may want to
store similar information concerning each of the employees. These employee
entities
share the same attributes, but each entity has its own value(s) for each attribute. An
entity type defines a collection (or set) of entities that have the same attributes.
Each
entity type in the database is described by its name and attributes.

The collection of all entities of a particular entity type in the database


at any point in time is called an entity set; the entity set is usually referred to
using the same name as the entity type. For example, EMPLOYEE refers to both a
type
of entity as well as the current set of all employee entities in the database.
An entity type describes the schema or intension for a set of entities that share
the
same structure. The collection of entities of a particular entity type is grouped into
an entity set, which is also called the extension of the entity type.

Key Attributes of an Entity Type. An important constraint on the entities of an


entity type is the key or uniqueness constraint on attributes. An entity type
usually
has one or more attributes whose values are distinct for each individual entity in
the
entity set. Such an attribute is called a key attribute, and its values can be used to
identify each entity uniquely. For example, the Name attribute is a key of the
COMPANY entity type because no two companies are allowed to have
the same name. For the PERSON entity type, a typical key attribute is Ssn (Social
Security number). An entity type may also have no key, in which case it is called a
weak entity type.

Value Sets (Domains) of Attributes. Each simple attribute of an entity type is


associated with a value set (or domain of values), which specifies the set of values
that may be assigned to that attribute for each individual entity.

Key Attribute : represents primary key. (main characteristics of an entity).


It is an attribute, that has distinct value for each entity/element in an entity set.
For example, Roll number in a Student Entity Type.

Relationship
The association among entities is called a relationship. For example, an
employee works_at a department, a student enrolls in a course. Here,
Works_at and Enrolls are called relationships.

Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a
relationship too can have attributes. These attributes are called descriptive
attributes.
A relationship type R among n entity types E1, E2, ..., En defines a set of
associations—
or a relationship set—among entities from these entity types. As for the
case of entity types and entity sets, a relationship type and its corresponding
relationship
set are customarily referred to by the same name, R.

Degree of Relationship
The number of participating entities in a relationship defines the degree of the
Relationship. Hence, the WORKS_FOR relationship is of degree two.
A relationship of degree two is called binary, and one of degree three is called
ternary. An example of a ternary relationship is SUPPLY.

Role Names and Recursive Relationships. Each entity type that participates
in a relationship type plays a particular role in the relationship. The role name
signifies
the role that a participating entity from the entity type plays in each relationship
instance, and helps to explain what the relationship means. For example, in the
WORKS_FOR relationship type, EMPLOYEE plays the role of employee or worker and
DEPARTMENT plays the role of department or employer.
Role names are not technically necessary in relationship types where all the
participating
entity types are distinct, since each participating entity type name can be
used as the role name. However, in some cases the same entity type participates
more than once in a relationship type in different roles. In such cases the role name
becomes essential for distinguishing the meaning of the role that each participating
entity plays. Such relationship types are called recursive relationships

Attributes of Relationship Types


Relationship types can also have attributes, similar to those of entity types. For
example, to record the number of hours per week that an employee works on a
particular
project, we can include an attribute Hours for the WORKS_ON relationship
type. Another example is to include the date on which a manager
started managing a department via an attribute Start_date for the MANAGES
relationship
type.

Mapping Cardinalities

Cardinality defines the number of entities in one entity set, which can be
associated with the number of entities of other set via relationship set.

One-to-one: One entity from entity set A can be associated with at most
one entity of entity set B and vice versa.

One-to-many: One entity from entity set A can be associated with more
than one entities of entity set B, however an entity from entity set B can
be associated with at most one entity.
Many-to-one: More than one entities from entity set A can be associated
with at most one entity of entity set B, however an entity from entity set
B can be associated with more than one entity from entity set A.

Many-to-many: One entity from A can be associated with more than one
entity from B and vice versa.

ER DIAGRAM
REPRESENTATION
An E-R diagram consists of the following major components:
• Rectangles divided into two parts represent entity sets. The first part, which
in this textbook is shaded blue, contains the name of the entity set. The second
part contains the names of all the attributes of the entity set.
• Diamonds represent relationship sets.
• Undivided rectangles represent the attributes of a relationship set.Attributes
that are part of the primary key are underlined.
• Lines link entity sets to relationship sets.
• Dashed lines link attributes of a relationship set to the relationship set.
• Double lines indicate total participation of an entity in a relationship set.
• Double diamonds represent identifying relationship sets linked to weak
entity sets.
Relationship
Relationships are represented by diamond-shaped box. Name of the relationship
is written inside the diamond-box. All the entities (rectangles) participating in a
relationship are connected to it by a line.

Binary Relationship and Cardinality


A relationship where two entities are participating is called a binary
relationship. Cardinality is the number of instance of an entity from a relation
that can be associated with the relation.

One-to-one: When only one instance of an entity is associated with the


relationship, it is marked as '1:1'. The following image reflects that only
one instance of each entity should be associated with the relationship. It
depicts one-to-one relationship.

One-to-many: When more than one instance of an entity is associated


with a relationship, it is marked as '1:N'. The following image reflects that
only one instance of entity on the left and more than one instance of an
entity on the right can be associated with the relationship. It depicts oneto-
many relationship.

Many-to-one: When more than one instance of entity is associated with


the relationship, it is marked as 'N:1'. The following image reflects that
more than one instance of an entity on the left and only one instance of
an entity on the right can be associated with the relationship. It depicts
many-to-one relationship.

Many-to-many: The following image reflects that more than one instance
of an entity on the left and more than one instance of an entity on the
right can be associated with the relationship. It depicts many-to-many
relationship.

Participation Constraints
Total Participation: Each entity is involved in the relationship. Total
participation is represented by double lines.
Partial participation: Not all entities are involved in the relationship.
Partial participation is represented by single lines.

Steps to form E-R diagram:-


1. Find entity.
2. Identify the relation among entities.
3. Find key attribute.
4. Find the remaining attributes.
5. Form the E-R diagram or complete it.
6. Review your result.

DATA MODELS
Data models define how the logical structure of a database is modeled. Data
Models are fundamental entities to introduce abstraction in a DBMS. Data
models define how data is connected to each other and how they are processed
and stored inside the system.
The very first data model could be flat data-models, where all the data used are
to be kept in the same plane. Earlier data models were not so scientific, hence
they were prone to introduce lots of duplication and update anomalies.

Historically, in database design, three models are commonly used. They are,

 Hierarchical Model

 Network Model

 Relational Model

Relational model
The most common model, the relational model sorts data
into tables, also known as relations, each of which consists
of columns and rows. Each column lists an attribute of the
entity in question, such as price, zip code, or birth date.
Together, the attributes in a relation are called a domain. A
particular attribute or combination of attributes is chosen as
a primary key that can be referred to in other tables, when
it’s called a foreign key.

Each row, also called a tuple, includes data about a specific


instance of the entity in question, such as a particular
employee.

The model also accounts for the types of relationships


between those tables, including one-to-one, one-to-many,
and many-to-many relationships. Here’s an example:

Within the database, tables can be normalized, or brought to


comply with normalization rules that make the database
flexible, adaptable, and scalable. When normalized, each
piece of data is atomic, or broken into the smallest useful
pieces.

Relational databases are typically written in Structured


Query Language (SQL). The model was introduced by E.F.
Codd in 1970.

Hierarchical model
The hierarchical model organizes data into a tree-like
structure, where each record has a single parent or root.
Sibling records are sorted in a particular order. That order
is used as the physical order for storing the database. This
model is good for describing many real-world relationships.
This model was primarily used by IBM’s Information
Management Systems in the 60s and 70s, but they are rarely
seen today due to certain operational inefficiencies.

Network model
The network model builds on the hierarchical model by
allowing many-to-many relationships between linked
records, implying multiple parent records. Based on
mathematical set theory, the model is constructed with sets
of related records. Each set consists of one owner or parent
record and one or more member or child records. A record
can be a member or child in multiple sets, allowing this
model to convey complex relationships.

It was most popular in the 70s after it was formally defined


by the Conference on Data Systems Languages (CODASYL).
Comparison between hierarchical model,
network model and relational model
When we move with the data models such as hierarchical model, network model,
relational model we can identify number of difference in terms of data structures, Data
manipulation and Data integrity.

Characteristi
Hierarchical model Network model Relational model
c
One to One,
Allowed the network
Data One to many or one to One to many, Many
model to support many
structure one relationships to many
to many relationships
relationships
Based on relational
A record can have many data structures
Data Based on parent child
parents as well as many
structure relationship
children.

Data Does not provide an CODASYL (Conference Relational


databases are what
brings many
independent stand on Data Systems
manipulation sources into a
alone query interface Languages)
common query
(such as SQL)
Retrieve
retrieve algorithms
Data Retrieve algorithms are algorithms are
are complex and
manipulation complex and symmetric simple and
asymmetric
symmetric
Cannot insert the
Does not suffer
Data information of a child Does not suffer form any
from any insert
integrity who does not have any insertion anomaly.
anomaly.
parent.
Multiple occurrences
of child records which
Data Free from update Free form update
lead to problems of
integrity anomalies. anomalies
inconsistency during
the update operation
Deletion of parent
Data Free from delete Free from delete
results in deletion of
intergirty anomalies anomalies
child records
UNIT-2
RELATIONAL DATA
MODEL
Relational data model is the primary data model, which is used widely around
the world for data storage and processing. This model is simple and it has all
the
properties and capabilities required to process data with storage efficiency.

The relational model is very simple and elegant; a database is a collection of one
or more relations, where each relation is a table with rows and columns. This
simple tabular representation enables even novice users to understand the
contents of a database and it permits the use of simple, high-level languages to
query the data. The major advantages of the relational model over the older
data models are its simple data representation and the ease with which even
complex queries can be expressed.

The relational model represents the database as a collection of relations (or


tables). Informally each relation resembles a table of values or, to some extent,
a “flat” file of records. One of the main advantages of relational model is that it
is conceptually simple and more important. It also frees the users from detail of
storage and access methods.

Attributes

In relational model terminology all the column headers are called


attributes. Consider a table STUDENT. In this table there are three
column headers, it means this table has three attributes RollNo, Name,
Address
Roll Nam Addre
No e ss
2 Kom Delhi
al

Each attribute Ai is the name of a role played by some domain D in the relation
schema R.
Domain

“The set of permitted values for each attribute is called domain”


or “A domain is referred to in a relation schema by the attribute
name and has a set of associated values”. A domain D is a set of
atomic values. By Atomic we mean that each value in the domain is
individual as far as the relational model is concerned. “The data type
describing the types of values that can appear in each column is
represented by a domain of possible values.” For example
Set_phone_number can be declared as of character strings. The data type
for Employee_ages is an integer number between 15 and 80. For
academic_deaprtment_names, the data type is the set of all characters
strings that represent valid department names. A domain is thus given a
name, data type, and format.

Tuples / Records
A single row of a table, which contains a single record for that relation is
called a tuple.
In relational model terminology all the rows are called tuples or records
in the relation. Consider a table STUDENT. In this table there are six
rows, it means there six tuples or records in this table

Relation instance: A finite set of tuples in the relational database system


represents relation instance. Relation instances do not have duplicate tuples.

Relation schema
“The relation schema describes the column headers for the table
or relation”. A relation schema R denoted by R (A1, A2, A3…An), is
made up of a relation name R and a list of attributes A1, A2, A3… An.
Each attribute Aj, is the name of role played by some domain D in the
relation schema R. D is called domain of Aj and is denoted by dom (Aj).
A relation schema is used to describe a relation R, and R is called the
name of this relation.

Relation
“A relation is defined as a set of tuples”.

The main construct for representing data in the relational model is a


relation. “A relation consists of a relation schema and a relation
instance. The relation instance is a table, and the relation schema
describes the column heads for the table”. A relation (or relation
state) r of the relation schema R (A1, A2, A3…An), is a set of n–tuples r =
{t1, t2, t3, . . . .tn}, which is denoted by r (R). Each tuple t is an ordered
list of n values t =<v 1,v2……,vn> , where each value vi (1 ≤ i ≤ n,) is an
element of domain or is a special null value.
Name Roll Home Addres OfficePhon Ag GP
No Phone s e e A
Ram 3 2134234432 Delhi Null 23 3.2
5
Rajesh 5 2342345433 Bombay Null 28 3.2
1
Rames 6 4564576657 Chenna Null 18 2.8
h i 9
Rajnees 2 7686786799 U.P. 345434535 25 3.2
h 5
STUDENT
In the above relation all the column headers are the attributes and all
rows are the tuples and STUDENT is the name of relation and schema of
this table is the relation schema.

Characteristics of relations

1.Ordering of tuples in a relation

A relation is defined as a set of tuples. Tuples in a relation do not


have any particular order. Tuple ordering is not a part of relation
definition, because a relation attempts to represent facts at a
logical or abstract level. For example tuples in the STUDENT
relation could be logically ordered by name, roll no, address, and
age or by some other attribute.

2.Ordering of values within a Tuple

According to the preceding definition of a relation the ordering of


values in tuple is important. However at a logical level, the order of
attributes and their values is not that important as long as the
correspondence between attributes and values is maintained.

3.Values and Nulls in the tuples

Each value in a tuple is an atomic value. It means it is not divisible into


components within the framework of the basic relational model. Hence,
composite and multivelued attributes are not allowed. This model is some
times called the flat relational model.An important concept is that of
nulls, which are used to represent the values of attributes that may be
unknown or may not apply to a tuple. A special value, called null is used
for these cases.

4. Interpretation (Meaning) of a Relation

The relation schema can be interpreted as a declaration or as a type of


assertion. For example the schema of the STUDENT relation as given
below asserts that a student entity has a Name, RollNo, HomePhone,
Address, OfficePhone, Age, and GPA. Each tuple in the relation can be
interpreted as a fact or a particular instance of the assertion. For
example in the following figure first tuple asserts the fact that there is a
student whose name is Ramesh, RollNo is 305612435, age is 19 and so
on. An alternative interpretation of a relation schema is as a predicate in
this case the values in each tuple are interpreted as values that satisfy
the predicate.

Name Rol Home Addres OfficePhon Ag GP


l Phone s e e A
No
Ram 3 21342344 Delhi Null 23 3.2
32 5
Rajesh 5 23423454 Bombay Null 28 3.2
33 1
Rames 6 45645766 Chenna Null 18 2.8
h 57 i 9
Rajnee 2 76867867 U.P. 345434535 25 3.2
sh 99 5

Keys
Any attribute in the table which uniquely identifies each record in the
table is called key. It can be a single attribute or a combination of
attributes. For example, in STUDENT table, STUDENT_ID is a key,
since it is unique for each student. In PERSON table, his passport
number, driving license number, phone number, SSN, email address is
keys since they are unique for each person.

Keys are very important part of Relational database. They are used to
establish and identify relation between tables. They also ensure that each
record within a table can be uniquely identified by combination of one or
more fields within a table.

Primary Key

It is the first and foremost key which is used to uniquely identify a


record. It can be a single attribute or a combination of attributes. For
an entity, there could be multiple keys as we saw in PERSON table.
Most suitable key from those lists becomes a primary key. In the
Person table above, we can select SSN as primary key, since it is
unique for each person. We can even select Passport Number or
license number as primary key as they are also unique for a person.
However, selection of primary key for each entity is based on
requirement and developer.
For a student, STUDENT_ID is a primary key and for an employee
EMPLOYEE_ID is a primary key.

It does not allow a relation to accept null values. Primary key is a


key by which all the tuples can be identified uniquely. It restricts
the duplicate rows in a relation.

Candidate Key
Candidate keys are defined as the set of fields from which
primary key can be selected. It is an attribute or set of attribute
that can act as a primary key for a table to uniquely identify each
record in that table.

As we discussed above, an employee is identified by his ID in his office.


Apart from his ID, does he have any other unique keys, so that he can be
identified from others? Yes, he has passport number, PAN number, SSN
number (if applicable), driving license number, email address etc. These
are also identifies specific person uniquely. But we can choose any one of
these unique attribute as primary key in the table. Rest of the attributes,
which holds as strong as primary key are considered as Candidate
key/secondary key. In our example of employee table, EMPLOYEE_ID is
best suited for primary key as its from his own employer. Rest of the
attributes like passport number, SSN, license Number etc are considered

as candidate key.
Foreign key
In a company there would be different departments - Accounting, Human
Resource (HR), development, Quality, etc. An employee, who works for that
company, works in specific department. But we know that employee and
department are two different entities. So we cannot store his department
information in employee table. Instead what we do is we link these two tables
by means of primary key of one of the table i.e.; in this case, we pick the
primary key of department table - DEPARTMENT_ID and add it as a new
attribute/column in the Employee table. Now DEPARTMENT_ID is a foreign key
for Employee table, and both the tables are related!

Note: - Names of the attribute in both the tables can be different. It's all
when we really create the table via script matters !

When a primary key of a relation (or a table) is used as a primary key in


another relation (or table), it is called foreign key. For example there are
two relations EMPLOYEE and SALARY which contain employee details
and salary details of employee respectively. An attribute {emp_id} which
is present in both the relation can be considered as a primary key in
EMPLOYEE relation and as a foreign key in SALARY relation.

Super Key
Super Key is defined as a set of attributes within a table that uniquely
identifies each record within a table. Super Key is a superset of Candidate
key.
A superkey is a combination of columns that uniquely identifies any row
within a relational database management system (RDBMS) table. A
candidate key is a closely related concept in which the superkey is reduced
to the minimum number of columns required to uniquely identify each row.

Superkey is a subset of one or more attributes that allows us to identify


uniquely a tuple in the relation. For example in the above relation the sid
attribute of the relation STUDENT is sufficient to distinguish one student entity
or a tuple from another. Each relation contains a default Superkey which is a
set of all the attributes. In above relation STUDENT {sid}, {login}, {sid,
name}, {name, login}, {sid, name, login, age} is a Superkey, and set of all
attribute {sid, name, login, age, gpa} is also a Superkey, which is default
Superkey for this relation.

Let’s take an example to understand this: Employee table

Emp_SSN Emp_Number Emp_Name

123456789 226 Steve

999999321 227 Ajeet

888997212 228 Chaitanya

777778888 229 Robert

Super keys:

 {Emp_SSN}
 {Emp_Number}
 {Emp_SSN, Emp_Number}
 {Emp_SSN, Emp_Name}
 {Emp_SSN, Emp_Number, Emp_Name}
 {Emp_Number, Emp_Name}

Candidate Keys:

 {Emp_SSN}
 {Emp_Number}

Compound key
A key in a table is formed by combining more than one attributes/columns of the
same table. These columns of the table can or cannot be keys in the table. The
compound key acts as a primary key only when all the columns in the compound
keys are together, individually those columns are not keys. In other words,
unique record from the table is fetched only if we combine more than one
column. If we use them individually, we will not get any unique record.

In the example of M: N relationship - ‘Student enrolls for a course',


STUDENT_ID and COURSE_ID, when combined together gives the particular
course to which he is enrolled for. Only STUDENT_ID or COURSE_ID alone
does not inform correct data.

In the table above, STUDENT_ID, 100 alone gives us multiple courses. To know
about particular course we need both STUDENT_ID and COURSE_ID. In this
case, both the IDs are primary keys from their table, but in STUDENT_COURSE
table, they form primary key when they are combined together. Hence they are
compound key.

Composite key
Composite key is similar to compound key, but the columns which are part of
composite keys are always keys in that table.

Key that consist of two or more attributes that uniquely identify an entity occurance
is called Composite key. But any attribute that makes up the Composite key is
not a simple key in its own.

In certain tables a single attribute can not be used to identify rows


uniquely then a combination of two or more attributes is used to as a
primary key. Such keys are called composite keys.
Surrogate Key
Surrogate key is a kind of primary key, but it is not defined by the designer. It is a
system generated random number, which uniquely identifies the entity in the
system and not available for the user.

Secondary or Alternative key


The candidate key which are not selected for primary key are known as secondary
keys or alternative keys.
A relation may contain more then one candidate key. If one candidate key has
been chosen as a primary key than another candidate key is called alternate key
in that relation. For example a STUDENT relation has two attributes {s_di} and
{login_id}. In this case both attributes serve as a unique identifier for the
relation. Hence, both of them are called the candidate keys. If suppose {s_id}
has been chosen as primary key then {login_id} would become alternate key.

Unique key
Unique key is just like a primary key with a little difference that primary key
enforces the NOT NULL constraint but unique key do not enforce NOT NULL
constraint in the relation. It means a unique key is a key which allows a relation
to accept only unique values and null value. A unique value can accept only one
null value in a relation.

Non-key Attribute
Non-key attributes are attributes other than candidate key attributes in a table.

Non-prime Attribute
Non-prime Attributes are attributes other than Primary attribute.
Relational database
A relational database is a collection of data items organized as a set of
formally-described tables from which data can be accessed or
reassembled in many different ways without having to reorganize the
database tables. The relational database was invented by E. F. Codd at
IBM in 1970.

The standard user and application program interface to a relational


database is the structured query language (SQL). SQL statements are
used both for interactive queries for information from a relational
database and for gathering data for reports.

In addition to being relatively easy to create and access, a relational


database has the important advantage of being easy to extend. After
the original database creation, a new data category can be added
without requiring that all existing applications be modified.

A relational database is a set of tables containing data fitted into


predefined categories. Each table (which is sometimes called
a relation) contains one or more data categories in columns.
Each row contains a unique instance of data for the categories defined
by the columns. For example, a typical business order entry database
would include a table that described a customer with columns for
name, address, phone number, and so forth. Another table would
describe an order: product, customer, date, sales price, and so forth. A
user of the database could obtain a view of the database that fitted the
user's needs. For example, a branch office manager might like a view
or report on all customers that had bought products after a certain
date. A financial services manager in the same company could, from
the same tables, obtain a report on accounts that needed to be paid.

When creating a relational database, you can define the domain of


possible values in a data column and further constraints that may
apply to that data value. For example, a domain of possible customers
could allow up to ten possible customer names but be constrained in
one table to allowing only three of these customer names to be
specifiable.
The definition of a relational database results in a table of metadata or
formal descriptions of the tables, columns, domains, and constraints.

Relational database management


system (RDBMS):-
A relational database management system (RDBMS) is a program that
lets you create, update, and administer a relational database. Most
commercial RDBMS's use the Structured Query Language (SQL) to
access the database, although SQL was invented after the development
of the relational model and is not necessary for its use.

The leading RDBMS products are Oracle, IBM's DB2 and Microsoft's SQL
Server. Despite repeated challenges by competing technologies, as well
as the claim by some experts that no current RDBMS has fully
implemented relational principles, the majority of new corporate
databases are still being created and managed with an RDBMS.

RDBMS stands for Relational Database Management System. RDBMS is


the basis for SQL, and for all modern database systems like MS SQL
Server, IBM DB2, Oracle, MySQL, and Microsoft Access.

A Relational database management system (RDBMS) is a database


management system (DBMS) that is based on the relational model as
introduced by E. F. Codd.

RDBMS store the data into collection of tables, which might be related by
common fields (database table columns). RDBMS also provide relational
operators to manipulate the data stored into the database tables.

Constraints:-
Domain Constraints –
Domain Constraints specifies that what set of values an attribute can
take. Value of each attribute X must be an atomic value from the domain
of X.
The data type associated with domains include integer, character, string,
date, time, currency etc. An attribute value must be available in the
corresponding domain. Consider the example below –
Tuple Uniqueness Constraints –
A relation is defined as a set of tuples. All tuples or all rows in a relation
must be unique or distinct. Suppose if in a relation, tuple uniqueness
constraint is applied, then all the rows of that table must be unique i.e. it
does not contain the duplicate values. For example,

Single Value Constraints –


Single value constraints refers that each attribute of an entity set has a
single value. If the value of an attribute is missing in a tuple, then we can
fill it with a “null” value. The null value for an attribute will specify that
either the value is not known or the value is not applicable. Consider the
below example-

Key Constraints –
A relation is defined as a set of tuples. By definition all the elements of a set are
distinct; hence, all the tuples in a relation must also be distinct. This means that
no two tuples can have the same combination of values for all their attributes. A
key constraint is a statement that a certain subset of the fields of a relation is a
unique identifier for a tuple.
There are three types of key constraints that are most common.

 Primary Key constraint

 Foreign Key constraint

 Unique Key constraint

A PRIMARY KEY constraint is a unique identifier for a row within a


database table. Every table should have a primary key constraint to
uniquely identify each row and only one primary key constraint can be
created for each table. The primary key constraints are used to enforce
entity integrity.

A UNIQUE constraint enforces the uniqueness of the values in a set of


columns, so no duplicate values are entered. The unique key constraints
are used to enforce entity integrity as the primary key constraints.

A FOREIGN KEY constraint prevents any actions that would destroy link
between tables with the corresponding data values. A foreign key in one
table points to a primary key in another table. Foreign keys prevent
actions that would leave rows with foreign key values when there are no
primary keys with that value. The foreign key constraints are used to
enforce referential integrity.

Integrity Constraints

Integrity constraints are such constraints which are specified on the


database schema and are expected to hold on every valid database state
of that schema. In addition to domain, key, and NOT NULL constraints,
two other types of constraints are considered which are entity integrity
constraint and referential integrity constraint.

Integrity constraints are used to ensure accuracy and consistency of data


in a relational database.

1. Entity Integrity Constraint

The entity integrity constraint states that primary keys can't be null. There must
be a proper value in the primary key field.

This is because the primary key value is used to identify individual rows in a
table. If there were null values for primary keys, it would mean that we could
not indentify those rows.

On the other hand, there can be null values other than primary key fields. Null
value means that one doesn't know the value for that field. Null value is
different from zero value or space.

In the Car Rental database in the Car table each car must have a proper and
unique Reg_No. There might be a car whose rate is unknown - maybe the car is
broken or it is brand new - i.e. the Rate field has a null value. See the picture
below.

The entity integrity constraints assure that a spesific row in a table can be
identified.

Picture. Car and CarType tables in the Rent database

2. Referential Integrity Constraint

The referential integrity constraint is specified between two tables and it is


used to maintain the consistency among rows between the two tables.

The rules are:


1. You can't delete a record from a primary table if matching records exist in a
related table.
2. You can't change a primary key value in the primary table if that record has
related records.
3. You can't enter a value in the foreign key field of the related table that
doesn't exist in the primary key of the primary table.
4. However, you can enter a Null value in the foreign key, specifying that the
records are unrelated.

Examples

Rule 1. You can't delete any of the rows in the CarType table that are visible in
the picture since all the car types are in use in the Car table.

Rule 2. You can't change any of the model_ids in the CarType table since all the
car types are in use in the Car table.

Rule 3. The values that you can enter in the model_id field in the Car table must
be in the model_id field in the CarType table.

Rule 4. The model_id field in the Car table can have a null value which means
that the car type of that car in not known

Relational database systems are expected to be equipped with a query language


that can assist its users to query the database instances. There are two kinds of
query languages: relational algebra and relational calculus.

Relational Algebra
Relational algebra is a procedural query language, which takes instances of
relations as input and yields instances of relations as output. It uses operators
to perform queries. An operator can be either unary or binary. They accept
relations as their input and yield relations as their output. Relational algebra is
performed recursively on a relation and intermediate results are also
considered relations.
The relational algebra is a theoretical language with operations that work on
one or more relations to define another relation without changing the original
relation(s).
While using the relational algebra, user has to specify what is required and
what are the procedures or steps to obtain the required output. Both the
relational algebra and the relational calculus are formal, non-user-friendly
languages. They have been used as the basis for other, higher-level Data
Manipulation Languages (DMLs) for relational databases. They illustrate the
basic operations required of any DML and serve as the standard of comparison
for other relational languages.

The fundamental operations of relational algebra are as follows:


Select
Project
Union
Set difference
Cartesian product
Rename

Other operations include join, division and intersection.

Unary Relational Operations:

1. The SELECT Operation(σ)


The SELECT operation is used to choose a subset of the tuples from a
relation that satisfies a selection condition. One can consider the
SELECT operation to be a filter that keeps only those tuples that satisfy
a qualifying condition. Alternatively, we can consider the SELECT
operation to restrict the tuples in a relation to only those tuples that
satisfy the condition. The SELECT operation can also be visualized as a
horizontal partition of the relation into two sets of tuples—those tuples
that satisfy the condition and are selected, and those tuples that do not
satisfy the condition and are discarded.

In general, the SELECT operation is denoted by


σ<selection condition>(R)
where the symbol σ(sigma) is used to denote the SELECT operator and
the selection condition is a Boolean expression (condition) specified on
the attributes of relation R. Notice that R is generally a relational
algebra expression whose result is a relation—the simplest such
expression is just the name of a database relation. The relation
resulting from the SELECT operation has the same attributes as R.
The Boolean expression specified in <selection condition> is made up
of a number of clauses of the form
<attribute name> <comparison op> <constant value>
or
<attribute name> <comparison op> <attribute name>
For example, to select the EMPLOYEE tuples whose department is 4, or
those whose salary is greater than $30,000, we can individually specify
each of these two conditions with a SELECT operation as follows:
σDno=4(EMPLOYEE)
σSalary>30000(EMPLOYEE)

Clauses can be connected by the standard Boolean operators and, or, and
not to form a general selection condition. For example, to select the
tuples for all employees who either work in department 4 and make over
$25,000 per year, or work in department 5 and make over $30,000, we
can specify the following SELECT operation:
σ(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)

The SELECT operator is unary; that is, it is applied to a single relation.


Moreover, the selection operation is applied to each tuple individually;
hence, selection conditions cannot involve more than one tuple. The
degree of the relation resulting from a SELECT operation—its number of
attributes—is the same as the degree of R.
2. The PROJECT Operation(π)

If we think of a relation as a table, the SELECT operation chooses some


of the rows from the table while discarding other rows. The PROJECT
operation, on the other hand, selects certain columns from the table and
discards the other columns. If we are interested in only certain attributes
of a relation, we use the PROJECT operation to project the relation over
these attributes only. Therefore, the result of the PROJECT operation can
be visualized as a vertical partition of the relation into two relations: one
has the needed columns (attributes) and contains the result of the
operation, and the other contains the discarded columns.
The general form of the PROJECT operation is :
π<attribute list>(R)
where π (pi) is the symbol used to represent the PROJECT operation, and
<attribute list> is the desired sub list of attributes from the attributes of
relation R. Again, notice that R is, in general, a relational algebra
expression whose result is a relation, which in the simplest case is just
the name of a database relation.
For example, to list each employee’s first and last name and salary, we
can use the PROJECT operation as follows:
πLname, Fname, Salary(EMPLOYEE)
The result of the PROJECT operation has only the attributes specified in
<attribute list> in the same order as they appear in the list. Hence, its
degree is equal to the number of attributes in <attribute list>.
If the attribute list includes only non-key attributes of R, duplicate tuples
are likely to occur. The PROJECT operation removes any duplicate tuples,
so the result of the PROJECT operation is a set of distinct tuples, and
hence a valid relation. This is known as duplicate elimination.

3. Rename Operation (ρ)

The results of relational algebra are also relations but without any name.
The rename operation allows us to rename the output relation. ‘rename’
operation is denoted with small Greek letter rho ρ.
Notation: ρ x (E)
Where the result of expression E is saved with name of x.

Relational Algebra Operations from Set


Theory
Several set theoretic operations are used to merge the elements of two
sets in various ways, including UNION, INTERSECTION, and SET
DIFFERENCE (also called MINUS or EXCEPT). These are binary
operations; that is, each is applied to two sets (of tuples).When these
operations are adapted to relational databases, the two relations on
which any of these three operations are applied must have the same type
of tuples; this condition has been called union compatibility or type
compatibility. Two relations R(A1, A2, ..., An) and S(B1, B2, ..., Bn) are
said to be union compatible (or type compatible) if they have the
same degree n and if dom(Ai) = dom(Bi) for 1 ≤ i ≥ n. This means that
the two relations have the same number of attributes and each
corresponding pair of attributes has the same domain.

1. UNION Operation (∪)

The result of this operation, denoted by R ∪ S, is a relation that includes


all tuples that are either in R or in S or in both R and S. Duplicate tuples
are eliminated.

r ∪ s = { t | t ∈ r or t ∈ s}
It performs binary union between two given relations and is defined as:

Notation: r U s.
Where r and s are either database relations or relation result set
(temporary relation).

2. INTERSECTION Operation (∩)

The result of this operation, denoted by R ∩ S, is a relation that includes

Notation: r ∩ s.
all tuples that are in both R and S.

Where r and s are either database relations or relation result set


(temporary relation).

3. SET DIFFERENCE Operation (-)

The result of set difference operation is tuples, which are present in one
relation but are not in the second relation.
Notation: r − s
Finds all the tuples that are present in r but not in s.

Notice that both UNION and INTERSECTION are commutative


operations; that is,
R ∪ S = S ∪ R and R ∩ S = S ∩ R
Both UNION and INTERSECTION can be treated as n-ary operations
applicable to any number of relations because both are also associative

R ∪ (S ∪ T) = (R ∪ S) ∪ T and (R ∩ S ) ∩ T = R ∩ (S ∩ T )
operations; that is,

The MINUS operation is not commutative; that is, in general,


R−S≠S−R
Note that INTERSECTION can be expressed in terms of union and set

R ∩ S = ((R ∪ S ) − (R − S )) − (S − R)
difference as follows:

The CARTESIAN PRODUCT (CROSS PRODUCT)


Operation

The CARTESIAN PRODUCT operation—also known as CROSS


PRODUCT or CROSS JOIN—which is denoted by ×. This is also a binary
set operation, but the relations on which it is applied do not have to be
union compatible. In its binary form, this set operation produces a new
element by combining every member (tuple) from one relation (set) with
every member (tuple) from the other relation (set). In general, the result
of R(A1, A2, ..., An) × S(B1, B2, ..., Bm) is a relation Q with degree n + m
& attributes Q(A1, A2, ..., An, B1, B2, ..., Bm), in that order.
The resulting relation Q has one tuple for each combination of tuples—
one from R and one from S. Hence, if R has nR tuples (denoted as |R| =
nR), and S has nS tuples, then R × S will have nR * nS tuples.
The n-ary CARTESIAN PRODUCT operation is an extension of the above
concept, which produces new tuples by concatenating all possible
combinations of tuples from n underlying relations.

Notation: r × s

r × s = { q t | q ∈ r and t ∈ s}
Where r and s are relations and their output will be defined as:

Binary Relational Operations:


1. The DIVISION Operation
The DIVISION operation, denoted by ÷, is useful for a special kind of
query that sometimes occurs in database applications.
In general, the DIVISION operation is applied to two relations R(Z) ÷

X ⊆ Z. Let Y be the set of attributes of R that are not attributes of S; that


S(X), where the attributes of R are a subset of the attributes of S; that is,

is, Y = Z – X (and hence Z = X ∪ Y). The result of DIVISION is a relation


T(Y) that includes a tuple t if tuples tR appear in R with tR [Y] = t, and
with tR [X] = tS for every tuple tS in S. This means that, for a tuple t to
appear in the result T of the DIVISION, the values in t must appear in R
in combination with every tuple in S. Note that in the formulation of the
DIVISION operation, the tuples in the denominator relation S restrict the
numerator relation R by selecting those tuples in the result that match all
values present in the denominator.
Produces a relation R(X) that includes all tuples t[X] in R1(Z) that

= X ∪ Y.
appear in R1 in combination with every tuple from R2(Y), where Z

R1(Z) ÷ R2(Y)

2. The JOIN Operation

The JOIN operation, denoted by ⋈, is used to combine related tuples


from two relations into single “longer” tuples. This operation is very
important for any relational database with more than a single relation
because it allows us to process relationships among relations.
The JOIN operation can be specified as a CARTESIAN PRODUCT
operation followed by a SELECT operation. However, JOIN is very
important because it is used very frequently when specifying database
queries.
The general form of a JOIN operation on two relations R(A1, A2, ..., An)

R ⋈<join condition> S
and S(B1, B2, ..., Bm) is

The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, ...,
An, B1, B2, ... , Bm) in that order; Q has one tuple for each combination
of tuples—one from R and one from S—whenever the combination
satisfies the join condition. This is the main difference between
CARTESIAN PRODUCT and JOIN. In JOIN, only combinations of tuples
satisfying the join condition appear in the result, whereas in the
CARTESIAN PRODUCT all combinations of tuples are included in the
result. The join condition is specified on attributes from the two relations
R and S and is evaluated for each combination of tuples. Each tuple
combination for which the join condition evaluates to TRUE is included in
the resulting relation Q as a single combined tuple.
A general join condition is of the form
<condition> AND <condition> AND...AND <condition>
where each <condition> is of the form Ai θ Bj, Ai is an attribute of R, Bj
is an attribute of S, Ai and Bj have the same domain, and θ (theta) is one
of the comparison operators {=, <, ≤, >, ≥, ≠}.

Types of Joins

Theta (θ) Join


Theta join combines tuples from different relations provided they satisfy
the theta condition. The join condition is denoted by the symbol θ.

R1 ⋈θ R2
Notation:

R1 and R2 are relations having attributes (A1, A2, .., An) and (B1,
B2,.. ,Bn) such that the attributes don’t have anything in common, that is,
R1 ∩ R2 = Φ. Theta join can use all kinds of comparison operators.

INNER Join or EQUI Join


The most common use of JOIN involves join conditions with equality
comparisons only. Such a JOIN, where the only comparison operator used
is =, is called an EQUIJOIN. In the result of an EQUIJOIN we always
have one or more pairs of attributes that have identical values in every
tuple.This is a simple JOIN in which the result is based on matched data
as per the equality condition specified in the query.
Natural Join (⋈)
Natural join does not use any comparison operator. It does not concatenate the
way a Cartesian product does. We can perform a Natural Join only if there is at
least one common attribute that exists between two relations. In addition, the
attributes must have the same name and domain.
Natural join acts on those matching attributes where the values of attributes in
both the relations are same.

Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join
includes only those tuples with matching attributes and the rest are
discarded in the resulting relation. Therefore, we need to use outer joins
to include all the tuples from the participating relations in the resulting
relation. There are three kinds of outer joins: left outer join, right outer
join, and full outer join.
A join that includes rows even if they do not have related rows in
the joined table is called as Outer Join.

1. LEFT OUTER JOIN or LEFT JOIN


This join returns all the rows from the left table in conjunction with the
matching rows from the right table. If there are no columns matching in the
right table, it returns NULL values.

2. RIGHT OUTER JOIN or RIGHT JOIN


This JOIN returns all the rows from the right table in conjunction with
the matching rows from the left table. If there are no columns matching
in the left table, it returns NULL values.

3. FULL OUTER JOIN or FULL JOIN


This JOIN combines LEFT OUTER JOIN and RIGHT OUTER JOIN. It
returns row from either table when the conditions are met and
returns NULL value when there is no match.

In other words, OUTER JOIN is based on the fact that : ONLY the
matching entries in ONE OF the tables (RIGHT or LEFT) or BOTH of the
tables(FULL) SHOULD be listed.

Note that `OUTER JOIN` is a loosened form of `INNER JOIN`.

SELF JOIN
A self join is a join in which a table is joined with itself (which is also
called Unary relationships), especially when the table has a FOREIGN
KEY which references its own PRIMARY KEY. To join a table itself means
that each row of the table is combined with itself and with every other
row of the table.

Relational Calculus
Relational calculus is an query language which is non procedural, and
instead of algebra it uses mathematical predicate calculus. The relational
calculus is not the same like that of differential and integral calculus in
mathematics, but takes its name from a branch of symbolic logic termed
as predicate calculus. When applied to databases, it is found in two
forms. These are
 Tuple relational calculus which was originally proposed by Codd in the
year 1972 and
 Domain relational calculus which was proposed by Lacroix and Pirotte in
the year 1977.
In first order logic or predicate calculus, a predicate is a truth valued
function with arguments. When we replace with values for the
arguments, the function yields an expression, called a proposition, which
will be either true or false.

Tuple Relational Calculus


In the tuple relational calculus you will have to find tuples for which a
predicate is true. The calculus is dependent on the use of tuple variables.
A tuple variable is a variable that ‘ranges over’ a named relation: i.e. a
variable who’s only permitted values are tuples of the relation.
Any tuple variable with ‘For All’ (?) or ‘there exists’ (?) condition
is called bound variable.
Any tuple variable without any ‘For All’ or ‘there exists’ condition is
called Free Variable.
All the conditions used in the tuple expression are called as well formed
formula – WFF. All the conditions in the expression are combined by
using logical operators like AND, OR and NOT, and qualifiers like ‘For
All’ (?) or ‘there exists’ (?). If the tuple variables are all bound variables
in a WFF is called closed WFF. In an open WFF, we will have at least
one free variable.

{t | P (t)} or {t | condition (t)} -- this is also known as expression of


relational calculus Where t is the resulting tuples, P(t) is the condition
used to fetch t.

{t | EMPLOYEE (t) and t.SALARY>10000} - implies that it selects the


tuples from EMPLOYEE relation such that resulting employee tuples will
have salary greater than 10000. It is example of selecting a range of
values.

{t | EMPLOYEE (t) AND t.DEPT_ID = 10} – this select all the tuples of
employee name who work for Department 10.

Domain Relational Calculus


In the tuple relational calculus, you have use variables that have series of
tuples in a relation. In the domain relational calculus, you will also use
variables but in this case the variables take their values from domains of
attributes rather than tuples of relations. A domain relational calculus
expression has the following general format –
{d1, d2, . . . , dn | F(d1, d2, . . . , dm)} m ≥ n
where d1, d2, . . . , dn, . . . , dm stand for domain variables and F(d1, d2, .
. . , dm) stands for a formula composed of atoms.

For example, select EMP_ID and EMP_NAME of employees who work for
department 10

{<EMP_ID, EMP_NAME> | <EMP_ID, EMP_NAME> ? EMPLOYEE Λ


DEPT_ID = 10}

Get name of the department name that Alex works for.

{DEPT_NAME |< DEPT_NAME >? DEPT Λ ? DEPT_ID


(<DEPT_ID> ? EMPLOYEE Λ EMP_NAME = Alex)}

Here green color expression is evaluated to get the department Id of Alex


and then it is used to get the department name form DEPT relation.

Let us consider another example where select EMP_ID, EMP_NAME and


ADDRESS the employees from the department where Alex works. What
will be done here?

{<EMP_ID, EMP_NAME, ADDRESS, DEPT_ID > | <EMP_ID,


EMP_NAME, ADDRESS, DEPT_ID> ? EMPLOYEE Λ ? DEPT_ID
(<DEPT_ID> ? EMPLOYEE Λ EMP_NAME = Alex)}

First, formula is evaluated to get the department ID of Alex (green color),


and then all the employees with that department is searched (red color).

Other concepts of TRC like free variable, bound variable, WFF etc
remains same in DRC too. Its only difference is DRC is based on
attributes of relation.

You might also like