0% found this document useful (0 votes)
449 views111 pages

BCOM Computers 3RD SEM RDBMS (RELATIONAL DATA BASE MANAGEMENT SYSEM) 2ND YEAR

This document provides an overview of Relational Database Management Systems (RDBMS), covering fundamental concepts such as data, information, and knowledge, as well as the advantages of DBMS over file-oriented systems. It discusses the architecture of DBMS, the roles of database administrators, and the characteristics and components of DBMS, including data access languages and user roles. Additionally, it highlights the objectives and advantages of database approaches, emphasizing data integrity, security, and efficient access.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
449 views111 pages

BCOM Computers 3RD SEM RDBMS (RELATIONAL DATA BASE MANAGEMENT SYSEM) 2ND YEAR

This document provides an overview of Relational Database Management Systems (RDBMS), covering fundamental concepts such as data, information, and knowledge, as well as the advantages of DBMS over file-oriented systems. It discusses the architecture of DBMS, the roles of database administrators, and the characteristics and components of DBMS, including data access languages and user roles. Additionally, it highlights the objectives and advantages of database approaches, emphasizing data integrity, security, and efficient access.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

lOMoARcPSD|44749509

BCOM Computers 3RD SEM RDBMS (RELATIONAL DATA


BASE MANAGEMENT SYSEM ) 2ND YEAR
B.Com Computer Application (Osmania University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


lOMoARcPSD|44749509

UNIT-I: BASIC CONCEPTS:


Database Management System - File based system - Advantages of DBMS over file based system –
Database Approach - Logical DBMS Architecture - Three level architecture of DBMS or logical DBM
architecture – Need for three level architecture - Physical DBMS Architecture - Database Administrator
(DBA) Functions & Role - Data files indices and Data Dictionary - Types of Database.
Relational and ER Models: Data Models - Relational Model – Domains - Tuple and Relation – Super
keys - Candidate keys - Primary keys and foreign key for the Relations - Relational Constraints - Domain
Constraint – Key Constraint - Integrity Constraint - Update Operations and Dealing with Constraint
Violations - Relational Operations - Entity Relationship (ER) Model – Entities – Attributes –
Relationships - More about Entities and Relationships - Defining Relationship for College Database - ER
Diagram - Conversion of E-R Diagram to Relational Database.

1
lOMoARcPSD|44749509

Introduction to Databases:
What is Data?
The raw facts are called as data. The word “raw” indicates that they have not been processed.
Ex: For example 89 is the data.
What is information?
The processed data is known as information.
Ex: Marks: 89; then it becomes information.
What is Knowledge?
1. Knowledge refers to the practical use of information.
2. Knowledge necessarily involves a personal experience.
DATA/INFORMATION PROCESSING:
The process of converting the data (raw facts) into meaningful information is called as
data/information processing.

Note: In business processing knowledge is more useful to make decisions for any organization.
DIFFERENCE BETWEEN DATA AND INFORMATION:
DATA INFORMATION
1. Raw facts 1. Processed data
2. It is in unorganized form 2. It is in organized form
3. Data doesn’t help Decision making 3. Information help in Decision making
process process

FILE ORIENTED APPROACH:


The earliest business computer systems were used to process business records and produce
information. They were generally faster and more accurate than equivalent manual systems. These
systems stored groups of records in separate files, and so they were called file processing systems.
➢ File system is a collection of data. Any management with the file system, user has to write the
procedures
➢ File system gives the details of the data representation and Storage of data.
➢ In File system storing and retrieving of data cannot be done efficiently.
2
lOMoARcPSD|44749509

➢ Concurrent access to the data in the file system has many problems like a Reading the file
while other deleting some information, updating some information
➢ File system doesn’t provide crash recovery mechanism.
Eg. While we are entering some data into the file if System crashes then content of the file
is lost.
➢ Protecting a file under file system is very difficult.
The typical file-oriented system is supported by a conventional operating system. Permanent
records are stored in various files and a number of different application programs are written to
extract records from and add records to the appropriate files.

DISADVANTAGES OF FILE-ORIENTED SYSTEM:


The following are the disadvantages of File-Oriented System:
Data Redundancy and Inconsistency:
Since files and application programs are created by different programmers over a long period
of time, the files are likely to be having different formats and the programs may be written in several
programming languages. Moreover, the same piece of information may be duplicated in several
places. This redundancy leads to higher storage and access cost. In addition, it may lead to data
inconsistency.
Difficulty in Accessing Data:
The conventional file processing environments do not allow needed data to be retrieved in a
convenient and efficient manner. Better data retrieval system must be developed for general use.
Data Isolation:
Since data is scattered(spread) in various files, and files may be in different formats, it is
difficult to write new application programs to retrieve the appropriate data.
Concurrent Access Anomalies:
In order to improve the overall performance of the system and obtain a faster response time,
many systems allow multiple users to update the data simultaneously. In such an environment,
interaction of concurrent updates may result in inconsistent data.
Security Problems:
Not every user of the database system should be able to access all the data. For example, in
banking system, payroll personnel need only that part of the database that has information about
various bank employees. They do not need access to information about customer accounts. It is
difficult to enforce such security constraints.

3
lOMoARcPSD|44749509

Integrity Problems:
The data values stored in the database must satisfy certain types of consistency constraints. For example,
the balance of a bank account may never fall below a prescribed amount. These constraints are enforced in
the system by adding appropriate code in the various application programs. When new constraints are
added, it is difficult to change the programs to enforce them. The problem is compounded when
constraints involve several data items for different files.
Atomicity Problem:
A computer system like any other mechanical or electrical device is subject to failure. In many
applications, it is crucial to ensure that once a failure has occurred and has been detected, the data are
restored to the consistent state existed prior to the failure

Database
A Database is a collection of related data organised in a way that data can be easily accessed,
managed and updated. Database can be software based or hardware based, with one sole purpose,
storing data.
During early computer days, data was collected and stored on tapes, which were mostly write-only,
which means once data is stored on it, it can never be read again. They were slow and bulky, and soon
computer scientists realised that they needed a better solution to this problem.

DBMS
A DBMS is software that allows creation, definition and manipulation of database, allowing users to
store, process and analyse data easily. DBMS provides us with an interface or a tool, to perform
various operations like creating database, storing data in it, updating data, creating tables in the
database and a lot more.
DBMS also provides protection and security to the databases. It also maintains data consistency in
case of multiple users.
Here are some examples of popular DBMS used these days:
• MySql
• Oracle
• SQL Server
• IBM DB2
• PostgreSQL
• Amazon SimpleDB (cloud based) etc.

Characteristics of Database Management System


A database management system has following characteristics:

4
lOMoARcPSD|44749509

1. Data stored into Tables: Data is never directly stored into the database. Data is stored into tables,
created inside the database. DBMS also allows to have relationships between tables which makes
the data more meaningful and connected. You can easily understand what type of data is stored
where by looking at all the tables created in a database.
2. Reduced Redundancy: In the modern world hard drives are very cheap, but earlier when hard
drives were too expensive, unnecessary repetition of data in database was a big problem. But
DBMS follows Normalisation which divides the data in such a way that repetition is minimum.
3. Data Consistency: On Live data, i.e. data that is being continuosly updated and added,
maintaining the consistency of data can become a challenge. But DBMS handles it all by itself.
4. Support Multiple user and Concurrent Access: DBMS allows multiple users to work on
it(update, insert, delete data) at the same time and still manages to maintain the data consistency.
5. Query Language: DBMS provides users with a simple Query language, using which data can be
easily fetched, inserted, deleted and updated in a database.
6. Security: The DBMS also takes care of the security of data, protecting the data from un-
authorised access. In a typical DBMS, we can create user accounts with different access
permissions, using which we can easily secure our data by restricting user access.
7. DBMS supports transactions, which allows us to better handle and manage data integrity in real
world applications where multi-threading is extensively used.

ADVANTAGES OF A DBMS OVER FILE SYSTEM:


Using a DBMS to manage data has many advantages:
Data Independence:
Application programs should be as independent as possible from details of data representation and
storage. The DBMS can provide an abstract view of the data to insulate application code from such
details.
Efficient Data Access:
A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. This
feature is especially important if the data is stored on external storage devices.
Data Integrity and Security:
If data is always accessed through the DBMS, the DBMS can enforce integrity constraints on
the data. For example, before inserting salary information for an employee, the DBMS can check that
the department budget is not exceeded. Also, the DBMS can enforce access controls that govern what
data is visible to different classes of users.

5
lOMoARcPSD|44749509

Concurrent Access and Crash Recovery:


A database system allows several users to access the database concurrently. Answering
different questions from different users with the same (base) data is a central aspect of an information
system. Such concurrent use of data increases the economy of a system.
An example for concurrent use is the travel database of a bigger travel agency. The employees
of different branches can access the database concurrently and book journeys for their clients. Each
travel agent sees on his interface if there are still seats available for a specific journey or if it is
already fully booked.
A DBMS also protects data from failures such as power failures and crashes etc. by the
recovery schemes such as backup mechanisms and log files etc.
Data Administration:
When several users share the data, centralizing the administration of data can offer significant
improvements. Experienced professionals, who understand the nature of the data being managed, and
how different groups of users use it, can be responsible for organizing the data representation to
minimize redundancy and fine-tuning the storage of the data to make retrieval efficient.
Reduced Application Development Time:
DBMS supports many important functions that are common to many applications accessing
data stored in the DBMS. This, in conjunction with the high-level interface to the data, facilitates
quick development of applications. Such applications are also likely to be more robust than
applications developed from scratch because many important tasks are handled by the DBMS instead
of being implemented by the application.

Disadvantages of DBMS
• It's Complexity
• Except MySQL, which is open source, licensed DBMSs are generally costly.
• They are large in size

DATABASE APPROACH
The objectives of database approaches includes,
1. Data sharability
2. Data availability
3. Data independency
4. Data integrity
5. Data security

6
lOMoARcPSD|44749509

1. Data sharability: the sharability objective ensures that the data item developed by one
application can be shared among all the applications. These objectives results in reducing the level
of unplanned redundancies which basically occur when same data is stored at multiple locations.
2. Data Availability: This objective ensures that the requested data is available to the user in a
meaningful format which results in decreasing the access time.

3. Data independency: this objectives ensures that the database programs are stored in such a
way that they are independent of their storage details. The conceptual schema provides physical
storage details and external schema provide logical storage details i.e., the conceptual schema
provide independence from external schema.

4. Data integrity: This objectives ensures that the data values enters in the database fall within a
specified range and are of correct format. Data integrity can be achived by enabling DBA to have
full control of database and the operations performed on it.

5. Data Security: Data is a dynamic important of an organization and must be confidential. Such
confidential data must be properly secured such that it is not accessed by unauthorized persons.
This can be achieved by employing data security.

Components of a DBMS
A database management system (DBMS) consists of several components. Each component plays very
important role in the database management system environment. The major components of database
management system are:
• Software
• Hardware
• Data
• Procedures
• Database Access Language
Software
The main component of a DBMS is the software. It is the set of programs used to handle the database
and to control and manage the overall computerized database
1. DBMS software itself, is the most important software component in the overall system
2. Operating system including network software being used in network, to share the data of
database among multiple users.
3. Application programs developed in programming languages such as C++, Visual Basic that
are used to to access database in database management system. Each program contains
statements that request the DBMS to perform operation on database. The operations may

7
lOMoARcPSD|44749509

include retrieving, updating, deleting data etc . The application program may be conventional
or online workstations or terminals.
Hardware
Hardware consists of a set of physical electronic devices such as computers (together with associated
I/O devices like disk drives), storage devices, I/O channels, electromechanical devices that make
interface between computers and the real world systems etc, and so on. It is impossible to implement
the DBMS without the hardware devices, In a network, a powerful computer with high data
processing speed and a storage device with large storage capacity is required as database server.

Data
Data is the most important component of the DBMS. The main purpose of DBMS is to process the
data. In DBMS, databases are defined, constructed and then data is stored, updated and retrieved to
and from the databases. The database contains both the actual (or operational) data and the metadata
(data about data or description about data).
Procedures
Procedures refer to the instructions and rules that help to design the database and to use the DBMS.
The users that operate and manage the DBMS require documented procedures on hot use or run the
database management system. These may include.
1. Procedure to install the new DBMS.
2. To log on to the DBMS.
3. To use the DBMS or application program.
4. To make backup copies of database.
5. To change the structure of database.
6. To generate the reports of data retrieved from database.

8
lOMoARcPSD|44749509

Database Access Language


The database access language is used to access the data to and from the database. The users use the
database access language to enter new data, change the existing data in database and to retrieve
required data from databases. The user write a set of appropriate commands in a database access
language and submits these to the DBMS. The DBMS translates the user commands and sends it to a
specific part of the DBMS called the Database Jet Engine. The database engine generates a set of
results according to the commands submitted by user, converts these into a user readable form called
an Inquiry Report and then displays them on the screen. The administrators may also use the database
access language to create and maintain the databases.
The most popular database access language is SQL (Structured Query Language). Relational
databases are required to have a database query language.
Users
The users are the people who manage the databases and perform different operations on the databases
in the database system.There are three kinds of people who play different roles in database system
1. Application Programmers
2. Database Administrators
3. End-Users
Application Programmers
The people who write application programs in programming languages (such as Visual Basic, Java, or
C++) to interact with databases are called Application Programmer.
Database Administrators
A person who is responsible for managing the overall database management system is called database
administrator or simply DBA.
End-Users
The end-users are the people who interact with database management system to perform different
operations on database such as retrieving, updating, inserting, deleting data etc.

Three level architecture of DBMS or logical DBM


architecture
1. Physical Level
2. Conceptual Level
3. External Level

9
lOMoARcPSD|44749509

In the above diagram,


• It shows the architecture of DBMS.
• Mapping is the process of transforming request response between various database levels of
architecture.
• Mapping is not good for small database, because it takes more time.
• In External / Conceptual mapping, DBMS transforms a request on an external schema against
the conceptual schema.
• In Conceptual / Internal mapping, it is necessary to transform the request from the conceptual
to internal levels.
1. Physical Level
• Physical level describes the physical storage structure of data in database.
• It is also known as Internal Level.
• This level is very close to physical storage of data.
• At lowest level, it is stored in the form of bits with the physical addresses on the secondary
storage device.
• At highest level, it can be viewed in the form of files.
• The internal schema defines the various stored data types. It uses a physical data model.
2. Conceptual Level
• Conceptual level describes the structure of the whole database for a group of users.
10
lOMoARcPSD|44749509

• It is also called as the data model.


• Conceptual schema is a representation of the entire content of the database.
• These schema contains all the information to build relevant external records.
• It hides the internal details of physical storage.
3. External Level
• External level is related to the data which is viewed by individual end users.
• This level includes a no. of user views or external schemas.
• This level is closest to the user.
• External view describes the segment of the database that is required for a particular user group
and hides the rest of the database from that user group.
PHYSICAL STRUCTURE of DBMS or OVERALL STRUCTURE of DBMS
Components of DBMS are broadly classified as follows :
1. Query Processor :
(a) DML Compiler
(b) Embedded DML pre-compiler
(c) DDL Interpreter
(d) Query Evaluation Engine
2. Storage Manager :
(a) Authorization and Integrity Manager
(b) Transaction Manager
(c) File Manager
(d) Buffer Manager
3. Data Structure :
(a) Data Files
(b) Data Dictionary
(c) Indices
(d) Statistical Data
1. Query Processor Components :
• DML Pre-compiler : It translates DML statements in a query language into low level instructions
that query evaluation engine understands. It also attempts to transform user's request into an
equivalent but more efficient form.
• Embedded DML Pre-compiler : It converts DML statements embedded in an application program
to normal procedure calls in the host language. The Pre-compiler must interact with the DML
compiler to generate the appropriate code.
• DDL Interpreter : It interprets the DDL statements and records them in a set of tables containing

11
lOMoARcPSD|44749509

meta data or data dictionary.


• Query Evaluation Engine : It executes low-level instructions generated by the DML compiler.
2. Storage Manager Components :
They provide the interface between the low-level data stored in the database and application programs
and queries submitted to the system.
• Authorization and Integrity Manager : It tests for the satisfaction of integrity constraints checks
the authority of users to access data.
• Transaction Manager : It ensures that the database remains in a consistent state despite the system
failures and that concurrent transaction execution proceeds without conflicting.
• File Manager : It manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
• Buffer Manager : It is responsible for fetching data from disk storage into main memory and
deciding what data to cache in memory.
3. Data Structures :
Following data structures are required as a part of the physical system implementation.
• Data Files : It stores the database.
• Data Dictionary : It stores meta data (data about data) about the structure of the database.
• Indices : Provide fast access to data items that hold particular values.
• Statistical Data : It stores statistical information about the data in the database. This information is
used by query processor to select efficient ways to execute query.

12
lOMoARcPSD|44749509

Database Administrator (DBA) Functions & Role


Installing and Configuration of database:
DBA is responsible for installing the database software. He configure the software of database and
then upgrades it if needed. There are many database software like oracle, Microsoft SQL and MySQL
in the industry so DBA decides how the installing and configuring of these database software will
take place.

Deciding the hardware device


Depending upon the cost, performance and efficiency of the hardware, it is DBA who have the duty of
deciding which hardware devise will suit the company requirement. It is hardware that is an interface
between end users and database so it needed to be of best quality.

Managing Data Integrity


Data integrity should be managed accurately because it protects the data from unauthorized use. DBA
manages relationship between the data to maintain data consistency.

Decides Data Recovery and Back up method


If any company is having a big database, then it is likely to happen that database may fail at any
instance. It is require that a DBA takes backup of entire database in regular time span. DBA has to
decide that how much data should be backed up and how frequently the back should be taken. Also
the recovery of data base is done by DBA if they have lost the database.

Tuning Database Performance


Database performance plays an important role for any business. If user is not able to fetch data
speedily then it may loss company business. So by tuning an modifying sql commands a DBA can
improves the performance of database.

Capacity Issues
All the databases have their limits of storing data in it and the physical memory also has some
limitations. DBA has to decide the limit and capacity of database and all the issues related to it.

Database design
The logical design of the database is designed by the DBA. Also a DBA is responsible for physical
design, external model design, and integrity control.

13
lOMoARcPSD|44749509

Database accessibility
DBA writes subschema to decide the accessibility of database. He decides the users of the database
and also which data is to be used by which user. No user has to power to access the entire database
without the permission of DBA.

Decides validation checks on data


DBA has to decide which data should be used and what kind of data is accurate for the company. So
he always puts validation checks on data to make it more accurate and consistence.

Monitoring performance
If database is working properly then it doesn’t mean that there is no task for the DBA. Yes f course,
he has to monitor the performance of the database. A DBA monitors the CPU and memory usage.

Decides content of the database


A database system has many kind of content information in it. DBA decides fields, types of fields,
and range of values of the content in the database system. One can say that DBA decides the structure
of database files.

Provides help and support to user


If any user needs help at any time then it is the duty of DBA to help him. Complete support is given to
the users who are new to database by the DBA.

Database implementation
Database has to be implemented before anyone can start using it. So DBA implements the database
system. DBA has to supervise the database loading at the time of its implementation.

Improve query processing performance


Queries made by the users should be performed speedily. As we have discussed that users need fast
retrieval of answers so DBA improves query processing by improving their performance.
Types of Database systems:
The Evolution of Database systems are as follows:
1. File Management System
2. Hierarchical database System
3. Network Database System
4. Relational Database System

14
lOMoARcPSD|44749509

File Management System:


The file management system also called as FMS in short is one in which all data is stored on a
single large file. The main disadvantage in this system is searching a record or data takes a long time.
This lead to the introduction of the concept, of indexing in this system. Then also the FMS system
had lot of drawbacks to name a few like updating or modifications to the data cannot be handled
easily, sorting the records took long time and so on. All these drawbacks led to the introduction of the
Hierarchical Database System.
Hierarchical Database System:
The previous system FMS drawback of accessing records and sorting records which took a long time
was removed in this by the introduction of parent-child relationship between records in database. The
origin of the data is called the root from which several branches have data at different levels and the
last level is called the leaf. The main drawback in this was if there is any modification or addition made
to the structure then the whole structure needed alteration which made the task a dull one. In order to
avoid this next system took its origin which is called as the Network Database System.

Network Database System:


In this the main concept of many-many relationships got introduced. But this also followed
the same technology of pointers to define relationships with a difference in this made in the
introduction if grouping of data items as sets.

15
lOMoARcPSD|44749509

Relational Database System:


In order to overcome all the drawbacks of the previous systems, the Relational Database
System got introduced in which data get organized as tables and each record forms a row with many
fields or attributes in it. Relationships between tables are also formed in this system.

Data Model
Data models show that how the data is connected and stored in the system. It shows the relationship
between data. A Model is basically a conceptualization between attributes and entities. There were
basically three main data models in DBMS that were Network, hierarchical, and relational. But
these days, there a lots of data models that are given below.
There are different types of the data models and now let see each of them in detail:
1. Flat data model
2. Entity relationship model
3. Relation model
4. Record base model
5. Network model
6. Hierarchical model
7. Object oriented data model
8. Context data model

Flat Data Model

Flat data model is the first and foremost introduced model and in this all the data used is kept in the
same plane. Since it was used earlier this model was not so scientific.

Flat Data Model

16
lOMoARcPSD|44749509

Entity Relationship Data Model

Entity relationship model is based on the notion of the real world entities and their relationships.
While formulating the real world scenario in to the database model an entity set is created and this
model is dependent on two vital things and they are :

• Entity and their attributes


• Relationships among entities

Entity Relationship Model

An entity has a real world property called attribute and attribute define by a set of values called
domain. For example, in a university a student is an entity, university is the database, name and age
and sex are the attributes. The relationships among entities define the logical association between
entities.

Relational Data Model

Relational model is the most popular model and the most extensively used model. In this model the
data can be stored in the tables and this storing is called as relation, the relations can be normalized
and the normalized relation values are called atomic values. Each row in a relation contains unique
value and it is called as tuple, each column contains value from same domain and it is called as
attribute.

17
lOMoARcPSD|44749509

Network Data Model

Network model has the entities which are organized in a graphical representation and some entities in
the graph can be accessed through several paths.

Network Model

Hierarchical Data Model

Hierarchical model has one parent entity with several children entity but at the top we should have
only one entity called root. For example, department is the parent entity called root and it has several
children entities like students, professors and many more.

Hierarchical model

Object oriented Data Model

Object oriented data model is one of the developed data model and this can hold the audio, video and
graphic files. These consist of data piece and the methods which are the DBMS instructions.

Object Oriented Data Model


18
lOMoARcPSD|44749509

Context Data Model

Context data model is a flexible model because it is a collection of many data models. It is a
collection of the data models like object oriented data model, network model, semi structured model.
So, in this different types of works can be done due to the versatility of it.

Context Model

Therefore, this support different types of users and differ by the interaction of users in database and
also the data models in DBMS brought a revolutionary change in industries by the handling of
relevant data. The data models in DBMS are the systems that help to use and create databases, as we
have seen there are different types of data models and depending on the kind of structure needed we
can select the data model in DBMS.

Relational Model
Relational Model was proposed by E.F. Codd to model data in the form of relations or tables.
After designing the conceptual model of Database using ER diagram, we need to convert the
conceptual model in the relational model which can be implemented using any RDMBS languages
like Oracle SQL, MySQL etc. So we will see what Relational Model is.
What is Relational Model?
Relational Model represents how data is stored in Relational Databases. A relational database stores
data in the form of relations (tables). Consider a relation STUDENT with attributes ROLL_NO,
NAME, ADDRESS, PHONE and AGE shown in Table 1.
STUDENT

ROLL_NO NAME ADDRESS PHONE AGE

1 RAM DELHI 9455123451 18

2 RAMESH GURGAON 9652431543 18

3 SUJIT ROHTAK 9156253131 20

4 SURESH DELHI 18

IMPORTANT TERMINOLOGIES

19
lOMoARcPSD|44749509

1. Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME
2. Relation Schema: A relation schema represents name of the relation with its attributes. e.g.;
STUDENT (ROLL_NO, NAME, ADDRESS, PHONE and AGE) is relation schema for
STUDENT. If a schema has more than 1 relation, it is called Relational Schema.
3. Tuple: Each row in the relation is known as tuple. The above relation contains 4 tuples, one of
which is shown as:

1 RAM DELHI 9455123451 18

4. Relation Instance: The set of tuples of a relation at a particular instance of time is called as
relation instance. Table 1 shows the relation instance of STUDENT at a particular time. It can
change whenever there is insertion, deletion or updation in the database.
5. Degree: The number of attributes in the relation is known as degree of the relation. The
STUDENT relation defined above has degree 5.
6. Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT
relation defined above has cardinality 4.
7. Column: Column represents the set of values for a particular attribute. The column
ROLL_NO is extracted from relation STUDENT.

ROLL_NO

8. NULL Values: The value which is not known or unavailable is called NULL value. It is
represented by blank space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.

Keys
An important constraint on an entity is the key. The key is an attribute or a group of attributes
whose values can be used to uniquely identify an individual entity in an entity set.

Types of Keys
There are several types of keys. These are described below.
Candidate key
A candidate key is a simple or composite key that is unique and minimal. It is unique because
no two rows in a table may have the same value at any time. It is minimal because every column is
necessary in order to attain uniqueness.

20
lOMoARcPSD|44749509

From our COMPANY database example, if the entity is Employee(EID, First Name, Last Name,
SIN, Address, Phone, BirthDate, Salary, DepartmentID), possible candidate keys are:

• EID, SIN
• First Name and Last Name – assuming there is no one else in the company with the same
name
• Last Name and DepartmentID – assuming two people with the same last name don’t work in
the same department
Composite key
A composite key is composed of two or more attributes, but it must be minimal. Using the
example from the candidate key section, possible composite keys are:
• First Name and Last Name – assuming there is no one else in the company with the same
name
• Last Name and Department ID – assuming two people with the same last name don’t work in
the same department
Primary key
The primary key is a candidate key that is selected by the database designer to be used as an
identifying mechanism for the whole entity set. It must uniquely identify tuples in a table and not be
null. The primary key is indicated in the ER model by underlining the attribute.
• A candidate key is selected by the designer to uniquely identify tuples in a table. It must not be
null.
• A key is chosen by the database designer to be used as an identifying mechanism for the
whole entity set. This is referred to as the primary key. This key is indicated by underlining
the attribute in the ER model.
In the following example, EID is the primary key:
Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary, DepartmentID)
Secondary key
A secondary key is an attribute used strictly for retrieval purposes (can be composite), for
example: Phone and Last Name.
Alternate key
Alternate keys are all candidate keys not chosen as the primary key.
Foreign key
A foreign key (FK) is an attribute in a table that references the primary key in another table
OR it can be null. Both foreign and primary keys must be of the same data type. In the COMPANY
database example below, DepartmentID is the foreign key:
21
lOMoARcPSD|44749509

Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary, DepartmentID)
Nulls
A null is a special symbol, independent of data type, which means either unknown or inapplicable.
It does not mean zero or blank. Features of null include:
• No data entry
• Not permitted in the primary key
• Should be avoided in other attributes
• Can represent
o An unknown attribute value
o A known, but missing, attribute value
o A “not applicable” condition
• Can create problems when functions such as COUNT, AVERAGE and SUM are used
• Can create logical problems when relational tables are linked
NOTE: The result of a comparison operation is null when either argument is null. The result of an
arithmetic operation is null when either argument is null (except functions that ignore nulls).

Constraints in Relational Model


While designing Relational Model, we define some conditions which must hold for data present in
database are called Constraints. These constraints are checked before performing any operation
(insertion, deletion and updation) in database. If there is a violation in any of constrains, operation
will fail.
Domain Constraints: These are attribute level constraints. An attribute can only take values which
lie inside the domain range. e.g,; If a constrains AGE>0 is applied on STUDENT relation, inserting
negative value of AGE will result in failure.
Key Integrity: Every relation in the database should have atleast one set of attributes which defines a
tuple uniquely. Those set of attributes is called key. e.g.; ROLL_NO in STUDENT is a key. No two
students can have same roll number. So a key has two properties:
• It should be unique for all tuples.
• It can’t have NULL values.
Referential Integrity: When one attribute of a relation can only take values from other attribute of
same relation or any other relation, it is called referential integrity. Let us suppose we have 2 relations
STUDENT
ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CODE

1 RAM DELHI 9455123451 18 CS

2 RAMESH GURGAON 9652431543 18 CS

22
lOMoARcPSD|44749509

3 SUJIT ROHTAK 9156253131 20 ECE

4 SURESH DELHI 18 IT

BRANCH

BRANCH_CODE BRANCH_NAME

CS COMPUTER SCIENCE

IT INFORMATION TECHNOLOGY

ELECTRONICS AND
ECE
COMMUNICATION ENGINEERING

CV CIVIL ENGINEERING
BRANCH_CODE of STUDENT can only take the values which are present in BRANCH_CODE of
BRANCH which is called referential integrity constraint. The relation which is referencing to other
relation is called REFERENCING RELATION (STUDENT in this case) and the relation to which
other relations refer is called REFERENCED RELATION (BRANCH in this case).

Integrity Constraints
Constraints enforce limits to the data or type of data that can be inserted/updated/deleted
from a table. The whole purpose of constraints is to maintain the data integrity during an
update/delete/insert into a table. In this tutorial we will learn several types of constraints that can be
created in DBMS.

Types of constraints

• NOT NULL
• UNIQUE
• DEFAULT
• CHECK
• Key Constraints – PRIMARY KEY, FOREIGN KEY
• Domain constraints
• Mapping constraints

NOT NULL:

NOT NULL constraint makes sure that a column does not hold NULL value. When we don’t provide
value for a particular column while inserting a record into a table, it takes NULL value by default. By
specifying NULL constraint, we can be sure that a particular column(s) cannot have NULL values.
Example:
23
lOMoARcPSD|44749509

CREATE TABLE STUDENT(


ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (235),
PRIMARY KEY (ROLL_NO)
);

UNIQUE:
UNIQUE Constraint enforces a column or set of columns to have unique values. If a column has a
unique constraint, it means that particular column cannot have duplicate values in a table
Example:
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);
DEFAULT:
The DEFAULT constraint provides a default value to a column when there is no value provided while
inserting a record into a table.
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)
);
CHECK:
This constraint is used for specifying range of values for a particular column of a table. When this
constraint is being set on a column, it ensures that the specified column must have the value falling in
the specified range.
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL CHECK(ROLL_NO >1000) ,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)

24
lOMoARcPSD|44749509

);

In the above example we have set the check constraint on ROLL_NO column of STUDENT table.
Now, the ROLL_NO field must have the value greater than 1000.

Key constraints:
PRIMARY KEY:
Primary key uniquely identifies each record in a table. It must have unique values and cannot contain
nulls. In the below example the ROLL_NO field is marked as primary key, that means the ROLL_NO
field cannot have duplicate and null values.
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);
FOREIGN KEY:

Foreign keys are the columns of a table that points to the primary key of another table. They act as a
cross-reference between tables.
Domain constraints:

Each table has certain set of columns and each column allows a same type of data, based on its data
type. The column does not accept values of any other data type.

Update Operations Dealing with Constraint Violations:


The operations of the relational model can be categorized into retrievals and updates.

25
lOMoARcPSD|44749509

Relational Algebra or Relational Operations

The relational algebra is a theoretical procedural query language which takes instance of
relations and does operations that work on one or more relations to describe another relation without
altering the original relation(s). Thus, both the operands and the outputs are relations, and so the
output from one operation can turn into the input to another operation which allows expressions to be
nested in the relational algebra, just as you nest arithmetic operations. This property is called closure:
relations are closed under the algebra, just as numbers are closed under arithmetic operations.
The relational algebra is a relation-at-a-time (or set) language where all tuples are controlled in
one statement without the use of loop. There are several variations of syntax for relational algebra
commands and you use a common symbolic notation for the commands and present it informally.

26

Downloaded by Rajesh Kore (rajeshkore16@gmail.com)


lOMoARcPSD|44749509

The primary operations of relational algebra are as follows:


• Select
• Project
• Union
• Set different
• Cartesian product
• Rename

➢ Projection (π)
Projection is used to project required column data from a relation.
Example :
R
(A B C)
----------
1 2 4
2 2 3
3 2 3
4 3 4

π (BC)
B C
-----
2 4
2 3
3 4

Note: By Default projection removes duplicate data.


➢ Selection (σ)
Selection is used to select required tuples of the relations.
for the above relation
σ (c>3)R
will select the tuples which have c more than 3.
Note: selection operator only selects the required tuples but does not display them. For displaying,
data projection operator is used.
For the above selected tuples, to display we need to use projection also.
π (σ (c>3)R ) will show following tuples.

27
lOMoARcPSD|44749509

A B C
-------
1 2 4
4 3 4

➢ Union Operation (∪)

For R ∪ S, The union of two relations R and S defines a relation that contains all the tuples of R, or S,
or both R and S, duplicate tuples being eliminated. R and S must be union-compatible.

For a union operation to be applied, the following rules must hold −

• r, and s must have the same quantity of attributes.


• Attribute domains must be compatible.
• Duplicate tuples gets automatically eliminated.

➢ Set difference (−)

For R − S The Set difference operation defines a relation consisting of the tuples that are in relation R,
but not in S. R and S must be union-compatible.
Example:

➢ Rename(ρ)
This is a unary operator which changes attribute names for a relation without changing any
values. Renaming removes the limitations associated with set operators

28
lOMoARcPSD|44749509

Notation: ρ Old Name → New Name ( r )

Where r is the table name


For example, ρ Father → Parent (Paternity)

➢ Cartesian or Cross product ( ×)

Cross product between two relations let say A and B, so cross product between A X B will results all
the attributes of A followed by each attribute of B. Each record of A will pairs with every record of B.
below is the example
A B
(Name Age Sex ) (Id Course)
------------------ -------------
Ram 14 M 1 DS
Sona 15 F 2 DBMS
kim 20 M

AXB
Name Age Sex Id Course
---------------------------------
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS

Note: if A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘n*m’ tuples.

➢ Division Operation:

29
lOMoARcPSD|44749509

STUDENT_SPORTS

ROLL_NO SPORTS

1 Badminton

2 Cricket

2 Badminton

4 Badminton

ALL_SPORTS

SPORTS

Badminton

Cricket

Division operator A÷B can be applied if and only if:


• Attributes of B is proper subset of Attributes of A.
• The relation returned by division operator will have attributes = (All attributes of A – All
Attributes of B)
• The relation returned by division operator will return those tuples from relation A which are
associated to every B’s tuple.
Consider the relation STUDENT_SPORTS and ALL_SPORTS given in Table 1 and Table 2 above.
To apply division operator as
STUDENT_SPORTS ÷ ALL_SPORTS

• The operation is valid as attributes in ALL_SPORTS is a proper subset of attributes in


STUDENT_SPORTS.
• The attributes in resulting relation will have attributes {ROLL_NO,SPORTS}-
{SPORTS}=ROLL_NO
• The tuples in resulting relation will have those ROLL_NO which are associated with all B’s
tuple {Badminton, Cricket}. ROLL_NO 1 and 4 are associated to Badminton only. ROLL_NO
2 is associated to all tuples of B. So the resulting relation will be:

ROLL_NO

❖ The Natural Join Operation ( )


➢ The natural join operation simplifies such type of queries. It combines following three operations
into one operation. The natural join operation -
▪ Forms a Cartesian product on its argument relations,
30
lOMoARcPSD|44749509

▪ Performs a selection for equality check on common attributes to remove


unnecessary tuples, and
▪ Removes duplicate attributes.
➢ The natural join is denoted by symbol (JOIN).
➢ The notation to perform this operation can be given as
▪ Relation 1 Relation 2
Example : Combine only consistent information from Account and Branch relation.

As shown in figure 3.11, natural join operation yield only consistent and useful information. It removes
unnecessary tuples as well as duplicate attributes. This makes the retrieval of information from multiple
relations very easy and convenient.
❖ Outer Join Operation :
➢ An extension of the join operation that avoids loss of information.
➢ Computes the join and then adds tuples form one relation that does not match tuples in the other
relation to the result of the join.
➢ Uses null values:
▪ null signifies that the value is unknown or does not exist
▪ All comparisons involving null are (roughly speaking) false by definition.
• We shall study precise meaning of comparisons with nulls later
➢ Table name: Client
NAME ID
Rahul 10
Vishal 20

➢ Table name: Salesman

ID CITY
30 Bombay
20 Madras

31
lOMoARcPSD|44749509

40 Bombay
➢ Join Client Salesman
NAME ID CITY
Vishal 20 madras
➢ The outer join operation can be divided into three different forms :

▪ Left outer join ( )


▪ Right outer join ( )
▪ Full outer join ( )

▪ Left outer join ( )


• The left outer join retains all the tuples of the left relation even though there is no
matching tuple in the right relation.
For such kind of tuple, the attributes of right relation will be padded with null in resultant relation

• Left outer join client salesman


NAME ID CITY
Rahul 10 Null
Vishal 20 madras
▪ Right outer join ( )
• The right outer join retains all the tuples of the right relation even though there is no
matching tuple in the left relation.
• For such kind of tuple, the attributes of left relation will be padded with null in resultant
relation.
• Right outer join client salesman
NAME ID CITY
Null 30 Bomba
y
Vishal 20 madras
Null 40 Bomba
y
▪ Full outer join ( )
• The full outer join retains all the tuples of both of the relations. It also pads null
values whenever required.
• Full outer join client salesman
NAME ID CITY
Rahul 10 Null
Null 30 Bomba
y

32
lOMoARcPSD|44749509

Vishal 20 madras
Null 40 Bomba
y

Entity – Relationship Modeling


The Entity – Relationship Model: It is a detailed logical representation of the datafor on
organization or for a business area.
➢ The E – R Model is expressed in terms of entities, relationship, and attributes in the business
environment.
➢ An E – R Model expressed as an entity – relationship diagrams ( E – R Diagrams). Which is a
graphical representation of an E – R Model?
E – R Model Notation:-

Strong Entity Relationship

Identifying
Week Entity
Relationship

Associative
Attribute
Entity

Multivalued Derived
Attribute Attribute

The E – R Model Construct


The E – R Model Relationship Model construct the following blocks, there are listed below
A. Entities
B. Attributes
C. Relationships

A. Entities: An entities is a person ,place, object, event or concept in the user environment
about which the organization wishes to maintain data.

Example:
33
lOMoARcPSD|44749509

Person: Employee, Student, Patient.


Place: City, Sate, Country.
Object: Machine, Building, Automobile.
Event: Sale, Registration, Renewal.
Concept: Account, Course.

Entity Types & Entity Instance:


An entity type is a collection of entities that share common properties or characteristics. We use
capital letters for names of entity types.
➢ An entity instance is a single occurrence of an entity type.
➢ An entity type is described just once (using metadata in a database) while ,any instances of
that entity type may be represented by data stored in the database.
Example:
There is one EMPLOYEE entity type in most organizations but there may be hundreds of instances of
this entity type stored in the database.
Strong Entity & Weak Entity Type
A strong Entity type is that exist independently of other entity types.
Example: STUDENT, EMPLOYEE, AUTOMOBILE & COURSE.
✓ Instances of a Strong Entities type always have a unique characteristic (identifiers).
✓ A weak entity type is an entity type whose existing depends on some other entity type.
✓ The entity type on which the weak entity type depends is called the “Identifying Owners”.
✓ A weak entity type does not have its own identifiers.
Example: EMPLOYEE is a strong entity type with identifiers employee_id. DEPENDENT is a weak
entity type as indicated by the double linked rectangle.
✓ The relationship between a weak entity type as indicated by double linked rectangle.
✓ The relationship between a weak entity type and its owner is called an identifying relationship.
✓ “HAS” is the identifying relationship (indicated by the double linked diamond symbol)
✓ The alternative dependent name as a partial identifiers dependent name is a composite attribute
that can be broke down into component parts.

HAS
EMPLOYEE DEPENDENT

34
lOMoARcPSD|44749509

EMPLOYEE OF WEAK ENTITY


Characteristic entities
Characteristic entities provide more information about another table. These entities have the
following characteristics:
• They represent multivalued attributes.
• They describe other entities.
• They typically have a one to many relationship.
• The foreign key is used to further identify the characterized table.
• Options for primary key are as follows:
1. Use a composite of foreign key plus a qualifying column
2. Create a new simple primary key. In the COMPANY database, these might include:
▪ Employee (EID, Name, Address, Age, Salary) – EID is the simple primary key.
EmployeePhone (EID, Phone) – EID is part of a composite primary key. Here, EID is also a foreign
key.

Attributes
An attribute is a descriptive property or characteristics of an entity. The attributes of the entity Customer are
CustNo, Name, Street, City, PostCode, TelNo and Balance.

Types of Attributes
There are a few types of attributes you need to be familiar with. Some of these are to be left as
is, but some need to be adjusted to facilitate representation in the relational model. This first section
will discuss the types of attributes. Later on we will discuss fixing the attributes to fit correctly into
the relational model.
Simple attributes
Simple attributes are an attributes that cannot be broke down into smaller components.
Ex: AUTOMOBILE are simple: Vehicle_id, colors, Weight.
Composite attributes
Composite attributes are an attribute that can be broke down into different components.
35
lOMoARcPSD|44749509

Ex: ADDRESS its component such as Street, Number, SubStreet, State, Postcode.

Figure. An example of composite attributes.


Multivalued attributes
Multivalued attributes are attributes that may take more than one value for a given entity
instant.
✓ We indicate a multivalued attribute with an ellipse with double lines.
Ex: An example of a multivalued attribute from the COMPANY database, as seen in Figure is the
degrees of an employee: BSc, MIT, PhD.

Figure. Example of a multivalued attribute.


Derived attributes
Derived attributes are attributes that contain values calculated from other attributes. An
example of this can be seen in Figure 8.5. Age can be derived from the attribute Birthdate. In this
situation, Birthdate is called a stored attribute, which is physically saved to the database.

Figure. Example of a derived attribute.

RELATIONSHIPS
36
lOMoARcPSD|44749509

A Relationship is an associated among the instances of one or more entity types i.e. Interest of the
organization.

Relationship Cardinality or Relationship Types


It is a meaningful association between entity types. A Relationship is depend by a diamond symbol
containing the name of the relationship.
Ex:
Course_Title
EMP_Nam Course_id
e

Other
Emp_id
Attribute

EMPLOYEE COURSE
Complet
e

There are three main types of relationship that can exist between entities:
i. one-to-one relationship
ii. one-to-many relationship
iii. many-to-many relationship
i. one-to-one relationship: A one to one (1:1) relationship is the relationship of one entity to
only one other entity, and vice versa. It should be rare in any relational database design. In
fact, it could indicate that two entities actually belong in the same table.

Explanation:
An Order generates only one invoice and an Invoice is generated by an order.
ii. one-to-many relationship: A one to many (1:M) relationship should be the norm in any
relational database design and is found in all relational database environments. For example,
one customer makes many orders.

37
lOMoARcPSD|44749509

Explanation:
Each Customer can make one or more orders and an Order is from one customer.
many-to-many relationship : For a many to many relationship, consider the following points:
• It cannot be implemented as such in the relational model.
• It can be changed into two 1:M relationships.
• It can be implemented by breaking up to produce a set of 1:M relationships.
• It involves the implementation of a composite entity.
• Creates two or more 1:M relationships.
• The composite entity table must contain at least the primary keys of the original tables.
• The linking table contains multiple occurrences of the foreign key values.
• Additional attributes may be assigned as needed.

Explanation:
An Order has one or more product and a Product can be in one or more orders.

Degree of Relationship Type


The number of participating entity types in a relationship is known as the degree of a relationship type.
There are 3 types degree of relationships is listed below.
a. Unary (Degree – 1)
b. Binary(Degree – 2)
c. Ternary (Degree – 3)
a. Unary (Degree – 1): It is a Relationship between the instances of a single entity type. Unary
relationships are also called as Recursive Relationship.

38
lOMoARcPSD|44749509

b. Binary (Degree – 2): It is a Relationship between the attribute of two entity types and is the
most common types of relationship in data modeling. This relationship has three types.
i. One to one:

EMPLOY Is – assigned to PRKING PLACE

EE
➢ It indicate that an employee is assigned one Parking place, & each parking placeis assigned to
one Employee.
ii. One to Many:

PRODUCT Is – assigned PRODUCT


LINE

➢ It indicate that a Product line may contain several Product and each Product belong to only
one Product Line.
iii. Many to Many:

STUDEN Register for COURSE


T
➢ It indicates that a Student may register more than one Course and that Each Course may have
many Students Registrations.
c. Ternary (Degree – 3): A Ternary Relationship is simultaneous among the instances of three
entity types.

39
lOMoARcPSD|44749509

Developing an E – R diagram:
Entity Relationship diagrams are major data modeling tool and will help organize the data in our
project into two entities and define the relationships between the entities.
Components of ERD: There are four Components, they are
i. Entity
ii. Relationship
iii. Cardinality
iv. Attribute
i. Entity: A data Entity is anything real or abstract about which we want to store data.
Ex: EMPLOYEE: Employee_id, Employee_name, Address
PAYMENT: Payment_id, Payment_Type.
BOOKS: Book_id, Book_Type.
ii. Relationship: A Relationship is a natural association that exists between one or more
entities.
Ex: Employee process Payment.
iii. Cardinality: Define the number of occurrence of one entity for a single occurrence of the
related entity.
Ex: An Employee may process many Payments but might not process any Payments,
depending on the nature of his / her job.
iv. Attribute: A data Attribute is a characteristics common to all or most instances of a
particular entity.
Ex: Name, Employee_No are all attributes of the entity “EMPLOYEE”

40

Downloaded by Rajesh Kore (rajeshkore16@gmail.com)


lOMoARcPSD|44749509

A Simple Example for E – R Diagram

Here we are going to design an Entity Relationship (ER) model for a college database .
Say we have the following statements.
1. A college contains many departments
2. Each department can offer any number of courses
3. Many instructors can work in a department
4. An instructor can work only in one department
5. For each department there is a Head
6. An instructor can be head of only one department
7. Each instructor can take any number of courses
8. A course can be taken by only one instructor
9. A student can enroll for any number of courses
10. Each course can have any number of students
Good to go. Let's start our design.(Remember our previous topic and the notations we have
used for entities, attributes, relations etc )
Step 1 : Identify the Entities
What are the entities here?
From the statements given, the entities are
1. Department
2. Course
3. Instructor

41
lOMoARcPSD|44749509

4. Student
Stem 2 : Identify the relationships
1. One department offers many courses. But one particular course can be offered by only
one department. hence the cardinality between department and course is One to Many
(1:N)
2. One department has multiple instructors . But instructor belongs to only one
department. Hence the cardinality between department and instructor is One to Many
(1:N)
3. One department has only one head and one head can be the head of only one
department. Hence the cardinality is one to one. (1:1)
4. One course can be enrolled by many students and one student can enroll for many
courses. Hence the cardinality between course and student is Many to Many (M:N)
5. One course is taught by only one instructor. But one instructor teaches many courses.
Hence the cardinality between course and instructor is Many to One (N :1)
Step 3: Identify the key attributes
• "Departmen_Name" can identify a department uniquely. Hence Department_Name is
the key attribute for the Entity "Department".
• Course_ID is the key attribute for "Course" Entity.
• Student_ID is the key attribute for "Student" Entity.
• Instructor_ID is the key attribute for "Instructor" Entity.
Step 4: Identify other relevant attributes
• For the department entity, other attributes are location
• For course entity, other attributes are course_name,duration
• For instructor entity, other attributes are first_name, last_name, phone
• For student entity, first_name, last_name, phone
Step 5: Draw complete ER diagram
By connecting all these details, we can now draw ER diagram as given below.

42
lOMoARcPSD|44749509

43

Downloaded by Rajesh Kore (rajeshkore16@gmail.com)


lOMoARcPSD|44749509

UNIT-II: DATABASE INTEGRITY AND NORMALISATION:

Relational Database Integrity - The Keys - Referential Integrity - Entity Integrity -


Redundancy and Associated Problems – Single Valued Dependencies – Normalisation -
Rules of Data Normalisation - The First Normal Form - The Second Normal Form - The
Third Normal Form - Boyce Codd Normal Form - Attribute Preservation - Lossless-join
Decomposition - Dependency Preservation.

File Organization: Physical Database Design Issues - Storage of Database on Hard Disks -
File Organization and Its Types – Heap files (Unordered files) - Sequential File
Organization - Indexed (Indexed Sequential) File Organization - Hashed File Organization
- Types of Indexes - Index and Tree Structure - Multi-Key File Organization - Need for
Multiple Access Paths - Multi-list File Organization - Inverted File Organization.

44
lOMoARcPSD|44749509

Data redundancy and associated problems


DBMS Data Redundancy:

It refers to the situation when the same data exists in more than one entity. It may also refer to the fact
that unnecessary or duplicated data is stored at different locations in the database. For instance

Problems caused by Data Redundancy:

There are three types of anomalies that occur when the database is not normalized. These are –
Insertion, update and deletion anomaly. Let’s take an example to understand this.

Example: Suppose a manufacturing company stores the employee details in a table named employee that
has four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name,
emp_address for storing employee’s address and emp_dept for storing the department details in which the
employee works. At some point of time the table looks like this:

emp_id emp_name emp_address emp_dept


101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004

The above table is not normalized. We will see the problems that we face when a table is not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the same in
two rows or the data will become inconsistent. If somehow, the correct address gets updated in one
department but not in other then as per the database, Rick would be having two different addresses, which
is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently not
assigned to any department then we would not be able to insert the data into the table if emp_dept field
doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the
rows that are having emp_dept as D890 would also delete the information of employee Maggie since she
is assigned only to this department.

45

Downloaded by Rajesh Kore (rajeshkore16@gmail.com)


lOMoARcPSD|44749509

To overcome these anomalies we need to normalize the data. In the next section we will discuss about
normalization.

Functional dependency in DBMS


The attributes of a table is said to be dependent on each other when an attribute of a table uniquely
identifies another attribute of the same table.
For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age. Here Stu_Id
attribute uniquely identifies the Stu_Name attribute of student table because if we know the student id we
can tell the student name associated with it. This is known as functional dependency and can be written as
Stu_Id->Stu_Name or in words we can say Stu_Name is functionally dependent on Stu_Id.
Formally:
If column A of a table uniquely identifies the column B of same table then it can represented as A->B
(Attribute B is functionally dependent on attribute A)
Types of Functional Dependencies

• Trivial functional dependency


• non-trivial functional dependency
• Multivalued dependency
• Transitive dependency
1. Trivial functional dependency in DBMS with example
The dependency of an attribute on a set of attributes is known as trivial functional dependency if the set of
attributes includes that attribute.
Symbolically: A ->B is trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A->A & B->B
For example: Consider a table with two columns Student_id and Student_Name.
{Student_Id, Student_Name} -> Student_Id is a trivial functional dependency as Student_Id is a subset of
{Student_Id, Student_Name}. That makes sense because if we know the values of Student_Id and
Student_Name then the value of Student_Id can be uniquely determined.
Also, Student_Id -> Student_Id & Student_Name -> Student_Name are trivial dependencies too.
2. Non trivial functional dependency in DBMS
If a functional dependency X->Y holds true where Y is not a subset of X then this dependency is called
non trivial Functional dependency.
For example:
An employee table with three attributes: emp_id, emp_name, emp_address.
The following functional dependencies are non-trivial:

46
lOMoARcPSD|44749509

emp_id -> emp_name (emp_name is not a subset of emp_id)


emp_id -> emp_address (emp_address is not a subset of emp_id)
On the other hand, the following dependencies are trivial:
{emp_id, emp_name} -> emp_name [emp_name is a subset of {emp_id, emp_name}]
Refer: trivial functional dependency.
Completely non trivial FD:
If a FD X->Y holds true where X intersection Y is null then this dependency is said to be completely non
trivial function dependency.
3. Multivalued dependency in DBMS
Multivalued dependency occurs when there are more than one independent multivalued attributes in a
table.
For example: Consider a bike manufacture company, which produces two colors (Black and white) in
each model every year.

bike_model manuf_year color


M1001 2007 Black
M1001 2007 Red
M2012 2008 Black
M2012 2008 Red
M2222 2009 Black
M2222 2009 Red
Here columns manuf_year and color are independent of each other and dependent on bike_model. In this
case these two columns are said to be multivalued dependent on bike_model. These dependencies can be
represented like this:
bike_model ->> manuf_year
bike_model ->> color
4. Transitive dependency in DBMS
A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies.
For e.g.
X -> Z is a transitive dependency if the following three functional dependencies hold true:
• X->Y
• Y does not ->X
• Y->Z
Note: A transitive dependency can only occur in a relation of three of more attributes. This dependency
helps us normalizing the database in 3NF (3rd Normal Form).

47
lOMoARcPSD|44749509

Example: Let’s take an example to understand it better:

Book Author Author_age


Game of Thrones George R. R. Martin 66
Harry Potter J. K. Rowling 49
Dying of the Light George R. R. Martin 66
{Book} ->{Author} (if we know the book, we knows the author name)
{Author} does not ->{Book}
{Author} -> {Author_age}
Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should hold, that makes
sense because if we know the book name we can know the author’s age.

Normalization
Here are the most commonly used normal forms:

• First normal form(1NF)


• Second normal form(2NF)
• Third normal form(3NF)
• Boyce & Codd normal form (BCNF)

First normal form (1NF)


As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It should
hold only atomic values.
Example: Suppose a company wants to store the names and contact details of its employees. It creates a
table that looks like this:

mp_id emp_name emp_address emp_mobile


101 Herschel New Delhi 8912312390

8812121212
102 Jon Kanpur
9900012222

103 Ron Chennai 7778881212

9990000123
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the same
field as you can see in the table above.

48
lOMoARcPSD|44749509

This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the
emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:

emp_id emp_name emp_address emp_mobile


101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987

Second normal form (2NF)


A table is said to be in 2NF if both the following conditions hold:

• Table is in 1NF (First normal form)


• No non-prime attribute is dependent on the proper subset of any candidate key of table.
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers and the subjects they teach. They create a
table that looks like this: Since a teacher can teach more than one subjects, the table can have multiple
rows for a same teacher.

teacher_id subject teacher_age

111 Maths 38

111 Physics 38

222 Biology 38

333 Physics 40

333 Chemistry 40

Candidate Keys: {teacher_id, subject}


Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non
prime attribute teacher_age is dependent on teacher_id alone which is a proper subset of candidate key.

49
lOMoARcPSD|44749509

This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset
of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:

teacher_id teacher_age

111 38

222 38

333 40

teacher_subject table:

teacher_id subject

111 Maths

111 Physics

222 Biology

333 Physics

333 Chemistry
Now the tables comply with Second normal form (2NF).

Third Normal form (3NF)


A table design is said to be in 3NF if both the following conditions hold:

• Table must be in 2NF


• Transitive functional dependency of non-prime attribute on any super key should be removed.

An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional
dependency X-> Y at least one of the following conditions hold:

• X is a super key of table


• Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.
50
lOMoARcPSD|44749509

emp_id emp_name emp_zip emp_state emp_city emp_district

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City

1006 Lora 282007 TN Chennai Urrapakkam

1101 Lilly 292008 UK Pauri Bhagwan

1201 Steve 222999 MP Gwalior Ratan

Example: Suppose a company wants to store the complete address of each employee, they create a table
named employee_details that looks like this:

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on


Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of any candidate
keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is dependent on emp_id
that makes non-prime attributes (emp_state, emp_city & emp_district) transitively dependent on super
key (emp_id). This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove the transitive
dependency:
employee table:

emp_id emp_name emp_zip


1001 John 282005

1002 Ajeet 222008

1006 Lora 282007

1101 Lilly 292008

1201 Steve 222999

employee_zip table:
51
lOMoARcPSD|44749509

emp_zip emp_state emp_city emp_district


282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF)

It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table
complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super
key of the table.
Example: Suppose there is a company wherein employees work in more than one department. They
store the data like this:

emp_idemp_nationality emp_dept dept_typedept_no_of_emp

1001 Austrian Production and planning D001 200

1001 Austrian Stores D001 250

design and technical


1002 American D134 100
support

1002 American Purchasing department D134 600

Functional dependencies in the table above:


emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:

52
lOMoARcPSD|44749509

emp_id emp_nationality

1001 Austrian

1002 American
emp_dept table:

emp_dept dept_type dept_no_of_emp

Production and planning D001 200

stores D001 250

design and technical support D134 100

Purchasing department D134 600

emp_dept_mapping table:

emp_id emp_dept
1001 Production and planning

1001 stores

1002 design and technical support

1002 Purchasing department


Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.

53
lOMoARcPSD|44749509

UNIT-III: STRUCTURES QUERY LANGUAGE (SQL):

Meaning – SQL commands - Data Definition Language - Data Manipulation Language

- Data Control Language - Transaction Control Language - Queries using Order by –

Where - Group by - Nested Queries. Joins – Views – Sequences - Indexes and

Synonyms - Table Handling.

54
lOMoARcPSD|44749509

Introduction to SQL:
What is SQL?
1. SQL is Structured Query Language, which is a computer language for storing, manipulating and
retrieving data stored in relational database.
2. SQL is the standard language for Relation Database System. All relational database management
systems like MySQL, MS Access, and Oracle, Sybase, Informix, postgres and SQL Server use SQL
as standard database language.
Why SQL?
3. Allows users to access data in relational database management systems.
4. Allows users to describe the data.
5. Allows users to define the data in database and manipulate that data.
6. Allows embedding within other languages using SQL modules, libraries & pre-compilers.
7. Allows users to create and drop databases and tables.
8. Allows users to create view, stored procedure, functions in a database.
9. Allows users to set permissions on tables, procedures and views
History:
10. 1970 -- Dr. E. F. "Ted" of IBM is known as the father of relational databases. He described a
relational model for databases.
11. 1974 -- Structured Query Language appeared.
12. 1978 -- IBM worked to develop Codd's ideas and released a product named System/R.
13. 1986 -- IBM developed the first prototype of relational database and standardized by ANSI. The first
relational database was released by Relational Software and its later becoming Oracle.
SQL Process:
14. When you are executing an SQL command for any RDBMS, the system determines the best way to
carry out your request and SQL engine figures out how to interpret the task.
15. There are various components included in the process. These components are Query Dispatcher,
Optimization Engines, Classic Query Engine and SQL Query Engine, etc. Classic query engine
handles all non-SQL queries, but SQL query engine won't handle logical files.
SQL Process:

55
lOMoARcPSD|44749509

SQL Commands:
SQL is a keyword based language. It consists of reserved words and user defined words. Reserved
word has a fix meaning and must be spelt exactly as required. User-defined words are words to represent
the names of various database objects including tables, columns, and indexes. They are defined by user.
SQL syntax is not case sensitive. Thus, words can be typed in either small or capital letters. SQL
language is a free format. However, to make it more readable, it is advisble to use indentation and
lineation. The SQL notation used throughout this book follows the Backus Naur Form (BNF) which is
described as below:
✓ Uppercase letters are used to represent reserved words
✓ Lower-case letters are used to represent user-defined words
✓ A vertical bar (| ) indicates a choice among alternatives
✓ Curly braces ({}) indicate a required element
✓ A ( [ ] ) brackets indicate an optional element
Data Definition language(DDL) in DBMS with Examples: Data Definition Language can be defined as
a standard for commands through which data structures are defined. It is a computer language that used
for creating and modifying the structure of the database objects, such as schemas, tables, views, indexes,
etc. Additionally, it assists in storing the metadata details in the database.

Data Definition language(DDL) in DBMS with Examples


Some of the common Data Definition Language commands are:
• CREATE
• ALTER
• DROP

1. CREATE- Data Definition language(DDL)


The main use of the create command is to build a new table and it comes with a predefined syntax. It
creates a component in a relational database management system. There are many implementations that
extend the syntax of the command to create the additional elements, like user profiles and indexes.
The general syntax for the create command in Data Definition Language is mentioned below:
CREATE TABLE tablename (Column1 DATATYPE, Column2 DATATYPE, Column3 DATATYPE, …….. ColumnN
DATATYPE)
OR
CREATE TABLE tablename
( columnName dataType [ NOT NULL]
[ DEFAULT defaultOption] [CHECK (searchCondition)]
[PRIMARY KEY (listofcolumns)]

56
lOMoARcPSD|44749509

[FOREIGN KEY (listofForeignKeyColumns)


REFERENCES ParentTableName[(listOfCandidateKeyColumns)]);
✓ he CREATE TABLE statement creates a table consisting of one or more columns
of the defined data type.
✓ The optional DEFAULT clause provides for default values in a column. Whenever an INSERT
statement fails to specify a column value, SQL will use the default value.
✓ The NOT NULL is specified to ensure that the column must have a data value.
✓ The remaining clauses are constraints and are headed by the clause:
CONSTRAINT constraintname.
✓ The PRIMARY KEY clause specified the column(s) that comprise the primary key. It is assumed
by default that the primary key value is NOT NULL.
✓ The FOREIGN KEY clause specifies a foreign key in the child table and it relationship to the
parent table. This clause specifies the:
• A listofForeignKeyColumns, the column(s) that form the foreign key.
• A REFERENCES subclause indicting to the parent table that holds the
matching primary key.
For Example
CREATE TABLE PUPIL (PUPIL_ID CHAR (10), STUDENT_Name Char (10);
Pupil Table with his ID and name is created by the DDL statement
Generally, the data types often used consists of strings and dates while creating a table. Every system
varies in how to specify the data type.

❖ Changing a Table Definition


The ALTER TABLE statement supports modification of a table definition. The definition of the ALTER
TABLE statement consists of the options:
✓ Adding a new column to a table and dropping an existing column
✓ Adding a new table constraint and dropping an existing table constraint
Setting a default for a column and dropping a existing default for a column
The general syntax of the ALTER command is mentioned below:
ALTER TABLE table_name ADD column_name (for adding a new column)
ALTER TABLE table_name RENAME To new_table_name (for renaming a table)
ALTER TABLE table_name MODIFY column_name data type (for modifying a column)
ALTER TABLE table_name DROP COLUMN column_name (for deleting a column)
For Example
Add column to the pupil table
ALTER TABLE PUPIL ADD PHONE NUMBER varchar 97
Before Adding Column

57
lOMoARcPSD|44749509

Pupil ID PUPIL_Name
97 Albert
98 Sameer
After Adding Column

PUPIL_ID STUDENT_NAME MOBILE NUMBER


97 ALBERT
98 SAMEER

3.Drop- Data Definition language(DDL)


By the use of this command, the users can delete an index, table or view. A component from a relational
database management system can be removed by a DROP statement in SQL. There are many systems
that allow the DROP and some other Data Definition Language commands for occurring inside a
transaction and then it can be rolled back.
The General syntax of the Drop command is mentioned below:
DROP TABLE table_name;
DROP DATABASE database_name;
DROP TABLE Student;
DROP TABLE index_name;

4.Truncate- Data Definition language(DDL)


By using the Truncate command, the users can remove the table content, but the structure of the table is
kept. In simple language, it removes all the records from the table structure. The users can’t remove data
partially through this command. In addition to this, every space allocated for the data is removed by
Truncate command.
The syntax of the Truncate command is mentioned below:
TRUNCATE TABLE table_name;
TRUNCATE TABLE Student;

DATA MANIPULATION
Data Manipulation Language (DML) can be defined as a set of syntax elements that are used to manage
the data in the database. The commands of DML are not auto-committed and modification made by them
are not permanent to the database. It is a computer programming language that is used to perform select,
insert, delete and update data in a database. The user requests are assisted by Data Manipulation
Language. This language is responsible for all forms of data modification in a database.
In this DML commands, namely:
1. UPDATE : updates data in a database table
2. DELETE : deletes data from a database table

58
lOMoARcPSD|44749509

3. INSERT INTO : inserts new data into a database table


1. INSERT
INSERT is used to add new records or data into an existing database table.
Syntax for INSERT command is as follows:
INSERT INTO tablename [(Column list)]
VALUES(dataValue LIst)
✓ columnList is optional; if omitted, SQL assumes the column list and its order are similar with the
column names that you specify when you first create the table.
✓ Any columns omitted must have been declared as NULL when table was created, unless DEFAULT
was specified when creating column.
✓ dataValueList must match columnList as follows:
• must have same number of items in each list;
• must be direct correspondence in position of items in both lists;
• data type of each item in dataValueList must be compatible with data type of
corresponding column.
We illustrate the variation of INSERT statement using the table Supplier as given below.

Example: to add a new row


Query 22: Add a new record as given below to the Supplier table.
Supplier Number: S9996
Supplier Name : NR Tech
Supplier Address : 20 Jalan Selamat, 62000 Kuala Lumpur,
Supplier Tel No: 23456677
Contact Person : Nick
This query can be written as:
INSERT into Supplier (SupNo, name, street, city, PostCode, TelNo, ContactPerson)
VALUES (S9996, “NR Tech”, “20 Jalan Selamat”,”Kuala Lumpur”,6200,23456677, “Nick”);

59
lOMoARcPSD|44749509

Since you want to insert values for all the columns in the table, therefore we may omit the column list.
Thus you may write the SQL statement as below:
INSERT into Supplier
VALUES (S9996, “NR Tech”, “20 Jalan Selamat”,”Kuala Lumpur”,6200,23456677, “Nick”);

Example : Insert a row into a specified column


You may insert new record with only a specific column into a table. However, each of the mandatory
columns, the column that is defined as NOT NULL in the CREATE TABLE statement, must be supplied
with a value.
Query : Add a new record as given below to the Supplier table.
Supplier Number: S9997
Supplier Name : Total System
Supplier Address : 25 Jalan Tanjung, Kuala Lumpur,
Supplier Tel No: 23456677
In this example, the data provided is not complete. Some information are missing, such as the post code
and the contact person. In this case, you need only specify the column names that we are going to use.
You may also omit the column list, but NULL value is required to use for the column name that has no
value.
INSERT into Supplier (SupNo, name, street, city, TelNo)
VALUES (“S9997”,”Total System”, “25 Jalan Tanjung”, “Kuala Lumpur”, 4385667);
You may also write as:
INSERT into Supplier
VALUES (“S9997”, “Total System”, “25 Jalan Tanjung”, “Kuala Lumpur”, NULL, 4385667, NULL);
The result of this INSERT operation is given below.

60
lOMoARcPSD|44749509

2. Update
The update statement is used to update or change records that match a specified criteria. This is
accomplished by carefully constructing a where clause.
The syntax for UPDATE statement is given below:
UPDATE TableName
SET columnName1 = dataValue1 [, columnName2 = dataValue2...]
[WHERE searchCondition]
✓ TableName is the name of a table.
✓ SET clause specifies names of one or more columns that are to be updated.
✓ WHERE clause is optional:
✓ if omitted, named columns are updated for all rows in table;
✓ if specified, only those rows that satisfy searchCondition are updated.
✓ New dataValue(s) must be compatible with data type for corresponding column
✓ We illustrate the variation of UPDATE statement using the table Employee as given below

Example 1: Update all rows


Updating may involve modifying a particular column for all records in a table.
Query 1: Increase the salary of each employee to 10% pay rise.
The UPDATE statement will be as given below:

61

Downloaded by Rajesh Kore (rajeshkore16@gmail.com)


lOMoARcPSD|44749509

UPDATE Employee
SET salary = salary*1.10;
The result table from this operation is shown below.

Example 2: Update Specified Rows


Query 2: Increase the salary only for managers by 5%,
If the changes are only for particular rows with a specified criteria, then the WHERE clause needs to be
used in the statement. This can be written as below.
UPDATE Employee
SET salary = salary*1.05
WHERE position = “Manager”;
The result from this operation is given below

3. DELETE
The DELETE statement is used to delete records or rows from an existing table.
The syntax for DELETE statement is given below:
DELETE FROM TableName
[WHERE searchCondition];
✓ TableName can be name of a base table or an updatable view.
✓ searchCondition is optional; if omitted, all rows are deleted from table. This does not delete table. If
search_condition is specified, only those rows that satisfy condition are deleted.
✓ We illustrate the variation of INSERT statement using the table Supplier as given below.
62
lOMoARcPSD|44749509

Example 1: Delete specified records or rows


Query 1: Delete supplier name „Total System‰ from the Supplier table.
You need to use WHERE clause when you want to delete only a specified records. Thus the statement
would be as given below:
DELETE FROM Supplier
WHERE Name = “Total System”

Example 2: Delete all records or rows


Query 2: Delete all records in the Shipping table
If we want to delete all records from the Supplier table, then we skip the WHERE clause. Thus the
statement would be written as:
DELETE FROM Supplier;
This command will delete all rows in the table shipping, but it does not delete the table. This means that
the table structure, attributes, and indexes will still be intact.
Explain Data Control Language (DCL) with Examples in DBMS: A Data Control Language (DCL)
can be defined as a computer language that is used for controlling privilege in the database. The
privileges are required for performing all the database operations, such as creating sequences, views or
tables. It is a part of the Structured Query Language.

63
lOMoARcPSD|44749509

Data Control Languages (DCL) Commands


There are two types of commands in the data control languages:
1. Grant Command
Grant Command is used for offering access or privileges to the users on the objects of the database.
Through this command, the users get access to the privileges in the database.
The General Syntax for the Grant Command is mentioned below:
GRANT privilege_name
ON object_name
TO {user_name I PUBLIC I role_name}
[WITH GRANT OPTION];
For Example
GRANT ALL ON workers
TO MNO;
[WITH GRANT OPTION]
In the given example, the permission to view and modify the details in the ‘workers table’ has been given
to the user MNO.
1. Revoke Command
The main purpose of the revoke command is canceling the previously denied or granted permissions.
Through the revoke command, the access to the given privileges can be withdrawn. In simple words, the
permission can be taken back from the user with this command.
The general syntax for the revoke command is mentioned below:
REVOKE<privilege list>
ON <relation name or view name>
From <user name>
For Example
REVOKE UPDATE
ON worker
FROM MNO;\

Differences between the Grant and Revoke Command


Grant Command Revoke Command

A user is allowed to perform some particular A user is disallowed to performing some


activities on the database by using Grant particular activities by using the revoke
Command. command.

64
lOMoARcPSD|44749509

The access to privileges for database objects that


The access to privileges for database objects is
is granted previously to the users can be
granted to the other users.
revoked.

Transaction Control Language (TCL)


Transaction Control Language can be defined as the portion of a database language used for maintaining
the consistency of the database and managing the transactions in database. A set of SQL statements that
are co-related logically and executed on the data stored in the table is known as transaction.
TCL Commands
There are three commands that come under the TCL:
1. Commit
The main use of Commit command is to make the transaction permanent. If there is a need for any
transaction to be done in the database that transaction permanent through commit command. Here is the
general syntax for the Commit command:
COMMIT;
For Example
UPDATE STUDENT SET STUDENT_NAME = ‘Maria’ WHERE STUDENT_NAME = ‘Meena’;
COMMIT;
By using the above set of instructions, you can update the wrong student name by the correct one and
save it permanently in the database. The update transaction gets completed when commit is used. If
commit is not used, then there will be lock on ‘Meena’ record till the rollback or commit is issued.
Now have a look at the below diagram where ‘Meena’ is updated and there is a lock on her record. The
updated value is permanently saved in the database after the use of commit and lock is released.

65
lOMoARcPSD|44749509

1. Rollback
Using this command, the database can be restored to the last committed state. Additionally, it is also used
with savepoint command for jumping to a savepoint in a transaction.
The general syntax for the Rollback command is mentioned below:
Rollback to savepoint-name;
For example
UPDATE STUDENT SET STUDENT_NAME = ‘Manish’ WHERE STUDENT_NAME = ‘Meena’;
ROLLBACK;
This command is used when the user realizes that he/she has updated the wrong information after the
student name and wants to undo this update. The users can issues ROLLBACK command and then undo
the update. Have a look at the below tables to know better about the implementation of this command.

1. Savepoint
The main use of the Savepoint command is to save a transaction temporarily. This way users can rollback
to the point whenever it is needed.
The general syntax for the savepoint command is mentioned below:
savepoint savepoint-name;
For Example
Following is the table of a school class

66
lOMoARcPSD|44749509

Use some SQL queries on the above table and then watch the results
INSERT into CLASS VALUES (101, ‘Rahul);
Commit;
UPDATE CLASS SET NAME= ‘Tyler’ where id= 101
SAVEPOINT A;
INSERT INTO CLASS VALUES (102, ‘Zack’);
Savepoint B;
INSERT INTO CLASS VALUES (103, ‘Bruno’)
Savepoint C;
Select * from Class;
The result will look like

Now
rollback to savepoint B
Rollback to B;
SELECT * from Class;

Now rollback to savepoint A


rollback to A;
SELECT * from class;

67
lOMoARcPSD|44749509

Row Selection (WHERE clause) or WHERE clause


SELECT statements, we retrieve all data or rows in specified columns from a table. To select
only some rows or to specify a selection criterion, we use WHERE clause. The WHERE clause filters
rows from the FROM clause tables. Omitting the WHERE clause specifies that all rows are used.
There are five basic search conditions that can be used in a query
✓ Comparison: compares the value of an expression to the value of another expression
✓ Range: tests whether the value of an expression falls within a specified range of values.
✓ Set membership: tests whether a value matches any value in a set of values.
✓ Pattern Match : tests whether a string matches a specified pattern.
✓ Null: tests a column for null (unknown) value.
Each type of these search conditions will be presented in this section.
Example 4: Comparison Search Condition
Query 4: List all employees with a salary greater than RM1000.
SELECT EmpNo, Name, TelNo, Position, Salary
FROM Employee
WHERE Salary > 1000;
This statement filters all rows based on the condition where salary are greater than 1000. The result
returns by this statement is shown below.

The above result shows list of comparison operators that can be used in the WHERE clause. In addition, a
more complex condition can be generated using the logical operators AND, OR, and NOT.

Sub-Queries/Nested Queries in SQL: Introduction to Nested Queries :


✓ One of the most powerful features of SQL is nested queries.
✓ A nested query is a query that has another query embedded within it; the embedded query is called
a sub query.

68
lOMoARcPSD|44749509

✓ When writing a query, we sometimes need to express a condition that refers to a table that must
itself be computed.
✓ A subquery typically appears within the WHERE clause of a query. Subqueries can sometimes
appear in the FROM clause or the HAVING clause.
✓ Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along
with the operators like =, <, >, >=, <=, IN, BETWEEN etc.
✓ here are a few rules that subqueries must follow:
1. Subqueries must be enclosed within parentheses.
2. A subquery can have only one column in the SELECT clause, unless multiple columns are in the
main query for the subquery to compare its selected columns.
3. A subquery cannot be immediately enclosed in a set function.
Subqueries with the SELECT Statement:
Subqueries are most frequently used with the SELECT statement. The basic syntax is as follows:
SELECT column_name
[, column_name ] FROM table1
[, table2 ]
WHERE column_name OPERATOR
(SELECT column_name [, column_name ]
FROM table1 [, table2]
[WHERE])
Ex: SELECT *
FROM customers
WHERE id in
(SELECT id
FROM customers
WHERE salary >4500);
Subqueries with the INSERT Statement:
✓ Sub queries also can be used with INSERT statements.
✓ The INSERT statement uses the data returned from the subquery to insert into another table.
✓ The selected data in the subquery can be modified with any of the character, date or number
functions.
Syntax
INSERT INTO tablename [ (column1[, column2 ]) ]
SELECT [ *|column1 [, column2 ]
FROM table1 [, table2]

69
lOMoARcPSD|44749509

[ WHERE VALUE OPERATOR ]


Ex:
INSERT into customers_bkp
SELECT * FROM customers
WHERE id in (SELECT id FROM customers) ;
Sub queries with the UPDATE Statement:
✓ The subquery can be used in conjunction with the UPDATE statement.
✓ Either single or multiple columns in a table can be updated when using a subquery with the
UPDATE statement.
Syntax:
UPDATE table SET column_name = new_value
[ WHERE OPERATOR [ VALUE ]
(SELECTCOLUMN_NAME FROM TABLE_NAME)
[ WHERE) ];
EX:
UPDATE CUSTOMERS SET SALARY = SALARY * 0.25
WHERE AGE IN
(SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );

Multi-table Queries
To retrieve data using SELECT statement from only one table. Sometimes we need results that
contain columns from more than one table. Thus, we need to perform a join operation to combine these
columns into one result table. To perform a join, we need to specify the tables to be used in the FROM
clause. The join condition that specifies the matching or common column/s of the tables to be joined is
written in the WHERE clause.
In this section we use Product and Delivery tables, shown below to illustrate the use of these
Multi-table Queries.

.
Product Table

70
lOMoARcPSD|44749509

Delivery Table

Example 1: Simple join


Query 1: List the supplier names for each product.
SELECT p.Name AS ProductName, s.Names AS SupplierName
FROM Product p, Supplier s
WHERE s.SuppNo = p.SuppNo;
This statement joins two tables which are Product and Supplier. Since the common column for both tables
is SuppNo, therefore this column is used for the join condition in the WHERE clause. The output for this
simple join statement is shown below.

Example 2: Sorting a join


Query 2: Sort the list of products based on supplier name and for each supplier name sort the list based on
Product names in descending order.
SELECT Product.Name AS ProductName, Supplier.Names AS SupplierName
FROM Product p, Supplier s
WHERE s.SuppNo = p.SuppNo
ORDER BY s.Name, p.Name DESC;
This statement is similar to the previous example, except it includes the ORDER BY clause for sorting
purposes. The result is sorted in ascending order by supplier name and for those supplier that have more
than one product the product name is sorted in descending order.

71
lOMoARcPSD|44749509

Example 21: Three table join


Query 21: Find the supplier names of the product that are delivered in Jan 2007. Sort the list based on
Supplier name.
SELECT Supplier.Names AS SupplierName, Product.Name AS ProductName, DeliveryDate
FROM Supplier s, Product p, Delivery d
WHERE s.SuppNo = p.SuppNo AND p.ProductNo = d.ProductNo AND
(DeliveryDate >= „1-Jan-07‰ and DeliveryDate <= „31-Jan-07‰)
ORDER BY s.Name;
This queries require to join three tables. All the join conditions are listed in the WHERE clause. As noted
earlier, the common column names for both tables to be joined need to be used as the join condition. To
join the supplier and product, the supplier number is used and to join the product and delivery tables, the
product number is used. The result from this join is shown below.

Sorting Results or ORDER


✓ SQL ORDER BY clause is used to sort the data in ascending or descending order, based on
one or more columns.
✓ Some database sorts query results in ascending order by default.
Syntax: The basic syntax of ORDER BY clause is as follows:
SELECT column-list FROM table_name
[WHERE condition] [ORDER BY column1, column2, .. columnN] [ASC | DESC];

Example : Single column ordering


Query : List salaries for all employess, arranged in ascending order of salary.
SELECT EmpNo, Name, TelNo, Position, salary
FROM Employee
ORDER BY salary;

72
lOMoARcPSD|44749509

✓ If we want to sort the list in descending order, the word DESC must be specified in the ORDER
BY clause after the column name, as shown below.
SELECT EmpNo, Name, TelNo, Position, salary
FROM Employee
ORDER BY salary DESC;

Example : Multicolumn ordering


Query : List the employees sorted by position and in each position sorted the list in descending order by
salary.
This query requires using two sort keys. The Position is the primary sort key and the Salary is the
secondary or minor sort key. The primary sort key has to be written first in the list and followed by minor
keys.
SELECT EmpNo, Name, TelNo, Position, salary
FROM Employee
ORDER BY position, salary DESC;

SQL JOIN
• A SQL JOIN combines records from two tables.
• A JOIN locates related column values in the two tables.

73
lOMoARcPSD|44749509

• A query can contain zero, one, or multiple JOIN operations.


• INNER JOIN is the same as JOIN; the keyword INNER is optional.

Different types of JOINs


• (INNER) JOIN: Select records that have matching values in both tables.
• LEFT (OUTER) JOIN: Select records from the first (left-most) table with matching right table records.
• RIGHT (OUTER) JOIN: Select records from the second (right-most) table with matching left table records.
• FULL (OUTER) JOIN: Selects all records that match either left or right table records.

The SQL JOIN syntax


The general syntax is:
SELECT column-names
FROM table-name1 JOIN table-name2
ON column-name1 = column-name2
WHERE condition

The general syntax with INNER is:


SELECT column-names
FROM table-name1 INNER JOIN table-name2
ON column-name1 = column-name2
WHERE condition
Note: The INNER keyword is optional: it is the default as well as the most commmonly used JOIN operation.

CUSTOMER
Id

74
lOMoARcPSD|44749509

FirstName
LastName
City
Country
Phone

ORDER
Id
OrderDate
OrderNumber
CustomerId
TotalAmount

SQL JOIN Examples


Problem: List all orders with customer information
SELECT OrderNumber, TotalAmount, FirstName, LastName, City, Country
FROM [Order] JOIN Customer
ON [Order].CustomerId = Customer.Id

OrderNumber TotalAmount FirstName LastName City Country

542378 440.00 Paul Henriot Reims France

542379 1863.40 Karin Josephs Münster Germany

542380 1813.00 Mario Pontes Rio de Janeiro Brazil

542381 670.80 Mary Saveley Lyon France

542382 3730.00 Pascale Cartrain Charleroi Belgium

542383 1444.80 Mario Pontes Rio de Janeiro Brazil

542384 625.20 Yang Wang Bern Switzerland

SQL LEFT JOIN


• LEFT JOIN performs a join starting with the first (left-most) table and then any matching second (right-most)
table records.
• LEFT JOIN and LEFT OUTER JOIN are the same.

75
lOMoARcPSD|44749509

The SQL LEFT JOIN syntax


The general syntax is:
SELECT column-names
FROM table-name1 LEFT JOIN table-name2
ON column-name1 = column-name2
WHERE condition

The general LEFT OUTER JOIN syntax is:


SELECT OrderNumber, TotalAmount, FirstName, LastName, City, Country
FROM Customer C LEFT JOIN [Order] O
ON O.CustomerId = C.Id
ORDER BY TotalAmount
This will list all customers, whether they placed any order or not.
The ORDER BY TotalAmount shows the customers without orders first (i.e. TotalMount is NULL).

OrderNumber TotalAmount FirstName LastName City Country

NULL NULL Diego Roel Madrid Spain

NULL NULL Marie Bertrand Paris France

542912 12.50 Patricio Simpson Buenos Aires Argentina

542937 18.40 Paolo Accorti Torino Italy

542897 28.00 Pascale Cartrain Charleroi Belgium

542716 28.00 Maurizio Moroni Reggio Emilia Italy

543028 30.00 Yvonne Moncada Buenos Aires Argentina

543013 36.00 Fran Wilson Portland USA

SQL RIGHT JOIN


• RIGHT JOIN performs a join starting with the second (right-most) table and then any matching first (left-most)
table records.
• RIGHT JOIN and RIGHT OUTER JOIN are the same.

76
lOMoARcPSD|44749509

The SQL RIGHT JOIN syntax


The general syntax is:
SELECT column-names
FROM table-name1 RIGHT JOIN table-name2
ON column-name1 = column-name2
WHERE condition

SQL RIGHT JOIN Example


Problem: List customers that have not placed orders
SELECT TotalAmount, FirstName, LastName, City, Country
FROM [Order] O RIGHT JOIN Customer C
ON O.CustomerId = C.Id
WHERE TotalAmount IS NULL
This returns customers that, when joined, have no matching order.

TotalAmount FirstName LastName City Country

NULL Diego Roel Madrid Spain

NULL Marie Bertrand Paris France

SQL FULL JOIN Statement


• FULL JOIN returns all matching records from both tables whether the other table matches or not.
• FULL JOIN can potentially return very large datasets.
• FULL JOIN and FULL OUTER JOIN are the same.

The general syntax is:


SELECT column-names
FROM table-name1 FULL JOIN table-name2
ON column-name1 = column-name2
WHERE condition

SQL FULL JOIN Examples

77
lOMoARcPSD|44749509

Problem: Match all customers and suppliers by country


SELECT C.FirstName, C.LastName, C.Country AS CustomerCountry,
S.Country AS SupplierCountry, S.CompanyName
FROM Customer C FULL JOIN Supplier S
ON C.Country = S.Country
ORDER BY C.Country, S.Country
This returns suppliers that have no customers in their country,
and customers that have no suppliers in their country,
and customers and suppliers that are from the same country.

What are views?


➢ A view is nothing more than a SQL statement that is stored in the database with an associated name. A
view is actually a composition of a table in the form of a predefined SQL query.
➢ A view can contain all rows of a table or select rows from a table. A view can be created from one or
many tables which depends on the written SQL query to create a view.
➢ Views, which are a type of virtual tables allow users to do the following −
• Structure data in a way that users or classes of users find natural or intuitive.
• Restrict access to the data in such a way that a user can see and (sometimes) modify
exactly what they need and no more.
• Summarize data from various tables which can be used to generate reports.

Creating Views
Database views are created using the CREATE VIEW statement. Views can be created from a single table,
multiple tables or another view.
To create a view, a user must have the appropriate system privilege according to the specific implementation.
The basic CREATE VIEW syntax is as follows −
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE [condition];
You can include multiple tables in your SELECT statement in a similar way as you use them in a normal SQL
SELECT query.

Example
Consider the CUSTOMERS table having the following records −

+----+----------+-----+-----------+----------+| ID | NAME | AGE | ADDRESS | SALARY |+----+----------+-----+-----------+----------


+| 1 | Ramesh | 32 | Ahmedabad | 2000.00 || 2 | Khilan | 25 | Delhi | 1500.00 || 3 | kaushik | 23 | Kota | 2000.00 || 4 |

78
lOMoARcPSD|44749509

Chaitali | 25 | Mumbai | 6500.00 || 5 | Hardik | 27 | Bhopal | 8500.00 || 6 | Komal | 22 | MP | 4500.00 || 7 | Muffy |


24 | Indore | 10000.00 |+----+----------+-----+-----------+----------+

Following is an example to create a view from the CUSTOMERS table. This view would be used to have
customer name and age from the CUSTOMERS table.

SQL > CREATE VIEW CUSTOMERS_VIEW AS

SELECT name, age

FROM CUSTOMERS;

Now, you can query CUSTOMERS_VIEW in a similar way as you query an actual table. Following is an example
for the same.

SQL > SELECT * FROM CUSTOMERS_VIEW;

This would produce the following result.


+----------+-----+
| name | age |
+----------+-----+
| Ramesh | 32 |
| Khilan | 25 |
| kaushik | 23 |

| Chaitali | 25 |
| Hardik | 27 |
| Komal | 22 |
| Muffy | 24 |
+----------+-----+

The WITH CHECK OPTION


The WITH CHECK OPTION is a CREATE VIEW statement option. The purpose of the WITH CHECK OPTION
is to ensure that all UPDATE and INSERTs satisfy the condition(s) in the view definition.
If they do not satisfy the condition(s), the UPDATE or INSERT returns an error.
The following code block has an example of creating same view CUSTOMERS_VIEW with the WITH CHECK
OPTION.

CREATE VIEW CUSTOMERS_VIEW AS

SELECT name, age

FROM CUSTOMERS

WHERE age IS NOT NULL

WITH CHECK OPTION;

79
lOMoARcPSD|44749509

The WITH CHECK OPTION in this case should deny the entry of any NULL values in the view's AGE column,
because the view is defined by data that does not have a NULL value in the AGE column.

Updating a View
A view can be updated under certain conditions which are given below −
• The SELECT clause may not contain the keyword DISTINCT.
• The SELECT clause may not contain summary functions.
• The SELECT clause may not contain set functions.
• The SELECT clause may not contain set operators.
• The SELECT clause may not contain an ORDER BY clause.
• The FROM clause may not contain multiple tables.
• The WHERE clause may not contain subqueries.
• The query may not contain GROUP BY or HAVING.
• Calculated columns may not be updated.
• All NOT NULL columns from the base table must be included in the view in order for the INSERT
query to function.
So, if a view satisfies all the above-mentioned rules then you can update that view. The following code block has
an example to update the age of Ramesh.

SQL > UPDATE CUSTOMERS_VIEW

SET AGE = 35

WHERE name = 'Ramesh';

This would ultimately update the base table CUSTOMERS and the same would reflect in the view itself. Now, try
to query the base table and the SELECT statement would produce the following result.
+----+----------+-----+-----------+----------+

| ID | NAME | AGE | ADDRESS | SALARY |


+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

Inserting Rows into a View


Rows of data can be inserted into a view. The same rules that apply to the UPDATE command also apply to the
INSERT command.

80
lOMoARcPSD|44749509

Here, we cannot insert rows in the CUSTOMERS_VIEW because we have not included all the NOT NULL
columns in this view, otherwise you can insert rows in a view in a similar way as you insert them in a table.

Deleting Rows into a View


Rows of data can be deleted from a view. The same rules that apply to the UPDATE and INSERT commands
apply to the DELETE command.
Following is an example to delete a record having AGE = 22.

SQL > DELETE FROM CUSTOMERS_VIEW

WHERE age = 22;

This would ultimately delete a row from the base table CUSTOMERS and the same would reflect in the view
itself. Now, try to query the base table and the SELECT statement would produce the following result.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |

| 3 | kaushik | 23 | Kota | 2000.00 |


| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+

Dropping Views
Obviously, where you have a view, you need a way to drop the view if it is no longer needed. The syntax is very
simple and is given below −
DROP VIEW view_name;

Following is an example to drop the CUSTOMERS_VIEW from the CUSTOMERS table.

DROP VIEW CUSTOMERS_VIEW;

SQL - Using Sequences


A sequence is a set of integers 1, 2, 3, ... that are generated in order on demand. Sequences are frequently used in
databases because many applications require each row in a table to contain a unique value and sequences provide
an easy way to generate them.
This chapter describes how to use sequences in MySQL.

Using AUTO_INCREMENT column


The simplest way in MySQL to use sequences is to define a column as AUTO_INCREMENT and leave the rest to
MySQL to take care.

81
lOMoARcPSD|44749509

Example
Try out the following example. This will create a table and after that it will insert a few rows in this table where it
is not required to give a record ID because its auto-incremented by MySQL.

mysql> CREATE TABLE INSECT

-> (

-> id INT UNSIGNED NOT NULL AUTO_INCREMENT,

-> PRIMARY KEY (id),

-> name VARCHAR(30) NOT NULL, # type of insect

-> date DATE NOT NULL, # date collected

-> origin VARCHAR(30) NOT NULL # where collected);Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO INSECT (id,name,date,origin) VALUES

-> (NULL,'housefly','2001-09-10','kitchen'),

-> (NULL,'millipede','2001-09-10','driveway'),

-> (NULL,'grasshopper','2001-09-10','front yard');Query OK, 3 rows affected (0.02 sec)Records: 3 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM INSECT ORDER BY id;+----+-------------+------------+------------+| id | name | date | origin |+---
-+-------------+------------+------------+| 1 | housefly | 2001-09-10 | kitchen || 2 | millipede | 2001-09-10 | driveway || 3 |
grasshopper | 2001-09-10 | front yard |+----+-------------+------------+------------+3 rows in set (0.00 sec)

Starting a Sequence at a Particular Value


By default, MySQL will start the sequence from 1, but you can specify any other number as well at the time of
table creation.
The following code block has an example where MySQL will start sequence from 100.

mysql> CREATE TABLE INSECT

-> (

-> id INT UNSIGNED NOT NULL AUTO_INCREMENT = 100,

-> PRIMARY KEY (id),

-> name VARCHAR(30) NOT NULL, # type of insect

-> date DATE NOT NULL, # date collected

-> origin VARCHAR(30) NOT NULL # where collected);


82
lOMoARcPSD|44749509

Alternatively, you can create the table and then set the initial sequence value with ALTER TABLE.

mysql> ALTER TABLE t AUTO_INCREMENT = 100;

SQL - Indexes
Indexes are special lookup tables that the database search engine can use to speed up data retrieval.
Simply put, an index is a pointer to data in a table. An index in a database is very similar to an index in
the back of a book.
For example, if you want to reference all pages in a book that discusses a certain topic, you first refer to
the index, which lists all the topics alphabetically and are then referred to one or more specific page
numbers.
An index helps to speed up SELECT queries and WHERE clauses, but it slows down data input, with
the UPDATE and the INSERT statements. Indexes can be created or dropped with no effect on the data.
Creating an index involves the CREATE INDEX statement, which allows you to name the index, to
specify the table and which column or columns to index, and to indicate whether the index is in an
ascending or descending order.
Indexes can also be unique, like the UNIQUE constraint, in that the index prevents duplicate entries in
the column or combination of columns on which there is an index.

The CREATE INDEX Command


The basic syntax of a CREATE INDEX is as follows.

CREATE INDEX index_name ON table_name;

Single-Column Indexes
A single-column index is created based on only one table column. The basic syntax is as follows.

CREATE INDEX index_name


ON table_name (column_name);

Unique Indexes
Unique indexes are used not only for performance, but also for data integrity. A unique index does not
allow any duplicate values to be inserted into the table. The basic syntax is as follows.

CREATE UNIQUE INDEX index_name


on table_name (column_name);

Composite Indexes
A composite index is an index on two or more columns of a table. Its basic syntax is as follows .
83

Downloaded by Rajesh Kore (rajeshkore16@gmail.com)


lOMoARcPSD|44749509

CREATE INDEX index_name


on table_name (column1, column2);

Whether to create a single-column index or a composite index, take into consideration the column(s) that
you may use very frequently in a query's WHERE clause as filter conditions.
Should there be only one column used, a single-column index should be the choice. Should there be two
or more columns that are frequently used in the WHERE clause as filters, the composite index would be
the best choice.

Implicit Indexes
Implicit indexes are indexes that are automatically created by the database server when an object is
created. Indexes are automatically created for primary key constraints and unique constraints.

The DROP INDEX Command


An index can be dropped using SQL DROP command. Care should be taken when dropping an index
because the performance may either slow down or improve.
The basic syntax is as follows −

DROP INDEX index_name;

we can check the INDEX Constraint chapter to see some actual examples on Indexes.
When should indexes be avoided?
Although indexes are intended to enhance a database's performance, there are times when they should be
avoided.
The following guidelines indicate when the use of an index should be reconsidered.
• Indexes should not be used on small tables.
• Tables that have frequent, large batch updates or insert operations.
• Indexes should not be used on columns that contain a high number of NULL values.
• Columns that are frequently manipulated should not be indexed.

84
lOMoARcPSD|44749509

UNIT-IV
TRANSACTIONS
AND
CONCURRENCY MANAGEMENT:

Transactions - Concurrent Transactions - Locking Protocol - Serialisable Schedules - Locks Two Phase
Locking (2PL) - Deadlock and its Prevention - Optimistic Concurrency Control.
Database Recovery and Security: Database Recovery meaning - Kinds of failures - Failure controlling
methods - Database errors - Backup & Recovery Techniques - Security & Integrity - Database Security -
Authorization.

85
lOMoARcPSD|44749509

TRANSACTIONS AND CONCURRENCY CONTROL MANAGEMENT

Transactions:

A transaction is an execution of a program and is seen by DBMS as a series or list of actions. It is


different from an ordinary program and is the result from the execution of a program written in a high –
level data manipulation language or programming language. A transaction starts and ends between the
statements “begin transaction” and “end transaction”.

In a transaction, access to the database is knowledgeable by two operaions.

i. Read(x).
ii. Write(x).

The first one perform the reading operation of data item x from the database, where as the second one

perform the writing operation of data item x to the database. Consider a transaction Ti which transfers
100/- from “A” account to “B” account. This transaction will follows

Ti:
read(A);
A: = A-100;
Write(A);
Read(B);
Write(B);

ACID Properties or Transaction Properties


A transaction is a very small unit of a program and it may contain several lowlevel tasks. A transaction
in a database system must maintain Atomicity, Consistency, Isolation, and Durability − commonly
known as ACID properties − in order to ensure accuracy, completeness, and data integrity.

1. Atomicity − This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none. There must be no state in a database where a
transaction is left partially completed. States should be defined either before the execution of the
transaction or after the execution/abortion/failure of the transaction.

2. Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the database

86
lOMoARcPSD|44749509

was in a consistent state before the execution of a transaction, it must remain consistent after the
execution of the transaction as well.
3. Durability − The database should be durable enough to hold all its latest updates even if the
system fails or restarts. If a transaction updates a chunk of data in a database and commits, then
the database will hold the modified data. If a transaction commits but the system fails before the
data could be written on to the disk, then that data will be updated once the system springs back
into action.
4. Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will affect
the existence of any other transaction.

Transaction States

There are the following six states in which a transaction may exist:
Active: The initial state when the transaction has just started execution.
Partially Committed: At any given point of time if the transaction is executing properly, then it is going
towards it COMMIT POINT. The values generated during the execution are all stored in volatile storage.
Failed: If the transaction fails for some reason. The temporary values are no longer required, and the
transaction is set to ROLLBACK. It means that any change made to the database by this transaction up to
the point of the failure must be undone. If the failed transaction has withdrawn Rs. 100/- from account A,
then the ROLLBACK operation should add Rs 100/- to account A.
Aborted: When the ROLLBACK operation is over, the database reaches the BFIM. The transaction is
now said to have been aborted.
Committed: If no failure occurs then the transaction reaches the COMMIT POINT. All the temporary
values are written to the stable storage and the transaction is said to have been committed.
Terminated: Either committed or aborted, the transaction finally reaches this state.

87
lOMoARcPSD|44749509

The whole process can be described using the following diagram:

CONCURRENT TRANSACTIONS
When more than one transaction is executed by the operating system in a multiple programming
environment, there are possibilities that instructions of one transaction are added with some other
transaction.

➢ Schedule: a sequential execution sequence of transaction is called schedule. A schedule can have
many transactions in it, each containing of number of instruction / tasks.
➢ Serial Schedule: a schedule in which transactions are aligned in such a way that one transaction is
executed first. When the first transaction completes its cycle then next transaction is executed.
Transactions are ordered one after other. This type of schedule is called serial schedule as
transactions are executed in a serial manner.

In a multi – transaction environment, serial schedules are considered as benchmark. The execution
sequence of instruction in a transaction cannot be changed but two transactions can have their
instruction executed in random wise. This execution does no damage if two transactions are manually
independent and working on different segment of data but in case these two transactions are working
on same data, result may change,. This every – changeable result may reason the database in an
inconsistent state.

To solve the problem, we allow parallel execution of transaction schedule if transactions in it are
either serializable or have some equivalence relation between or among transactions.

Problems of concurrency control


Several problems can occur when concurrent transactions are executed in an uncontrolled manner.
Following are the three problems in concurrency control.
1. Lost updates
2. Dirty read
3. Unrepeatable read

1. Lost update problem


o When two transactions that access the same database items contain their operations in a way that
makes the value of some database item incorrect, then the lost update problem occurs.
o If two transactions T1 and T2 read a record and then update it, then the effect of updating of the
first record will be overwritten by the second update.

Example:

88
lOMoARcPSD|44749509

Here,
o At time t2, transaction-X reads A's value.
o At time t3, Transaction-Y reads A's value.
o At time t4, Transactions-X writes A's value on the basis of the value seen at time t2.
o At time t5, Transactions-Y writes A's value on the basis of the value seen at time t3.
o So at time T5, the update of Transaction-X is lost because Transaction y overwrites it without
looking at its current value.
o Such type of problem is known as Lost Update Problem as update made by one transaction is lost
here.

2. Dirty Read
o The dirty read occurs in the case when one transaction updates an item of the database, and then
the transaction fails for some reason. The updated database item is accessed by another transaction
before it is changed back to the original value.
o A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now has values which
have never formed part of the stable database.

Example:

At time t2, transaction-Y writes A's value.


o At time t3, Transaction-X reads A's value.
o At time t4, Transactions-Y rollbacks. So, it changes A's value back to that of prior to t1.
o So, Transaction-X now contains a value which has never become part of the stable database.
o Such type of problem is known as Dirty Read Problem, as one transaction reads a dirty value
which has not been committed.
89
lOMoARcPSD|44749509

3. Inconsistent Retrievals Problem


o Inconsistent Retrievals Problem is also known as unrepeatable read. When a transaction calculates
some summary function over a set of data while the other transactions are updating the data, then
the Inconsistent Retrievals Problem occurs.
o A transaction T1 reads a record and then does some other processing during which the transaction
T2 updates the record. Now when the transaction T1 reads the record, then the new value will be
inconsistent with the previous value.
Example:
Suppose two transactions operate on three accounts.

Transaction-X is doing the sum of all balance while transaction-Y is transferring an amount 50 from
Account-1 to Account-3.
o Here, transaction-X produces the result of 550 which is incorrect. If we write this produced result
in the database, the database will become an inconsistent state because the actual sum is 600.
o Here, transaction-X has seen an inconsistent state of the database.
Concurrency Control:-
➢ The coordination of the simultaneous execution of transactions in a multi user database system is
known as concurrency control.
90
lOMoARcPSD|44749509

➢ The objective of concurrency control is to ensure the serializability of transaction in a multi-user


database environment.
➢ Concurrency control is important because the simultaneous execution of transactions over a
shared database can several data integrity and consistency problems.
❖ The three main problems are
a) Lost updates
b) Uncommitted data
c) Inconsistent data
a) Lost updates:-
➢ The Lost Updates problem occurs when two concurrent transactions T1 and T2 are updating the
same data element and one of the updates is lost.
b) Uncommitted data:-
➢ Uncommitted data occurs when two transactions T1 and T2 are executed concurrently and the first
transaction (T1) is rolled back after the second transaction (T2) has already accessed the
uncommitted data. Thus it is violating Isolation property of transactions.
c) Inconsistent retrievals:-
➢ Inconsistent retrievals occur when a transaction access data before and after another transactions
finish working with such data.
Concurrency Control with Locking Methods:-
➢ A transaction acquires a lock prior to data access, the lock is released (Unlocked) when the
transaction is completed, so that another transaction can lock the data item.
➢ Transaction T2 does not have access to a data item that is currently being used by transaction T1.
➢ Most multi-user DBMS automatically initiate and enforce locking procedures.
➢ All lock information is managed by a lock manager, which is responsible for assigning and
policing the locks used by the transactions.
Lock Granularity :- (or) Locking Level
➢ Lock granularity indicates the level of lock use. Locking can take place at the following level.
a) Database level
b) Table level
c) Page level
d) Row level
e) Field (attribute level)
a) Database level:-
➢ In database level lock the entire database is locked. So if transaction T1 is accessing that database,
then transaction T2 cannot access it.

91
lOMoARcPSD|44749509

This level of locking is good for batch processes but it is unsuitable for multi user DBMS. Because
thousands of transactions had to wait for the previous transaction to be completed before the next one
could reserve the entire database. So the data access would be slow.
Table level:-
➢ In table level lock the entire table is locked that means if transaction T1 is accessing a table then
transaction T2 cannot access the same table.
➢ If a transaction requires access to several tables, each table may be locked.
➢ Table level locks are less restrictive than database level locks.
➢ Table level locks are not suitable for multi-user DBMS.
➢ The drawback of table level lock is suppose transaction T1 and T2 cannot access the same table
even when they try to use different rows; T2 must wait until T1 unlocks the table.
Page level:-
➢ In a page level lock, the DBMS will lock on entire disk page.
➢ A disk page or page is also referred as a disk block, which is described as a section of a disk.
➢ A page has a fixed size such as 4k, 8k or 16k.
➢ A table can span several pages, and a page can contain several rows of one or more tables.
➢ Page level locks are currently frequently used multi-user DBMS locking method.
➢ Page level lock is shown in the following fig.

In the above fig. T1 and T2 access the same table while locking different disk pages.
➢ If T2 requires the use of a row located on a page that is locked by T1, T2 must wait until the page
is unlocked.
Row level:-
➢ A row level lock is much less restrictive than the other locks. The DBMS allows concurrent
transactions to access different rows of the same table even the rows are located on the same
pages.
➢ The row level locking approach improves the availability of data.
➢ But row level locking management requires high overhead because a lock exist for each row in a
table of the database. So it involves a conflicting transaction.
Field level:-

92
lOMoARcPSD|44749509

The field level lock allows concurrent transactions to access the same row as long as they require the use
of different fields (attributes) within that row.
➢ Although field level locking clearly yields the most flexible multi user data access, but it is rarely
implemented in a DBMS because it requires an extreme High Level computer overhead.
Lock Types:-
➢ The DBMS use different lock types like
a) Binary Locks
b) Shared/Exclusive Locks.
a) Binary Locks:-
➢ A binary lock has two states:
a) Locked
b) Unlocked
➢ In an object is locked by a transaction no other transaction can use that object. The object may be
a database, table, page or row.
➢ If an object is unlocked, any transaction can lock the object for its use.
➢ Every database operation requires that the affected object be locked.
➢ A transaction must unlock the object after its termination. Therefore every transaction requires a
lock and unlocks operation for each data item that is accessed.
➢ Such operations are automatically managed and scheduled by the DBMS.
➢ Every DBMS has a default locking mechanism. If the end user wants to override the default, the
LOCK TABLE and other SQL commands are available.
➢ Using binary locks the lost update problem is eliminated in concurrency control because the lock
released until the WRITE statement is completed.
➢ But binary locks are how considered too restrictive to yield optional concurrency conditions For
example the DBMS will not allow two transactions to read the same database object even though
neither transaction updates the database.
Shared/Exclusive Locks:-
➢ A shared lock exists when concurrent transactions are granted read access on the basic of a
common lock. A shared lock produces no conflict as long as all the concurrent transactions are
read only.
➢ An exclusive lock exists when access is reserved specifically for the transaction that locked the
object. The exclusive lock must be used when conflicts exists lock one transaction is READ and
other is WRITE.
➢ So a shared lock is issued when a transaction wants to read data from the data base and an
exclusive lock is issued when a transaction wants to update (WRITE) a data item.

93
lOMoARcPSD|44749509

➢ Using Shared/Exclusive locking concept a lock can have 3 states


a) Unlocked
b) Shared (read)
c) Exclusive (write)
Example for shared lock:
➢ If transaction T1 has shared lock on data item X and transaction T2 wants to read data item X. So
T2 may also obtain a shared lock on data item X.
Example for Shared/Exclusive lock:
➢ If transaction T1 has Shared/Exclusive lock on data item X and transaction T2 wants an exclusive
lock to update data item X. But an exclusive lock cannot be granted to transaction T2 and it must
wait till T1 is saved. So “The exclusive lock is granted if and only if no other are held on the data
item”.
➢ Shared/Exclusive locks are more efficient but these increase the lock manager’s overhead because
of the following reasons.
a) The type of the lock must be held before a lock is granted.
b) First lock type has to check, then lock is issued and then release the lock.
➢ Shared/Exclusive locks can lead to two major problems:
a) The resulting transaction schedule might not be serializable.
b) The schedule might create deadlock.
Two Phase Locking to ensure Serializability:-
➢ Two phase locking defines how transactions acquire locks.
➢ Two phase locking guarantees Serializability but it does not prevent dead locks.
➢ The two phases are
a) A growing phase in which a transaction acquires all required locks with out unlocking any data.
Once all locks have been acquired the transaction is in its locked point.
b) A shrinking phase, in which a transaction releases all locks and can not obtain any new lock.
➢ In two phase locking protocol the transaction acquires all the locks until it reaches its locked
point. When the locked point is reached, the data are modified to conform to the transaction
requirements. Finally the transaction is completed, and then it releases all the locks which are
acquired.
➢ The two phase locking increase the transaction processing cost but the drawback is it might create
deadlocks.
Two Phase Locking –
A transaction is said to follow Two Phase Locking protocol if Locking and Unlocking can be done in two
phases.

94
lOMoARcPSD|44749509

1. Growing Phase: New locks on data items may be acquired but none can be released.
2. Shrinking Phase: Existing locks may be released but no new locks can be acquired.
Note – If lock conversion is allowed, then upgrading of lock( from S(a) to X(a) ) is allowed in Growing
Phase and downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.
Let’s see a transaction implementing 2-PL.

T1 T2

1 LOCK-S(A)

2 LOCK-S(A)

3 LOCK-X(B)

4 ……. ……

5 UNLOCK(A)

6 LOCK-X(C)

7 UNLOCK(B)

8 UNLOCK(A)

9 UNLOCK(C)

10 ……. ……

This is just a skeleton transaction which shows how unlocking and locking works with 2-PL. Note for:
Transaction T1:
• Growing Phase is from steps 1-3.
• Shrinking Phase is from steps 5-7.
• Lock Point at 3
Transaction T2:
• Growing Phase is from steps 2-6.
• Shrinking Phase is from steps 8-9.
• Lock Point at 6

Deadlocks:-
➢ A dead lock occurs when two transactions wait indefinitely for each other to unlock data For
example a dead lock occurs when two transactions, T1 and T2 exist in the following mode.

95
lOMoARcPSD|44749509

T1: access data items X and Y.


T2: access data items Y and X.
➢ T1 and T2 transactions are executing simultaneously so, T1 has locked data item X and T2 has
locked data item Y. Now transaction T1 is waiting to lock data item Y but it is already locked by
T2. Simultaneous T2 is waiting to lock X but it is already locked by T1. So both transactions are
waiting to access other items. Thus condition is referred as “Dead Lock”.
➢ So in real world DBMS, many transactions can be executed simultaneously, there by increasing
the probability of generating dead Locks.
➢ The 3 basic techniques to control dead locks are
a) Dead Lock Prevention:-
➢ A transaction requesting a new lock is aborted when there is the possibility that a dead lock can
occur. If the transaction is aborted, all changes made by this transaction are rolled back, and all
locks obtained by the transaction are released.
➢ This method is used when there is existing high probability of dead lock.
b) Dead Lock Detection:-
➢ The DBMS tests the database for dead locks. If a dead lock is found, one of the transactions is
aborted and the other transaction continues.
➢ This method is used when there is un probability of dead locks.
c) Dead Lock Avoidance:-
➢ The transaction must obtain all of the locks if needs before it can be executed. This technique
avoids the rollback of conflicting transactions.
Concurrency Control with Time Stamping Methods:-
➢ Time stamping methods are used to manage concurrency transaction execution.
➢ In time stamping approach, for each transaction unique time stamp assigned.
➢ Time stamps must have two properties.
a) Uniqueness: It specifies unique time stamp value exist, that means no equal time stamp values can
exist.
b) Monotonicity: It specifies that time stamp values always increase.
➢ The disadvantage of time stamping approach is that each values stored in the database requires
two additional time stamp fields. They are
a) One field is for the last time the field was read
b) Another field is for the last update.
➢ So time stamping increases memory needs and the database’s processing overhead.
Wait/die and Wound/wait Schemes:-
➢ The wait/die and wound/wait scheme are used in time stamping method.

96
lOMoARcPSD|44749509

Example:
➢ Assume that we have two conflicting transactions:
T1 and T2, each with a unique time stamp.
➢ Suppose T1 has time stamp of 11548789 and T2 has a time stamp of 19562545. So T1 is the order
transaction and T2 is newer (younger) transaction.
Using the wait/die scheme:-
a) If the transaction requesting the lock is the order of the two transactions, it will wait until the other
transactions is completed, and the locks are relapsed.
b) If the transaction requesting the lock is the younger of the two transactions, it will die (rollback) and is
rescheduled using the same time stamp.
➢ That means in wait/die scheme, the order transaction waits for the younger to complete and
release its locks.
Using the wound/wait scheme:-
a) If the transaction requesting the lock is the older of the two transactions, it will preempt (wound) the
younger transaction (by rolling it back). The younger transaction is rescheduled using the same time
stamp.
b) If the transaction requesting the lock is the younger of the two transactions, it will wait until the other
transaction is completed and the locks are released.
➢ That means in the wound/wait scheme, the older transaction rolls back the younger transaction and
reschedules it.
Concurrency Control with Optimistic methods:-
➢ The optimistic approach is based on the assumption that the majority of the database operations do
not conflict.
The optimistic approach requires neither locking nor time stamping techniques.
➢ Using an optimistic approach, each transaction moves through three phases. They are
a) Read Phase
b) Validation Phase
c) Write Phase.
➢ During the read phase, the transaction reads the database, executes the needed computations, and
makes the updates to a private copy of the data base values. All the update operations of the
transaction are recorded in a temporary update file, which is not accessed by the remaining
transactions.
During the validation phase the transaction is validated to ensure that the changes made will not affect the
integrity and consistency of the database. If the validation test is positive, the transaction goes to the write
phase. If the validation test is negative, the transaction is restarted and the changes are discarded.

97
lOMoARcPSD|44749509

Introduction to Database recovery:


A computer system is an electrochemical device subject to failures of various types. The reliability of the
database system is linked to the reliability of the computer system on which it runs. The types of failures
that the computer system is likely to be subjected to include failures of components or subsystems,
software failures, power outages, accidents, unforeseen situations and natural or man-made disasters.
Database recovery techniques are methods of making the database fault accepting. The aim of
recovery scheme is to allow database operations to be resumed after a failure with minimum loss of
information at an economically justified cost.

"Database security" is protection of the information contained in the database against unauthorized
access, modification or destruction.
"Database integrity" is the mechanism that is applied to ensure that the data in the database is correct and
consistent.

Database Recovery
Recovery techniques are used to bring database, which does not satisfy consistency requirements, into a
consistent state. The inconsistencies may arise due to dissatisfaction of the semantic integrity constraints
specified in the schema or may be due to damage of certain implicit constraints that are expected to hold
for a database. In other words, if a transaction completes normally then all the changes that it performs on
the database are permanently committed. But, if transaction does not complete normally then none of its
changes are committed. An abnormal termination may be due to several reasons including:
a) user may decide to abort his transaction
b) there might be a deadlock
c) there might be a system failure.
So the recovery mechanisms must make sure that a consistent state of database can be restored under all
circumstances. In case of transaction abort or deadlock the system remains in control but incase of failure
the system loses control because computer itself fails or some critical data are lost.

Kinds of Failures
When a transaction/program is made to be executed, a number of difficulties can arise, which leads to its
abnormal termination. The failures are mainly of two types:
1. Soft failures: In such cases, a CPU or memory or software error shortly stops the execution of the
current transaction (or all transactions), thus lead to losing the state of program execution and the
state/contents of the buffers. These can further be subdivided into two types:
a) Statement failure
b) Program failure

98
lOMoARcPSD|44749509

A Statement of program may cause to abnormal termination if it does not execute completely. If
during the execution of a statement, an integrity constraints get violated it leads to abnormal
termination of program due to which any updates made already may not got reflected in the
database leaving it in an inconsistent state.
A failure of program can occur if some code in a program leads to its abnormal termination. E.g.,
a program which goes into an infinite loop. In such case the only way to break the loop is to abort
the program. Thus part of program, which is executed before abortion from program may cause
some updates in database, and hence the database is, updated only partially which leads to an
inconsistent state of database. Also in case of deadlock i.e. if one program enters into a deadlock
with some other program, then this program has to be restarted to get out of deadlock and thus the
partial updates made by this program in the database makes the database in an inconsistent state.
Thus soft failures can be occurred due to either of statement failure or failure of program.
2. Hard failure: Hard failures are those failures when some data on disk get damaged and cannot be
read anymore. This may be due to many reasons e.g. a voltage fluctuation in the power supply to
the computer makes it go off or some bad sectors may come on disk or there is a disk crash. In all
these cases, the database gets into an inconsistent state.

Failure Controlling Methods


Although failures can be controlled and removed / handled using different recovery techniques to be
discussed later, but they are quite expensive both in case of time and in memory space. In such a case it is
more beneficial to better avoid the failure by some checks instead of deploying recovery technique to
make database consistent. Also recovery from failure involves manpower, which can be used in some
other productive work, if failure can be avoided. It is therefore, important to find out ways and means by
which failures could be controlled.
Different methods/techniques can be adopted to control different types of failures. For e.g. consider a
hard failure i.e. system crashing. The cause of system shutdown could be a failure in power supply unit or
loss of power, due to which information stored on the storage medium can be lost. One method to avoid
loss of data stored on disk due to power failure is to provide an uninterruptable power source by using
voltage stabilizers or batteries or transformers. Also since recovery from soft failures is quicker, so it is
hard failure, which, as far as possible, should be controlled by taking some preventive measures. In case
of failure of system software, it can be controlled by ensuring that all the functions as well as statements
used in the program have been placed in right positions and debugging is done prior to its execution so
that appropriate solution can be applied thus avoiding inconsistency in database. Soft failure can also be
controlled by checking the integrity constraints used in program prior to its execution or by checking the
preconditions to be satisfied by a statement so that program won't go into an infinite loop thus causing

99
lOMoARcPSD|44749509

abnormal termination and hence leaving database in a corrupt state. If all such precautions are taken in
advance then no extra effort has to be done in recovering erroneous data on the database.

Recovery Techniques Top

Several recovery techniques have been proposed for database systems. As we have seen that two types of
failures are there, so now we will discuss about how to recover from those two types of failures. Soft
failure or Media failure recovery can be done using/restoring the last backup copy or by doing forward
recovery if the system logs is intact. While Hard failure or system failure recovery using log include
backward recovery as well as forward recovery. So there are two main strategies for performing recovery:
1) Backward Recovery (UNDO)
In this scheme the uncommitted changes made by a transaction to a database are undone. Instead the
system is reset to some previous consistent state of database that is free from any errors.

2) Forward Recovery (Redo)


In this scheme the committed changes made by a transaction are reapplied to an earlier copy of the
database.

In simpler words, when a particular error in system is detected, the recovery system makes an accurate
assessment of the state of the system and then makes appropriate adjustment based on the anticipated
results had the system been error free. One thing to be noted that the Redo operation must be idempotent
i.e. executing it several times must be equivalent to executing it once. This characteristic is required to
guarantee correct behaviour of database even if a failure occurs during the recovery process.

100
lOMoARcPSD|44749509

Error Reporting and Detection Schemes


An error is said to have occurred if the execution of a command to manipulate the database cannot be
successfully completed either due to inconsistent data or due to state of program. For e.g.: - There may be
a command in program to store data in database. On the execution of command without any problem, it is
found that there is no space/place in database to accommodate that additional data. Then if can be soul
that an error has occurred. This error is due to physical state of database.
Broadly errors are classified into following categories :-
1. User error : This includes errors in the program (e.g. Logical errors) as well as errors made by
online users of database. These types of errors can be avoided by applying some check conditions
in programs or by limiting the access rights of online users e.g. read only. So only updation or
insertion operation require that appropriate check routines perform appropriate checks on the data
entered. In case of error, some prompts can be passed to user to enable him to correct those errors.
2. Consistency error : These errors occur due to inconsistent state of database caused may be due to
wrong execution of commands or in case of abortion of a transaction. To overcome these errors
the database system should include routines that check for the consistency of data entered in the
database.
3. System error : These include errors in database system or the OS for e.g. deadlocks, (discussed
earlier in Concurrency Control Unit). Such errors are fairly hard to detect and require
reprogramming the erroneous components of the system software.

Security & Integrity Top

Information security is the protection of information against unauthorized disclosure, alteration or


destruction. Database security is the protection of information that is maintained in a database. It deals
with ensuring only the "right people" get the rights access to the "right data". By right people are mean to
those people who have the right to access or interact with the database. This ensured that the
confidentiality of the data is maintained. For e.g.: - In an educational instruction, information about
student's grade, & university's personal information accessible only to authorities concern & not to
everyone. Another example can be in case of medical records of patients in a hospital, these could
accessible only to health care officials. In computer definition, specification of access rules about who has
what type of access to what information is known as problem of authorization. These access rules are
defined at the time database is defined. The person who writes access rules is called on authorizer. The
process of ensuring that information & other protected object are accessed only in authorized ways is
called access control. The term integrity is also applied to data & to the mechanism that help to ensure its
correctness. Integrity refers to the avoidance of accidental loss of consistency. Protection of database

101
lOMoARcPSD|44749509

contents from unauthorized access includes legal & ethical issues, organization policies as well as
database management policies. To protect database several levels of security measures are maintained: -
1. Physical : The site or sites containing the computer system must be physically secured against
illegal entry of unauthorized person.
2. Human : A template authorization is given to user to reduce chance of any other user giving
access to outsides in exchange of some favors.
3. O.S. : Even though a fool proof security measures are taken to secure database System, weakness
in O.S. security may serve as a means of unauthorized access to the database.
4. Network : Since databases allow distributed or remote access through terminals or network,
software level security within the network software is an important issue to be taken under
consideration.
5. Database system : In database also according to user needs authorization is distributed or done.
That is to say user may bee allowed to read data & issue queries but would not be allowed to
deliberately modify the data. Only some upper level users may be allowed to do so giving them
authorized access rights with database itself. It is the responsibility of database system to ensure
that these authorization restrictions are not violated.
To ensure database security scarcity at all these above levels must be maintained.

Authorization
Authorization is the culmination of the administrative policies of the organization. As name specifies,
authorization is a set of rules that can be used to determine which user has what type of access of which
portion of the database. The person who writes access rules is called an authorizer.
An authorizer may set several forms of authorization on parts of the database. Among them are the
following:
1. Read Authorization: allows reading, but not modification of data.
2. Insert Authorization: allows insertion of new data, but not the modification of existing data, e.g.
insertion of tuple in a relation.
3. Update authorization: allows modification of data, but not its deletion. But data items like
primary-key attributes may not be modified.
4. Delete authorization: allows deletion of data only.
A user may be assigned all, none or combination of these types of authorization, which are broadly called
access authorizations.
In addition to these manipulation operations, a user may be granted control operations like
1. Add: Allow adding new object types such as new relations (in case of RDB), records and set
types (in case of network model) or record types and hierarchies (in hierarchical model of DB).
2. Drop: Allows the deletion of relations in DB.
102
lOMoARcPSD|44749509

3. Alter: Allows addition of new attributes in a relations (data-items) or deletion of existing data
items from the database.
4. Propagate Access Control: This is an additional right that allows to propagate the access control
or access right which one already has to some other i.e. if user A has access right R over a relation
S, then if having propagate access control, he can propagate his access right R over relation S to
another user B either fully or part of it.

103
lOMoARcPSD|44749509

UNIT 5
Distributed database
A distributed database is a collection of multiple interconnected databases, which are spread physically
across various locations that communicate via a computer network.

Features
• Databases in the collection are logically interrelated with each other. Often they represent a single
logical database.
• Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.
• The processors in the sites are connected via a network. They do not have any multiprocessor
configuration.
• A distributed database is not a loosely connected file system.
• A distributed database incorporates transaction processing, but it is not synonymous with a
transaction processing system.

Distributed Database Management System


A distributed database management system (DDBMS) is a centralized software system that manages a
distributed database in a manner as if it were all stored in a single location.

Features
• It is used to create, retrieve, update and delete distributed databases.
• It synchronizes the database periodically and provides access mechanisms by the virtue of which
the distribution becomes transparent to the users.
• It ensures that the data modified at any site is universally updated.
• It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.
• It is designed for heterogeneous database platforms.
• It maintains confidentiality and data integrity of the databases.

Advantages of Distributed Databases


Following are the advantages of distributed databases over centralized databases.
Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers and

104
lOMoARcPSD|44749509

local data to the new site and finally connecting them to the distributed system, with no interruption in
current functions.
More Reliable − In case of database failures, the total system of centralized databases comes to a halt.
However, in distributed systems, when a component fails, the functioning of the system continues may
be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met from local
data itself, thus providing faster response. On the other hand, in centralized systems, all queries have to
pass through the central computer for processing, which increases the response time.
Lower Communication Cost − In distributed database systems, if data is located locally where it is
mostly used, then the communication costs for data manipulation can be minimized. This is not feasible
in centralized systems.

Types of Distributed Databases or Structure of DDMS


Distributed databases can be broadly classified into homogeneous and heterogeneous distributed
database environments, each with further sub-divisions, as shown in the following illustration.

Homogeneous Distributed Databases


In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its
properties are −
• The sites use very similar software.
• The sites use identical DBMS or DBMS from the same vendor.
• Each site is aware of all other sites and cooperates with other sites to process user requests.
• The database is accessed through a single interface as if it is a single database.
Types of Homogeneous Distributed Database
There are two types of homogeneous distributed database −
• Autonomous − Each database is independent that functions on its own. They are integrated by a
controlling application and use message passing to share data updates.
• Non-autonomous − Data is distributed across the homogeneous nodes and a central or master
DBMS co-ordinates data updates across the sites.
105
lOMoARcPSD|44749509

Heterogeneous Distributed Databases


In a heterogeneous distributed database, different sites have different operating systems, DBMS products
and data models. Its properties are −
• Different sites use dissimilar schemas and software.
• The system may be composed of a variety of DBMSs like relational, network, hierarchical or
object oriented.
• Query processing is complex due to dissimilar schemas.
• Transaction processing is complex due to dissimilar software.
• A site may not be aware of other sites and so there is limited co-operation in processing user
requests.
Types of Heterogeneous Distributed Databases
• Federated − The heterogeneous database systems are independent in nature and integrated
together so that they function as a single database system.
• Un-federated − The database systems employ a central coordinating module through which the
databases are accessed.

Client-server architecture of Distributed system.


• A client server architecture has a number of clients and a few servers connected in a network.
• A client sends a query to one of the servers. The earliest available server solves it and replies.
• A Client-server architecture is simple to implement and execute due to centralized server
system.

Collaborating server architecture


• Collaborating server architecture is designed to run a single query on multiple servers.
• Servers break single query into multiple small queries and the result is sent to the client.
• Collaborating server architecture has a collection of database servers. Each server is capable
for executing the current transactions across the databases.

106

Downloaded by Rajesh Kore (rajeshkore16@gmail.com)


lOMoARcPSD|44749509

3. Middleware architecture
• Middleware architectures are designed in such a way that single query is executed on multiple
servers.
• This system needs only one server which is capable of managing queries and transactions from
multiple servers.
• Middleware architecture uses local servers to handle local queries and transactions.
• The softwares are used for execution of queries and transactions across one or more independent
database servers, this type of software is called as middleware.

Data Replication
Data replication is the process in which the data is copied at multiple locations (Different computers or
servers) to improve the availability of data.

Goals of data replication


• Increase the availability of data.
• Speed up the query evaluation.

Types of data replication


There are two types of data replication:
1. Synchronous Replication:
In synchronous replication, the replica will be modified immediately after some changes are made in the
relation table. So there is no difference between original data and replica.
2. Asynchronous replication:
In asynchronous replication, the replica will be modified after commit is fired on to the database.

Replication Schemes
The three replication schemes are as follows:

107
lOMoARcPSD|44749509

1. Full Replication
In full replication scheme, the database is available to almost every location or user in communication
network.

Advantages of full replication


• High availability of data, as database is available to almost every location.
• Faster execution of queries.
Disadvantages of full replication
• Concurrency control is difficult to achieve in full replication.
• Update operation is slower.

2. No Replication

No replication means, each fragment is stored exactly at one location.

Advantages of no replication
• Concurrency can be minimized.
• Easy recovery of data.
Disadvantages of no replication
• Poor availability of data.
• Slows down the query execution process, as multiple clients are accessing the same server.

108

Downloaded by Rajesh Kore (rajeshkore16@gmail.com)


lOMoARcPSD|44749509

3. Partial replication

Partial replication means only some fragments are replicated from the database.

Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are
called fragments. Fragmentation can be of three types: horizontal, vertical, and hybrid (combination of
horizontal and vertical). Horizontal fragmentation can further be classified into two techniques: primary
horizontal fragmentation and derived horizontal fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed from the
fragments. This is needed so that the original table can be reconstructed from the fragments whenever
required. This requirement is called “reconstructiveness.”
Advantages of Fragmentation
• Since data is stored close to the site of usage, efficiency of the database system is increased.
• Local query optimization techniques are sufficient for most queries since data is locally available.
• Since irrelevant data is not available at the sites, security and privacy of the database system can
be maintained.
Disadvantages of Fragmentation
• When data from different fragments are required, the access speeds may be very high.
• In case of recursive fragmentations, the job of reconstruction will need expensive techniques.
• Lack of back-up copies of data in different sites may render the database ineffective in case of
failure of a site.
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to
maintain reconstructiveness, each fragment should contain the primary key field(s) of the table. Vertical
fragmentation can be used to enforce privacy of data.
For example, let us consider that a University database keeps records of all registered students in a
Student table having the following schema.
109
lOMoARcPSD|44749509

STUDENT

Regd_No Name Course Address Semester Fees Marks

Now, the fees details are maintained in the accounts section. In this case, the designer will fragment the
database as follows −

CREATE TABLE STD_FEES AS


SELECT Regd_No, Fees
FROM STUDENT;

Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more fields.
Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each horizontal fragment
must have all columns of the original base table.
For example, in the student schema, if the details of all students of Computer Science Course needs to be
maintained at the School of Computer Science, then the designer will horizontally fragment the database
as follows −

CREATE COMP_STD AS
SELECT * FROM STUDENT
WHERE COURSE = "Computer Science";

Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used.
This is the most flexible fragmentation technique since it generates fragments with minimal extraneous
information. However, reconstruction of the original table is often an expensive task.
Hybrid fragmentation can be done in two alternative ways −
• At first, generate a set of horizontal fragments; then generate vertical fragments from one or more
of the horizontal fragments.
• At first, generate a set of vertical fragments; then generate horizontal fragments from one or more
of the vertical fragments.

110

Downloaded by Rajesh Kore (rajeshkore16@gmail.com)

You might also like