BCOM Computers 3RD SEM RDBMS (RELATIONAL DATA BASE MANAGEMENT SYSEM) 2ND YEAR
BCOM Computers 3RD SEM RDBMS (RELATIONAL DATA BASE MANAGEMENT SYSEM) 2ND YEAR
1
lOMoARcPSD|44749509
Introduction to Databases:
What is Data?
The raw facts are called as data. The word “raw” indicates that they have not been processed.
Ex: For example 89 is the data.
What is information?
The processed data is known as information.
Ex: Marks: 89; then it becomes information.
What is Knowledge?
1. Knowledge refers to the practical use of information.
2. Knowledge necessarily involves a personal experience.
DATA/INFORMATION PROCESSING:
The process of converting the data (raw facts) into meaningful information is called as
data/information processing.
Note: In business processing knowledge is more useful to make decisions for any organization.
DIFFERENCE BETWEEN DATA AND INFORMATION:
DATA INFORMATION
1. Raw facts 1. Processed data
2. It is in unorganized form 2. It is in organized form
3. Data doesn’t help Decision making 3. Information help in Decision making
process process
➢ Concurrent access to the data in the file system has many problems like a Reading the file
while other deleting some information, updating some information
➢ File system doesn’t provide crash recovery mechanism.
Eg. While we are entering some data into the file if System crashes then content of the file
is lost.
➢ Protecting a file under file system is very difficult.
The typical file-oriented system is supported by a conventional operating system. Permanent
records are stored in various files and a number of different application programs are written to
extract records from and add records to the appropriate files.
3
lOMoARcPSD|44749509
Integrity Problems:
The data values stored in the database must satisfy certain types of consistency constraints. For example,
the balance of a bank account may never fall below a prescribed amount. These constraints are enforced in
the system by adding appropriate code in the various application programs. When new constraints are
added, it is difficult to change the programs to enforce them. The problem is compounded when
constraints involve several data items for different files.
Atomicity Problem:
A computer system like any other mechanical or electrical device is subject to failure. In many
applications, it is crucial to ensure that once a failure has occurred and has been detected, the data are
restored to the consistent state existed prior to the failure
Database
A Database is a collection of related data organised in a way that data can be easily accessed,
managed and updated. Database can be software based or hardware based, with one sole purpose,
storing data.
During early computer days, data was collected and stored on tapes, which were mostly write-only,
which means once data is stored on it, it can never be read again. They were slow and bulky, and soon
computer scientists realised that they needed a better solution to this problem.
DBMS
A DBMS is software that allows creation, definition and manipulation of database, allowing users to
store, process and analyse data easily. DBMS provides us with an interface or a tool, to perform
various operations like creating database, storing data in it, updating data, creating tables in the
database and a lot more.
DBMS also provides protection and security to the databases. It also maintains data consistency in
case of multiple users.
Here are some examples of popular DBMS used these days:
• MySql
• Oracle
• SQL Server
• IBM DB2
• PostgreSQL
• Amazon SimpleDB (cloud based) etc.
4
lOMoARcPSD|44749509
1. Data stored into Tables: Data is never directly stored into the database. Data is stored into tables,
created inside the database. DBMS also allows to have relationships between tables which makes
the data more meaningful and connected. You can easily understand what type of data is stored
where by looking at all the tables created in a database.
2. Reduced Redundancy: In the modern world hard drives are very cheap, but earlier when hard
drives were too expensive, unnecessary repetition of data in database was a big problem. But
DBMS follows Normalisation which divides the data in such a way that repetition is minimum.
3. Data Consistency: On Live data, i.e. data that is being continuosly updated and added,
maintaining the consistency of data can become a challenge. But DBMS handles it all by itself.
4. Support Multiple user and Concurrent Access: DBMS allows multiple users to work on
it(update, insert, delete data) at the same time and still manages to maintain the data consistency.
5. Query Language: DBMS provides users with a simple Query language, using which data can be
easily fetched, inserted, deleted and updated in a database.
6. Security: The DBMS also takes care of the security of data, protecting the data from un-
authorised access. In a typical DBMS, we can create user accounts with different access
permissions, using which we can easily secure our data by restricting user access.
7. DBMS supports transactions, which allows us to better handle and manage data integrity in real
world applications where multi-threading is extensively used.
5
lOMoARcPSD|44749509
Disadvantages of DBMS
• It's Complexity
• Except MySQL, which is open source, licensed DBMSs are generally costly.
• They are large in size
DATABASE APPROACH
The objectives of database approaches includes,
1. Data sharability
2. Data availability
3. Data independency
4. Data integrity
5. Data security
6
lOMoARcPSD|44749509
1. Data sharability: the sharability objective ensures that the data item developed by one
application can be shared among all the applications. These objectives results in reducing the level
of unplanned redundancies which basically occur when same data is stored at multiple locations.
2. Data Availability: This objective ensures that the requested data is available to the user in a
meaningful format which results in decreasing the access time.
3. Data independency: this objectives ensures that the database programs are stored in such a
way that they are independent of their storage details. The conceptual schema provides physical
storage details and external schema provide logical storage details i.e., the conceptual schema
provide independence from external schema.
4. Data integrity: This objectives ensures that the data values enters in the database fall within a
specified range and are of correct format. Data integrity can be achived by enabling DBA to have
full control of database and the operations performed on it.
5. Data Security: Data is a dynamic important of an organization and must be confidential. Such
confidential data must be properly secured such that it is not accessed by unauthorized persons.
This can be achieved by employing data security.
Components of a DBMS
A database management system (DBMS) consists of several components. Each component plays very
important role in the database management system environment. The major components of database
management system are:
• Software
• Hardware
• Data
• Procedures
• Database Access Language
Software
The main component of a DBMS is the software. It is the set of programs used to handle the database
and to control and manage the overall computerized database
1. DBMS software itself, is the most important software component in the overall system
2. Operating system including network software being used in network, to share the data of
database among multiple users.
3. Application programs developed in programming languages such as C++, Visual Basic that
are used to to access database in database management system. Each program contains
statements that request the DBMS to perform operation on database. The operations may
7
lOMoARcPSD|44749509
include retrieving, updating, deleting data etc . The application program may be conventional
or online workstations or terminals.
Hardware
Hardware consists of a set of physical electronic devices such as computers (together with associated
I/O devices like disk drives), storage devices, I/O channels, electromechanical devices that make
interface between computers and the real world systems etc, and so on. It is impossible to implement
the DBMS without the hardware devices, In a network, a powerful computer with high data
processing speed and a storage device with large storage capacity is required as database server.
Data
Data is the most important component of the DBMS. The main purpose of DBMS is to process the
data. In DBMS, databases are defined, constructed and then data is stored, updated and retrieved to
and from the databases. The database contains both the actual (or operational) data and the metadata
(data about data or description about data).
Procedures
Procedures refer to the instructions and rules that help to design the database and to use the DBMS.
The users that operate and manage the DBMS require documented procedures on hot use or run the
database management system. These may include.
1. Procedure to install the new DBMS.
2. To log on to the DBMS.
3. To use the DBMS or application program.
4. To make backup copies of database.
5. To change the structure of database.
6. To generate the reports of data retrieved from database.
8
lOMoARcPSD|44749509
9
lOMoARcPSD|44749509
11
lOMoARcPSD|44749509
12
lOMoARcPSD|44749509
Capacity Issues
All the databases have their limits of storing data in it and the physical memory also has some
limitations. DBA has to decide the limit and capacity of database and all the issues related to it.
Database design
The logical design of the database is designed by the DBA. Also a DBA is responsible for physical
design, external model design, and integrity control.
13
lOMoARcPSD|44749509
Database accessibility
DBA writes subschema to decide the accessibility of database. He decides the users of the database
and also which data is to be used by which user. No user has to power to access the entire database
without the permission of DBA.
Monitoring performance
If database is working properly then it doesn’t mean that there is no task for the DBA. Yes f course,
he has to monitor the performance of the database. A DBA monitors the CPU and memory usage.
Database implementation
Database has to be implemented before anyone can start using it. So DBA implements the database
system. DBA has to supervise the database loading at the time of its implementation.
14
lOMoARcPSD|44749509
15
lOMoARcPSD|44749509
Data Model
Data models show that how the data is connected and stored in the system. It shows the relationship
between data. A Model is basically a conceptualization between attributes and entities. There were
basically three main data models in DBMS that were Network, hierarchical, and relational. But
these days, there a lots of data models that are given below.
There are different types of the data models and now let see each of them in detail:
1. Flat data model
2. Entity relationship model
3. Relation model
4. Record base model
5. Network model
6. Hierarchical model
7. Object oriented data model
8. Context data model
Flat data model is the first and foremost introduced model and in this all the data used is kept in the
same plane. Since it was used earlier this model was not so scientific.
16
lOMoARcPSD|44749509
Entity relationship model is based on the notion of the real world entities and their relationships.
While formulating the real world scenario in to the database model an entity set is created and this
model is dependent on two vital things and they are :
An entity has a real world property called attribute and attribute define by a set of values called
domain. For example, in a university a student is an entity, university is the database, name and age
and sex are the attributes. The relationships among entities define the logical association between
entities.
Relational model is the most popular model and the most extensively used model. In this model the
data can be stored in the tables and this storing is called as relation, the relations can be normalized
and the normalized relation values are called atomic values. Each row in a relation contains unique
value and it is called as tuple, each column contains value from same domain and it is called as
attribute.
17
lOMoARcPSD|44749509
Network model has the entities which are organized in a graphical representation and some entities in
the graph can be accessed through several paths.
Network Model
Hierarchical model has one parent entity with several children entity but at the top we should have
only one entity called root. For example, department is the parent entity called root and it has several
children entities like students, professors and many more.
Hierarchical model
Object oriented data model is one of the developed data model and this can hold the audio, video and
graphic files. These consist of data piece and the methods which are the DBMS instructions.
Context data model is a flexible model because it is a collection of many data models. It is a
collection of the data models like object oriented data model, network model, semi structured model.
So, in this different types of works can be done due to the versatility of it.
Context Model
Therefore, this support different types of users and differ by the interaction of users in database and
also the data models in DBMS brought a revolutionary change in industries by the handling of
relevant data. The data models in DBMS are the systems that help to use and create databases, as we
have seen there are different types of data models and depending on the kind of structure needed we
can select the data model in DBMS.
Relational Model
Relational Model was proposed by E.F. Codd to model data in the form of relations or tables.
After designing the conceptual model of Database using ER diagram, we need to convert the
conceptual model in the relational model which can be implemented using any RDMBS languages
like Oracle SQL, MySQL etc. So we will see what Relational Model is.
What is Relational Model?
Relational Model represents how data is stored in Relational Databases. A relational database stores
data in the form of relations (tables). Consider a relation STUDENT with attributes ROLL_NO,
NAME, ADDRESS, PHONE and AGE shown in Table 1.
STUDENT
4 SURESH DELHI 18
IMPORTANT TERMINOLOGIES
19
lOMoARcPSD|44749509
1. Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME
2. Relation Schema: A relation schema represents name of the relation with its attributes. e.g.;
STUDENT (ROLL_NO, NAME, ADDRESS, PHONE and AGE) is relation schema for
STUDENT. If a schema has more than 1 relation, it is called Relational Schema.
3. Tuple: Each row in the relation is known as tuple. The above relation contains 4 tuples, one of
which is shown as:
4. Relation Instance: The set of tuples of a relation at a particular instance of time is called as
relation instance. Table 1 shows the relation instance of STUDENT at a particular time. It can
change whenever there is insertion, deletion or updation in the database.
5. Degree: The number of attributes in the relation is known as degree of the relation. The
STUDENT relation defined above has degree 5.
6. Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT
relation defined above has cardinality 4.
7. Column: Column represents the set of values for a particular attribute. The column
ROLL_NO is extracted from relation STUDENT.
ROLL_NO
8. NULL Values: The value which is not known or unavailable is called NULL value. It is
represented by blank space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.
Keys
An important constraint on an entity is the key. The key is an attribute or a group of attributes
whose values can be used to uniquely identify an individual entity in an entity set.
Types of Keys
There are several types of keys. These are described below.
Candidate key
A candidate key is a simple or composite key that is unique and minimal. It is unique because
no two rows in a table may have the same value at any time. It is minimal because every column is
necessary in order to attain uniqueness.
20
lOMoARcPSD|44749509
From our COMPANY database example, if the entity is Employee(EID, First Name, Last Name,
SIN, Address, Phone, BirthDate, Salary, DepartmentID), possible candidate keys are:
• EID, SIN
• First Name and Last Name – assuming there is no one else in the company with the same
name
• Last Name and DepartmentID – assuming two people with the same last name don’t work in
the same department
Composite key
A composite key is composed of two or more attributes, but it must be minimal. Using the
example from the candidate key section, possible composite keys are:
• First Name and Last Name – assuming there is no one else in the company with the same
name
• Last Name and Department ID – assuming two people with the same last name don’t work in
the same department
Primary key
The primary key is a candidate key that is selected by the database designer to be used as an
identifying mechanism for the whole entity set. It must uniquely identify tuples in a table and not be
null. The primary key is indicated in the ER model by underlining the attribute.
• A candidate key is selected by the designer to uniquely identify tuples in a table. It must not be
null.
• A key is chosen by the database designer to be used as an identifying mechanism for the
whole entity set. This is referred to as the primary key. This key is indicated by underlining
the attribute in the ER model.
In the following example, EID is the primary key:
Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary, DepartmentID)
Secondary key
A secondary key is an attribute used strictly for retrieval purposes (can be composite), for
example: Phone and Last Name.
Alternate key
Alternate keys are all candidate keys not chosen as the primary key.
Foreign key
A foreign key (FK) is an attribute in a table that references the primary key in another table
OR it can be null. Both foreign and primary keys must be of the same data type. In the COMPANY
database example below, DepartmentID is the foreign key:
21
lOMoARcPSD|44749509
Employee(EID, First Name, Last Name, SIN, Address, Phone, BirthDate, Salary, DepartmentID)
Nulls
A null is a special symbol, independent of data type, which means either unknown or inapplicable.
It does not mean zero or blank. Features of null include:
• No data entry
• Not permitted in the primary key
• Should be avoided in other attributes
• Can represent
o An unknown attribute value
o A known, but missing, attribute value
o A “not applicable” condition
• Can create problems when functions such as COUNT, AVERAGE and SUM are used
• Can create logical problems when relational tables are linked
NOTE: The result of a comparison operation is null when either argument is null. The result of an
arithmetic operation is null when either argument is null (except functions that ignore nulls).
22
lOMoARcPSD|44749509
4 SURESH DELHI 18 IT
BRANCH
BRANCH_CODE BRANCH_NAME
CS COMPUTER SCIENCE
IT INFORMATION TECHNOLOGY
ELECTRONICS AND
ECE
COMMUNICATION ENGINEERING
CV CIVIL ENGINEERING
BRANCH_CODE of STUDENT can only take the values which are present in BRANCH_CODE of
BRANCH which is called referential integrity constraint. The relation which is referencing to other
relation is called REFERENCING RELATION (STUDENT in this case) and the relation to which
other relations refer is called REFERENCED RELATION (BRANCH in this case).
Integrity Constraints
Constraints enforce limits to the data or type of data that can be inserted/updated/deleted
from a table. The whole purpose of constraints is to maintain the data integrity during an
update/delete/insert into a table. In this tutorial we will learn several types of constraints that can be
created in DBMS.
Types of constraints
• NOT NULL
• UNIQUE
• DEFAULT
• CHECK
• Key Constraints – PRIMARY KEY, FOREIGN KEY
• Domain constraints
• Mapping constraints
NOT NULL:
NOT NULL constraint makes sure that a column does not hold NULL value. When we don’t provide
value for a particular column while inserting a record into a table, it takes NULL value by default. By
specifying NULL constraint, we can be sure that a particular column(s) cannot have NULL values.
Example:
23
lOMoARcPSD|44749509
UNIQUE:
UNIQUE Constraint enforces a column or set of columns to have unique values. If a column has a
unique constraint, it means that particular column cannot have duplicate values in a table
Example:
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);
DEFAULT:
The DEFAULT constraint provides a default value to a column when there is no value provided while
inserting a record into a table.
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)
);
CHECK:
This constraint is used for specifying range of values for a particular column of a table. When this
constraint is being set on a column, it ensures that the specified column must have the value falling in
the specified range.
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL CHECK(ROLL_NO >1000) ,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)
24
lOMoARcPSD|44749509
);
In the above example we have set the check constraint on ROLL_NO column of STUDENT table.
Now, the ROLL_NO field must have the value greater than 1000.
Key constraints:
PRIMARY KEY:
Primary key uniquely identifies each record in a table. It must have unique values and cannot contain
nulls. In the below example the ROLL_NO field is marked as primary key, that means the ROLL_NO
field cannot have duplicate and null values.
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);
FOREIGN KEY:
Foreign keys are the columns of a table that points to the primary key of another table. They act as a
cross-reference between tables.
Domain constraints:
Each table has certain set of columns and each column allows a same type of data, based on its data
type. The column does not accept values of any other data type.
25
lOMoARcPSD|44749509
The relational algebra is a theoretical procedural query language which takes instance of
relations and does operations that work on one or more relations to describe another relation without
altering the original relation(s). Thus, both the operands and the outputs are relations, and so the
output from one operation can turn into the input to another operation which allows expressions to be
nested in the relational algebra, just as you nest arithmetic operations. This property is called closure:
relations are closed under the algebra, just as numbers are closed under arithmetic operations.
The relational algebra is a relation-at-a-time (or set) language where all tuples are controlled in
one statement without the use of loop. There are several variations of syntax for relational algebra
commands and you use a common symbolic notation for the commands and present it informally.
26
➢ Projection (π)
Projection is used to project required column data from a relation.
Example :
R
(A B C)
----------
1 2 4
2 2 3
3 2 3
4 3 4
π (BC)
B C
-----
2 4
2 3
3 4
27
lOMoARcPSD|44749509
A B C
-------
1 2 4
4 3 4
For R ∪ S, The union of two relations R and S defines a relation that contains all the tuples of R, or S,
or both R and S, duplicate tuples being eliminated. R and S must be union-compatible.
For R − S The Set difference operation defines a relation consisting of the tuples that are in relation R,
but not in S. R and S must be union-compatible.
Example:
➢ Rename(ρ)
This is a unary operator which changes attribute names for a relation without changing any
values. Renaming removes the limitations associated with set operators
28
lOMoARcPSD|44749509
Cross product between two relations let say A and B, so cross product between A X B will results all
the attributes of A followed by each attribute of B. Each record of A will pairs with every record of B.
below is the example
A B
(Name Age Sex ) (Id Course)
------------------ -------------
Ram 14 M 1 DS
Sona 15 F 2 DBMS
kim 20 M
AXB
Name Age Sex Id Course
---------------------------------
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS
Note: if A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘n*m’ tuples.
➢ Division Operation:
29
lOMoARcPSD|44749509
STUDENT_SPORTS
ROLL_NO SPORTS
1 Badminton
2 Cricket
2 Badminton
4 Badminton
ALL_SPORTS
SPORTS
Badminton
Cricket
ROLL_NO
As shown in figure 3.11, natural join operation yield only consistent and useful information. It removes
unnecessary tuples as well as duplicate attributes. This makes the retrieval of information from multiple
relations very easy and convenient.
❖ Outer Join Operation :
➢ An extension of the join operation that avoids loss of information.
➢ Computes the join and then adds tuples form one relation that does not match tuples in the other
relation to the result of the join.
➢ Uses null values:
▪ null signifies that the value is unknown or does not exist
▪ All comparisons involving null are (roughly speaking) false by definition.
• We shall study precise meaning of comparisons with nulls later
➢ Table name: Client
NAME ID
Rahul 10
Vishal 20
ID CITY
30 Bombay
20 Madras
31
lOMoARcPSD|44749509
40 Bombay
➢ Join Client Salesman
NAME ID CITY
Vishal 20 madras
➢ The outer join operation can be divided into three different forms :
32
lOMoARcPSD|44749509
Vishal 20 madras
Null 40 Bomba
y
Identifying
Week Entity
Relationship
Associative
Attribute
Entity
Multivalued Derived
Attribute Attribute
A. Entities: An entities is a person ,place, object, event or concept in the user environment
about which the organization wishes to maintain data.
Example:
33
lOMoARcPSD|44749509
HAS
EMPLOYEE DEPENDENT
34
lOMoARcPSD|44749509
Attributes
An attribute is a descriptive property or characteristics of an entity. The attributes of the entity Customer are
CustNo, Name, Street, City, PostCode, TelNo and Balance.
Types of Attributes
There are a few types of attributes you need to be familiar with. Some of these are to be left as
is, but some need to be adjusted to facilitate representation in the relational model. This first section
will discuss the types of attributes. Later on we will discuss fixing the attributes to fit correctly into
the relational model.
Simple attributes
Simple attributes are an attributes that cannot be broke down into smaller components.
Ex: AUTOMOBILE are simple: Vehicle_id, colors, Weight.
Composite attributes
Composite attributes are an attribute that can be broke down into different components.
35
lOMoARcPSD|44749509
Ex: ADDRESS its component such as Street, Number, SubStreet, State, Postcode.
RELATIONSHIPS
36
lOMoARcPSD|44749509
A Relationship is an associated among the instances of one or more entity types i.e. Interest of the
organization.
Other
Emp_id
Attribute
EMPLOYEE COURSE
Complet
e
There are three main types of relationship that can exist between entities:
i. one-to-one relationship
ii. one-to-many relationship
iii. many-to-many relationship
i. one-to-one relationship: A one to one (1:1) relationship is the relationship of one entity to
only one other entity, and vice versa. It should be rare in any relational database design. In
fact, it could indicate that two entities actually belong in the same table.
Explanation:
An Order generates only one invoice and an Invoice is generated by an order.
ii. one-to-many relationship: A one to many (1:M) relationship should be the norm in any
relational database design and is found in all relational database environments. For example,
one customer makes many orders.
37
lOMoARcPSD|44749509
Explanation:
Each Customer can make one or more orders and an Order is from one customer.
many-to-many relationship : For a many to many relationship, consider the following points:
• It cannot be implemented as such in the relational model.
• It can be changed into two 1:M relationships.
• It can be implemented by breaking up to produce a set of 1:M relationships.
• It involves the implementation of a composite entity.
• Creates two or more 1:M relationships.
• The composite entity table must contain at least the primary keys of the original tables.
• The linking table contains multiple occurrences of the foreign key values.
• Additional attributes may be assigned as needed.
Explanation:
An Order has one or more product and a Product can be in one or more orders.
38
lOMoARcPSD|44749509
b. Binary (Degree – 2): It is a Relationship between the attribute of two entity types and is the
most common types of relationship in data modeling. This relationship has three types.
i. One to one:
EE
➢ It indicate that an employee is assigned one Parking place, & each parking placeis assigned to
one Employee.
ii. One to Many:
➢ It indicate that a Product line may contain several Product and each Product belong to only
one Product Line.
iii. Many to Many:
39
lOMoARcPSD|44749509
Developing an E – R diagram:
Entity Relationship diagrams are major data modeling tool and will help organize the data in our
project into two entities and define the relationships between the entities.
Components of ERD: There are four Components, they are
i. Entity
ii. Relationship
iii. Cardinality
iv. Attribute
i. Entity: A data Entity is anything real or abstract about which we want to store data.
Ex: EMPLOYEE: Employee_id, Employee_name, Address
PAYMENT: Payment_id, Payment_Type.
BOOKS: Book_id, Book_Type.
ii. Relationship: A Relationship is a natural association that exists between one or more
entities.
Ex: Employee process Payment.
iii. Cardinality: Define the number of occurrence of one entity for a single occurrence of the
related entity.
Ex: An Employee may process many Payments but might not process any Payments,
depending on the nature of his / her job.
iv. Attribute: A data Attribute is a characteristics common to all or most instances of a
particular entity.
Ex: Name, Employee_No are all attributes of the entity “EMPLOYEE”
40
Here we are going to design an Entity Relationship (ER) model for a college database .
Say we have the following statements.
1. A college contains many departments
2. Each department can offer any number of courses
3. Many instructors can work in a department
4. An instructor can work only in one department
5. For each department there is a Head
6. An instructor can be head of only one department
7. Each instructor can take any number of courses
8. A course can be taken by only one instructor
9. A student can enroll for any number of courses
10. Each course can have any number of students
Good to go. Let's start our design.(Remember our previous topic and the notations we have
used for entities, attributes, relations etc )
Step 1 : Identify the Entities
What are the entities here?
From the statements given, the entities are
1. Department
2. Course
3. Instructor
41
lOMoARcPSD|44749509
4. Student
Stem 2 : Identify the relationships
1. One department offers many courses. But one particular course can be offered by only
one department. hence the cardinality between department and course is One to Many
(1:N)
2. One department has multiple instructors . But instructor belongs to only one
department. Hence the cardinality between department and instructor is One to Many
(1:N)
3. One department has only one head and one head can be the head of only one
department. Hence the cardinality is one to one. (1:1)
4. One course can be enrolled by many students and one student can enroll for many
courses. Hence the cardinality between course and student is Many to Many (M:N)
5. One course is taught by only one instructor. But one instructor teaches many courses.
Hence the cardinality between course and instructor is Many to One (N :1)
Step 3: Identify the key attributes
• "Departmen_Name" can identify a department uniquely. Hence Department_Name is
the key attribute for the Entity "Department".
• Course_ID is the key attribute for "Course" Entity.
• Student_ID is the key attribute for "Student" Entity.
• Instructor_ID is the key attribute for "Instructor" Entity.
Step 4: Identify other relevant attributes
• For the department entity, other attributes are location
• For course entity, other attributes are course_name,duration
• For instructor entity, other attributes are first_name, last_name, phone
• For student entity, first_name, last_name, phone
Step 5: Draw complete ER diagram
By connecting all these details, we can now draw ER diagram as given below.
42
lOMoARcPSD|44749509
43
File Organization: Physical Database Design Issues - Storage of Database on Hard Disks -
File Organization and Its Types – Heap files (Unordered files) - Sequential File
Organization - Indexed (Indexed Sequential) File Organization - Hashed File Organization
- Types of Indexes - Index and Tree Structure - Multi-Key File Organization - Need for
Multiple Access Paths - Multi-list File Organization - Inverted File Organization.
44
lOMoARcPSD|44749509
It refers to the situation when the same data exists in more than one entity. It may also refer to the fact
that unnecessary or duplicated data is stored at different locations in the database. For instance
There are three types of anomalies that occur when the database is not normalized. These are –
Insertion, update and deletion anomaly. Let’s take an example to understand this.
Example: Suppose a manufacturing company stores the employee details in a table named employee that
has four attributes: emp_id for storing employee’s id, emp_name for storing employee’s name,
emp_address for storing employee’s address and emp_dept for storing the department details in which the
employee works. At some point of time the table looks like this:
The above table is not normalized. We will see the problems that we face when a table is not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the same in
two rows or the data will become inconsistent. If somehow, the correct address gets updated in one
department but not in other then as per the database, Rick would be having two different addresses, which
is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently not
assigned to any department then we would not be able to insert the data into the table if emp_dept field
doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890 then deleting the
rows that are having emp_dept as D890 would also delete the information of employee Maggie since she
is assigned only to this department.
45
To overcome these anomalies we need to normalize the data. In the next section we will discuss about
normalization.
46
lOMoARcPSD|44749509
47
lOMoARcPSD|44749509
Normalization
Here are the most commonly used normal forms:
8812121212
102 Jon Kanpur
9900012222
9990000123
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) are having two mobile numbers so the company stored them in the same
field as you can see in the table above.
48
lOMoARcPSD|44749509
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the
emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
49
lOMoARcPSD|44749509
This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset
of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables comply with Second normal form (2NF).
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional
dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
50
lOMoARcPSD|44749509
Example: Suppose a company wants to store the complete address of each employee, they create a table
named employee_details that looks like this:
employee_zip table:
51
lOMoARcPSD|44749509
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A table
complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super
key of the table.
Example: Suppose there is a company wherein employees work in more than one department. They
store the data like this:
52
lOMoARcPSD|44749509
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 stores
53
lOMoARcPSD|44749509
54
lOMoARcPSD|44749509
Introduction to SQL:
What is SQL?
1. SQL is Structured Query Language, which is a computer language for storing, manipulating and
retrieving data stored in relational database.
2. SQL is the standard language for Relation Database System. All relational database management
systems like MySQL, MS Access, and Oracle, Sybase, Informix, postgres and SQL Server use SQL
as standard database language.
Why SQL?
3. Allows users to access data in relational database management systems.
4. Allows users to describe the data.
5. Allows users to define the data in database and manipulate that data.
6. Allows embedding within other languages using SQL modules, libraries & pre-compilers.
7. Allows users to create and drop databases and tables.
8. Allows users to create view, stored procedure, functions in a database.
9. Allows users to set permissions on tables, procedures and views
History:
10. 1970 -- Dr. E. F. "Ted" of IBM is known as the father of relational databases. He described a
relational model for databases.
11. 1974 -- Structured Query Language appeared.
12. 1978 -- IBM worked to develop Codd's ideas and released a product named System/R.
13. 1986 -- IBM developed the first prototype of relational database and standardized by ANSI. The first
relational database was released by Relational Software and its later becoming Oracle.
SQL Process:
14. When you are executing an SQL command for any RDBMS, the system determines the best way to
carry out your request and SQL engine figures out how to interpret the task.
15. There are various components included in the process. These components are Query Dispatcher,
Optimization Engines, Classic Query Engine and SQL Query Engine, etc. Classic query engine
handles all non-SQL queries, but SQL query engine won't handle logical files.
SQL Process:
55
lOMoARcPSD|44749509
SQL Commands:
SQL is a keyword based language. It consists of reserved words and user defined words. Reserved
word has a fix meaning and must be spelt exactly as required. User-defined words are words to represent
the names of various database objects including tables, columns, and indexes. They are defined by user.
SQL syntax is not case sensitive. Thus, words can be typed in either small or capital letters. SQL
language is a free format. However, to make it more readable, it is advisble to use indentation and
lineation. The SQL notation used throughout this book follows the Backus Naur Form (BNF) which is
described as below:
✓ Uppercase letters are used to represent reserved words
✓ Lower-case letters are used to represent user-defined words
✓ A vertical bar (| ) indicates a choice among alternatives
✓ Curly braces ({}) indicate a required element
✓ A ( [ ] ) brackets indicate an optional element
Data Definition language(DDL) in DBMS with Examples: Data Definition Language can be defined as
a standard for commands through which data structures are defined. It is a computer language that used
for creating and modifying the structure of the database objects, such as schemas, tables, views, indexes,
etc. Additionally, it assists in storing the metadata details in the database.
56
lOMoARcPSD|44749509
57
lOMoARcPSD|44749509
Pupil ID PUPIL_Name
97 Albert
98 Sameer
After Adding Column
DATA MANIPULATION
Data Manipulation Language (DML) can be defined as a set of syntax elements that are used to manage
the data in the database. The commands of DML are not auto-committed and modification made by them
are not permanent to the database. It is a computer programming language that is used to perform select,
insert, delete and update data in a database. The user requests are assisted by Data Manipulation
Language. This language is responsible for all forms of data modification in a database.
In this DML commands, namely:
1. UPDATE : updates data in a database table
2. DELETE : deletes data from a database table
58
lOMoARcPSD|44749509
59
lOMoARcPSD|44749509
Since you want to insert values for all the columns in the table, therefore we may omit the column list.
Thus you may write the SQL statement as below:
INSERT into Supplier
VALUES (S9996, “NR Tech”, “20 Jalan Selamat”,”Kuala Lumpur”,6200,23456677, “Nick”);
60
lOMoARcPSD|44749509
2. Update
The update statement is used to update or change records that match a specified criteria. This is
accomplished by carefully constructing a where clause.
The syntax for UPDATE statement is given below:
UPDATE TableName
SET columnName1 = dataValue1 [, columnName2 = dataValue2...]
[WHERE searchCondition]
✓ TableName is the name of a table.
✓ SET clause specifies names of one or more columns that are to be updated.
✓ WHERE clause is optional:
✓ if omitted, named columns are updated for all rows in table;
✓ if specified, only those rows that satisfy searchCondition are updated.
✓ New dataValue(s) must be compatible with data type for corresponding column
✓ We illustrate the variation of UPDATE statement using the table Employee as given below
61
UPDATE Employee
SET salary = salary*1.10;
The result table from this operation is shown below.
3. DELETE
The DELETE statement is used to delete records or rows from an existing table.
The syntax for DELETE statement is given below:
DELETE FROM TableName
[WHERE searchCondition];
✓ TableName can be name of a base table or an updatable view.
✓ searchCondition is optional; if omitted, all rows are deleted from table. This does not delete table. If
search_condition is specified, only those rows that satisfy condition are deleted.
✓ We illustrate the variation of INSERT statement using the table Supplier as given below.
62
lOMoARcPSD|44749509
63
lOMoARcPSD|44749509
64
lOMoARcPSD|44749509
65
lOMoARcPSD|44749509
1. Rollback
Using this command, the database can be restored to the last committed state. Additionally, it is also used
with savepoint command for jumping to a savepoint in a transaction.
The general syntax for the Rollback command is mentioned below:
Rollback to savepoint-name;
For example
UPDATE STUDENT SET STUDENT_NAME = ‘Manish’ WHERE STUDENT_NAME = ‘Meena’;
ROLLBACK;
This command is used when the user realizes that he/she has updated the wrong information after the
student name and wants to undo this update. The users can issues ROLLBACK command and then undo
the update. Have a look at the below tables to know better about the implementation of this command.
1. Savepoint
The main use of the Savepoint command is to save a transaction temporarily. This way users can rollback
to the point whenever it is needed.
The general syntax for the savepoint command is mentioned below:
savepoint savepoint-name;
For Example
Following is the table of a school class
66
lOMoARcPSD|44749509
Use some SQL queries on the above table and then watch the results
INSERT into CLASS VALUES (101, ‘Rahul);
Commit;
UPDATE CLASS SET NAME= ‘Tyler’ where id= 101
SAVEPOINT A;
INSERT INTO CLASS VALUES (102, ‘Zack’);
Savepoint B;
INSERT INTO CLASS VALUES (103, ‘Bruno’)
Savepoint C;
Select * from Class;
The result will look like
Now
rollback to savepoint B
Rollback to B;
SELECT * from Class;
67
lOMoARcPSD|44749509
The above result shows list of comparison operators that can be used in the WHERE clause. In addition, a
more complex condition can be generated using the logical operators AND, OR, and NOT.
68
lOMoARcPSD|44749509
✓ When writing a query, we sometimes need to express a condition that refers to a table that must
itself be computed.
✓ A subquery typically appears within the WHERE clause of a query. Subqueries can sometimes
appear in the FROM clause or the HAVING clause.
✓ Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along
with the operators like =, <, >, >=, <=, IN, BETWEEN etc.
✓ here are a few rules that subqueries must follow:
1. Subqueries must be enclosed within parentheses.
2. A subquery can have only one column in the SELECT clause, unless multiple columns are in the
main query for the subquery to compare its selected columns.
3. A subquery cannot be immediately enclosed in a set function.
Subqueries with the SELECT Statement:
Subqueries are most frequently used with the SELECT statement. The basic syntax is as follows:
SELECT column_name
[, column_name ] FROM table1
[, table2 ]
WHERE column_name OPERATOR
(SELECT column_name [, column_name ]
FROM table1 [, table2]
[WHERE])
Ex: SELECT *
FROM customers
WHERE id in
(SELECT id
FROM customers
WHERE salary >4500);
Subqueries with the INSERT Statement:
✓ Sub queries also can be used with INSERT statements.
✓ The INSERT statement uses the data returned from the subquery to insert into another table.
✓ The selected data in the subquery can be modified with any of the character, date or number
functions.
Syntax
INSERT INTO tablename [ (column1[, column2 ]) ]
SELECT [ *|column1 [, column2 ]
FROM table1 [, table2]
69
lOMoARcPSD|44749509
Multi-table Queries
To retrieve data using SELECT statement from only one table. Sometimes we need results that
contain columns from more than one table. Thus, we need to perform a join operation to combine these
columns into one result table. To perform a join, we need to specify the tables to be used in the FROM
clause. The join condition that specifies the matching or common column/s of the tables to be joined is
written in the WHERE clause.
In this section we use Product and Delivery tables, shown below to illustrate the use of these
Multi-table Queries.
.
Product Table
70
lOMoARcPSD|44749509
Delivery Table
71
lOMoARcPSD|44749509
72
lOMoARcPSD|44749509
✓ If we want to sort the list in descending order, the word DESC must be specified in the ORDER
BY clause after the column name, as shown below.
SELECT EmpNo, Name, TelNo, Position, salary
FROM Employee
ORDER BY salary DESC;
SQL JOIN
• A SQL JOIN combines records from two tables.
• A JOIN locates related column values in the two tables.
73
lOMoARcPSD|44749509
CUSTOMER
Id
74
lOMoARcPSD|44749509
FirstName
LastName
City
Country
Phone
ORDER
Id
OrderDate
OrderNumber
CustomerId
TotalAmount
75
lOMoARcPSD|44749509
76
lOMoARcPSD|44749509
77
lOMoARcPSD|44749509
Creating Views
Database views are created using the CREATE VIEW statement. Views can be created from a single table,
multiple tables or another view.
To create a view, a user must have the appropriate system privilege according to the specific implementation.
The basic CREATE VIEW syntax is as follows −
CREATE VIEW view_name AS
SELECT column1, column2.....
FROM table_name
WHERE [condition];
You can include multiple tables in your SELECT statement in a similar way as you use them in a normal SQL
SELECT query.
Example
Consider the CUSTOMERS table having the following records −
78
lOMoARcPSD|44749509
Following is an example to create a view from the CUSTOMERS table. This view would be used to have
customer name and age from the CUSTOMERS table.
FROM CUSTOMERS;
Now, you can query CUSTOMERS_VIEW in a similar way as you query an actual table. Following is an example
for the same.
| Chaitali | 25 |
| Hardik | 27 |
| Komal | 22 |
| Muffy | 24 |
+----------+-----+
FROM CUSTOMERS
79
lOMoARcPSD|44749509
The WITH CHECK OPTION in this case should deny the entry of any NULL values in the view's AGE column,
because the view is defined by data that does not have a NULL value in the AGE column.
Updating a View
A view can be updated under certain conditions which are given below −
• The SELECT clause may not contain the keyword DISTINCT.
• The SELECT clause may not contain summary functions.
• The SELECT clause may not contain set functions.
• The SELECT clause may not contain set operators.
• The SELECT clause may not contain an ORDER BY clause.
• The FROM clause may not contain multiple tables.
• The WHERE clause may not contain subqueries.
• The query may not contain GROUP BY or HAVING.
• Calculated columns may not be updated.
• All NOT NULL columns from the base table must be included in the view in order for the INSERT
query to function.
So, if a view satisfies all the above-mentioned rules then you can update that view. The following code block has
an example to update the age of Ramesh.
SET AGE = 35
This would ultimately update the base table CUSTOMERS and the same would reflect in the view itself. Now, try
to query the base table and the SELECT statement would produce the following result.
+----+----------+-----+-----------+----------+
80
lOMoARcPSD|44749509
Here, we cannot insert rows in the CUSTOMERS_VIEW because we have not included all the NOT NULL
columns in this view, otherwise you can insert rows in a view in a similar way as you insert them in a table.
This would ultimately delete a row from the base table CUSTOMERS and the same would reflect in the view
itself. Now, try to query the base table and the SELECT statement would produce the following result.
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
Dropping Views
Obviously, where you have a view, you need a way to drop the view if it is no longer needed. The syntax is very
simple and is given below −
DROP VIEW view_name;
81
lOMoARcPSD|44749509
Example
Try out the following example. This will create a table and after that it will insert a few rows in this table where it
is not required to give a record ID because its auto-incremented by MySQL.
-> (
-> origin VARCHAR(30) NOT NULL # where collected);Query OK, 0 rows affected (0.02 sec)
-> (NULL,'housefly','2001-09-10','kitchen'),
-> (NULL,'millipede','2001-09-10','driveway'),
-> (NULL,'grasshopper','2001-09-10','front yard');Query OK, 3 rows affected (0.02 sec)Records: 3 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM INSECT ORDER BY id;+----+-------------+------------+------------+| id | name | date | origin |+---
-+-------------+------------+------------+| 1 | housefly | 2001-09-10 | kitchen || 2 | millipede | 2001-09-10 | driveway || 3 |
grasshopper | 2001-09-10 | front yard |+----+-------------+------------+------------+3 rows in set (0.00 sec)
-> (
Alternatively, you can create the table and then set the initial sequence value with ALTER TABLE.
SQL - Indexes
Indexes are special lookup tables that the database search engine can use to speed up data retrieval.
Simply put, an index is a pointer to data in a table. An index in a database is very similar to an index in
the back of a book.
For example, if you want to reference all pages in a book that discusses a certain topic, you first refer to
the index, which lists all the topics alphabetically and are then referred to one or more specific page
numbers.
An index helps to speed up SELECT queries and WHERE clauses, but it slows down data input, with
the UPDATE and the INSERT statements. Indexes can be created or dropped with no effect on the data.
Creating an index involves the CREATE INDEX statement, which allows you to name the index, to
specify the table and which column or columns to index, and to indicate whether the index is in an
ascending or descending order.
Indexes can also be unique, like the UNIQUE constraint, in that the index prevents duplicate entries in
the column or combination of columns on which there is an index.
Single-Column Indexes
A single-column index is created based on only one table column. The basic syntax is as follows.
Unique Indexes
Unique indexes are used not only for performance, but also for data integrity. A unique index does not
allow any duplicate values to be inserted into the table. The basic syntax is as follows.
Composite Indexes
A composite index is an index on two or more columns of a table. Its basic syntax is as follows .
83
Whether to create a single-column index or a composite index, take into consideration the column(s) that
you may use very frequently in a query's WHERE clause as filter conditions.
Should there be only one column used, a single-column index should be the choice. Should there be two
or more columns that are frequently used in the WHERE clause as filters, the composite index would be
the best choice.
Implicit Indexes
Implicit indexes are indexes that are automatically created by the database server when an object is
created. Indexes are automatically created for primary key constraints and unique constraints.
we can check the INDEX Constraint chapter to see some actual examples on Indexes.
When should indexes be avoided?
Although indexes are intended to enhance a database's performance, there are times when they should be
avoided.
The following guidelines indicate when the use of an index should be reconsidered.
• Indexes should not be used on small tables.
• Tables that have frequent, large batch updates or insert operations.
• Indexes should not be used on columns that contain a high number of NULL values.
• Columns that are frequently manipulated should not be indexed.
84
lOMoARcPSD|44749509
UNIT-IV
TRANSACTIONS
AND
CONCURRENCY MANAGEMENT:
Transactions - Concurrent Transactions - Locking Protocol - Serialisable Schedules - Locks Two Phase
Locking (2PL) - Deadlock and its Prevention - Optimistic Concurrency Control.
Database Recovery and Security: Database Recovery meaning - Kinds of failures - Failure controlling
methods - Database errors - Backup & Recovery Techniques - Security & Integrity - Database Security -
Authorization.
85
lOMoARcPSD|44749509
Transactions:
i. Read(x).
ii. Write(x).
The first one perform the reading operation of data item x from the database, where as the second one
perform the writing operation of data item x to the database. Consider a transaction Ti which transfers
100/- from “A” account to “B” account. This transaction will follows
Ti:
read(A);
A: = A-100;
Write(A);
Read(B);
Write(B);
1. Atomicity − This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none. There must be no state in a database where a
transaction is left partially completed. States should be defined either before the execution of the
transaction or after the execution/abortion/failure of the transaction.
2. Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the database
86
lOMoARcPSD|44749509
was in a consistent state before the execution of a transaction, it must remain consistent after the
execution of the transaction as well.
3. Durability − The database should be durable enough to hold all its latest updates even if the
system fails or restarts. If a transaction updates a chunk of data in a database and commits, then
the database will hold the modified data. If a transaction commits but the system fails before the
data could be written on to the disk, then that data will be updated once the system springs back
into action.
4. Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will affect
the existence of any other transaction.
Transaction States
There are the following six states in which a transaction may exist:
Active: The initial state when the transaction has just started execution.
Partially Committed: At any given point of time if the transaction is executing properly, then it is going
towards it COMMIT POINT. The values generated during the execution are all stored in volatile storage.
Failed: If the transaction fails for some reason. The temporary values are no longer required, and the
transaction is set to ROLLBACK. It means that any change made to the database by this transaction up to
the point of the failure must be undone. If the failed transaction has withdrawn Rs. 100/- from account A,
then the ROLLBACK operation should add Rs 100/- to account A.
Aborted: When the ROLLBACK operation is over, the database reaches the BFIM. The transaction is
now said to have been aborted.
Committed: If no failure occurs then the transaction reaches the COMMIT POINT. All the temporary
values are written to the stable storage and the transaction is said to have been committed.
Terminated: Either committed or aborted, the transaction finally reaches this state.
87
lOMoARcPSD|44749509
CONCURRENT TRANSACTIONS
When more than one transaction is executed by the operating system in a multiple programming
environment, there are possibilities that instructions of one transaction are added with some other
transaction.
➢ Schedule: a sequential execution sequence of transaction is called schedule. A schedule can have
many transactions in it, each containing of number of instruction / tasks.
➢ Serial Schedule: a schedule in which transactions are aligned in such a way that one transaction is
executed first. When the first transaction completes its cycle then next transaction is executed.
Transactions are ordered one after other. This type of schedule is called serial schedule as
transactions are executed in a serial manner.
In a multi – transaction environment, serial schedules are considered as benchmark. The execution
sequence of instruction in a transaction cannot be changed but two transactions can have their
instruction executed in random wise. This execution does no damage if two transactions are manually
independent and working on different segment of data but in case these two transactions are working
on same data, result may change,. This every – changeable result may reason the database in an
inconsistent state.
To solve the problem, we allow parallel execution of transaction schedule if transactions in it are
either serializable or have some equivalence relation between or among transactions.
Example:
88
lOMoARcPSD|44749509
Here,
o At time t2, transaction-X reads A's value.
o At time t3, Transaction-Y reads A's value.
o At time t4, Transactions-X writes A's value on the basis of the value seen at time t2.
o At time t5, Transactions-Y writes A's value on the basis of the value seen at time t3.
o So at time T5, the update of Transaction-X is lost because Transaction y overwrites it without
looking at its current value.
o Such type of problem is known as Lost Update Problem as update made by one transaction is lost
here.
2. Dirty Read
o The dirty read occurs in the case when one transaction updates an item of the database, and then
the transaction fails for some reason. The updated database item is accessed by another transaction
before it is changed back to the original value.
o A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now has values which
have never formed part of the stable database.
Example:
Transaction-X is doing the sum of all balance while transaction-Y is transferring an amount 50 from
Account-1 to Account-3.
o Here, transaction-X produces the result of 550 which is incorrect. If we write this produced result
in the database, the database will become an inconsistent state because the actual sum is 600.
o Here, transaction-X has seen an inconsistent state of the database.
Concurrency Control:-
➢ The coordination of the simultaneous execution of transactions in a multi user database system is
known as concurrency control.
90
lOMoARcPSD|44749509
91
lOMoARcPSD|44749509
This level of locking is good for batch processes but it is unsuitable for multi user DBMS. Because
thousands of transactions had to wait for the previous transaction to be completed before the next one
could reserve the entire database. So the data access would be slow.
Table level:-
➢ In table level lock the entire table is locked that means if transaction T1 is accessing a table then
transaction T2 cannot access the same table.
➢ If a transaction requires access to several tables, each table may be locked.
➢ Table level locks are less restrictive than database level locks.
➢ Table level locks are not suitable for multi-user DBMS.
➢ The drawback of table level lock is suppose transaction T1 and T2 cannot access the same table
even when they try to use different rows; T2 must wait until T1 unlocks the table.
Page level:-
➢ In a page level lock, the DBMS will lock on entire disk page.
➢ A disk page or page is also referred as a disk block, which is described as a section of a disk.
➢ A page has a fixed size such as 4k, 8k or 16k.
➢ A table can span several pages, and a page can contain several rows of one or more tables.
➢ Page level locks are currently frequently used multi-user DBMS locking method.
➢ Page level lock is shown in the following fig.
In the above fig. T1 and T2 access the same table while locking different disk pages.
➢ If T2 requires the use of a row located on a page that is locked by T1, T2 must wait until the page
is unlocked.
Row level:-
➢ A row level lock is much less restrictive than the other locks. The DBMS allows concurrent
transactions to access different rows of the same table even the rows are located on the same
pages.
➢ The row level locking approach improves the availability of data.
➢ But row level locking management requires high overhead because a lock exist for each row in a
table of the database. So it involves a conflicting transaction.
Field level:-
92
lOMoARcPSD|44749509
The field level lock allows concurrent transactions to access the same row as long as they require the use
of different fields (attributes) within that row.
➢ Although field level locking clearly yields the most flexible multi user data access, but it is rarely
implemented in a DBMS because it requires an extreme High Level computer overhead.
Lock Types:-
➢ The DBMS use different lock types like
a) Binary Locks
b) Shared/Exclusive Locks.
a) Binary Locks:-
➢ A binary lock has two states:
a) Locked
b) Unlocked
➢ In an object is locked by a transaction no other transaction can use that object. The object may be
a database, table, page or row.
➢ If an object is unlocked, any transaction can lock the object for its use.
➢ Every database operation requires that the affected object be locked.
➢ A transaction must unlock the object after its termination. Therefore every transaction requires a
lock and unlocks operation for each data item that is accessed.
➢ Such operations are automatically managed and scheduled by the DBMS.
➢ Every DBMS has a default locking mechanism. If the end user wants to override the default, the
LOCK TABLE and other SQL commands are available.
➢ Using binary locks the lost update problem is eliminated in concurrency control because the lock
released until the WRITE statement is completed.
➢ But binary locks are how considered too restrictive to yield optional concurrency conditions For
example the DBMS will not allow two transactions to read the same database object even though
neither transaction updates the database.
Shared/Exclusive Locks:-
➢ A shared lock exists when concurrent transactions are granted read access on the basic of a
common lock. A shared lock produces no conflict as long as all the concurrent transactions are
read only.
➢ An exclusive lock exists when access is reserved specifically for the transaction that locked the
object. The exclusive lock must be used when conflicts exists lock one transaction is READ and
other is WRITE.
➢ So a shared lock is issued when a transaction wants to read data from the data base and an
exclusive lock is issued when a transaction wants to update (WRITE) a data item.
93
lOMoARcPSD|44749509
94
lOMoARcPSD|44749509
1. Growing Phase: New locks on data items may be acquired but none can be released.
2. Shrinking Phase: Existing locks may be released but no new locks can be acquired.
Note – If lock conversion is allowed, then upgrading of lock( from S(a) to X(a) ) is allowed in Growing
Phase and downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.
Let’s see a transaction implementing 2-PL.
T1 T2
1 LOCK-S(A)
2 LOCK-S(A)
3 LOCK-X(B)
4 ……. ……
5 UNLOCK(A)
6 LOCK-X(C)
7 UNLOCK(B)
8 UNLOCK(A)
9 UNLOCK(C)
10 ……. ……
This is just a skeleton transaction which shows how unlocking and locking works with 2-PL. Note for:
Transaction T1:
• Growing Phase is from steps 1-3.
• Shrinking Phase is from steps 5-7.
• Lock Point at 3
Transaction T2:
• Growing Phase is from steps 2-6.
• Shrinking Phase is from steps 8-9.
• Lock Point at 6
Deadlocks:-
➢ A dead lock occurs when two transactions wait indefinitely for each other to unlock data For
example a dead lock occurs when two transactions, T1 and T2 exist in the following mode.
95
lOMoARcPSD|44749509
96
lOMoARcPSD|44749509
Example:
➢ Assume that we have two conflicting transactions:
T1 and T2, each with a unique time stamp.
➢ Suppose T1 has time stamp of 11548789 and T2 has a time stamp of 19562545. So T1 is the order
transaction and T2 is newer (younger) transaction.
Using the wait/die scheme:-
a) If the transaction requesting the lock is the order of the two transactions, it will wait until the other
transactions is completed, and the locks are relapsed.
b) If the transaction requesting the lock is the younger of the two transactions, it will die (rollback) and is
rescheduled using the same time stamp.
➢ That means in wait/die scheme, the order transaction waits for the younger to complete and
release its locks.
Using the wound/wait scheme:-
a) If the transaction requesting the lock is the older of the two transactions, it will preempt (wound) the
younger transaction (by rolling it back). The younger transaction is rescheduled using the same time
stamp.
b) If the transaction requesting the lock is the younger of the two transactions, it will wait until the other
transaction is completed and the locks are released.
➢ That means in the wound/wait scheme, the older transaction rolls back the younger transaction and
reschedules it.
Concurrency Control with Optimistic methods:-
➢ The optimistic approach is based on the assumption that the majority of the database operations do
not conflict.
The optimistic approach requires neither locking nor time stamping techniques.
➢ Using an optimistic approach, each transaction moves through three phases. They are
a) Read Phase
b) Validation Phase
c) Write Phase.
➢ During the read phase, the transaction reads the database, executes the needed computations, and
makes the updates to a private copy of the data base values. All the update operations of the
transaction are recorded in a temporary update file, which is not accessed by the remaining
transactions.
During the validation phase the transaction is validated to ensure that the changes made will not affect the
integrity and consistency of the database. If the validation test is positive, the transaction goes to the write
phase. If the validation test is negative, the transaction is restarted and the changes are discarded.
97
lOMoARcPSD|44749509
"Database security" is protection of the information contained in the database against unauthorized
access, modification or destruction.
"Database integrity" is the mechanism that is applied to ensure that the data in the database is correct and
consistent.
Database Recovery
Recovery techniques are used to bring database, which does not satisfy consistency requirements, into a
consistent state. The inconsistencies may arise due to dissatisfaction of the semantic integrity constraints
specified in the schema or may be due to damage of certain implicit constraints that are expected to hold
for a database. In other words, if a transaction completes normally then all the changes that it performs on
the database are permanently committed. But, if transaction does not complete normally then none of its
changes are committed. An abnormal termination may be due to several reasons including:
a) user may decide to abort his transaction
b) there might be a deadlock
c) there might be a system failure.
So the recovery mechanisms must make sure that a consistent state of database can be restored under all
circumstances. In case of transaction abort or deadlock the system remains in control but incase of failure
the system loses control because computer itself fails or some critical data are lost.
Kinds of Failures
When a transaction/program is made to be executed, a number of difficulties can arise, which leads to its
abnormal termination. The failures are mainly of two types:
1. Soft failures: In such cases, a CPU or memory or software error shortly stops the execution of the
current transaction (or all transactions), thus lead to losing the state of program execution and the
state/contents of the buffers. These can further be subdivided into two types:
a) Statement failure
b) Program failure
98
lOMoARcPSD|44749509
A Statement of program may cause to abnormal termination if it does not execute completely. If
during the execution of a statement, an integrity constraints get violated it leads to abnormal
termination of program due to which any updates made already may not got reflected in the
database leaving it in an inconsistent state.
A failure of program can occur if some code in a program leads to its abnormal termination. E.g.,
a program which goes into an infinite loop. In such case the only way to break the loop is to abort
the program. Thus part of program, which is executed before abortion from program may cause
some updates in database, and hence the database is, updated only partially which leads to an
inconsistent state of database. Also in case of deadlock i.e. if one program enters into a deadlock
with some other program, then this program has to be restarted to get out of deadlock and thus the
partial updates made by this program in the database makes the database in an inconsistent state.
Thus soft failures can be occurred due to either of statement failure or failure of program.
2. Hard failure: Hard failures are those failures when some data on disk get damaged and cannot be
read anymore. This may be due to many reasons e.g. a voltage fluctuation in the power supply to
the computer makes it go off or some bad sectors may come on disk or there is a disk crash. In all
these cases, the database gets into an inconsistent state.
99
lOMoARcPSD|44749509
abnormal termination and hence leaving database in a corrupt state. If all such precautions are taken in
advance then no extra effort has to be done in recovering erroneous data on the database.
Several recovery techniques have been proposed for database systems. As we have seen that two types of
failures are there, so now we will discuss about how to recover from those two types of failures. Soft
failure or Media failure recovery can be done using/restoring the last backup copy or by doing forward
recovery if the system logs is intact. While Hard failure or system failure recovery using log include
backward recovery as well as forward recovery. So there are two main strategies for performing recovery:
1) Backward Recovery (UNDO)
In this scheme the uncommitted changes made by a transaction to a database are undone. Instead the
system is reset to some previous consistent state of database that is free from any errors.
In simpler words, when a particular error in system is detected, the recovery system makes an accurate
assessment of the state of the system and then makes appropriate adjustment based on the anticipated
results had the system been error free. One thing to be noted that the Redo operation must be idempotent
i.e. executing it several times must be equivalent to executing it once. This characteristic is required to
guarantee correct behaviour of database even if a failure occurs during the recovery process.
100
lOMoARcPSD|44749509
101
lOMoARcPSD|44749509
contents from unauthorized access includes legal & ethical issues, organization policies as well as
database management policies. To protect database several levels of security measures are maintained: -
1. Physical : The site or sites containing the computer system must be physically secured against
illegal entry of unauthorized person.
2. Human : A template authorization is given to user to reduce chance of any other user giving
access to outsides in exchange of some favors.
3. O.S. : Even though a fool proof security measures are taken to secure database System, weakness
in O.S. security may serve as a means of unauthorized access to the database.
4. Network : Since databases allow distributed or remote access through terminals or network,
software level security within the network software is an important issue to be taken under
consideration.
5. Database system : In database also according to user needs authorization is distributed or done.
That is to say user may bee allowed to read data & issue queries but would not be allowed to
deliberately modify the data. Only some upper level users may be allowed to do so giving them
authorized access rights with database itself. It is the responsibility of database system to ensure
that these authorization restrictions are not violated.
To ensure database security scarcity at all these above levels must be maintained.
Authorization
Authorization is the culmination of the administrative policies of the organization. As name specifies,
authorization is a set of rules that can be used to determine which user has what type of access of which
portion of the database. The person who writes access rules is called an authorizer.
An authorizer may set several forms of authorization on parts of the database. Among them are the
following:
1. Read Authorization: allows reading, but not modification of data.
2. Insert Authorization: allows insertion of new data, but not the modification of existing data, e.g.
insertion of tuple in a relation.
3. Update authorization: allows modification of data, but not its deletion. But data items like
primary-key attributes may not be modified.
4. Delete authorization: allows deletion of data only.
A user may be assigned all, none or combination of these types of authorization, which are broadly called
access authorizations.
In addition to these manipulation operations, a user may be granted control operations like
1. Add: Allow adding new object types such as new relations (in case of RDB), records and set
types (in case of network model) or record types and hierarchies (in hierarchical model of DB).
2. Drop: Allows the deletion of relations in DB.
102
lOMoARcPSD|44749509
3. Alter: Allows addition of new attributes in a relations (data-items) or deletion of existing data
items from the database.
4. Propagate Access Control: This is an additional right that allows to propagate the access control
or access right which one already has to some other i.e. if user A has access right R over a relation
S, then if having propagate access control, he can propagate his access right R over relation S to
another user B either fully or part of it.
103
lOMoARcPSD|44749509
UNIT 5
Distributed database
A distributed database is a collection of multiple interconnected databases, which are spread physically
across various locations that communicate via a computer network.
Features
• Databases in the collection are logically interrelated with each other. Often they represent a single
logical database.
• Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.
• The processors in the sites are connected via a network. They do not have any multiprocessor
configuration.
• A distributed database is not a loosely connected file system.
• A distributed database incorporates transaction processing, but it is not synonymous with a
transaction processing system.
Features
• It is used to create, retrieve, update and delete distributed databases.
• It synchronizes the database periodically and provides access mechanisms by the virtue of which
the distribution becomes transparent to the users.
• It ensures that the data modified at any site is universally updated.
• It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.
• It is designed for heterogeneous database platforms.
• It maintains confidentiality and data integrity of the databases.
104
lOMoARcPSD|44749509
local data to the new site and finally connecting them to the distributed system, with no interruption in
current functions.
More Reliable − In case of database failures, the total system of centralized databases comes to a halt.
However, in distributed systems, when a component fails, the functioning of the system continues may
be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met from local
data itself, thus providing faster response. On the other hand, in centralized systems, all queries have to
pass through the central computer for processing, which increases the response time.
Lower Communication Cost − In distributed database systems, if data is located locally where it is
mostly used, then the communication costs for data manipulation can be minimized. This is not feasible
in centralized systems.
106
3. Middleware architecture
• Middleware architectures are designed in such a way that single query is executed on multiple
servers.
• This system needs only one server which is capable of managing queries and transactions from
multiple servers.
• Middleware architecture uses local servers to handle local queries and transactions.
• The softwares are used for execution of queries and transactions across one or more independent
database servers, this type of software is called as middleware.
Data Replication
Data replication is the process in which the data is copied at multiple locations (Different computers or
servers) to improve the availability of data.
Replication Schemes
The three replication schemes are as follows:
107
lOMoARcPSD|44749509
1. Full Replication
In full replication scheme, the database is available to almost every location or user in communication
network.
2. No Replication
Advantages of no replication
• Concurrency can be minimized.
• Easy recovery of data.
Disadvantages of no replication
• Poor availability of data.
• Slows down the query execution process, as multiple clients are accessing the same server.
108
3. Partial replication
Partial replication means only some fragments are replicated from the database.
Fragmentation
Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are
called fragments. Fragmentation can be of three types: horizontal, vertical, and hybrid (combination of
horizontal and vertical). Horizontal fragmentation can further be classified into two techniques: primary
horizontal fragmentation and derived horizontal fragmentation.
Fragmentation should be done in a way so that the original table can be reconstructed from the
fragments. This is needed so that the original table can be reconstructed from the fragments whenever
required. This requirement is called “reconstructiveness.”
Advantages of Fragmentation
• Since data is stored close to the site of usage, efficiency of the database system is increased.
• Local query optimization techniques are sufficient for most queries since data is locally available.
• Since irrelevant data is not available at the sites, security and privacy of the database system can
be maintained.
Disadvantages of Fragmentation
• When data from different fragments are required, the access speeds may be very high.
• In case of recursive fragmentations, the job of reconstruction will need expensive techniques.
• Lack of back-up copies of data in different sites may render the database ineffective in case of
failure of a site.
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to
maintain reconstructiveness, each fragment should contain the primary key field(s) of the table. Vertical
fragmentation can be used to enforce privacy of data.
For example, let us consider that a University database keeps records of all registered students in a
Student table having the following schema.
109
lOMoARcPSD|44749509
STUDENT
Now, the fees details are maintained in the accounts section. In this case, the designer will fragment the
database as follows −
Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more fields.
Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each horizontal fragment
must have all columns of the original base table.
For example, in the student schema, if the details of all students of Computer Science Course needs to be
maintained at the School of Computer Science, then the designer will horizontally fragment the database
as follows −
CREATE COMP_STD AS
SELECT * FROM STUDENT
WHERE COURSE = "Computer Science";
Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques are used.
This is the most flexible fragmentation technique since it generates fragments with minimal extraneous
information. However, reconstruction of the original table is often an expensive task.
Hybrid fragmentation can be done in two alternative ways −
• At first, generate a set of horizontal fragments; then generate vertical fragments from one or more
of the horizontal fragments.
• At first, generate a set of vertical fragments; then generate horizontal fragments from one or more
of the vertical fragments.
110