DBMS Note
DBMS Note
ON
PREPARED BY:-
SAMBHU PRASAD PANDA
ASST PROF (CSE)
C. V. RAMAN POLYTECHNIC
Database Management System
RATIONALE
Database is the prime area of Application Development. This paper teaches the methodology of storing
& processing data for commercial application. It also deals in the security & other aspects of DBMS.
Data independence
Entity relationship models
Entity sets and Relationship sets 2.4 Explain Attributes
Mapping constraints
E-R Diagram
Relational model
Hierarchical model
Network model
Relational algebra
Different operators select, project, and join, simple Examples
Functional Dependencies
Lossless join
Importance of normalization
Compare First second and third normal forms 4.5 Explain BCNF
Basic concepts,
Locks, Live Lock, Dead Lock,
Serializability(only fundamentals)
DBMS:-
A database system is nothing more than a computer based record keeping system i.e. whose
purpose is to record and maintain the data or information.
A DBMS is a software system that allows to access data contained in a database.
The objective of DBMS is to provide a convenient and effective method of defining, storing and
retrieving information contained in the database.
The DBMS interfaces with application programs so that the data contained in the database can
be used by multiple applications and users.
A database system involves four major components namely data, hardware, software and user.
Data: - The fundamental unit of database is data. A database is therefore nothing but a
repository for stored data. Every data must have two properties namely integrated and shared.
Integrated means that the data can be uniquely identified in the database and shared means the
data can be shared among several different users.
Hardware: - The hardware consists of secondary storage volumes on which the database
resides.
Software: - Between the physical database and the users of the system there is a layer of
software usually called as database management system (DBMS). All requests from users for
access to the database are handled by DBMS.
Users: - The users are the application programmers responsible for writing application programs
that use the database. The application programmers operate on the data in all the usual ways,
i.e. retrieving information, creating new information, deleting or changing existing information.
The second classes of users are the end users whose task is to access the data of a database
from a terminal. The end users use a query language to invoke user written application
programs as per the commands from the terminal. The third classes of users are the database
administrators or DBA who has control over the whole system.
DBMS
Credit card transactions-> For purposes on credit cards and generation of monthly statements.
Telecommunications-> For keeping records of calls made, generating monthly bills, maintaining balances
on prepaid calling cards and storing information about the communication network.
Finance-> For storing information about holdings, sales and purchases of financial instruments such as
stocks and bonds.
Manufacturing-> For management of supply chain and for tracking production of items in factories,
inventory of items in warehouses and orders for items.
This architecture is used to provide framework which is extremely useful in describing general database
concepts and for explaining the structure of individual systems.
The purpose of designing a generalized database is so that it must have the capability to
transform the query asked by the user into programming form so that the system can understand it and
will be able to retrieve back the answer of the query.
It is divided into three levels.
I. External level or view level or user view
II. Conceptual level or logical level or global view
III. Physical level or internal level or internal view
The view at each of these levels is described by a schema. A schema is an outline or a plan that
describes the records and relationships existing in the view. The schema also describes the way in which
entities at one level of abstraction can be mapped to next level.
External/conceptual mapping
Conceptual/Internal mapping
Physical Level Or Internal Level
1. Physical Level:-
The lowest level of abstraction describes how the data are actually stored. The physical
level describes complex low level data structure in detail. This level is expressed by physical
schema which contains the definition of stored record, the method of representing the data
fields and access aids used.
The internal level is the one closest to the physical storage. Whenever an external user
query a database and the response of the query is available at the conceptual level then it is
provided to the user. If the response is not available at the conceptual level then it is retrieved
from the internal level.
DATABASE USERS:-
There are five different types of database system users differentiated by the way they expect to interact
with the system.
Naive Users:-
They are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously. Example:- ATM user
The typical user interface for a naive user is a forms interface where the user can fill in
appropriate fields of the form. Naive users may also simply read reports generated from
the database.
Application Programmers:-
Application programmers are computer professionals who write application programs.
Application programmers can choose from many tools like RAD tools, programming
languages, fourth generation languages etc. to develop user interfaces.
Sophisticated Users:-
Sophisticated users interact with the system without writing programs. Instead they
form their requests in database query language. They submit each such query to query
processor whose function is to break down DML statements into instructions that the
storage manager understands.
Analysts who submit queries to explore data in the database fall in this category.
A database system provides a data definition language to specify the database schemas.
For example the following SQL statement defines the account table.
Sql> create table account (accountno varchar2 (10), balance number);
Execution of the above DDL statement creates the account table. In addition it updates a special
set of tables called data dictionary.
DDL is used to define the database. This definition includes all the entity sets and their
associated attributes as well as the relationship among the entity sets. It also includes any
constraints that have to be maintained.
The DDL used at the external schema is called the view definition language (VDL) from where
the defining process starts.
A data dictionary contains metadata. A database system consults data dictionary before reading
or modifying actual data. The schema of a table is an example of a metadata i.e. data about
data.
We specify the storage structure and access methods used by the database system by a set of
statements in a special type of DDL called data storage and definition language (DSDL). These
statements define the implementation details of the database schemas which are usually hidden
from the users.
The DDL provides facilities to specify such consistency constraints. The database system checks
these constraints every time the database is updated.
Information regarding the structure and usage of data contained in the database, the metadata
maintained in a data dictionary. The term system catalog also describes this metadata. The data
dictionary which is a database itself documents the data.
Each database users can consult the data dictionary to learn what each piece of data and various
synonyms of data fields mean.
In an integrated system (i.e. in system where the data dictionary is a part of the DBMS) the data
dictionary stores information concerning the external, conceptual and internal levels of the
database. It contains the source of each data field value, the frequency of its use and an audit
trail concerning updates, including the who and when of each update.
DATA MODELS
Data Independence:-
Three levels of abstraction along with the mappings from internal to conceptual and from
conceptual to external level provide two distinct levels of data independence: logical data
independence and physical data independence.
Logical data independence indicates that the conceptual schema can be changed without
affecting the existing external schemas. The change would be absorbed by the mapping
between the external and conceptual level.
Logical data independence also insulates application programs from operations such as
combining two records into one or splitting an existing record into two or more records.
Logical data independence is achieved by providing the external level or user view of the
database. The application programs or users see the database as described by their respective
external views.
Physical data independence is achieved by the presence of the internal level of the database and
the mapping or transformation from conceptual level of database to internal level.
The physical data independence criterion requires that the conceptual level does not specify
storage structures or the access methods (indexing, hashing etc) used to retrieve the data from
the physical storage medium.
Another aspect of data independence allows different interpretation of the same data. The
storage of data is in bits and may change from EBCDIC to ASCII coding.
Ellipses Attributes
Customer_name Customer_street
Accountno Balance
Customer_id Customer_city
(E-R Diagram)
In addition to entities and relationships, the E-R model represents certain constraints to which
the contents of a database must conform. One important constraint is mapping cardinalities,
which express the number of entities to which another entity can be associated via a
relationship set. For example, if each account must belong to only one customer the E-R model
can express that constraint.
The E-R model is widely used in database design.
ENTITY SETS:-
An entity is a thing or object in the real world that is distinguishable from all other objects.
Example:- Each person of an enterprise.
An entity has a set of properties and values. For some set of properties may uniquely identify an
entity. Example:- Aadhar number
An entity may be concrete such as a person or book or it may be abstract like a holiday or a
concept.
An entity set is a set of entities of same type that share the same properties or attributes. For
example the set of all persons who are customers at a given bank can be defined as an entity set
customer. The individual entities that constitute a set are said to be the extension of the entity
set. For example the individual customers that constitute a set are the extension of the entity
set customer.
Entity sets do not need to be disjoint. For example it is possible to define the entity set of all
employees of a bank (employees) and the entity set of all customers of the bank.
ATTRIBUTES:-
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by
each member of an entity set.
Each entity has a value for each of its attributes. For example a particular customer entity may
have the value 321-12 for customer id, value Anil for customer name etc.
For each attribute there is a set of permitted values called domain or value set for that attribute.
RELATIONSHIP SETS:-
A relationship is an association among several entities.
A relationship set is a set of relationships of the same type.
The association between entity sets is referred to as participation that is the entity sets E1, E2,
…………., En participate in a relationship set R.
A relationship instance in an E-R schema represents an association between the named entities
of the real world enterprise that is being modeled.
A relationship may also have attributes called descriptive attributes. Consider a relationship set
depositor with entity sets customer and account. We could associate the attribute access date
to that relationship to specify the most recent date on which the customer accessed an account.
A relationship instance in a given relationship set must be uniquely identifiable from its
participating entities, without using descriptive attributes. For example instead of using a single
access date we use access dates.
However there can be more than one relationship set involving the same entity set. For example
guarantor in customer –loan relationship.
One that involves two entity sets is called binary relationship. Most of the relationship sets in a
database system are binary.
The relationship set works on among employees, branch and job is an example of ternary
relationship.
The number of entity sets that participate in a relationship set is also called the degree of
relationship set. A binary relationship set is of degree two and a ternary relationship set is of
degree three.
Department
DEPT_EMP
Employee
In the network model, the relationships as well as the navigation through the database are
predefined at database creation time.
RELATIONAL DATABASE
QUERY LANGUAGE:-
A query language is a language in which a user requests information from the database. These
languages are usually on a level higher than that of a standard programming language.
Query languages can be categorized as either procedural or non-procedural. In a procedural
language the user instructs the system to perform a sequence of operations on the database to
compute the desired result. In a non-procedural language the user describes the desired
information without giving a specific procedure for obtaining that information.
RELATIONAL ALGEBRA:-
The relational algebra is a procedural query language. It consists of a set of operations that take
one or two relations as its operands and produces a new relation as its result.
The fundamental operations in the relational algebra are select, project, union, set difference,
rename and Cartesian product.
In addition to the fundamental operations there are several other operations namely set
intersection, natural join, division and assignment.
Fundamental Operations:-
The select, project and rename operations are called unary operations because they operate on
one relation. The other three operations union, set difference and Cartesian product operate on pairs of
relations and therefore called binary relations.
Select Operation:-
The select operation selects tuples that satisfy a given predicate. We use the lower case Greek
letter sigma (σ) to denote selection. The predicate appears as a subscript to σ. Thus to select those
tuples of the loan relationship where the branch is “Berhampur”, we write
σ branch_name=”Berhampur” (loan)
We must ensure that set difference are taken between compatible relations. Therefore
a set difference operation R-S to be valid we require that the relations R and S be of same arity
and the domains of ith attribute of R and the ith attribute of S be same for all i.
Cartesian product operation:-
The Cartesian product operation denoted by “X” allows us to combine information from any two
relations. We write the Cartesian product of relations R1 and R2 as R1 X R2
A relation is by definition a subset of a Cartesian product of a set of domains.
However since the same attribute name may appear in both R1 and R2 we need to devise a
naming schema to distinguish between these attributes.
We do it by attaching to an attribute the name of the relation from which the attribute originally
came. For example the relational schema R= borrower X depositor is
(depositor.name, depositor.accno, borrower.name, borrower.loan no, borrower.accno)
Set intersection operation:-
We want to find out the customers who have both a loan and an account. Using set intersection
operation we write
Π depositor_name(depositor)∩ Π borrower_name(borrower)
We can rewrite any relational algebra expression that uses set intersection operation with a pair
of set difference operation.
R∩S=R-(R-S)
Natural Join operation:-
The natural join is a simple operation that allows us to combine certain selections and a
Cartesian product into one operation. It is denoted by the join symbol .The natural join
operation forms a Cartesian product of its two arguments, performs a selection forcing equality
on those attributes that appear in both relation schemas and finally removes duplicate
attributes.
FUNCTIONAL DEPENDENCIES:-
Functional dependencies play a key role in differentiating good database design from bad
database design.
Functional dependencies are constraints on the set of legal relations. They allow us to express
facts about the enterprise that we are modeling with our database.
The notion of functional dependency generalizes the notion of super key. Consider a relation
schema R and let αϹ R and βϹ R. The functional dependency
α β
holds on schema R, if any legal relation r(R), for all pairs of tuples t1 and t2 in r such that
t1 [α] = t2 [α], it is also the case that t1 [β] = t2 [β]
Using functional dependency notation, we say that k is a super key of R if k → R that is k is a
super key if whenever t1 [k] = t2 [k], it is also the case that
t1 [R] = t2 [R] (that is t1 = t2)
Functional dependencies allow us to express constraints that we cannot express through super
keys.
Consider the schema
Loan_ info_ schema = (loan_ number, branch_ name, customer_ name, amount)
The set of functional dependency loan_ number → amount
loan_ number → branch_ name
However we could not expect the functional dependency loan_ number → customer_ name to
hold, since in general a given loan can be made to more than one customer.
We shall use functional dependencies in two ways.
1. To test relations to see whether they are legal under a given set of functional
dependencies. If a relation R is legal under a set F of functional dependencies we say
that R satisfies F.
2. To specify constraints on the set of legal relations. We shall thus concern ourselves with
only those relations that satisfy a given set of functional dependencies. If we wish to
constraint ourselves to relations on schema R that satisfy a set F of functional
dependencies, we say that F holds on R.
Update anomalies: if data items are scattered and are not linked to each other properly, then
there may be instances when we try to update one data item that has copies of it scattered at
several places, few instances of it get updated properly while few are left with there old values.
This leaves database in an inconsistent state.
Deletion anomalies: we tried to delete a record, but parts of it left undeleted because of
unawareness, the data is also saved somewhere else.
Insert anomalies: we tried to insert data in a record that does not exist at all.
IMPORTANCE OF NORMALIZATION:-
Searching, sorting, and creating indexes is faster, since tables are narrower, and more rows fit
on a data page.
You usually have more tables.
You can have more clustered indexes (one per table), so you get more flexibility in tuning
queries.
Index searching is often faster, since indexes tend to be narrower and shorter.
More tables allow better use of segments to control physical placement of data.
You usually have fewer indexes per table, so data modification commands are faster.
Fewer null values and less redundant data, making your database more compact.
Triggers execute more quickly if you are not maintaining redundant data.
Data modification anomalies are reduced.
Normalization is conceptually cleaner and easier to maintain and change as your needs change.
The cost of finding rows already in the data cache is extremely low.
Avoids data modification (INSERT/DELETE/UPDATE) anomalies as each data item lives in one
place.
Fewer null values and less opportunity for inconsistency.
Disadvantages of normalization:-
Requires much more CPU, memory, and I/O to process thus normalized data gives reduced
database performance.
Requires more joins to get the desired result. A poorly-written query can bring the database
down.
Maintenance overhead i.e. the higher the level of normalization, the greater the number of
tables in the database.
In example of First Normal Form there are two rows for Adam, to include multiple subjects that
he has opted for. While this is searchable, and follows First normal form, it is an inefficient use of
space. Also in the above Table in First Normal Form, while the candidate key is {Student, Subject},
Age of Student only depends on Student column, which is incorrect as per Second Normal Form. To
achieve second normal form, it would be helpful to split out the subjects into an independent table,
and match them up using the student names as foreign keys.
In Student Table the candidate key will be Student column, because all other column i.e Age is
dependent on it.
New Subject Table introduced for 2NF will be:
Student Subject
Adam Biology
Adam Math
Alex Math
Stuart Math
In Subject Table the candidate key will be {Student, Subject} column. Now, both the above tables qualify
for Second Normal Form and will never suffer from Update Anomalies. Although there are a few
complex cases in which table in Second Normal Form suffers Update Anomalies, and to handle those
scenarios Third Normal Form is there
Student_Detail Table :
Student_id Student_name DOB Street City State PIN
In this table Student_id is Primary key, but street, city and state depends upon PIN. The dependency
between PIN and other fields is called transitive dependency. Hence to apply 3NF, we need to move the
street, city and state to new table, with PIN as primary key.
New Student_Detail Table :
Student_id Student_name DOB PIN
Address Table:
In the above normalized tables in 3NF, Student_id is super-key in Student_Detail relation and
PIN is super-key in Address relation. So,
QUERIES IN SQL:-
Table:-
A table is a database object that holds user data. The simplest analogy is to think of a table as a
spread sheet
The columns of a table are associated with a specific data type.
Oracle ensures that only data which is identical to the data type of the column will be stored
within the column.
Data type:-
1. Character (size):-
This data type is used to store character string of fixed length. The size in bracket
determines the number of characters the shell can hold. Maximum size is 255 characters.
2. Varchar(size)/ Varchar2(size):-
This data type is used to store variable length alpha numeric data. Maximum size is 2000
characters.
3. Date:-
Date data type is used to represent date and time. The standard format is DD-MMM-YY.
4. Number:-
The number data type is used to store numbers (fixed or floating) number of any
magnitude may be stored up to 38 digit of decision. Maximum size is 9.99 * 10 124 . The precision
P determines the maximum length of data where as the scale S determines the number of
places to right of the decimal.
5. Long:-
This data type is used to store variable length character, strings up to 2GB. Long data
can be used to store arrays of binary data in ASCII format.
6. RAW/ LONG RAW:-
The raw data type is used to store binary data such as picture or image. Raw data type
can have maximum 255 byte and maximum size of long raw is up to 2GB.
SQL COMMANDS:-
Command to Create a Table:-
Syntax:-
create table tablename ( colname datatype(size), colname datatype (size), .......... , colname
datatype (size));
Syntax:-
Insert into table name (col1, col2, col3,………., coln) values (expr1, expr2, expr3,........ , exprn);
Example:
Insert into student (‘f1201207005’,’radhika’,’bbsr’,’3rd cse’);
Insert into student (‘f1201207002’,’rupak’,’ctc’,’5th cse’);
Insert into student (‘f1201207003’,’jyoti’,’rkl’,’5th it’);
Insert into student (‘f1201207008’,’ajay’,’bam’,’3 rd it’);
Insert into student (‘f1201207001’,’manoj’,’khd’,’5th cse’);
In order to view the partial column data we have the following syntax.
Syntax:-
Sql> Select col1, col2, col5 from table name;
Example:-
Select rollno, semester from student;
Syntax:-
Sql> select distinct * from table name;
Example:-
select distinct * from student;
rollno name address semester rollno name address semester
f1201207005 radhika bbsr 3rd cse f1201207005 radhika bbsr 3rd cse
f1201207002 rupak ctc 5th cse f1201207002 rupak ctc 5th cse
f1201207003 jyoti rkl 5th it f1201207003 jyoti rkl 5th it
f1201207008 ajay bam 3rd it f1201207008 ajay bam 3rd it
f1201207001 manoj khd 5th cse f1201207001 manoj khd 5th cse
f1201207003 jyoti rkl 5th it
(Base Table) (Resulting Table)
Renaming tables:-
To rename a table the syntax is as follows.
Syntax:-
Sql> rename old table name to new table name;
Example:-
Rename student to diploma_student;
To truncate a table:-
To truncate a table the syntax is as follows.
Syntax:-
Sql> truncate table table name;
Example:-
Truncate table student;
Destroying tables:-
To destroy a table with all its contents the syntax is as follows.
Syntax:-
Sql> drop table table name;
Example:-
Drop table student;
TRANSACTION CONCEPT:-
States of transaction:-
A transaction can be considered to be an atomic operation by the user but in reality it goes through a
number of states during its life time.
Modify Abort
Error System
Detected by Initiated
Transactions
Consistent
State
A transaction can end in three possible ways. It can end after a commit operation (a successful
termination). It can detect an error during its processing and decide to abort itself by performing a
We assume that the database is in a consistent state before a transaction starts. A transaction starts
when the first statement of transaction is executed; it becomes active and we assume that it is in modify
state when it modifies the database.
At the end of modify state there is a transition into one of the following states: start to commit, abort or
error. If the transaction completes the modification state satisfactorily, it enters the start to commit
state where it instructs the DBMS to reflect the changes made by it into a database.
Once all the changes made by transaction are propagated to the database, the transaction is said to be
in commit state and from there the transaction is terminated, the database is once again in a consistent
state. In the interval between start to commit state and commit state some of the data changed by
transaction in the buffer may or may not have been propagated to the database on the non volatile
storage.
There is a possibility that all modifications done by the transaction cannot propagated to the database
due to conflicts or hardware failures. In this case the system forces the transaction to the abort state.
The abort state could also be entered from the modify state if there are system errors for example
division by zero.
If the system aborts a transaction, it may have to initiate a rollback to undo partial changes made by the
transaction. An aborted transaction that made no changes to the database is terminated without the
need of the rollback.
A transaction that on execution of its last statement enters the start to commit state to commit state
and from there commit state is guaranteed and the modifications made are propagated to the database.
The transaction outcome may either be successful (if the transaction goes through commit state),
suicidal ( if the transaction goes through the rollback state ) and murderer (if the transaction goes
through abort state). In the last two cases there is no trace of the transaction left in the database and
only the log indicates that the transaction was ever run.
Any message given to the user by the transaction must be delayed till the end of the transaction at
which point the user can be notified as to success or failure and in the later case the reasons for failure.
Properties of transaction:-
The database system must maintain the following properties of transactions to show all the
characteristics. The properties referred to ACID (atomicity, consistency, isolation and durability) test
represent the transaction paradigm.
Consistency:-
The consistency property of a transaction implies that if the database was in a consistent state before
the start of a transaction, then on termination of a transaction the database will also be in a consistent
state.
Isolation:-
The isolation property of a transaction indicates that actions performed by a transaction will be isolated
or hidden from outside the transaction until the transaction terminates. This property gives the
transactions a measure of relative independence.
Durability:-
The durability property of a transaction ensures that the commit action of a transaction, on its
termination, will be reflected in the database. The permanence of the commit action of a transaction
requires that any failures after the commit operation will not cause loss of updates made by the
transaction.
SCHEDULES:-
A schedule is a list of actions (reading, writing, aborting, or committing) from a set of
transactions, and the order in which two actions of a transaction T appear in a schedule must be the
same as the order in which they appear in T.
Intuitively, a schedule represents an actual or potential execution sequence. For example, the
schedule in Figure below shows an execution order for actions of two transactions T1 and T2. We move
forward in time as we go down from one row to the next. We emphasize that a schedule describes the
actions of transactions as seen by the DBMS.
In addition to these actions, a transaction may carry out other actions, such as reading or writing
from operating system files, evaluating arithmetic expressions, and so on.
Serial schedule:
Schedule that does not interleave the actions of different transactions is called a serial schedule.
Equivalent schedules:
For any database state, the effect (on the set of objects in the database) of executing the first schedule
is identical to the effect of executing the second schedule and these schedules are known as equivalent
schedules.
Serializable schedule: A schedule that is equivalent to some serial execution of the transactions is
known as serializable schedule.
(Note: If each transaction preserves consistency, every serializable schedule preserves consistency.
T1: R (A)
W (A)
T2:
W (A)
T3:
W (A)
T1: R (A), W (A)
T2:
W (A)
T3:
W (A)
Recoverable Schedule:
For each pair of transaction Ti and Tj, if Tj reads an object previously written by Ti, Tj commits
after Ti commits
Avoids-cascading-abort Schedule:
For each pair of transaction Ti and Tj, if Tj reads an object previously written by Ti, Ti commits
before the read operation of Tj.
Strict Schedule: An object written by T cannot be read or overwritten until T commits or aborts.
RECOVERABILITY:-
Crash Recovery
Though we are living in highly technologically advanced era where hundreds of satellite monitor the
earth and at every second billions of people are connected through information technology, failure is
expected but not every time acceptable.
DBMS is highly complex system with hundreds of transactions being executed every second. Availability
of DBMS depends on its complex architecture and underlying hardware or system software. If it fails or
crashes amid transactions being executed, it is expected that the system would follow some sort of
algorithm or techniques to recover from crashes or failures.
Failure Classification
To see where the problem has occurred we generalize the failure into various categories, as follows:
Transaction failure
Logical errors: where a transaction cannot complete because of it has some code error or any
internal error condition
System errors: where the database system itself terminates an active transaction because
DBMS is not able to execute it or it has to stop because of some system condition. For example,
in case of deadlock or resource unavailability systems aborts an active transaction.
System crash
There are problems, which are external to the system, which may cause the system to stop abruptly and
cause the system to crash. For example interruptions in power supply failure of underlying hardware or
software failure.
Examples may include operating system errors.
Disk failure:
In early days of technology evolution, it was a common problem where hard disk drives or storage drives
used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other
failure, which destroys all or part of disk storage
Storage Structure
We have already described storage system here. In brief, the storage structure can be divided in various
categories:
Volatile storage: As name suggests, this storage does not survive system crashes and mostly
placed much closed to CPU by embedding them onto the chipset itself for examples: main
memory, cache memory. They are fast but can store a small amount of information.
Nonvolatile storage: These memories are made to survive system crashes. They are huge in
data storage capacity but slower in accessibility. Examples may include, hard disks, magnetic
tapes, flash memory, non-volatile (battery backed up) RAM.
It should check the states of all transactions, which were being executed.
A transaction may be in the middle of some operation; DBMS must ensure the atomicity of
transaction in this case.
It should check whether the transaction can be completed now or needs to be rolled back.
There are two types of techniques, which can help DBMS in recovering as well as maintaining the
atomicity of transaction:
Maintaining the logs of each transaction, and writing them onto some stable storage before
actually modifying the database.
Maintaining shadow paging, where the changes are are done on a volatile memory and later the
actual database is updated.
Log-Based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction. It is
important that the logs are written prior to actual modification and stored on a stable storage media,
which is failsafe.
1. Deferred database modification: All logs are written on to the stable storage and database is
updated when transaction commits.
2. Immediate database modification: Each log follows an actual database modification. That is,
database is modified immediately after every operation.
When more than one transactions are being executed in parallel, the logs are interleaved. At the time of
recovery it would become hard for recovery system to backtrack all logs, and then start recovering. To
ease this situation most modern DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the memory space
available in the system. At time passes log file may be too big to be handled at all. Checkpoint is a
mechanism where all the previous logs are removed from the system and stored permanently in storage
disk. Checkpoint declares a point before which the DBMS was in consistent state and all the transactions
were committed.
When system with concurrent transaction crashes and recovers, it does behave in the following manner:
The recovery system reads the logs backwards from the end to the last Checkpoint.
It maintains two lists, undo-list and redo-list.
If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts
the transaction in redo-list.
If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the
transaction in undo-list.
All transactions in undo-list are then undone and their logs are removed. All transaction in redo-list,
their previous logs are removed and then redone again and log saved.
Lock-X (A); (Exclusive Lock, we want to both read A’s value and modify it)
Read A;
A = A – 100;
Write A;
Unlock (A); (Unlocking A after the modification is done)
Lock-X (B); (Exclusive Lock, we want to both read B’s value and modify it)
Read B;
B = B + 100;
Write B;
Unlock (B); (Unlocking B after the modification is done)
And the transaction that deposits 10% amount of account A to account C should now be written as:
Now let us see how these locking mechanisms help us to create error free schedules. You
should remember that in the previous chapter we discussed an example of an erroneous schedule:
T1 T2
Read A;
A = A - 100;
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;
Write A;
Read B;
B = B + 100;
Write B;
T1 T2
Lock-X (A)
Read A;
A = A - 100;
Write A;
Lock-S (A)
Read A;
Temp = A * 0.1;
Unlock (A)
Lock-X(C)
Read C;
C = C + Temp;
Write C;
Unlock (C)
Write A;
Unlock (A)
Lock-X (B)
Read B;
B = B + 100;
Write B;
Unlock (B)
We cannot prepare a schedule like the above even if we like, provided that we use the locks in
the transactions. See the first statement in T2 that attempts to acquire a lock on A. This would be
impossible because T1 has not released the excusive lock on A, and T2 just cannot get the shared lock it
wants on A. It must wait until the exclusive lock on A is released by T1, and can begin its execution only
after that. So the proper schedule would look like the following:
Lock-X (A)
Read A;
A = A - 100;
Write A;
Unlock (A)
Lock-S (A)
Read A;
Temp = A * 0.1;
Unlock (A)
Lock-X(C)
Read C;
C = C + Temp;
Write C;
Unlock (C)
Lock-X (B)
Read B;
B = B + 100;
Write B;
Unlock (B)
And this automatically becomes a very correct schedule. We need not apply any manual effort
to detect or correct the errors that may creep into the schedule if locks are not used in them.
Lock-X (A)
Read A;
A = A - 100;
Write A;
Unlock (A)
Lock-S (A)
Read A;
Temp = A * 0.1;
Unlock (A)
Lock-X(C)
Read C;
C = C + Temp;
Write C;
Unlock (C)
Lock-X (B)
Read B;
B = B + 100;
Write B;
Unlock (B)
The schedule is theoretically correct, but a very strange kind of problem may arise here. T1
releases the exclusive lock on A, and immediately after that the Context Switch is made. T2
acquires a shared lock on A to read its value, perform a calculation, update the content of
account C and then issue COMMIT. However, T1 is not finished yet. What if the remaining
portion of T1 encounters a problem (power failure, disc failure etc) and cannot be committed?
1. If TS (T) < W-timestamp (Q), then the transaction T is trying to read a value of data item Q which
has already been overwritten by some other transaction. Hence the value which T wanted to
read from Q does not exist there anymore, and T would be rolled back.
2. If TS (T) >= W-timestamp (Q), then the transaction T is trying to read a value of data item Q
which has been written and committed by some other transaction earlier. Hence T will be
allowed to read the value of Q, and the R-timestamp of Q should be updated to TS (T).
Chapter 1 If TS (T) < R-timestamp (Q), then it means that the system has waited too long for transaction
T to write its value, and the delay has become so great that it has allowed another transaction to read
the old value of data item Q. In such a case T has lost its relevance and will be rolled back.
Chapter 2 Else if TS (T) < W-timestamp (Q), then transaction T has delayed so much that the system has
allowed another transaction to write into the data item Q. in such a case too, T has lost its relevance and
will be rolled back.
Chapter 3 Otherwise the system executes transaction T and updates the W-timestamp of Q to TS (T).
LIVE LOCK:-
A live lock is similar to a deadlock, except that the states of the processes involved in the live
lock constantly change with regard to one another, none progressing. This term was defined
formally at some time during the 1970s ‒ in Babich's 1979 article on program correctness. Live
lock is a special case of resource starvation; the general definition only states that a specific
process is not progressing.
A real-world example of live lock occurs when two people meet in a narrow corridor, and each
tries to be polite by moving aside to let the other pass, but they end up swaying from side to
side without making any progress because they both repeatedly move the same way at the
same time.
Live lock is a risk with some algorithms that detect and recover from deadlock. If more than one
process takes action, the deadlock detection algorithm can be repeatedly triggered. This can be
avoided by ensuring that only one process (chosen arbitrarily or by priority) takes action.
DEAD LOCK:-
A deadlock is a situation in which two or more competing actions are each waiting for the other
to finish, and thus neither ever does.
In a transactional database, a deadlock happens when two processes each within its own
transaction updates two rows of information but in the opposite order. For example, process A
updates row 1 then row 2 in the exact timeframe process B updates row 2 then row 1. Process A
can't finish updating row 2 until process B is finished, but it cannot finish updating row 1 until
process A finishes. No matter how much time is allowed to pass, this situation will never resolve
itself and because of this database management systems will typically kill the transaction of the
process that has done the least amount of work.
Deadlock is a common problem in multiprocessing systems, parallel computing and distributed
systems, where software and hardware locks are used to handle shared resources and
implement process synchronization.
In telecommunication systems, deadlocks occur mainly due to lost or corrupt signals instead of
resource contention.
Example:-
A simple computer-based example is as follows. Suppose a computer has three CD
drives and three processes. Each of the three processes holds one of the drives. If each process
now requests another drive, the three processes will be in a deadlock. Each process will be
waiting for the "CD drive released" event, which can be only caused by one of the other waiting
processes. Thus, it results in a circular chain.
SERIALIZABILITY:-
Serializability is a property of a transaction schedule (history). It relates to the isolation property
of a database transaction.
Serializability of a schedule means equivalence to a serial schedule (i.e., sequential with no
transaction overlap in time) with the same transactions. It is the major criterion for the
correctness of concurrent transactions' schedule, and thus supported in all general purpose
database systems.
The rationale behind serializability is the following:
If each transaction is correct by itself i.e. meets certain integrity conditions, then a
schedule that comprises any serial execution of these transactions is correct (its transactions still
meet their conditions): "Serial" means that transactions do not overlap in time and cannot
interfere with each other, i.e. complete isolation between each other exists. Any order of the
transactions is legitimate, if no dependencies among them exist, which is assumed. As a result,
a schedule that comprises any execution (not necessarily serial) that is equivalent to any serial
execution of these transactions is correct.
Schedules that are not serializable are likely to generate erroneous outcomes. Examples are
with transactions that debit and credit accounts with money: If the related schedules are not
VIEWS:-
Sometimes for security and other concerns, it is undesirable to have all users to see the entire
relation. It would also be beneficial if we would create useful relations for different groups of
users, rather than have them all manipulate the base relations. Any relation that is not part of
the physical database, i.e., a virtual relation is made available to the users is known as a view.
It is possible to create views in SQL. A relation view is virtual since no corresponding physical
relation exists. A view represents a different perspective of a base relation or relations.
The result of a query operation on one or more base relations is a relation. Therefore if a user
needs a particular view based on the base relations, it can be defined using a query expression.
To be useful, we assign the view a name and relate it to the query expression.
Create view <view name> as <query expression>
A view is a relation (virtual rather than base) and can be used in query an expression that is
queries can be written using views as a relation.
Views generally are not stored, since the data in the base relations may change.
The definition of a view in a create view statement is stored in the system catalog. Having been
defined, it can be used as if the view really represents a real relation. However such a virtual
relation defined by a view is recomputed whenever a query refers to it.
Views or sub schemes are used to enforce security. A user is allowed access to only that portion
of the database defined by the user’s view.
A number of users may share a view. However, the users may create new views based on the
views allowed.
The advantage of this approach is that the number of objects accessible to a class of users and
the entry for it in the authorization matrix is reduced one per view. This reduces the size of
authorization matrix. The disadvantage is that the entire class of users has the same access
rights.
INTEGRITY CONSTRAINT:-
Integrity constraints ensure that any properly authorized access, alteration, deletion or insertion
of the data in the database does not change the consistency and validity of the data. Database integrity
involves the correctness of data; this correctness has to be preserved in the presence of concurrent
operations, errors in the users operations and application programmes and failures in hardware and
software. Constraints are restrictions or rules applied to a database to maintain its integrity.
They are-
1. Data/Entity integrity constraint
2. Referential integrity constraint
1. Data constraint:-
It is the most common integrity constraint also known as domain integrity constraint.
Domain integrity rules are simply the definition of the domains of the attributes or the value set
for the data items. The value that each attribute or data item can be assigned is expressed in the
form of data type, a range of values or a value from a specified set. Example: In the relation
EMPLOYEE the domain of the attribute Salary may be in the range of 12000 to 300000
The domain values supplied for an operation are validated against the domain
constraint. In specifying the domain constraints, null values may or may not be allowed. Thus it
is usual not to allow null values for any attribute that forms part of a primary key of a relation.
2. Referential integrity constraint:-
It is an implicit integrity constraint which states that, a tuple in one relation that refers
to another relation must refer to an existing tuple in that relation. The referential integrity rule
is explained by the foreign key between two relation schemas R1 and R2.
A set of attributes Fk in relation schema R1 is a foreign key of R1 that references relation
R2 if the following two rules are being satisfied.
i) The attributes in foreign key Fk have same domain as primary key attributes Pk of
R2,the attributes Fk are said to be referenced or referred relation R2.
ii) A value of Fk in tuple T1 of the current state R1 (r1) either occurs as a value of Pk for
some tuple T2 in the current state R2 (r2) or is none.
In the former case we have T1 [Fk] = T2 [Fk] and we say that the tuple T1 references or
refers to the tuple T2. R1 is called the referencing relation and R2 is called referenced relation.