Cse CSPC403 DBMS
Cse CSPC403 DBMS
This small
transaction contains several low-level tasks:
X's Account
Open_Account(X)
Old_Balance = X.balance
New_Balance = Old_Balance - 800
X.balance = New_Balance
Close_Account(X)
Y's Account
Open_Account(Y)
Old_Balance = Y.balance
New_Balance = Old_Balance + 800
Y.balance = New_Balance
Close_Account(Y)
Operations of Transaction:
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a buffer in main
memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following operations:
R(X);
X = X - 500;
W(X);
Let's assume the value of X before starting of the transaction is 4000.
• The first operation reads X's value from database and stores it in a buffer.
• The second operation will decrease the value of X by 500. So buffer will contain 3500.
• The third operation will write the buffer's value to the database. So X's final value will be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that transaction may
fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing operation 2 then X's
value will remain 4000 in the database which is not acceptable by the bank.
To solve this problem, we have two important operations:
Commit: It is used to save the work done permanently.
Rollback: It is used to undo the work done.
States of Transaction
In a database, the transaction can be in one of the following states -
Active state
• The active state is the first state of every transaction. In this state, the transaction is being executed.
• For example: Insertion or deletion or updating a record is done here. But all the records are still not
saved to the database.
Partially committed
• In the partially committed state, a transaction executes its final operation, but the data is still not
saved to the database.
• In the total mark calculation example, a final display of the total marks step is executed in this state.
Committed
A transaction is said to be in a committed state if it executes all its operations successfully. In this state, all
the effects are now permanently saved on the database system.
Failed state
• If any of the checks made by the database recovery system fails, then the transaction is said to be in
the failed state.
• In the example of total mark calculation, if the database is not able to fire a query to fetch the marks,
then the transaction will fail to execute.
Aborted
• If any of the checks fail and the transaction has reached a failed state then the database recovery
system will make sure that the database is in its previous consistent state. If not then it will abort or
roll back the transaction to bring the database into a consistent state.
• If the transaction fails in the middle of the transaction then before executing the transaction, all the
executed transactions are rolled back to its consistent state.
• After aborting the transaction, the database recovery module will select one of the two operations:
1. Re-start the transaction
2. Kill the transaction
Implementation of Atomicity and Durability
The recovery-management component of a database system can support atomicity and durability by a variety
of schemes. We first consider a simple, but extremely in- efficient, scheme called the shadow copy scheme.
This scheme, which is based on making copies of the database, called shadow copies, assumes that only one
transaction is active at a time. The scheme also assumes that the database is simply a file on disk. A pointer
called db-pointer is maintained on disk; it points to the current copy of the database.
In the shadow-copy scheme, a transaction that wants to update the database first creates a complete copy of
the database. All updates are done on the new database copy, leaving the original copy, the shadow copy,
untouched. If at any point the transaction has to be aborted, the system merely deletes the new copy. The old
copy of the database has not been affected.
If the transaction completes, it is committed as follows. First, the operating system is asked to make sure that
all pages of the new copy of the database have been written out to disk. (Unix systems use the flush command
for this purpose.) After the operating system has written all the pages to disk, the database system updates the
pointer db-pointer to point to the new copy of the database; the new copy then becomes the current copy of
the database. The old copy of the database is then deleted. The below Figure depicts the scheme, showing the
database state before and after the update.
Furthermore, the implementation does not allow transactions to execute concurrently with one another.
There are practical ways of implementing atomicity and durability that are much less expensive and more
powerful.
ACID Properties
A transaction is a very small unit of a program and it may contain several lowlevel tasks. A transaction in a
database system must maintain Atomicity, Consistency, Isolation, and Durability − commonly known as
ACID properties − in order to ensure accuracy, completeness, and data integrity.
• Atomicity − This property states that a transaction must be treated as an atomic unit, that is, either all
of its operations are executed or none. There must be no state in a database where a transaction is left
partially completed. States should be defined either before the execution of the transaction or after the
execution/abortion/failure of the transaction.
• Consistency − The database must remain in a consistent state after any transaction. No transaction
should have any adverse effect on the data residing in the database. If the database was in a consistent
state before the execution of a transaction, it must remain consistent after the execution of the
transaction as well.
• Durability − The database should be durable enough to hold all its latest updates even if the system
fails or restarts. If a transaction updates a chunk of data in a database and commits, then the database
will hold the modified data. If a transaction commits but the system fails before the data could be
written on to the disk, then that data will be updated once the system springs back into action.
Isolation − In a database system where more than one transaction are being executed simultaneously and in
parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the
only transaction in the system. No transaction will affect the existence of any other transaction.
Concurrent Executions
Transaction-processing systems usually allow multiple transactions to run concurrently. Allowing multiple
transactions to update data concurrently causes several complications with consistency of the data.
Ensuring consistency in spite of concurrent execution of transactions requires extra work; it is far easier to
insist that transactions run serially—that is, one at a time, each starting only after the previous one has
completed.
However, there are two good reasons for allowing concurrency:
Improved throughput and resource utilization:
• A transaction consists of many steps. Some involve I/O activity; others involve CPU activity. The CPU
and the disks in a computer system can operate in parallel. Therefore, I/O activity can be done in parallel
with processing at the CPU.
• The parallelism of the CPU and the I/O system can therefore be exploited to run multiple transactions in
parallel.
• While a read or write on behalf of one transaction is in progress on one disk, another transaction can be
running in the CPU, while another disk may be executing a read or write on behalf of a third transaction.
• All of this increases the throughput of the system—that is, the number of transactions executed in a given
amount of time.
• Correspondingly, the processor and disk utilization also increase; in other words, the processor and disk
spend less time idle, or not performing any useful work.
Reduced waiting time:
• There may be a mix of transactions running on a system, some short and some long.
• If transactions run serially, a short transaction may have to wait for a preceding long transaction to
complete, which can lead to unpredictable delays in running a transaction.
• If the transactions are operating on different parts of the database, it is better to let them run concurrently,
sharing the CPU cycles and disk accesses among them.
• Concurrent execution reduces the unpredictable delays in running transactions.
• Moreover, it also reduces the average response time: the average time for a transaction to be completed
after it has been submitted.
The idea behind using concurrent execution in a database is essentially the same as the idea behind using
multi programming in an operating system.
The database system must control the interaction among the concurrent transactions to prevent them from
destroying the consistency of the database. It is achieved using concurrency-control schemes.
Serializability
Serializability is the classical concurrency scheme. It ensures that a schedule for executing concurrent
transactions is equivalent to one that executes the transactions serially in some order. It assumes that all
accesses to the database are done using read and write operations. A schedule is called ``correct'' if we can
find a serial schedule that is ``equivalent'' to it. Given a set of transactions T1...Tn, two schedules S1 and S2
of these transactions are equivalent if the following conditions are satisfied:
Read-Write Synchronization: If a transaction reads a value written by another transaction in one schedule, then
it also does so in the other schedule.
Write-Write Synchronization: If a transaction overwrites the value of another transaction in one schedule, it
also does so in the other schedule.
When multiple transactions are being executed by the operating system in a multiprogramming environment,
there are possibilities that instructions of one transactions are interleaved with some other transaction.
• Schedule − A chronological execution sequence of a transaction is called a schedule. A schedule can
have many transactions in it, each comprising of a number of instructions/tasks.
• Serial Schedule − It is a schedule in which transactions are aligned in such a way that one transaction
is executed first. When the first transaction completes its cycle, then the next transaction is executed.
Transactions are ordered one after the other. This type of schedule is called a serial schedule, as
transactions are executed in a serial manner.
In a multi-transaction environment, serial schedules are considered as a benchmark. The execution sequence
of an instruction in a transaction cannot be changed, but two transactions can have their instructions executed
in a random fashion. This execution does no harm if two transactions are mutually independent and working
on different segments of data; but in case these two transactions are working on the same data, then the results
may vary. This ever-varying result may bring the database to an inconsistent state.
To resolve this problem, we allow parallel execution of a transaction schedule, if its transactions are either
serializable or have some equivalence relation among them.
Equivalence Schedules
An equivalence schedule can be of the following types −
Result Equivalence
If two schedules produce the same result after execution, they are said to be result equivalent. They may yield
the same result for some value and different results for another set of values. That's why this equivalence is
not generally considered significant.
View Equivalence
Two schedules would be view equivalence if the transactions in both the schedules perform similar actions
in a similar manner.
For example −
• If T reads the initial data in S1, then it also reads the initial data in S2.
• If T reads the value written by J in S1, then it also reads the value written by J in S2.
• If T performs the final write on the data value in S1, then it also performs the final write on the data
value in S2.
Conflict Equivalence
Two schedules would be conflicting if they have the following properties −
• Both belong to separate transactions.
• Both accesses the same data item.
• At least one of them is "write" operation.
Two schedules having multiple transactions with conflicting operations are said to be conflict equivalent if
and only if −
• Both the schedules contain the same set of Transactions.
• The order of conflicting pairs of operation is maintained in both the schedules.
Note − View equivalent schedules are view serializable and conflict equivalent schedules are conflict
serializable. All conflict serializable schedules are view serializable too.
Implementation of Isolation
In order to maintain consistency in a database, it follows ACID properties. Among these four properties
(Atomicity, Consistency, Isolation and Durability) Isolation determines how transaction integrity is visible to
other users and systems. It means that a transaction should take place in a system in such a way that it is the
only transaction that is accessing the resources in a database system.
Isolation levels define the degree to which a transaction must be isolated from the data modifications made by
any other transaction in the database system. A transaction isolation level is defined by the following
phenomena –
• Dirty Read – A Dirty read is the situation when a transaction reads a data that has not yet been
committed. For example, Let’s say transaction 1 updates a row and leaves it uncommitted, meanwhile,
Transaction 2 reads the updated row. If transaction 1 rolls back the change, transaction 2 will have read
data that is considered never to have existed.
• Non Repeatable read – Non Repeatable read occurs when a transaction reads same row twice, and get
a different value each time. For example, suppose transaction T1 reads data. Due to concurrency, another
transaction T2 updates the same data and commit, Now if transaction T1 rereads the same data, it will
retrieve a different value.
• Phantom Read – Phantom Read occurs when two same queries are executed, but the rows retrieved by
the two, are different. For example, suppose transaction T1 retrieves a set of rows that satisfy some
search criteria. Now, Transaction T2 generates some new rows that match the search criteria for
transaction T1. If transaction T1 re-executes the statement that reads the rows, it gets a different set of
rows this time.
Based on these phenomena, The SQL standard defines four isolation levels :
1. Read Uncommitted – Read Uncommitted is the lowest isolation level. In this level, one transaction
may read not yet committed changes made by other transaction, thereby allowing dirty reads. In
this level, transactions are not isolated from each other.
2. Read Committed – This isolation level guarantees that any data read is committed at the moment
it is read. Thus it does not allows dirty read. The transaction holds a read or write lock on the
current row, and thus prevent other transactions from reading, updating or deleting it.
3. Repeatable Read – This is the most restrictive isolation level. The transaction holds read locks on
all rows it references and writes locks on all rows it inserts, updates, or deletes. Since other
transaction cannot read, update or delete these rows, consequently it avoids non-repeatable read.
4. Serializable – This is the Highest isolation level. A serializable execution is guaranteed to be
serializable. Serializable execution is defined to be an execution of operations in which
concurrently executing transactions appears to be serially executing.
The Table is given below clearly depicts the relationship between isolation levels, read phenomena and locks:
Isolation level Dirty reads Non-repeatable reads Phantoms
Read Uncommitted May Occur May Occur May Occur
Read Committed Don’t Occur May Occur May Occur
Repeatable Read Don’t Occur Don’t Occur May Occur
Serialized Don’t Occur Don’t Occur Don’t Occur
Anomaly Serializable is not the same as Serializable. That is, it is necessary, but not sufficient that a
Serializable schedule should be free of all three phenomena types.
Testing of Serializability
Serialization Graph is used to test the Serializability of a schedule.
Assume a schedule S. For S, we construct a graph known as precedence graph. This graph has a pair G = (V,
E), where V consists a set of vertices, and E consists a set of edges. The set of vertices is used to contain all
the transactions participating in the schedule. The set of edges is used to contain all edges Ti ->Tj for which
one of the three conditions holds:
1. Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).
2. Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).
3. Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q).
Serialization Graph is used to test the Serializability of a schedule.
Assume a schedule S. For S, we construct a graph known as precedence graph. This graph has a pair G = (V,
E), where V consists a set of vertices, and E consists a set of edges. The set of vertices is used to contain all
the transactions participating in the schedule. The set of edges is used to contain all edges Ti ->Tj for which
one of the three conditions holds:
1. Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).
2. Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).
3. Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q).
• If a precedence graph contains a single edge Ti → Tj, then all the instructions of Ti are executed before
the first instruction of Tj is executed.
• If a precedence graph for schedule S contains a cycle, then S is non-serializable. If the precedence
graph has no cycle, then S is known as serializable.
Explanation:
Read(A): In T1, no subsequent writes to A, so no new edges.
Read(B): In T2, no subsequent writes to B, so no new edges
Read(C): In T3, no subsequent writes to C, so no new edges
Write(B): B is subsequently read by T3, so add edge T2 → T3
Write(C): C is subsequently read by T1, so add edge T3 → T1
Write(A): A is subsequently read by T2, so add edge T1 → T2
Write(A): In T2, no subsequent reads to A, so no new edges
Write(C): In T1, no subsequent reads to C, so no new edges
Write(B): In T3, no subsequent reads to B, so no new edges
Precedence graph for schedule S1:
The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is non-serializable.
Explanation:
Read(A): In T4,no subsequent writes to A, so no new edges
Read(C): In T4, no subsequent writes to C, so no new edges
Write(A): A is subsequently read by T5, so add edge T4 → T5
Read(B): In T5,no subsequent writes to B, so no new edges
Write(C): C is subsequently read by T6, so add edge T4 → T6
Write(B): A is subsequently read by T6, so add edge T5 → T6
Write(C): In T6, no subsequent reads to C, so no new edges
Write(A): In T5, no subsequent reads to A, so no new edges
Write(B): In T6, no subsequent reads to B, so no new edges
Precedence graph for schedule S2:
The precedence graph for schedule S2 contains no cycle that's why ScheduleS2 is serializable.
Concurrency control
• In the concurrency control, the multiple transactions can be executed simultaneously.
• It may affect the transaction result. It is highly important to maintain the order of execution of those
transactions.
Problems of concurrency control
Several problems can occur when concurrent transactions are executed in an uncontrolled manner. Following
are the three problems in concurrency control.
1. Lost updates
2. Dirty read
3. Unrepeatable read
1. Lost update problem
When two transactions that access the same database items contain their operations in a way that makes the
value of some database item incorrect, then the lost update problem occurs.
If two transactions T1 and T2 read a record and then update it, then the effect of updating of the first record
will be overwritten by the second update.
Example:
Transaction-X Time Transaction-Y
- t1 -
Read A t2 -
- t3 Read A
Update A t4 -
- t5 Update A
- t6 -
Here,
• At time t2, transaction-X reads A's value.
• At time t3, Transaction-Y reads A's value.
• At time t4, Transactions-X writes A's value on the basis of the value seen at time t2.
• At time t5, Transactions-Y writes A's value on the basis of the value seen at time t3.
• So at time T5, the update of Transaction-X is lost because Transaction y overwrites it without looking at
its current value.
• Such type of problem is known as Lost Update Problem as update made by one transaction is lost here.
2. Dirty Read
• The dirty read occurs in the case when one transaction updates an item of the database, and then the
transaction fails for some reason. The updated database item is accessed by another transaction before it
is changed back to the original value.
• A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now has values which have
never formed part of the stable database.
Example:
Transaction-X Time Transaction-Y
- t1 -
- t2 Update A
Read A t3 -
- t4 Rollback
- t5 -
• At time t2, transaction-Y writes A's value.
• At time t3, Transaction-X reads A's value.
• At time t4, Transactions-Y rollbacks. So, it changes A's value back to that of prior to t1.
• So, Transaction-X now contains a value which has never become part of the stable database.
• Such type of problem is known as Dirty Read Problem, as one transaction reads a dirty value which has
not been committed.
3. Inconsistent Retrievals Problem
• Inconsistent Retrievals Problem is also known as unrepeatable read. When a transaction calculates some
summary function over a set of data while the other transactions are updating the data, then the
Inconsistent Retrievals Problem occurs.
• A transaction T1 reads a record and then does some other processing during which the transaction T2
updates the record. Now when the transaction T1 reads the record, then the new value will be
inconsistent with the previous value.
Example:
Suppose two transactions operate on three accounts.
Account – 1 Account – 2 Account – 3
• Transaction-X is doing the sum of all balance while transaction-Y is transferring an amount 50 from
Account-1 to Account-3.
• Here, transaction-X produces the result of 550 which is incorrect. If we write this produced result in the
database, the database will become an inconsistent state because the actual sum is 600.
• Here, transaction-X has seen an inconsistent state of the database.
Concurrency Control Protocol
Concurrency control protocols ensure atomicity, isolation, and serializability of concurrent transactions. The
concurrency control protocol can be divided into three categories:
1. Lock based protocol
2. Time-stamp protocol
3. Validation based protocol
Lock Based Protocols
Database systems equipped with lock-based protocols use a mechanism by which any transaction cannot read
or write data until it acquires an appropriate lock on it. Locks are of two kinds −
• Binary Locks − A lock on a data item can be in two states; it is either locked or unlocked.
• Shared/exclusive − This type of locking mechanism differentiates the locks based on their uses. If a
lock is acquired on a data item to perform a write operation, it is an exclusive lock. Allowing more
than one transaction to write on the same data item would lead the database into an inconsistent state.
Read locks are shared because no data value is being changed.
There are four types of lock protocols available −
Simplistic Lock Protocol
Simplistic lock-based protocols allow transactions to obtain a lock on every object before a 'write' operation
is performed. Transactions may unlock the data item after completing the ‘write’ operation.
Pre-claiming Lock Protocol
Pre-claiming protocols evaluate their operations and create a list of data items on which they need locks.
Before initiating an execution, the transaction requests the system for all the locks it needs beforehand. If all
the locks are granted, the transaction executes and releases all the locks when all its operations are over. If
all the locks are not granted, the transaction rolls back and waits until all the locks are granted.