ADBMS Lec5
ADBMS Lec5
Sunil Paudel
[email protected]
1
Transaction Management
Transaction Concept
A transaction is a unit of program execution that
accesses and possibly updates various data items.
A transaction is any one execution of a user program
in a DBMS
A transaction is a series of reads and writes of
database objects
Transaction states
Active transaction
Partially committed transaction
Committed transaction
Failed transaction
Aborted transaction
Example of Fund Transfer
Transaction to transfer $50 from account A to account B:
1. read(A)
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
11
Implementation of Atomicity and
Durability
The recovery-management component of a database system
implements the support for atomicity and durability.
E.g. the shadow-database scheme:
all updates are made on a shadow copy of the database
• db_pointer is made to point to the updated shadow copy after
– the transaction reaches partial commit and
– all updated pages have been flushed to disk.
Implementation of Atomicity and
Durability (Cont.)
db_pointer always points to the current
consistent copy of the database.
In case transaction fails, old consistent copy pointed
to by db_pointer can be used, and the shadow copy
can be deleted.
The shadow-database scheme:
Assumes that only one transaction is active at a
time.
Assumes disks do not fail
Does not handle concurrent transactions
Concurrent Executions
Multiple transactions are allowed to run
concurrently in the system.
Advantages are:
increased processor and disk utilization, leading to
better transaction throughput
• E.g. one transaction can be using the CPU while another is
reading from or writing to the disk
reduced average response time for transactions: short
transactions need not wait behind long ones.
Concurrency control schemes – mechanisms to
achieve isolation
that is, to control the interaction among the concurrent
transactions in order to prevent them from destroying the
consistency of the database
Schedules
A sequences of instructions that specify the chronological
order in which instructions of concurrent transactions are
executed
a schedule for a set of transactions must consist of all instructions
of those transactions
must preserve the order in which the instructions appear in each
individual transaction.
A transaction that successfully completes its execution will
have a commit instructions as the last statement
by default transaction assumed to execute commit instruction as
its last step
A transaction that fails to successfully complete its
execution will have an abort instruction as the last
statement
Schedule 1
Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to
B.
A serial schedule in which T is followed by T :
1 2
Schedule 2
• A serial schedule where T2 is followed by T1
Schedule 3
Let T1 and T2 be the transactions defined previously. The following
schedule is not a serial schedule, but it is equivalent to Schedule 1.
22
Conflicting Instructions
Instructions li and lj of transactions Ti and Tj
respectively, conflict if and only if there exists some
item Q accessed by both li and lj, and at least one of
these instructions wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Intuitively, a conflict between li and lj forces a
(logical) temporal order between them.
If li and lj are consecutive in a schedule and they do not
conflict, their results would remain the same even if they
had been interchanged in the schedule.
Conflict Serializability
If a schedule S can be transformed into a
schedule S´ by a series of swaps of non-
conflicting instructions, we say that S and S´
are conflict equivalent.
We say that a schedule S is conflict
serializable if it is conflict equivalent to a
serial schedule
Conflict Serializability (Cont.)
Schedule 3 can be transformed into Schedule 6, a serial
schedule where T2 follows T1, by series of swaps of non-
conflicting instructions.
Therefore Schedule 3 is conflict serializable.
Schedule 3 Schedule 6
View Serializability
Let S and S´ be two schedules with the same set of
transactions. S and S´ are view equivalent if the
following three conditions are met, for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then
in schedule S’ also transaction Ti must read the initial value of
Q.
2. If in schedule S transaction Ti executes read(Q), and that value
was produced by transaction Tj (if any), then in schedule S’
also transaction Ti must read the value of Q that was produced
by the same write(Q) operation of transaction Tj .
3. The transaction (if any) that performs the final write(Q)
operation in schedule S must also perform the final write(Q)
operation in schedule S’.
View Serializability (Cont.)
A schedule S is view serializable if it is view
equivalent to a serial schedule.
Every conflict serializable schedule is also
view serializable.
Below is a schedule which is view-serializable
but not conflict serializable.
What serial schedule is above equivalent to?
Every view serializable schedule that is not conflict
serializable has blind writes.
Other Notions of Serializability
The schedule below produces same outcome as the
serial schedule < T1, T5 >, yet is not conflict equivalent
or view equivalent to it.
Determining such equivalence requires analysis of
operations other than read and write.
Recovery
30
Recovery
Recovery means to restore the database to a
correct state after some failure has rendered the
current state incorrect or suspect
Recovery is based on redundancy
To recover a database, the source for the
recovery must be information that has been
stored redundantly somewhere else
Failure Classification
Transaction failure :
Logical errors: transaction cannot complete due to some
internal error condition
System errors: the database system must terminate an active
transaction due to an error condition (e.g., deadlock)
System crash: a power failure or other hardware or
software failure causes the system to crash.
Fail-stop assumption: non-volatile storage contents are
assumed to not be corrupted by system crash
• Database systems have numerous integrity checks to prevent
corruption of disk data
Disk failure: a head crash or similar disk failure destroys
all or part of disk storage
Destruction is assumed to be detectable: disk drives use
checksums to detect failures
Recovery Algorithms
Recovery algorithms are techniques to ensure
database consistency and transaction atomicity
and durability despite failures
Focus of this chapter
Recovery algorithms have two parts
1. Actions taken during normal transaction processing
to ensure enough information exists to recover from
failures
2. Actions taken after a failure to recover the database
contents to a state that ensures atomicity,
consistency and durability
Storage Structure
Volatile storage:
does not survive system crashes
examples: main memory, cache memory
Nonvolatile storage:
survives system crashes
examples: disk, tape, flash memory,
non-volatile (battery backed up) RAM
Stable storage:
a mythical form of storage that survives all failures
approximated by maintaining multiple copies on
distinct nonvolatile media
Stable-Storage Implementation
Maintain multiple copies of each block on separate disks
copies can be at remote sites to protect against disasters such as
fire or flooding.
Failure during data transfer can still result in inconsistent
copies: Block transfer can result in
Successful completion
Partial failure: destination block has incorrect information
Total failure: destination block was never updated
Protecting storage media from failure during data transfer
(one solution):
Execute output operation as follows (assuming two copies of each
block):
1. Write the information onto the first physical block.
2. When the first write successfully completes, write the same information
onto the second physical block.
3. The output is completed only after the second write successfully
completes.
Stable-Storage Implementation (Cont.)
Protecting storage media from failure during data transfer
(cont.):
Copies of a block may differ due to failure during output
operation. To recover from failure:
1. First find inconsistent blocks:
1. Expensive solution: Compare the two copies of every disk block.
2. Better solution:
Record in-progress disk writes on non-volatile storage (Non-volatile RAM or
special area of disk).
Use this information during recovery to find blocks that may be inconsistent,
and only compare copies of these.
Used in hardware RAID systems
2. If either copy of an inconsistent block is detected to have an error
(bad checksum), overwrite it by the other copy. If both have no
error, but are different, overwrite the second block by the first block.
Data Access
Physical blocks are those blocks residing on the disk.
Buffer blocks are the blocks residing temporarily in main
memory.
Block movements between disk and main memory are
initiated through the following two operations:
input(B) transfers the physical block B to main memory.
output(B) transfers the buffer block B to the disk, and replaces the
appropriate physical block there.
Each transaction Ti has its private work-area in which local
copies of all data items accessed and updated by it are
kept.
Ti's local copy of a data item X is called xi.
Data Access (Cont.)
Transaction transfers data items between system buffer blocks
and its private work-area using the following operations :
read(X) assigns the value of data item X to the local variable xi.
write(X) assigns the value of local variable xi to data item {X} in the
buffer block.
both these commands may necessitate the issue of an input(BX)
instruction before the assignment, if the block BX in which X resides is
not already in memory.
Transactions
Perform read(X) while accessing X for the first time;
All subsequent accesses are to the local copy.
After last access, transaction executes write(X).
output(BX) need not immediately follow write(X). System can
perform the output operation when it deems fit.
Example of Data Access
buffer
Buffer Block A input(A)
X A
Buffer Block B Y B
output(B)
read(X)
write(Y)
x2
x1
y1
T1 T2 … Tn How to prevent
harmful interference
btw transactions?
DB => scheduling
(consistency techniques based on
constraints) - locks
- timestamps and
validation
DBMS 2001
Concurrency Problems –Description
A lost update occurs when a second transaction
reads the state of the database prior to the first
one writing a change, and then stomps on the
first one’s change with its own update
An uncommitted dependency occurs when a
second transaction relies on a change which has
not yet been committed, which is rolled back
after the second transaction has begun
An inconsistent analysis occurs when totals are
calculated during interleaved updates
Locking
A transaction locks a portion of the
database to prevent concurrency problems
Exclusive lock – write lock, will lock out all
other transactions
Shared lock – read lock, will lock out
writes, but allow other reads
Lock-Based Protocols
A lock is a mechanism to control concurrent access to a
data item
Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well
as written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is
56
More Deadlock Prevention
Strategies
Following schemes use transaction timestamps for
the sake of deadlock prevention alone.
wait-die scheme — non-preemptive
older transaction may wait for younger one to
release data item. Younger transactions never wait
for older ones; they are rolled back instead.
a transaction may die several times before
acquiring needed data item
wound-wait scheme — preemptive
older transaction wounds (forces rollback) of
younger transaction instead of waiting for it.
Younger transactions may wait for older ones.
may be fewer rollbacks than wait-die scheme.
Pitfalls of Lock-Based Protocols (Cont.)
Starvation is also possible if concurrency
control manager is badly designed. For
example:
A transaction may be waiting for an X-lock on an
item, while a sequence of other transactions request
and are granted an S-lock on the same item.
The same transaction is repeatedly rolled back due
to deadlocks.
Concurrency control manager can be designed
to prevent starvation.
The Two-Phase Locking Protocol
This is a protocol which ensures conflict-serializable
schedules.
Phase 1: Growing Phase
transaction may obtain locks
transaction may not release locks
Phase 2: Shrinking Phase
transaction may release locks
transaction may not obtain locks
The protocol assures serializability. It can be proved that
the transactions can be serialized in the order of their lock
points (i.e. the point where a transaction acquired its final
lock).
Implementation of Locking
A lock manager can be implemented as a separate
process to which transactions send lock and unlock
requests
The lock manager replies to a lock request by sending a
lock grant messages (or a message asking the
transaction to roll back, in case of a deadlock)
The requesting transaction waits until its request is
answered
The lock manager maintains a data-structure called a
lock table to record granted locks and pending requests
The lock table is usually implemented as an in-memory
hash table indexed on the name of the data item being
locked
Remaining Topics
1. Physical database administration
2. Database security and authorization
3. Case Study ( Oracle and MS SQL)
4. Parallel and Distributed Databases
( including Internet Database)
5. Object Oriented Database
6. Data Mining
62