DBMS UNIT-5 LONGS
Unit-V-Essay Questions
1. Explain about transaction states with an example.
1A) A transaction in a database is a sequence of operations
performed as a single logical unit of work. During its
execution, a transaction goes through several states to
ensure data integrity and correctness.
The Main Transaction States are:
State Description
The transaction is currently executing its
Active
operations (read/write/update).
The transaction has completed its final
Partially
operation but is not yet fully saved to the
Committed
database.
All operations of the transaction are
Committed successfully completed and permanently
saved.
A failure occurs during transaction
Failed execution; the transaction cannot
proceed.
The transaction has been rolled back and
Aborted the database is restored to the previous
consistent state.
Diagram of Transaction States:
Start
↓
Active
↓
Partially Committed
↓
Committed (if success)
↘
Failed (if error)
↓
Aborted
Explanation of Each State:
1. Active:
o When the transaction starts, it is in the active state.
o It performs all the read and write operations.
2. Partially Committed:
o After the final operation, but before the data is
permanently saved, the transaction enters the
partially committed state.
3. Committed:
o If all operations are completed successfully and
changes are permanently saved, the transaction
moves to the committed state.
o The changes are now permanent and cannot be
undone.
4. Failed:
o If a failure occurs (due to system crash, validation
error, etc.), the transaction enters the failed state.
5. Aborted:
o After failure, the transaction must be rolled back.
o All the changes made by the transaction are
undone, and the system returns to the state before
the transaction started.
Example:
Suppose a bank transaction where money is transferred
from Account A to Account B:
T1: Deduct ₹500 from Account A.
T2: Add ₹500 to Account B.
Transaction Progress:
Initially, the transaction is Active.
Deducting ₹500 from Account A happens → still Active.
Adding ₹500 to Account B happens → transaction
becomes Partially Committed.
If no error occurs, it becomes Committed and the
changes are saved.
If the system crashes before adding ₹500 to Account B, it
Fails and Aborts, rolling back the deduction from
Account A.
Summary Table:
Transaction
State Example Action
Phase
Start Deduct ₹500 from
Active
transaction Account A
Last operation Partially Add ₹500 to Account
done Committed B
Balance updated for
Successful save Committed
both accounts
Failed → Undo deduction from
Error occurs
Aborted Account A
2. What are ACID properties? Illustrate them through
examples and also explain commit and Rollback.
2A) ACID properties ensure that database transactions are
processed reliably.
ACID stands for:
Property Meaning
A— A transaction is all-or-nothing: either fully
Atomicity completed or fully undone.
A transaction transforms the database
C—
from one valid state to another valid
Consistency
state.
Transactions are executed independently
I — Isolation
without interfering with each other.
Once a transaction is committed, its
D—
changes are permanent, even if the
Durability
system fails.
Detailed Explanation with Examples:
Property Explanation Example
In a bank transfer,
deduct ₹500 from
If any part of the
Account A and add
transaction fails, the
₹500 to Account B.
entire transaction
Atomicity If deduction
fails, and the
succeeds but
database remains
addition fails, the
unchanged.
deduction must also
be undone.
Consistency A transaction must After a transfer, the
leave the database total balance of the
in a consistent state, bank must remain
Property Explanation Example
maintaining rules the same (no money
like constraints and should be lost or
relationships. created).
The intermediate
Two people booking
state of a
the last seat on a
transaction is
Isolation flight: isolation
hidden from other
ensures only one
transactions until it
booking succeeds.
completes.
Once a transaction After booking a
is committed, ticket, the booking
Durability changes survive remains confirmed
even if the system even if the server
crashes. restarts.
Commit and Rollback:
Operation Meaning When Used
When a
Save all the changes
transaction
made by the transaction
Commit completes
permanently into the
successfully
database.
without any error.
Rollback Undo all the changes When a
made by the transaction transaction fails
and restore the or an error occurs
Operation Meaning When Used
database to its previous
during execution.
consistent state.
Simple Example for Commit and Rollback:
Bank Transfer:
o Transaction: Transfer ₹500 from Account A to
Account B.
o If deduction and addition both succeed → Commit
→ Save the changes.
o If addition fails (e.g., Account B not found) →
Rollback → Undo the deduction from Account A.
3. Explain about Conflict Serializability with an example.
3A) Conflict Serializability is a concept used in database
systems to ensure that the execution of concurrent
transactions is correct and produces the same result as
some serial execution (one after another without overlap).
Definition:
A schedule (order of operations) is said to be conflict
serializable if it can be transformed into a serial schedule
by swapping non-conflicting operations.
Conflicting Operations:
Two operations conflict if they:
Belong to different transactions,
Operate on the same data item, and
At least one of them is a write operation.
Types of Conflicts:
Read-Write Conflict (RW): One transaction reads,
another writes.
Write-Read Conflict (WR): One transaction writes,
another reads.
Write-Write Conflict (WW): Both transactions write.
Example of Conflict Serializability:
Suppose we have two transactions:
T1:
o Read(A)
o Write(A)
T2:
o Read(A)
o Write(A)
Now, consider the following schedule:
css
CopyEdit
S: R1(A) → R2(A) → W1(A) → W2(A)
Where:
R1(A) = T1 reads A,
R2(A) = T2 reads A,
W1(A) = T1 writes A,
W2(A) = T2 writes A.
Checking for Conflicts:
R1(A) and W2(A): Conflict (read-write conflict)
W1(A) and R2(A): Conflict (write-read conflict)
W1(A) and W2(A): Conflict (write-write conflict)
Since there are conflicts, we build a precedence graph:
From Transaction To Transaction Reason
T1 T2 W1(A) before W2(A)
Graph:
T1 → T2
The graph has no cycle, so the schedule is conflict
serializable.
Equivalent Serial Schedule: T1 followed by T2.
Summary Table:
Term Meaning
Schedule can be rearranged into serial
Conflict
order by swapping non-conflicting
Serializability
operations.
Conflict Two operations on same data item, one
Term Meaning
being a write.
Conflict Types Read-Write, Write-Read, Write-Write.
Build a precedence graph and check for
How to check
cycles.
If no cycle Schedule is conflict serializable.
4. Discuss about View Serializability.
4A) View Serializability is a concept in database systems to
ensure the correctness of a concurrent schedule.
It is a weaker (more general) form of serializability
compared to conflict serializability.
Definition:
A schedule is said to be view serializable if it is view
equivalent to a serial schedule.
Two schedules are view equivalent if:
1. Initial Reads: Each transaction reads the same initial
value of a data item in both schedules.
2. Read From: If a transaction reads a value written by
another transaction in one schedule, it should read the
same value written by the same transaction in the other
schedule.
3. Final Writes: The final value of each data item must be
written by the same transaction in both schedules.
Key Points:
Term Meaning
Read and write operations behave
View Equivalence
the same way in two schedules.
A schedule can be reordered to a
View Serializability serial schedule based on view
equivalence.
Comparison to Every conflict-serializable schedule
Conflict is view-serializable, but the reverse
Serializability is not always true.
Example of View Serializability:
Consider two transactions:
T1:
o Read(A)
o Write(A)
T2:
o Read(A)
o Write(A)
Now, consider the following schedule S:
css
CopyEdit
S: R1(A) → R2(A) → W2(A) → W1(A)
(First T1 reads A, then T2 reads A, then T2 writes A, then
T1 writes A.)
Checking for View Serializability:
Both transactions read the original value of A.
T2 writes A first, but T1 writes A finally.
Final write on A is by T1.
Observation:
There is no conflict serializability here (because T2
writes before T1).
But View Serializability exists, because final results and
read-from relations match the serial execution T2 → T1.
Thus, schedule S is view serializable.
Summary Table:
Aspect Description
Ensure correct execution of concurrent
Purpose
transactions.
Schedules are view equivalent if initial reads,
Definition
read-from relations, and final writes match.
Strength Weaker than conflict serializability.
Key Allows more schedules than conflict
Aspect Description
Feature serializability.
A schedule where final writes match a serial
Example
order, even if conflicts exist.
5)Define concurrent execution? Explain about problems with
concurrent execution.
5A) Concurrent execution refers to the simultaneous
execution of multiple transactions in a database system.
Instead of executing one transaction at a time (serial
execution), the operations of different transactions are
interleaved to improve performance and resource utilization.
Concurrent execution allows:
Better CPU and disk utilization,
Higher throughput (more transactions completed in less
time),
Shorter response times for users.
Problems with Concurrent Execution:
Although concurrent execution improves performance, it can
also cause several problems if not properly controlled:
1. Lost Update Problem
Occurs when two transactions update the same data
item, and one update is overwritten by another.
Example:
mathematica
CopyEdit
T1: Read(A); A = A + 10; Write(A)
T2: Read(A); A = A + 20; Write(A)
If T1 and T2 read A at the same time, then both write back
their values, one update may be lost.
2. Temporary Update Problem (Dirty Read)
A transaction reads data written by another transaction
that has not yet committed.
If the first transaction rolls back, the second transaction
would have read incorrect (dirty) data.
Example:
makefile
CopyEdit
T1: Write(A = 500)
T2: Read(A)
T1: Rollback
Now T2 has read a value that doesn’t exist anymore.
3. Incorrect Summary Problem
Occurs when one transaction reads some but not all
records, while another transaction is updating those
records.
Example:
T1 calculates the sum of account balances.
T2 is transferring money between accounts during T1’s
calculation.
The sum computed by T1 becomes inaccurate.
4. Unrepeatable Read Problem
A transaction reads the same data twice but gets
different values because another transaction modified it
in between.
Example:
T1: Read(A)
T2: Write(A)
T1: Read(A) → Different value
Summary Table:
Problem Description
Lost Update One update overwrites another.
Temporary Update (Dirty
Read uncommitted data.
Read)
Incorrect Summary Reading inconsistent or partial
Problem Description
data.
Different values read in the same
Unrepeatable Read
transaction.
6)Discuss in detail about all Lock Based Protocols
6A) Lock-based protocols are concurrency control
mechanisms used in database systems to manage access
to data items during concurrent transaction execution.
They ensure serializability and consistency by
controlling how multiple transactions interact with
shared data.
Types of Locks:
1. Shared Lock (S-lock):
o Used when a transaction wants to read a data item.
o Multiple transactions can hold shared locks on the
same data item at the same time.
2. Exclusive Lock (X-lock):
o Used when a transaction wants to write (update) a
data item.
o Only one transaction can hold an exclusive lock on a
data item at any time.
Operation Type of Lock
Read Shared Lock (S-lock)
Write Exclusive Lock (X-lock)
Operation Type of Lock
Types of Lock-Based Protocols:
1. Simplistic Lock Protocol
A transaction locks a data item before accessing it.
After completing the operation, it immediately releases
the lock.
Simple but does not guarantee serializability.
2. Two-Phase Locking Protocol (2PL)
Divides transaction execution into two phases:
o Growing Phase: Transaction acquires all the locks it
needs but cannot release any lock.
o Shrinking Phase: Transaction releases its locks but
cannot acquire any new locks.
Property:
Guarantees conflict serializability.
Diagram:
Growing Phase: Acquire locks →
Shrinking Phase: Release locks
3. Strict Two-Phase Locking (Strict 2PL)
A special version of 2PL.
A transaction holds all its exclusive locks until commit or
abort.
Advantage: Prevents cascading rollbacks.
4. Rigorous Two-Phase Locking
Even shared locks are held until commit or abort.
Transactions behave in an even stricter manner
compared to strict 2PL.
Advantage: Ensures transactions are serializable and
recoverable.
5. Multiple Granularity Locking
Allows locking at various levels: database, table, page, or
row.
Locks can be coarse-grained or fine-grained.
Uses a hierarchy of locks (e.g., intention locks) to
efficiently manage complex transactions.
Summary Table:
Key Serializabi Recoverabi
Protocol
Feature lity lity
Lock and
Simplisti
unlock
c No No
immediat
Locking
ely
Two- Growing
Phase and
Yes No
Locking shrinking
(2PL) phases
Hold
Strict exclusive
Yes Yes
2PL locks till
commit
Rigorous Hold all Yes Yes
Key Serializabi Recoverabi
Protocol
Feature lity lity
locks (S &
2PL X) till
commit
Locking
Multiple
at
Granular
different Yes Yes
ity
levels of
Locking
database
7. Discuss Timestamp- based protocols?
7A) Introduction:
Timestamp-based protocols are concurrency control
methods that order transactions based on their
timestamps.
Every transaction is assigned a unique timestamp when
it starts.
Older transactions (with smaller timestamps) are given
priority over newer ones.
The main goal is to ensure serializability according to
the transaction timestamps.
How It Works:
Each data item in the database maintains two
timestamps:
1. Read Timestamp (RTS): Latest time the data item
was read.
2. Write Timestamp (WTS): Latest time the data item
was written.
When a transaction tries to read or write a data item, the
system checks these timestamps to ensure no conflict occurs.
Rules:
1. For Read Operation:
If TS(T) < WTS(X):
o Reject the read (Transaction T is trying to read
outdated data).
o Transaction T is rolled back and restarted with a
new timestamp.
Else:
o Allow the transaction to read.
o Update RTS(X) = max(RTS(X), TS(T)).
2. For Write Operation:
If TS(T) < RTS(X) or TS(T) < WTS(X):
o Reject the write (This would violate serializability).
o Transaction T is rolled back.
Else:
o Allow the transaction to write.
o Update WTS(X) = TS(T).
Advantages:
Simple to understand and implement.
Ensures serializability automatically without using locks.
No deadlocks (because transactions never wait; they just
rollback if needed).
Disadvantages:
High rollback rate if transactions conflict frequently.
Waste of resources due to repeated rollbacks and
restarts.
Example:
Assume:
Transaction T1: Timestamp 5
Transaction T2: Timestamp 10
Data item X: RTS(X) = 6, WTS(X) = 7
Now, T2 (timestamp 10) tries to read(X):
→ TS(T2) > WTS(X), so the read is allowed.
Suppose T1 (timestamp 5) now tries to write(X):
→ TS(T1) < RTS(X), so T1 is rolled back because it’s too
old to overwrite.
Summary Table:
Aspect Details
Assignment Unique timestamp to each transaction
Read Timestamp (RTS) and Write Timestamp
Maintains
(WTS) for each data item
Serializability Maintained automatically
Deadlocks No deadlocks
Main issue Rollbacks can be frequent
8. Describe Optimistic Protocols
8A) Introduction:
Optimistic Protocols are concurrency control methods
that assume conflicts between transactions are rare.
Instead of locking data, transactions are allowed to
execute freely.
Validation is done at the end of the transaction to check
for conflicts.
If conflicts are found, the transaction may be rolled back
and restarted.
Phases of Optimistic Protocol:
Optimistic concurrency control works in three main phases:
1. Read Phase:
The transaction reads data from the database.
Changes are made to local copies (not to the database
immediately).
No locks are acquired during this phase.
2. Validation Phase:
Before committing, the transaction is validated.
The system checks whether the transaction can be
committed without violating serializability.
If there are no conflicts (i.e., no other transaction has
changed the data the current transaction read), the
transaction proceeds.
3. Write Phase:
If validation is successful, the transaction writes its
changes to the database.
Otherwise, the transaction is aborted and restarted.
Validation Rules:
Validation checks if the read set and write set of the
transaction conflict with other committed transactions.
If:
A transaction T1 completes before another transaction
T2 starts → No conflict.
The write set of T1 and the read set of T2 do not overlap
→ No conflict.
Advantages:
High concurrency: Many transactions can run
simultaneously without blocking.
No deadlocks: Since no locks are held, deadlocks cannot
occur.
Efficient when conflicts are rare: Very effective in
systems with mostly independent transactions.
Disadvantages:
Costly rollbacks: If conflicts are frequent, many
transactions may have to restart.
Validation overhead: Requires extra computation at
validation time.
Example:
Transaction T1 reads X and Y, modifies X.
Transaction T2 reads X.
If T1 and T2 overlap, validation ensures that T1’s write to
X does not affect T2’s read.
If validation fails, one transaction (usually the younger
one) will rollback.
Summary Table:
Aspect Details
Assumption Conflicts are rare
Locking No locks used
Deadlocks Cannot occur
Phases Read → Validate → Write
Main Issue Rollback if validation fails
9. Explain the concept of deadlock and dead lock prevention
policies with an example.
9A) Deadlock Concept:
Deadlock occurs when two or more transactions are
waiting for each other to release resources, but none
can proceed because each transaction is holding a
resource the other needs.
It creates a permanent blocking situation unless
resolved externally.
Deadlock Example:
Transaction T1 locks Resource A and needs Resource B.
Transaction T2 locks Resource B and needs Resource A.
Neither transaction can move forward → Deadlock.
Simple Illustration:
makefile
CopyEdit
T1: holds A, requests B
T2: holds B, requests A
→ Deadlock occurs!
Deadlock Prevention Policies:
Deadlock prevention ensures that one of the necessary
conditions for deadlock does not occur (like circular wait).
Common Deadlock Prevention Techniques:
1. Wait-Die Scheme:
Assign a timestamp to each transaction.
Older transactions are allowed to wait for younger ones.
Younger transactions requesting resources held by older
ones are aborted ("die") and restarted.
Example:
T1 (timestamp 5) requests a lock held by T2 (timestamp
10).
T1 (older) waits.
2. Wound-Wait Scheme:
Older transactions can force (wound) younger
transactions to abort if they hold the needed resources.
Younger transactions must wait if they need resources
held by older transactions.
Example:
T1 (timestamp 5) requests a lock held by T2 (timestamp
10).
T2 (younger) is wounded (aborted), and T1 proceeds.
3. Resource Ordering:
All resources are globally ordered (R1 < R2 < R3...).
Transactions must request resources in a specific order
to prevent circular waits.
No transaction can request a resource lower in the
order than the one it already holds.
4. Timeouts:
If a transaction waits for too long, it is assumed to be in
deadlock.
The system aborts it automatically.
Summary Table:
Policy Strategy
Wait-Die Older waits, younger aborts
Policy Strategy
Older wounds (forces abort), younger
Wound-Wait
waits
Resource Ordering Request resources in a fixed order
Timeouts Abort transaction after waiting too long
10. Define locking protocol. Describe the Strict Two-Phase
Locking protocol with an example.
10A) Locking Protocol:
A locking protocol is a set of rules that governs how
locks are applied to resources in a database
management system (DBMS).
The protocol ensures that transactions do not interfere
with each other by restricting access to shared
resources.
Locking helps in preventing data corruption and ensures
serializability, which means the result of executing
transactions concurrently is the same as if they were
executed sequentially.
Types of Locks:
Exclusive Lock (X): The transaction has exclusive access
to the resource. No other transaction can read or modify
the locked data.
Shared Lock (S): The transaction can read the resource,
but cannot modify it. Other transactions can also acquire
shared locks to read the data.
Strict Two-Phase Locking (2PL) Protocol:
Strict Two-Phase Locking (2PL) is a locking protocol that
ensures serializability by dividing a transaction into two
phases: the growing phase and the shrinking phase.
It requires that a transaction acquire all its locks (both
shared and exclusive) during the growing phase and
release the locks only during the shrinking phase.
Phases of Strict Two-Phase Locking:
1. Growing Phase:
o The transaction can acquire locks (both shared and
exclusive) but cannot release any locks.
o It can continue acquiring locks until it completes its
read/write operations.
2. Shrinking Phase:
o Once a transaction releases its first lock, it enters
the shrinking phase.
o From this point on, it can only release locks; no
more locks can be acquired.
o This ensures that the transaction cannot change the
state of the database once it starts releasing locks.
Strict Two-Phase Locking Example:
Let’s consider Transaction T1 and Transaction T2, which both
want to access Resource A and Resource B.
1. T1 starts:
o T1 acquires Shared Lock (S) on Resource A
(Growing Phase).
o T1 then acquires Exclusive Lock (X) on Resource B
(Growing Phase).
2. T2 starts:
o T2 tries to acquire a lock on Resource A but cannot
get it because T1 already holds a Shared Lock on A.
3. T1 releases its locks:
o T1 releases the Exclusive Lock (X) on Resource B
(Shrinking Phase).
o T1 releases the Shared Lock (S) on Resource A
(Shrinking Phase).
4. T2 acquires locks:
o After T1 releases the locks, T2 can acquire Shared
Lock (S) on Resource A and Exclusive Lock (X) on
Resource B.
Key Points of Strict Two-Phase Locking:
Serializability: This protocol ensures that the transaction
schedule is serializable, meaning it produces the same
result as if the transactions were executed one after
another.
Prevents Cascading Aborts: Transactions are not
allowed to release locks before the end of the
transaction. This prevents cascading rollbacks in case a
transaction fails after releasing locks.
Deadlock Potential: While Strict 2PL ensures
serializability, it can also cause deadlocks. A transaction
may wait indefinitely for another transaction to release
locks, especially in systems with many concurrent
transactions.
Summary Table:
Phase Description
Growing Phase Acquire locks, no release of locks.
Release locks, no more locks can be
Shrinking Phase
acquired.
11. Explain about Two-Phase looking protocol.
11A) Two-Phase Locking Protocol (2PL):
The Two-Phase Locking (2PL) protocol is a locking
mechanism used to ensure serializability in transaction
processing systems.
Serializability refers to the concept that the result of
executing multiple transactions concurrently should be
the same as if they were executed sequentially, one after
the other.
The 2PL protocol divides the execution of a transaction
into two distinct phases:
1. The Growing Phase
2. The Shrinking Phase
Phases of Two-Phase Locking:
1. Growing Phase:
During this phase, a transaction can acquire locks (both
shared and exclusive) on the resources it needs to
access.
The transaction cannot release any locks during this
phase.
The growing phase continues until the transaction
acquires all the necessary locks.
2. Shrinking Phase:
Once the transaction begins releasing locks, it enters the
shrinking phase.
From this point, the transaction cannot acquire any new
locks.
The shrinking phase continues until the transaction has
released all locks and is completed.
In essence, during the growing phase, the transaction
increases its set of locks. During the shrinking phase, it
decreases its set of locks.
Example of Two-Phase Locking:
Let’s consider Transaction T1 and Transaction T2 that both
want to access Resource A and Resource B:
1. T1 starts:
o T1 acquires a Shared Lock (S) on Resource A
(Growing Phase).
o T1 then acquires an Exclusive Lock (X) on Resource
B (Growing Phase).
2. T2 starts:
o T2 tries to acquire a lock on Resource A, but cannot
proceed because T1 already holds a Shared Lock (S)
on A.
o T2 must wait for T1 to release the lock on A.
3. T1 releases locks:
o T1 releases the Exclusive Lock (X) on Resource B
(Shrinking Phase).
o T1 releases the Shared Lock (S) on Resource A
(Shrinking Phase).
4. T2 proceeds:
o After T1 has released its locks, T2 can acquire
Shared Lock (S) on Resource A and Exclusive Lock
(X) on Resource B.
Advantages of Two-Phase Locking:
Ensures Serializability: By following the two-phase rule,
2PL guarantees that the schedule of transactions is
serializable. This is the primary reason it is widely used
in transaction processing systems.
Deadlock Prevention: While not a direct solution to
deadlocks, 2PL reduces the likelihood of deadlock
situations by enforcing strict rules about when locks can
be acquired and released.
Disadvantages of Two-Phase Locking:
Deadlocks: Since 2PL allows transactions to hold locks
until the end of their execution, it may cause deadlocks.
For example, Transaction T1 might be waiting for a lock
held by T2, and T2 might be waiting for a lock held by T1,
resulting in a deadlock.
Resource Contention: Transactions may spend a long
time holding locks, which could block other transactions
from accessing needed resources, leading to potential
performance issues.
Increased Overhead: Transactions may need to wait for
locks, especially in high contention environments,
leading to delays in transaction execution.
Summary Table:
Phase Description
Growing Phase Acquiring locks; cannot release locks.
Shrinking Phase Releasing locks; cannot acquire new locks.
12. Explain about Log based recovery with an example.
12A) Log-Based Recovery:
Log-Based Recovery is a technique used in database
management systems (DBMS) to ensure transaction
durability and consistency after system failures such as
power outages, crashes, or software malfunctions.
This recovery mechanism uses a log file that records
every change (or modification) made to the database
during the execution of transactions.
The log file contains a sequential record of all the
operations performed by each transaction. This allows
the system to undo changes from incomplete
transactions or redo changes from committed
transactions during recovery.
Key Components of Log-Based Recovery:
1. Log File: A persistent record that tracks all database
operations.
2. Write-Ahead Logging (WAL): A rule that ensures that
changes are written to the log before they are applied to
the database.
Types of Log Records:
1. Transaction Start (BEGIN):
o Marks the beginning of a transaction. This record is
placed in the log as soon as a transaction begins.
2. Write Operation (UPDATE/WRITE):
o A record that contains the old value and the new
value for a modified data item, marking a change in
the database.
3. Transaction Commit (COMMIT):
o Indicates that a transaction has successfully
completed and is ready to be permanently applied
to the database. This record is logged before
changes are finalized.
4. Transaction Abort (ROLLBACK):
o Indicates that a transaction has been aborted, and
all changes made by that transaction need to be
undone.
Log-Based Recovery Process:
The recovery process is divided into two main steps:
1. UNDO (Backward Recovery):
o If a transaction is not completed and the system
crashes, we need to undo the operations of
incomplete transactions.
o For example, if Transaction T1 started but didn’t
commit before the crash, all changes made by T1
are reversed.
2. REDO (Forward Recovery):
o If a transaction has committed before the crash, the
changes made by that transaction need to be
redone to ensure durability.
o For instance, if Transaction T2 committed but the
changes weren’t written to disk, we redo those
changes from the log to restore consistency.
Example of Log-Based Recovery:
Let's consider Transaction T1 and Transaction T2.
1. Transaction T1 starts and makes a modification on Data
Item X:
o Log Record: T1 Start
o Log Record: T1 Write(X, 10) (T1 updates X to 10)
2. Transaction T2 starts and modifies Data Item Y:
o Log Record: T2 Start
o Log Record: T2 Write(Y, 20) (T2 updates Y to 20)
3. Transaction T1 commits:
o Log Record: T1 Commit
4. System crashes before Transaction T2 commits.
After the Crash:
Step 1: Undo:
o Since Transaction T2 did not commit, we need to
undo its changes. This is achieved by rolling back
the update made to Data Item Y.
Step 2: Redo:
o Transaction T1 committed, so the system redos the
change made by T1, ensuring that Data Item X has
the value 10.
Advantages of Log-Based Recovery:
1. Durability and Consistency: By using the log to track all
transactions, the system ensures that changes made by
committed transactions are not lost, and uncommitted
changes can be undone.
2. Crash Recovery: It provides an efficient method for
recovery after system failures.
3. Minimal Impact on Performance: Logging only records
changes, making it less resource-intensive than other
recovery methods.
Disadvantages of Log-Based Recovery:
1. Log Size: The log can grow large, especially with long
transactions and a high volume of operations.
2. Complexity: The recovery process, including UNDO and
REDO, can become complex in systems with many
transactions.
Summary Table:
Type of Log
Description
Record
BEGIN Marks the start of a transaction.
WRITE Records updates with old and new values.
Indicates successful transaction
COMMIT
completion.
ROLLBACK Indicates transaction failure and undoing.
13. Construct a B+Tree for the following list of elements: 1, 4,
7, 10, 17, 21, 31, 25, 19, 20, 28, 42. Assume the initial tree is
empty and a node can hold maximum 3 key values.
13A) Here’s how to construct a B+ Tree for the list of
elements 1, 4, 7, 10, 17, 21, 31, 25, 19, 20, 28, 42. Assume
the tree initially starts empty, and each node can hold a
maximum of 3 keys. This means each internal node has 4
child pointers, and the leaf nodes can store up to 3 elements.
Step-by-Step Construction of the B+ Tree:
Step 1: Insert 1
The tree is empty, so we insert 1 into the first leaf node.
less
Leaf Node: [1]
Step 2: Insert 4
Insert 4 into the leaf node. The node now holds [1, 4].
less
Leaf Node: [1, 4]
Step 3: Insert 7
Insert 7 into the leaf node. The node now holds [1, 4, 7].
less
Leaf Node: [1, 4, 7]
Step 4: Insert 10
The current leaf node is full (it can hold only 3 keys), so
we split it.
The middle key 4 is pushed up to become the new root,
and we create two new leaf nodes: one with [1] and the
other with [7, 10].
Root: [4]
Child 1: [1]
Child 2: [7, 10]
Step 5: Insert 17
Insert 17 into the second leaf node [7, 10], resulting in
[7, 10, 17].
Root: [4]
Child 1: [1]
Child 2: [7, 10, 17]
Step 6: Insert 21
Insert 21 into the second leaf node [7, 10, 17]. The node
is full, so we split it.
The middle key 10 is pushed up to the root, creating a
new child node with [17, 21].
Root: [4, 10]
Child 1: [1]
Child 2: [7]
Child 3: [17, 21]
Step 7: Insert 31
Insert 31 into the third leaf node [17, 21], resulting in
[17, 21, 31].
Root: [4, 10]
Child 1: [1]
Child 2: [7]
Child 3: [17, 21, 31]
Step 8: Insert 25
Insert 25 into the third leaf node [17, 21, 31]. The node
is full, so we split it.
The middle key 21 is pushed up to the root, creating a
new child node with [25, 31].
Root: [4, 10, 21]
Child 1: [1]
Child 2: [7]
Child 3: [17]
Child 4: [25, 31]
Step 9: Insert 19
Insert 19 into the third leaf node [17], resulting in [17,
19].
Root: [4, 10, 21]
Child 1: [1]
Child 2: [7]
Child 3: [17, 19]
Child 4: [25, 31]
Step 10: Insert 20
Insert 20 into the third leaf node [17, 19], resulting in
[17, 19, 20].
Root: [4, 10, 21]
Child 1: [1]
Child 2: [7]
Child 3: [17, 19, 20]
Child 4: [25, 31]
Step 11: Insert 28
Insert 28 into the fourth leaf node [25, 31], resulting in
[25, 28, 31].
Root: [4, 10, 21]
Child 1: [1]
Child 2: [7]
Child 3: [17, 19, 20]
Child 4: [25, 28, 31]
Step 12: Insert 42
Insert 42 into the fourth leaf node [25, 28, 31]. The node
is full, so we split it.
The middle key 28 is pushed up to the root, creating a
new child node with [31, 42].
Root: [4, 10, 21, 28]
Child 1: [1]
Child 2: [7]
Child 3: [17, 19, 20]
Child 4: [25]
Child 5: [31, 42]
Final B+ Tree:
Root: [4, 10, 21, 28]
Child 1: [1]
Child 2: [7]
Child 3: [17, 19, 20]
Child 4: [25]
Child 5: [31, 42]
Explanation:
The B+ Tree maintains sorted order of elements.
Each internal node (except the root) can have up to 4
children and 3 keys.
The leaf nodes contain the actual data entries, and the
internal nodes are used for indexing.
The root can have fewer than 4 children if it is not full,
but each other node has to follow the rule of holding up
to 3 keys and 4 child pointers.
All leaf nodes are connected in a linked list for fast
sequential access, which is a characteristic of the B+
Tree.
This B+ Tree ensures efficient searching, insertion, and
deletion of elements.
14. Explain about Hash based indexing techniques with an
example.
14A) Hash-based indexing is a technique used in databases to
enable efficient data retrieval. It relies on a hash function to
compute an index into an array of buckets or slots, from
which the desired value can be found. This method is widely
used for direct access to records based on a key value.
How Hash-Based Indexing Works:
1. Hash Function:
o A hash function is applied to a key to compute a
hash value.
o The hash value determines the index (or position)
in an array or hash table where the corresponding
record will be stored.
o The idea is to distribute the records uniformly
across the array to minimize collisions (situations
where different keys generate the same hash
value).
2. Hash Table:
o A hash table is a data structure used to store the
hashed records.
o Each entry in the table is a bucket, which can hold
one or more records.
o When a record is inserted, the hash function is
applied to its key to determine the appropriate
bucket.
3. Collision Handling:
o Collisions occur when two keys produce the same
hash value. There are several strategies to handle
collisions:
Chaining: In this method, each bucket is a
linked list. When multiple keys map to the
same bucket, they are added to the list at that
bucket.
Open Addressing: In this method, if a collision
occurs, the algorithm searches for the next
available slot in the array (using techniques
like linear probing, quadratic probing, or
double hashing).
4. Search Operation:
o To retrieve a record, the key is hashed using the
same hash function, which gives the index of the
bucket in the hash table.
o The record is then retrieved from that bucket (or
the linked list in case of chaining).
Example of Hash-Based Indexing:
Suppose we have the following set of records with keys:
(1, "John")
(4, "Alice")
(7, "Bob")
(10, "Eve")
(17, "Charlie")
The hash function used is:
Hash Value=Key%5\text{Hash Value} = \text{Key} \%
5Hash Value=Key%5
Where % represents the modulo operation.
Let's construct the hash table using this hash function:
Step-by-Step Process:
1. Insert (1, "John"):
o Hash value: 1%5=11 \% 5 = 11%5=1
o Insert John in bucket 1.
2. Insert (4, "Alice"):
o Hash value: 4%5=44 \% 5 = 44%5=4
o Insert Alice in bucket 4.
3. Insert (7, "Bob"):
o Hash value: 7%5=27 \% 5 = 27%5=2
o Insert Bob in bucket 2.
4. Insert (10, "Eve"):
o Hash value: 10%5=010 \% 5 = 010%5=0
o Insert Eve in bucket 0.
5. Insert (17, "Charlie"):
o Hash value: 17%5=217 \% 5 = 217%5=2
o Since bucket 2 is already occupied by Bob, we use
chaining to add Charlie to a linked list in bucket 2.
Hash Table After Insertion:
Bucket Records
0 (10, "Eve")
1 (1, "John")
2 (7, "Bob") → (17, "Charlie")
3 Empty
4 (4, "Alice")
Search Operation Example:
Suppose we want to search for key = 7:
1. Apply the hash function: 7%5=27 \% 5 = 27%5=2.
2. Check bucket 2: The records in bucket 2 are (7, "Bob") →
(17, "Charlie").
3. Since the key 7 matches the first record in the bucket,
we return Bob.
Advantages of Hash-Based Indexing:
1. Efficiency:
o Hashing provides constant time complexity for
search, insert, and delete operations (on average,
O(1)), making it highly efficient for direct access.
2. Simplicity:
o The process is simple to implement, and the
operations are fast.
3. Scalability:
o Hash-based indexing can scale well, especially
when handling large datasets with proper collision
resolution techniques.
Disadvantages of Hash-Based Indexing:
1. Collisions:
o Collisions can degrade the performance, especially
if many records map to the same bucket.
2. No Range Queries:
o Hashing is not suitable for range-based queries
(e.g., finding all records between keys 5 and 10)
because the keys are not stored in any specific
order.
3. Fixed Size:
o The hash table size is typically fixed, and resizing
the table can be expensive.
4. Uneven Distribution:
o If the hash function doesn't distribute keys
uniformly, some buckets may have many records
while others may have very few, leading to
inefficiency.