Database Recovery Process
Database Recovery Process
The database recovery process is essential for ensuring data integrity and
consistency in the event of a system failure. It involves restoring the database to the
most recent consistent state that existed before the failure. This document provides
an elaborate explanation of the core concepts, supported by examples, and
concludes with a summary.
[START, T1]
[WRITE, T1, A, 1000, 900] // Account A: Before $1000, After $900
[WRITE, T1, B, 500, 600] // Account B: Before $500, After $600
[COMMIT, T1]
In this example:
● The log begins with [START, T1].
● The first WRITE record shows that T1 changed the balance of account A from
$1000 to $900.
● The second WRITE record shows that T1 increased the balance of account B from
$500 to $600.
● Finally, [COMMIT, T1] indicates that the transaction was successfully completed.
The system log is stored on stable storage, a non-volatile storage medium (e.g., a
redundant disk array) that can withstand system crashes. This ensures that the log
information survives failures, enabling the database to recover to a consistent state.
Suppose a system failure occurs before transaction T1 in the previous example can
commit. The log would look like this:
[START, T1]
[WRITE, T1, A, 1000, 900]
[WRITE, T1, B, 500, 600]
// System Failure occurs here
This ensures that the database is restored to the state before T1 began, effectively
discarding its partial updates.
Now, suppose the system failure occurs after transaction T1 commits, but before the
changes are written from the database buffer to the disk. The log would be:
[START, T1]
[WRITE, T1, A, 1000, 900]
[WRITE, T1, B, 500, 600]
[COMMIT, T1]
// System Failure occurs here
This ensures that the changes made by the committed transaction T1 are applied to
the database, even though they were not written to disk before the failure.
Consider a DBMS cache with 3 buffers. Let's trace the following operations:
1. Transaction T1 reads page A.
2. Transaction T2 reads page B.
3. Transaction T1 modifies page A.
4. Transaction T3 reads page C.
5. Transaction T2 commits.
Here's how the DBMS cache might evolve:
● Initial State: Cache is empty.
● T1 reads A:
○ Page A is loaded into buffer 1.
○ Cache directory: { (1, A, 0, 1) } (Buffer 1, Page A, Not Dirty, Pinned)
● T2 reads B:
○ Page B is loaded into buffer 2.
○ Cache directory: { (1, A, 0, 1), (2, B, 0, 1) }
● T1 modifies A:
○ Page A in buffer 1 is modified.
○ Cache directory: { (1, A, 1, 1), (2, B, 0, 1) } (Dirty bit for A is set to 1)
● T3 reads C:
○ Page C is loaded into buffer 3 (assuming LRU, and B is the least recently used)
○ Cache directory: { (1, A, 1, 1), (3, C, 0, 1) } (Page B is replaced)
● T2 commits:
○ Page B is unpinned, but not dirty.
○ Cache directory: { (1, A, 1, 1), (3, C, 0, 0) }
Let's revisit the transaction T1 that transfers $100 from account A to account B, and
illustrate how WAL works:
In this example, the log records for START, WRITE (for A), WRITE (for B), and COMMIT
are all written to the stable log before the actual database modifications are written to
disk. This ensures that the log contains a complete record of the transaction's actions,
allowing for proper recovery in case of a failure.
Using the account transfer example, if we use in-place update, the actual disk
location for account A's balance will be overwritten from 1000 to 900.
When a system failure occurs, the recovery process examines the log to find the most
recent [CHECKPOINT] record. This checkpoint record indicates a point in the log
before which all transactions were either committed and their changes written to disk,
or had not yet started.
[START, T1]
[WRITE, T1, A, 1000, 900]
[COMMIT, T1]
[START, T2]
[WRITE, T2, B, 500, 600]
[START, T3]
[WRITE, T3, C, 200, 300]
[CHECKPOINT, T2, T3] // T2 and T3 are active
[START, T4]
[WRITE, T4, D, 700, 800]
[COMMIT, T2]
[COMMIT, T4]
// System Failure
In this example:
● T1 committed before the checkpoint: no action needed.
● T2 was active at the checkpoint and committed: need to REDO T2.
● T3 was active at the checkpoint and did not commit: need to UNDO T3.
● T4 started after the checkpoint and committed: need to REDO T4.
8. Commit Point: Ensuring Atomicity
A transaction reaches its commit point when it has successfully executed all its
operations and the log record containing the COMMIT message has been written to
stable storage. This signifies that the transaction has completed its execution, and its
effects should be durably recorded in the database.