0% found this document useful (0 votes)
3 views

Database Recovery Process

The database recovery process is crucial for maintaining data integrity after system failures, involving the restoration of the database to its last consistent state. Key components include the system log, which records all transactions, and recovery strategies like Undo and Redo to manage uncommitted and committed transactions. Additional concepts such as caching, Write-Ahead Logging, and checkpointing further enhance the efficiency and reliability of database recovery mechanisms.

Uploaded by

sakthivel2310758
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Database Recovery Process

The database recovery process is crucial for maintaining data integrity after system failures, involving the restoration of the database to its last consistent state. Key components include the system log, which records all transactions, and recovery strategies like Undo and Redo to manage uncommitted and committed transactions. Additional concepts such as caching, Write-Ahead Logging, and checkpointing further enhance the efficiency and reliability of database recovery mechanisms.

Uploaded by

sakthivel2310758
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Database Recovery Process

The database recovery process is essential for ensuring data integrity and
consistency in the event of a system failure. It involves restoring the database to the
most recent consistent state that existed before the failure. This document provides
an elaborate explanation of the core concepts, supported by examples, and
concludes with a summary.

1. The System Log:


The system log is a sequential, append-only file that meticulously records every
transaction and its associated database modifications. It acts as a historical record,
enabling the database to recover from failures.

1.1 Log Structure

Each log record typically contains the following information:


●​ Transaction ID (TID): A unique identifier for each transaction. This allows the
system to differentiate between concurrent transactions.
●​ Log Record Type: Indicates the specific database operation. Common types
include:
○​ START: Marks the beginning of a transaction.
○​ WRITE: Records a data modification (insert, update, or delete).
○​ COMMIT: Indicates the successful completion of a transaction.
○​ ABORT: Indicates the unsuccessful termination of a transaction.
●​ Data Item ID (X): Identifies the specific data item (e.g., a row, a page) that was
affected by the operation.
●​ Before Image (BFIM): The value of the data item before the modification.
●​ After Image (AFIM): The value of the data item after the modification.
1.2 Example of a System Log

Consider a simple banking database with a table Accounts(account_number, balance).


Let's illustrate a transaction T1 that transfers $100 from account A to account B:

[START, T1]​
[WRITE, T1, A, 1000, 900] // Account A: Before $1000, After $900​
[WRITE, T1, B, 500, 600] // Account B: Before $500, After $600​
[COMMIT, T1]​
In this example:
●​ The log begins with [START, T1].
●​ The first WRITE record shows that T1 changed the balance of account A from
$1000 to $900.
●​ The second WRITE record shows that T1 increased the balance of account B from
$500 to $600.
●​ Finally, [COMMIT, T1] indicates that the transaction was successfully completed.

1.3 Importance of Stable Storage

The system log is stored on stable storage, a non-volatile storage medium (e.g., a
redundant disk array) that can withstand system crashes. This ensures that the log
information survives failures, enabling the database to recover to a consistent state.

2. Recovery Strategies: Undo and Redo in Detail


Recovery strategies dictate how the database system uses the log to recover from
failures. The two fundamental operations are Undo and Redo.
●​ Undo: This operation is applied to transactions that were not successfully
completed (i.e., uncommitted transactions). It restores the affected data items to
their state before the transaction began, effectively undoing any partial changes.
●​ Redo: This operation is applied to transactions that were successfully completed
(i.e., committed transactions). It reapplies the changes made by these
transactions, ensuring that their effects are reflected in the database even if a
failure occurred before those changes were written to disk.
2.1 Undo Example

Suppose a system failure occurs before transaction T1 in the previous example can
commit. The log would look like this:

[START, T1]​
[WRITE, T1, A, 1000, 900]​
[WRITE, T1, B, 500, 600]​
// System Failure occurs here​

In this case, the recovery process would:


1.​ Identify T1 as an uncommitted transaction (no COMMIT record).
2.​ Undo the WRITE operations of T1:
○​ Set the balance of account A back to $1000.
○​ Set the balance of account B back to $500.

This ensures that the database is restored to the state before T1 began, effectively
discarding its partial updates.

2.2 Redo Example

Now, suppose the system failure occurs after transaction T1 commits, but before the
changes are written from the database buffer to the disk. The log would be:

[START, T1]​
[WRITE, T1, A, 1000, 900]​
[WRITE, T1, B, 500, 600]​
[COMMIT, T1]​
// System Failure occurs here​

In this scenario, the recovery process would:


1.​ Identify T1 as a committed transaction (there is a COMMIT record).
2.​ Redo the WRITE operations of T1:
○​ Set the balance of account A to $900.
○​ Set the balance of account B to $600.

This ensures that the changes made by the committed transaction T1 are applied to
the database, even though they were not written to disk before the failure.

3. Caching and the DBMS Cache: A Deeper Dive


Database systems employ caching to improve performance by storing frequently
accessed data in faster memory.
●​ Operating System Caching: The operating system manages a buffer cache that
stores disk pages in main memory. When a process requests data, the OS first
checks if it's in the cache. If not (a cache miss), the OS reads the data from disk
into the cache and then provides it to the process.
●​ DBMS Cache: In addition to the OS cache, Database Management Systems often
implement their own cache, known as the DBMS cache or buffer pool. This allows
the DBMS to have more control over which data is cached and how it's managed,
enabling database-specific optimizations.
3.1 DBMS Cache Components

The DBMS cache consists of:


●​ Memory Buffers: A set of memory locations that hold copies of database pages.
●​ Cache Directory (or Cache Table): A data structure that tracks which database
pages are currently stored in the buffers. For each buffer, the directory typically
stores:
○​ Buffer Address: The memory address of the buffer.
○​ Disk Block Address: The address of the corresponding database page on
disk.
○​ Dirty Bit: A flag indicating whether the buffer has been modified since it was
read from disk.
■​ 0: The buffer has not been modified.
■​ 1: The buffer has been modified.
○​ Pin/Unpin Bit: A flag indicating whether the buffer is currently in use by a
transaction.
■​ 0: The buffer is not currently in use and can be replaced.
■​ 1: The buffer is currently in use and should not be replaced.

3.2 Cache Management

When a transaction needs to access a database page:


1.​ The DBMS first checks the cache directory to see if the page is in the DBMS
cache.
2.​ If the page is in the cache (a cache hit), the DBMS directly accesses the data in
the memory buffer.
3.​ If the page is not in the cache (a cache miss), the DBMS:
○​ Fetches the page from disk into a free buffer in the DBMS cache.
○​ Updates the cache directory to reflect the presence of the new page.
○​ If no buffers are free, the DBMS must choose a buffer to replace (using a
replacement policy like LRU - Least Recently Used). If the buffer to be
replaced has its dirty bit set, its contents must first be written back to disk.
4.​ The DBMS then provides the data in the buffer to the transaction.

3.3 Example of Cache Management

Consider a DBMS cache with 3 buffers. Let's trace the following operations:
1.​ Transaction T1 reads page A.
2.​ Transaction T2 reads page B.
3.​ Transaction T1 modifies page A.
4.​ Transaction T3 reads page C.
5.​ Transaction T2 commits.
Here's how the DBMS cache might evolve:
●​ Initial State: Cache is empty.
●​ T1 reads A:
○​ Page A is loaded into buffer 1.
○​ Cache directory: { (1, A, 0, 1) } (Buffer 1, Page A, Not Dirty, Pinned)
●​ T2 reads B:
○​ Page B is loaded into buffer 2.
○​ Cache directory: { (1, A, 0, 1), (2, B, 0, 1) }
●​ T1 modifies A:
○​ Page A in buffer 1 is modified.
○​ Cache directory: { (1, A, 1, 1), (2, B, 0, 1) } (Dirty bit for A is set to 1)
●​ T3 reads C:
○​ Page C is loaded into buffer 3 (assuming LRU, and B is the least recently used)
○​ Cache directory: { (1, A, 1, 1), (3, C, 0, 1) } (Page B is replaced)
●​ T2 commits:
○​ Page B is unpinned, but not dirty.
○​ Cache directory: { (1, A, 1, 1), (3, C, 0, 0) }

4. Write-Ahead Logging (WAL) in Detail


Write-Ahead Logging (WAL) is a fundamental protocol that ensures database
recoverability. It mandates that log records describing database changes must be
written to stable storage before those changes are applied to the database on disk.

4.1 The WAL Protocol

The WAL protocol consists of the following rules:


1.​ Log Before Write: Before a transaction changes any data on disk, the
corresponding log record (containing the BFIM and/or AFIM) must be written to
the stable log on disk.
2.​ Commit Only After Log Write: A transaction cannot be considered committed
until all its log records, including the COMMIT record, have been written to the
stable log on disk.
4.2 Why WAL?

WAL ensures that:


●​ If a system crash occurs before a transaction commits, the database on disk will
not contain any of its changes. The recovery process can then safely ignore this
transaction.
●​ If a system crash occurs after a transaction commits, the log will contain all the
information needed to redo the transaction's changes, even if those changes
were not yet written to the database on disk.
4.3 Example of WAL

Let's revisit the transaction T1 that transfers $100 from account A to account B, and
illustrate how WAL works:

// Initial state: A = $1000, B = $500​



// T1 starts​
[START, T1] <-- Log record written to stable log​

// T1 debits $100 from A​
[WRITE, T1, A, 1000, 900] <-- Log record written to stable log​
// Database buffer for A is modified: A = $900 (not yet on disk)​

// T1 credits $100 to B​
[WRITE, T1, B, 500, 600] <-- Log record written to stable log​
// Database buffer for B is modified: B = $600 (not yet on disk)​

// T1 commits​
[COMMIT, T1] <-- Log record written to stable log​
// Database buffers for A and B can now be written to disk​

In this example, the log records for START, WRITE (for A), WRITE (for B), and COMMIT
are all written to the stable log before the actual database modifications are written to
disk. This ensures that the log contains a complete record of the transaction's actions,
allowing for proper recovery in case of a failure.

5. Flushing Strategies: In-Place Update vs. Shadow Update


When a modified buffer in the DBMS cache needs to be written back to disk, the
database system employs a flushing strategy.
●​ In-Place Update: The modified data overwrites the existing data at the same
location on disk. This is the most common approach.
○​ Advantages: Simpler to implement, requires less disk space.
○​ Disadvantages: Requires a detailed log to support undo and redo operations,
as the original data is overwritten.
●​ Shadow Update: The modified data is written to a different location on disk,
creating a new version of the data. The original data remains unchanged. After
the transaction commits, the system updates a pointer to point to the new data
location.
○​ Advantages: Simplifies recovery, especially for undo operations, as the
original data is preserved.
○​ Disadvantages: More complex to manage disk space (requires garbage
collection to reclaim old versions), can be more I/O intensive.
5.1 Example of In-Place Update

Using the account transfer example, if we use in-place update, the actual disk
location for account A's balance will be overwritten from 1000 to 900.

5.2 Example of Shadow Update


With shadow update, the updated balance of 900 for account A would be written to a new
disk location. Only after the transaction commits, the pointer to account A's balance is
updated to point to this new location. The old location with the value 1000 is preserved until it
is garbage collected.
6. Steal/No-Steal and Force/No-Force: Detailed Explanation
These policies determine when modified pages in the DBMS cache are written to disk,
relative to transaction commit.
●​ Steal:
○​ Definition: The DBMS is allowed to write a modified buffer to disk before the
transaction that modified it commits.
○​ Advantage: Allows the DBMS to free up buffer space, potentially improving
performance, especially when transactions modify many pages.
○​ Disadvantage: Requires the ability to undo the changes made by
uncommitted transactions (i.e., UNDO).
●​ No-Steal:
○​ Definition: The DBMS is not allowed to write a modified buffer to disk before
the transaction that modified it commits.
○​ Advantage: Simplifies undo operations, as any changes made by an
uncommitted transaction reside only in memory.
○​ Disadvantage: Can limit buffer availability if transactions hold modified
pages for a long time, potentially degrading performance.
●​ Force:
○​ Definition: All modified buffers of a transaction are written to disk at the time
the transaction commits.
○​ Advantage: Simplifies redo operations, as all changes made by committed
transactions are guaranteed to be on disk.
○​ Disadvantage: Can introduce significant I/O overhead at commit time,
potentially slowing down transaction processing.
●​ No-Force:
○​ Definition: Modified buffers of a committed transaction are written to disk
after the transaction commits, potentially much later.
○​ Advantage: Improves commit performance, as the transaction does not have
to wait for the I/O operations to complete.
○​ Disadvantage: Requires the ability to redo the changes made by committed
transactions that were not yet written to disk at the time of a failure (i.e.,
REDO).
6.1 Common Approach: Steal/No-Force

Most modern database systems employ a steal/no-force approach. This combination


offers good performance by allowing flexible buffer management (steal) and
minimizing the I/O overhead at commit time (no-force). However, it necessitates the
use of both UNDO and REDO recovery mechanisms.

7. Checkpointing: Optimizing Recovery with Examples


As the transaction log grows, the time it takes to recover from a failure increases.
Checkpointing is a technique that reduces recovery time by periodically saving the
current state of the database and the transaction log.

7.1 Checkpoint Process

A checkpoint operation involves the following steps:


1.​ Write Dirty Buffers to Disk: All modified buffers in the DBMS cache are written
to the database on disk. This ensures that the database on disk is consistent up
to the point of the checkpoint for committed transactions.
2.​ Write Checkpoint Record to Log: A special record [CHECKPOINT, T1, T2, ..., Tn]
is written to the log. This record contains:
○​ The identifier CHECKPOINT.
○​ A list of all transactions that are active at the time the checkpoint is taken (T1,
T2, ..., Tn).
7.2 Recovery Using Checkpoints

When a system failure occurs, the recovery process examines the log to find the most
recent [CHECKPOINT] record. This checkpoint record indicates a point in the log
before which all transactions were either committed and their changes written to disk,
or had not yet started.

The recovery process then proceeds as follows:


1.​ Transactions before the checkpoint: Any transaction that committed before
the checkpoint is guaranteed to have its changes written to disk. Therefore, no
UNDO or REDO is required for these transactions.
2.​ Active transactions at checkpoint: Transactions that were active at the time of
the checkpoint, and did not commit before the system failure, need to be
undone.
3.​ Transactions after the checkpoint: Transactions that started after the
checkpoint, and committed before the system failure, may need to be redone, as
their changes might not have been written to disk.
7.3 Checkpointing Example

[START, T1]​
[WRITE, T1, A, 1000, 900]​
[COMMIT, T1]​
[START, T2]​
[WRITE, T2, B, 500, 600]​
[START, T3]​
[WRITE, T3, C, 200, 300]​
[CHECKPOINT, T2, T3] // T2 and T3 are active​
[START, T4]​
[WRITE, T4, D, 700, 800]​
[COMMIT, T2]​
[COMMIT, T4]​
// System Failure​

In this example:
●​ T1 committed before the checkpoint: no action needed.
●​ T2 was active at the checkpoint and committed: need to REDO T2.
●​ T3 was active at the checkpoint and did not commit: need to UNDO T3.
●​ T4 started after the checkpoint and committed: need to REDO T4.
8. Commit Point: Ensuring Atomicity
A transaction reaches its commit point when it has successfully executed all its
operations and the log record containing the COMMIT message has been written to
stable storage. This signifies that the transaction has completed its execution, and its
effects should be durably recorded in the database.

9. Non-Catastrophic Failure Recovery Policies: Deferred vs. Immediate Update


The two main policies for recovery from non-catastrophic transaction failures are
deferred update and immediate update.
●​ Deferred Update (NO-UNDO/REDO):
○​ Update Timing: Database modifications are written to disk only after a
transaction commits.
○​ Log Content: Log records primarily contain redo information (AFIM).
○​ Recovery:
■​ Transaction Failure: If a transaction fails before committing, its changes
in memory are discarded; no UNDO is needed.
■​ System Failure: Only committed transactions need to be redone from the
log.
○​ Advantages: Simpler recovery (no UNDO).
○​ Disadvantages: Lower concurrency, as transactions must hold locks until
commit.
●​ Immediate Update (UNDO/REDO):
○​ Update Timing: Database modifications are written to disk before a
transaction commits.
○​ Log Content: Log records contain both undo (BFIM) and redo (AFIM)
information.
○​ Recovery:
■​ Transaction Failure: Failed transactions need to be undone using the log.
■​ System Failure: Committed transactions may need to be redone if their
changes haven't been written to disk; uncommitted transactions need to
be undone.
○​ Advantages: Higher concurrency.
○​ Disadvantages: More complex recovery (requires both UNDO and REDO).

10. UNDO and REDO Procedures: Detailed


●​ UNDO Procedure:
1.​ The recovery manager scans the log backwards.
2.​ For each log record [WRITE, T, X, BFIM, AFIM] of an uncommitted transaction
T, it sets the value of data item X to BFIM.
3.​ This restores the database to the state before the transaction began.
●​ REDO Procedure:
1.​ The recovery manager scans the log forward.
2.​ For each log record [WRITE, T, X, BFIM, AFIM] (or [WRITE, T, X, AFIM] in a
NO-UNDO/REDO system) of a committed transaction T, it sets the value of
data item X to AFIM.
3.​ This ensures that the effects of committed transactions are applied to the
database.
Summary
Database recovery is a critical process that ensures data integrity and consistency
following a system failure. The system log plays a central role, recording all database
changes. Recovery strategies employ UNDO and REDO operations to restore the
database to a consistent state. Techniques like caching, WAL, and checkpointing
optimize the recovery process. Different update policies (deferred and immediate)
impact the complexity and performance of recovery. A thorough understanding of
these concepts is essential for database administrators and developers to build
robust and reliable database systems.

You might also like