Aries Recovery Algorithm
Aries Recovery Algorithm
Overheads:
Space and I/O (Seq and random) during Normal processing and recovery
Failure Modes:
transaction/process, system and media/device
Recovery independence
can recover some pages separately from others
short duration
no deadlock detection direct addressing (unlike hash table for locks)
often using atomic instructions
latch acquisition/release is much faster than lock
acquisition/release
Lock requests
conditional, instant duration, manual duration, commit duration
Buffer Manager
Fix, unfix and fix_new (allocate and fix new pg)
Aries uses steal policy - uncommitted writes may be
on disk
dirty pages written out in a continuous manner to disk
Some Notation
rollback
one CLR for each normal log record which is undone
Normal Processing
Transactions add log records
Recovery Phases
Analysis pass
forward from last checkpoint
Redo pass
forward from RedoLSN, which is determined in analysis pass
Undo pass
backwards from end of log, undoing incomplete transactions
Analysis Pass
RedoLSN = min(LSNs of dirty pages recorded
in checkpoint)
if no dirty pages, RedoLSN = LSN of checkpoint pages dirtied later will have higher LSNs)
Redo Pass
Undo Pass
Single scan backwards in log, undoing actions of
``loser'' transactions
for each transaction, when a log record is found, use prev_LSN fields to find next record to be undone can skip parts of the log with no records from loser transactions don't perform any undo for CLRs (note: UndoNxtLSN for CLR indicates next record to be undone, can skip intermediate records of that transactions)
Transaction Table
During recovery:
initialized during analysis pass from most recent checkpoint modified during analysis as log records are encountered, and during undo
generated)
if page is not dirty, store L as RecLSN of the page in dirty pages
table
When page is flushed to disk, delete from dirty page table dirty page table written out during checkpoint (Thus RecLSN is LSN of earliest log record whose effect is not reflected in page on disk)
Updates
Page latch held in X mode until log record is logged
so updates on same page are logged in correct order
page latch held in S mode during reads since records may get moved around by update latch required even with page locking if dirty reads are allowed
Updates (Contd.)
Protocol to avoid deadlock involving latches
deadlocks involving latches and locks were a major problem in System R and SQL/DS transaction may hold at most two latches at-a-time must never wait for lock while holding latch
if both are needed (e.g. Record found after latching page): release latch before requesting lock and then reacquire latch (and
recheck conditions in case page has changed inbetween). Optimization: conditional lock request
Savepoints
Simply notes LSN of last record written by transaction
them
deadlocks can be resolved by rollback to appropriate
Rollback
Scan backwards from last log record of txn
(last log record of txn = transTable[TransID].UndoNxtLSN
More on Rollback
Extra logging during rollback is bounded
make sure enough log space is available for rollback in case of system crash, else BIG problem
Transaction Termination
Checkpoints
begin_chkpt record is written first
Checkpoint (contd)
Pages need not be flushed during checkpoint
are flushed on a continuous basis
Transactions may write log records during checkpoint Can copy dirty_page table fuzzily (hold latch, copy
Restart Processing
Finds checkpoint begin using master record
Do restart_analysis
Do restart_redo
... some details of dirty page table here
dirty page table: pages that were potentially dirty at time of crash/shutdown
Redo Pass
Scan forward from RedoLSN
If log record is an update log record, AND is in dirty_page_table AND LogRec.LSN >= RecLSN of the page in dirty_page_table
then if pageLSN < LogRec.LSN then perform redo; else just update RecLSN in dirty_page_table
Optimizations of redo
Dirty page table info can be used to pre-read pages during redo
Out of order redo is also possible to reduce disk seeks
Undo Pass
Rolls back loser transaction in reverse order in single
scan of log
stops when all losers have been fully undone processing of log records is exactly as in single transaction rollback
4'
3'
6'
5' 2'
1'
Undo Optimizations
Parallel undo
each txn undone separately, in parallel with others can even generate CLRs and apply them separately , in parallel for a single transaction
pages
Transaction Recovery
Loser transactions can be restarted in some cases
e.g. Mini batch transactions which are part of a larger transaction
Media Recovery
For archival dump
can dump pages directly from disk (bypass buffer, no latching needed) or via buffer, as desired
this is a fuzzy dump, not transaction consistent
begin_chkpt location of most recent checkpoint completed before archival dump starts is noted
called image copy checkpoint redoLSN computed for this checkpoint and noted as media
corruption
e.g. Application program with direct access to buffer crashes before writing undo log record
recovery mechanism
used also for other operations like creating a file (which can then be used by other txns, before the creater commits) updates of nested top action commit early and should not be undone
during undo