Troubleshooting techniques @ Dennis Shasha and Philippe Bonnet, 2013.
© Dennis Shasha, Philippe Bonnet 2001 Log Tuning.
-
Upload
rudolf-atkinson -
Category
Documents
-
view
214 -
download
1
Transcript of © Dennis Shasha, Philippe Bonnet 2001 Log Tuning.
© Dennis Shasha, Philippe Bonnet 2001
Log Tuning
© Dennis Shasha, Philippe Bonnet 2001
DB Architecture
Hardware[Processor(s), Disk(s), Memory]
Operating System
Concurrency Control Recovery
Storage Subsystem
Buffer Manager
Review: The ACID properties
• Question: which properties of ACID is related to crash recovery?– Atomicity & Durability (and also used for
Consistency-related rollbacks)
© Dennis Shasha, Philippe Bonnet 2001
Atomicity and Durability
• Every transaction either commits or aborts. It cannot change its mind
• Even in the face of failures:– Effects of committed
transactions should be permanent;
– Effects of aborted transactions should leave no trace.
ACTIVE(running, waiting)
ABORTED
COMMITTEDCOMMIT
ROLLBACK
ØBEGINTRANS
© Dennis Shasha, Philippe Bonnet 2001
Outages
• Environment– Fire in the machine room
(Credit Lyonnais, 1996)
• Operations– Problem during regular
system administration, configuration and operation.
• Maintenance– Problem during system
repair and maintenance
• Hardware– Fault in the physical
devices: CPU, RAM, disks, network.
• Software– 99% are Heisenbugs:
transient software error related to timing or overload. (software failures that occurs only once but cause system to stop)
– Heisenbugs do not appear when the system is re-started.
© Dennis Shasha, Philippe Bonnet 2001
Outages
• A fault tolerant system must provision for all causes of outages (see case studies)
• Software is the problem– Hardware failures cause
under 10% of outages– Heisenbugs stop the
system without damaging the data.
• Database systems protect integrity against single hardware failure and some software failures.
SoftwareHardwareMaintenanceOperationsEnvironmentUnknown
From J.Gray and A.ReutersTransaction Processing: Conceptsand Techniques
Motivation
• Atomicity:– Transactions may
abort (“Rollback”).• Durability:
– What if DBMS stops running? (Causes?)
• Desired Behavior aftersystem restarts:– T1, T2 & T3 should be
durable.– T4 & T5 should be
aborted (effects not seen).
© Dennis Shasha, Philippe Bonnet 2001
© Dennis Shasha, Philippe Bonnet 2001
A simple non-logging scheme :Assumptions
• Concurrency control is in effect.
• Updates are happening “in place”.– i.e. data is overwritten on (deleted from) the
disk.
• A simple scheme to guarantee Atomicity & Durability?
Buffer Mgmt Plays a Key Role
• Force policy – make sure that every update is on disk before commit.– Provides durability without REDO logging.– But, can cause poor performance.
• No Steal policy – no UNCOMMITED updates are written to disk.– Useful for ensuring atomicity without UNDO
logging.– But can cause poor performance.
Handling the Buffer Pool
• When a Xact submit, force every write to disk?– Poor response time.– But provides durability.
• When the buffer is full, can we Steal buffer-pool frames from uncommited Xacts?– If not, poor throughput.– If so, how can we ensure
atomicity?
© Dennis Shasha, Philippe Bonnet 2001
Force
No Force
No Steal Steal
Trivial
Desired
More on Steal and Force
• STEAL (why enforcing Atomicity is hard)– To steal frame F: Current page in F (say P) is written to
disk; some Xact holds lock on P.– What if the Xact with the lock on P aborts?– Must remember the old value of P at steal time (to support
UNDOing the write to page P).• NO FORCE (why enforcing Durability is hard)
– What if system crashes before a modified page iswritten to disk?
– Write as little as possible, in a convenient place, at commit time,to support REDOing modifications.
© Dennis Shasha, Philippe Bonnet 2001
© Dennis Shasha, Philippe Bonnet 2001
Logging
• To enable REDO/UNDO: record each update in a log.• Log: An ordered list of REDO/UNDO actions• What should a log record contain?
– a unique identifier – LSN (log sequence number)– Who makes the action? – XID (Transaction ID: )– Where does the action happen? – pageID, offset, length– What is changed? – old data, new data– and some control info (we will see later)
• Note: Only those log records written on disk could be used for recovery after crash!
• REDO and UNDO information in a log.– Sequential writes to log (put it on a separate disk).– Minimal info written to log, so multiple updates fit in a single log page.
Current database state = current state of data on disks + log
© Dennis Shasha, Philippe Bonnet 2001
Write-Ahead Logging (WAL)
The Write-Ahead Logging Protocol: Must force the log record for an update before the
corresponding data page gets to disk.Guarantees Atomicity
Must write all log records for a Xact before commit.
Guarantees Durability
ARIES algorithms, developed by C.Mohan at IBM Almaden in the early 90’shttp://www.almaden.ibm.com/u/mohan/ARIES_Impact.html
WAL & the Log
• Each log record has a unique Log Sequence Number (LSN).– LSNs always increasing.
• Each data page contains a pageLSN.– The LSN of the most recent log record for an update to that
page.• System keeps track of flushedLSN.
– The max LSN flushed so far.• WAL: Before a page is written,
– pageLSN ≤ flushedLSN
© Dennis Shasha, Philippe Bonnet 2001
Log Records
• Possible log record types:– Update– Commit– Abort– End (signifies end of commit or abort)
• Compensation Log Records (CLRs)– for UNDO actions
© Dennis Shasha, Philippe Bonnet 2001
Other Log-Related State
• Transaction Table:– One entry per active Xact.– Contains XID, status (running/commited/aborted), and lastLSN.
• Dirty Page Table:– Dirty page in the buffer: pages have been changed but not yet reflect on disk– One entry per dirty page in buffer pool.– Contains recLSN -- the LSN of the log record which first caused the page to be
dirty.
© Dennis Shasha, Philippe Bonnet 2001
summary