M5
M5
● Lock
○ Variable associated with a data item describing status for operations that can be applied
○ One lock for each item in the database
● Binary locks
○ Two states (values)
■ Locked (1)
● Item cannot be accessed
■ Unlocked (0)
● Item can be accessed when requested
Two-Phase Locking Techniques for Concurrency Control (cont’d.)
● Lock conversion
○ Transaction that already holds a lock allowed to convert the lock from one state to another
● Upgrading
○ Issue a read_lock operation then a write_lock operation
● Downgrading
○ Issue a read_lock operation after a write_lock operation
Guaranteeing Serializability by Two-Phase Locking
Slide
21- 10
Figure 5.3 Transactions that
do not obey two-phase
locking (a) Two transactions
T1 and T2 (b) Results of
possible serial schedules of
T1 and T2 (c) A
nonserializable schedule S
that uses locks
Guaranteeing Serializability by Two-Phase Locking
Slide
21- 12
Variations of Two-Phase Locking
● Basic 2PL
○ Technique described on previous slides
● Conservative (static) 2PL
○ Requires a transaction to lock all the items it accesses before the transaction begins
■ Predeclare read-set and write-set
○ Deadlock-free protocol
● Strict 2PL
○ Transaction does not release exclusive locks until after it commits or aborts
Variations of Two-Phase Locking (cont’d.)
● Rigorous 2PL
○ Transaction does not release any locks until after it commits or aborts
● Concurrency control subsystem responsible for generating read_lock and
write_lock requests
● Locking generally considered to have high overhead
Dealing with Deadlock and Starvation
● Deadlock
○ Occurs when each transaction T in a set is waiting for some item locked by some other
transaction T’
○ Both transactions stuck in a waiting queue
Figure 5.5 Illustrating the deadlock problem (a) A partial schedule of T1′ and T2′ that is
in a state of deadlock (b) A wait-for graph for the partial schedule in (a)
Dealing with Deadlock and Starvation (cont’d.)
● No waiting algorithm
○ If transaction unable to obtain a lock, immediately aborted and restarted later
● Cautious waiting algorithm
○ Deadlock-free
● Deadlock detection
○ System checks to see if a state of deadlock exists
○ Wait-for graph
Dealing with Deadlock and Starvation (cont’d.)
● Victim selection
○ Deciding which transaction to abort in case of deadlock
● Timeouts
○ If system waits longer than a predefined time, it aborts the transaction
● Starvation
○ Occurs if a transaction cannot proceed for an indefinite period of time while other transactions
continue normally
○ Solution: first-come-first-served queue
5.2 Concurrency Control Based on Timestamp Ordering
● Timestamp
○ Unique identifier assigned by the DBMS to identify a transaction
○ Assigned in the order submitted
○ Transaction start time
● Concurrency control techniques based on timestamps do not use locks
○ Deadlocks cannot occur
Concurrency Control Based on Timestamp Ordering
(cont’d.)
● Generating timestamps
○ Counter incremented each time its value is assigned to a transaction
○ Current date/time value of the system clock
■ Ensure no two timestamps are generated during the same tick of the clock
● General approach
○ Enforce equivalent serial order on the transactions based on their timestamps
Concurrency Control Basedon Timestamp Ordering (cont’d.)
● Basic TO algorithm
○ If conflicting operations detected, later operation rejected by aborting transaction that issued it
○ Schedules produced guaranteed to be conflict serializable
○ Starvation may occur
● Strict TO algorithm
○ Ensures schedules are both strict and conflict serializable
Concurrency Control Based on Timestamp Ordering
(cont’d.)
Figure 5.6 Lock compatibility tables (a) Lock compatibility table for read/write
locking scheme (b) Lock compatibility table for read/write/certify locking scheme
Validation (Optimistic) Techniques and Snapshot Isolation Concurrency
Control
● Optimistic techniques
○ Also called validation or certification techniques
○ No checking is done while the transaction is executing
○ Updates not applied directly to the database until finished transaction is validated
■ All updates applied to local copies of data items
○ Validation phase checks whether any of transaction’s updates violate serializability
■ Transaction committed or aborted based on result
Concurrency Control Based on Snapshot Isolation
● Transaction sees data items based on committed values of the items in the
database snapshot
○ Does not see updates that occur after transaction starts
● Read operations do not require read locks
○ Write operations require write locks
● Temporary version store keeps track of older versions of updated items
● Variation: serializable snapshot isolation (SSI)
5.5 Granularity of Data Items and Multiple Granularity Locking
Slide
20- 29
Multiple Granularity Level Locking
● Lock can be requested at any level
Figure 5.7 A granularity hierarchy for illustrating multiple granularity level locking
Multiple Granularity Level Locking (cont’d.)
Slide
21- 34
Using Locks for Concurrency Control in Indexes (cont’d.)
● Optimistic approach
○ Request and hold shared locks on nodes leading to the leaf node, with exclusive lock on
the leaf
● B-link tree approach
○ Sibling nodes on the same level are linked at every level
○ Allows shared locks when requesting a page
○ Requires lock be released before accessing the child node
5.7 Other Concurrency Control Issues
● Insertion
○ When new data item is inserted, it cannot be accessed until after operation is completed
● Deletion operation on the existing data item
○ Write lock must be obtained before deletion
● Phantom problem
○ Can occur when a new record being inserted satisfies a condition that a set of records
accessed by another transaction must satisfy
○ Record causing conflict not recognized by concurrency control protocol
Other Concurrency Control Issues (cont’d.)
● Interactive transactions
○ User can input a value of a data item to a transaction T based on some value written to the
screen by transaction T′, which may not have committed
○ Solution approach: postpone output of transactions to the screen until committed
● Latches
○ Locks held for a short duration
○ Do not follow usual concurrency control protocol
5.8 Summary
● NOSQL
○ Not only SQL
○ SQL systems offer many features (not everyone will use) and restrictive
● Most NOSQL systems are distributed databases or distributed storage
systems
○ Focus on semi-structured data storage, high performance, availability, data replication, and
scalability
Introduction (cont’d.)
Slide
24- 41
6.1 Introduction to NOSQL Systems
● BigTable
○ Google’s proprietary NOSQL system
○ Column-based or wide column store
● DynamoDB (Amazon)
○ Key-value data store
● Cassandra (Facebook)
○ Uses concepts from both key-value store and column-based systems
Slide
24- 42
Introduction to NOSQL Systems (cont’d.)
● Document stores
○ Collections of similar documents
● Individual documents resemble complex objects or XML documents
○ Documents are self-describing
○ Can have different data elements
● Documents can be specified in various formats
○ XML
○ JSON
MongoDB Data Model
Slide
24- 62
Hbase Data Model and Versioning
Slide
24- 63
Hbase Data Model and Versioning (cont’d.)
● Get
○ Retrieves data associated with a single row
● Scan
○ Retrieves all the rows
Hbase Storage and Distributed System Concepts
● Graph databases
○ Data represented as a graph
○ Collection of vertices (nodes) and edges
○ Possible to store data associated with both individual nodes and individual edges
● Neo4j
○ Open source system
○ Uses concepts of nodes and relationships
Neo4j (cont’d.)
Figure 6.4 Examples in Neo4j using the Cypher language (a) Creating some nodes
Neo4j (cont’d.)
● Path
○ Traversal of part of the graph
○ Typically used as part of a query to specify a pattern
● Schema optional in Neo4j
● Indexing and node identifiers
○ Users can create for the collection of nodes that have a particular label
○ One or more properties can be indexed
The Cypher Query Language of Neo4j
● Cypher query made up of clauses
● Result from one clause can be the input to the next clause in the query
Slide
24- 75
Slide 24- 76