Concurrency Control For Distributed Systems PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document covers concurrency control in distributed systems. It discusses ACID properties, concurrent transactions, and different types of anomalies like dirty reads, lost updates, and fuzzy reads. It also details isolation levels.
Full Transcript
CS 427 Distributed Systems Concurrency Control ACID Properties Isolation (I) even though transactions might be running concurrently and have data dependencies, the end result will be as if one of them was executing at a time, so that there was no interference between them This prev...
CS 427 Distributed Systems Concurrency Control ACID Properties Isolation (I) even though transactions might be running concurrently and have data dependencies, the end result will be as if one of them was executing at a time, so that there was no interference between them This prevents a large number of anomalies that will be discussed later. 2 Concurrent TXNs with Dependency Tim e Source: book (Distributed Systems: Concepts and Design, 5 th edition ) 3 Concurrent TXNs with Dependency Tim e Source: book (Designing Data Intensive Applications) 4 Anomalies inherent concurrency in distributed systems creates potential anomalies and unexpected behaviour transactions that are composed of multiple operations and run concurrently can lead to different results depending on how their operations are interleaved Consider transactions T and U where transaction T transfers an amount from account A to account B transaction U transfers an amount from account C to account B initial balances are: A = B = C = $200 5 Scenario 1 ( T before U ) Transaction T balance = B.get_balance(); B.balance = 200 B.set_balance(balance+40) B.balance = 240 A.withdraw(40) A.balance = 160 Transaction U balance = B.get_balance(); B.balance = 240 B.set_balance(balance+80) B.balance = 240+80= Final 320 Value C.withdraw(80) C.balance = 120 6 Scenario 2 ( U before T ) Transaction U balance = B.get_balance(); B.balance = 200 B.set_balance(balance+80) B.balance = 280 C.withdraw(80) C.balance = 120 Transaction T balance = B.get_balance(); B.balance = 280 B.set_balance(balance+40) B.balance = 280+40= Final 320 Value A.withdraw(40) A.balance = 160 7 Why we need concurrency? Serial Execution of Transactions is generally be unacceptable for servers whose resources are shared by many interactive users Servers that supports transactions aim to maximize concurrency transactions are allowed to execute concurrently if this would have the same effect as a serial execution 8 Scenario 3 ( Interleaving ops ) Transaction T Transaction U balance = B.get_balance(); B=200 $ balance = B.get_balance(); B= 200 $ Final B.set_balance(balance+80) This update is lost Value B=280 $ because of the B.set_balance(balance+40) incorrect B=240 $ interleaving of ops A.withdraw(40) A=160$ C.withdraw(80) C=120$ 9 Isolation Levels There is a need for formal models that define what is possible and what is not in the behavior of a system These models are called isolation levels: Serializability Repeatable read Snapshot Isolation Read Committed Read Uncommitted define what is NOT possible, i.e. which anomalies are prevented amongst the ones that are already known stronger isolation levels prevent more anomalies at the cost of performance 10 Isolation Levels Anomalies Weakest Strongest Source: book (Distributed Systems for Practitioners, Ch.2 ) 11 Why we need Isolation Rather than blindly relying on tools ( such as Databases that implement isolation), we need to develop a good understanding of the kinds of concurrency problems that exist, and how to prevent them. Then we can build applications that are reliable and correct, using the tools at our disposal. we will look at several weak (nonserializable) isolation levels that are used in practice, and study race conditions that can and cannot occur, so that you can decide what level is appropriate to your application 12 Anomalies We will study anomalies before examining isolation levels Dirty writes Dirty reads (Fuzzy) non-repeatable reads Phantom reads Lost updates Read skew Write skew In all of the coming figures, we will focus on “what the problem is” not “how to solve it” 13 Dirty Read occurs when a transaction reads a value that has been written by another transaction that is still in-flight and has not been committed yet caused by the interaction between a read operation in one transaction and an earlier write operation in another transaction on the same object. 14 Dirty Read transaction U will have seen a value that never existed, since “a” will be restored to its original value. We say that the transaction U has performed a dirty read. 15 As it has committed, Dirty Read user 2 experiences an anomaly: the mailbox listing shows an unread mess but the counter shows zero unread messa because the counter increment has not yet happened Isolation would have prevented this issue ensuring that user 2 sees either both the inserted email and the updated or neither, but not an inconsistent halfwa 16 Dirty Write occurs when a transaction overwrites a value that has previously been written by another transaction that is still in-flight and has not been committed yet Assume transactions A and B, where transaction A is running the operations [x=1, y=1] transaction B is running the operations [x=2, y=2] A database is consistent if after executing A and B, both x and y have the same value Execution could be interleaved/serial 17 Serial Execution A before B before B A Transaction A Transaction B x=1 x=2 y=1 y=2 Commit OR commit Transaction B Transaction A x=2 x=1 y=2 y=1 commit commit Final Value of x is 2 Final Value of x is 1 Final Value of y is 2 Final Value of y is 1 18 Dirty Write Transaction A Transaction B x=1 x=2 y=2 commit y=1 commit Final Value of x is 2 Final Value of y is 1 19 Dirty Write writes from different transactions that access the same object what happens if the earlier write is part of a transaction that has not yet committed, and the later write overwrites an uncommitted value? 20 Figure 7.5 illustrates a used car sales website on which two people, Alice and Bob, are simultaneously trying to buy the same car. Buying a car requires two database writes the listing on the website needs to be updated to reflect the buyer, and the sales invoice needs to be sent to the buyer In the Figure, the sale is awarded to Bob (because he performs the winning update to the listings table), but the invoice is sent to Alice (because she performs the winning update to the invoices table). 21 Lost Update occurs when two transactions read the same value and try to update it to two different values (commit) (commit) 22 Lost Update the counter should have increased from 42 to 44, sin two increments happened, but it actually only went to 4 because of the race conditio 23 Read Skew occurs when consistency is violated because a transaction can only see partial results of another transaction A client sees different parts of the database at different points in time. Thus, reading data in a state that violates consistency 24 Read Skew The consistency constraint is that friendships are m so if person B is included in person A’s list of friends then A must also be included in B’s list 1- (commit) 2- 1-Delete P2 from friendlist o 2-Delete P2 from friendlist o (commit) The consistency constraint is no longer maintained 25 Read Skew 26 Say Alice has $1,000 of savings at a bank, split across two accounts with $500 (Account 1 and Account 2) Now a transaction transfers $100 from Account 2 to Account 1 When Alice looks at her account balances at different points in time she may see Account1 balance =500$ (at a time before the incoming payment has arrived ) and Account 2 balance =400$ (at a time after the outgoing transfer has been made Account1 + Account 2 has a total of $900—it seems that $100 has vanished into thin air. 27 Write Skew occurs when two transactions read the same data, but then modify disjoint sets of data It is neither a dirty write nor a lost update, because the two transactions are updating two different objects 28 Write Skew Alice and Bob are the two on-call doctors for a particular shift. Both are feeling unwell, so they both decide to request leave. Unfortunately, they happen to click the button to go off call at approximately the same time. No Doctor is on Call ! 29 Phantoms where a write in one transaction changes the result of a search query in another transaction occurs when a transaction does a read and another transaction writes or removes a data item matched by that read while the first transaction is still in flight 30 Phantoms Definition: Phantom reads occur when a transaction reads a set of rows that match a certain condition, and another concurrent transaction inserts, deletes, or updates rows that affect that condition before the first transaction is complete. When the first transaction re-reads the data, it sees a different set of rows. Example: Consider a transaction that reads all records from a "Customers" table where the "Country" is "USA". If, during this transaction, another transaction inserts a new customer from the USA, the first transaction would see this new customer in its subsequent reads, leading to inconsistent results. 31 Phantoms https://dotnettutorials.net/lesson/phantom-read-concurrency-problem-sql-server 32 Fuzzy Read ( Un-Repeatable Read ) Definition: Fuzzy reads, also known as dirty reads, occur when a transaction reads data that has been modified by another ongoing transaction but not yet committed. If the modifying transaction rolls back, the reading transaction will have seen inconsistent or "fuzzy" data. Example: If Transaction A updates a customer's balance but hasn’t committed yet, and Transaction B reads that balance, Transaction B may see an uncommitted and potentially incorrect value. If Transaction A rolls back, Transaction B would have acted on data that no longer exists. 33 Fuzzy Read ( Un-Repeatable Read ) occurs when a value is retrieved twice during a transaction (without it being updated in the same transaction) and the value is different. https://dotnettutorials.net/lesson/phantom-read- 34 References Distributed Systems for Practitioners ( Ch. 2, 3 ) Designing Data Intensive Applications ( Ch. 7 ) Distributed Systems: Concepts and Design, 5th ed. ( Ch. 16 ) 35