Consistency Models in Distributed Systems PDF

# Chapter 6 - Consistency and Replication ## More on Replication * Replicas allow remote sites to continue working in the event of local failures. * It is also possible to protect against data corruption. * Replicas allow data to reside close to where it is used. * Even a large number of replicated "local" systems can improve performance: think of clusters. * This directly supports the distributed systems goal of enhanced scalability. ## Replication and Scalability * Replication is a widely-used scalability technique: think of Web clients and Web proxies. * When systems scale, the first problems to surface are those associated with performance - as the systems get bigger (e.g., more users), they get often slower. * Replicating the data and moving it closer to where it is needed helps to solve this scalability problem. * A problem remains: how to efficiently synchronize all of the replicas created to solve the scalability issue? * Dilemma: adding replicas improves scalability, but incurs the (oftentimes considerable) overhead of keeping the replicas up-to-date!!! * As we shall see, the solution often results in a relaxation of any consistency constraints. ## Replication and Consistency * But if there are many replicas of the same thing, how do we keep all of them up-to-date? How do we keep the replicas consistent? * Consistency can be achieved in a number of ways. We will study a number of consistency models, as well as protocols for implementing the models. * So, what's the catch? * It is not easy to keep all those replicas consistent. ## Data-centric Consistency Models A diagram shows 3 processes, each connected to a separate local copy of a distributed data store. The diagram is captioned: "The general organization of a logical data store, physically distributed and replicated across multiple processes" ## What is a Consistency Model? * A "consistency model" is a CONTRACT between a DS data-store and its processes. * If the processes agree to the rules, the data-store will perform properly and as advertised. * We start with Strict Consistency, which is defined as: * Any read on a data item 'x' returns a value corresponding to the result of the most recent write on 'x' (regardless of where the write occurred). ## Consistency Model Diagram Notation * W₁(x)a – a write by process 'i' to item 'x' with a value of 'a'. That is, 'x' is set to 'a'. * (Note: The process is often shown as 'P₁'). * R₁(x)b – a read by process 'i' from item 'x' producing the value 'b'. That is, reading 'x' returns 'b'. * Time moves from left to right in all diagrams. ## Strict Consistency Two diagrams show 2 processes, each interacting with a data store. There are 2 cases: * **(a)** Process 1 writes to data item 'x' with value 'a', and then process 2 reads from 'x' and receives 'a'. This is described as "A strictly consistent data-store" * **(b)** Process 1 writes to 'x' with value 'a'. Process 2 then attempts to read, but receives NIL (null value). Process 2 then reads again and receives 'a'. This is described as "A data-store that is not strictly consistent". The caption for the diagrams states: "Behavior of two processes, operating on same data item:" The text below the diagrams states: "With Strict Consistency, all writes are instantaneously visible to all processes and absolute global time order is maintained throughout the DS. This is the consistency model "Holy Grail" - not at all easy in the real world, and all but impossible within a DS. So, other, less strict (or "weaker") models have been developed ..." ## Sequential Consistency (1) * A weaker consistency model, which represents a relaxation of the rules. * It is also much easier (possible) to implement. * Definition of "Sequential Consistency": * The result of any execution is the same as if the (read and write) operations by all processes on the data-store were executed in the same sequential order and the operations of each individual process appear in this sequence in the order specified by its program. ## Sequential Consistency (2) Two diagrams show 4 processes, each interacting with a data store. There are 2 cases: * **(a)** Processes 1, 2, 3, and 4 interact with the data store. Process 1 writes value 'a' to 'x'. Process 2 writes value 'b' to 'x'. Process 3 reads 'b' and then 'a'. Process 4 reads 'b', then 'a'. This is described as "A sequentially consistent data store". * **(b)** Processes 1, 2, 3, and 4 interact with the data store. Process 1 writes value 'a' to 'x'. Process 2 writes value 'b' to 'x'. Process 3 reads 'b' and then 'a'. Process 4 reads 'a', then 'b'. This is described as "A data store that is not sequentially consistent". The caption for the diagrams states: "(a) A sequentially consistent data store." and "(b) A data store that is not sequentially consistent." ## Sequential Consistency (3) A table shows 3 processes interacting with a data store. Each process operates on a different data item (x, y, z). Each process also performs a different operation related to the data item, such as assignment and printing. The caption for the table is "Three concurrently-executing processes" ## Client-Centric Consistency Models * The previously studied consistency models concern themselves with maintaining a consistent (globally accessible) data-store in the presence of concurrent read/write operations * Another class of distributed data-store is that which is characterized by the lack of simultaneous updates. Here, the emphasis is more on maintaining a consistent view of things for the individual client process that is currently operating on the data-store. ## More Client-Centric Consistency * How fast should updates (writes) be made available to read-only processes? * Think of most database systems: mainly read. * Think of the DNS: write-write conflicts do no occur, only read-write conflicts. * Think of WWW: as with DNS, except that heavy use of client-side caching is present: even the return of stale pages is acceptable to most users. * These systems all exhibit a high degree of acceptable inconsistency ... with the replicas gradually becoming consistent over time. ## Toward Eventual Consistency * The only requirement is that all replicas will eventually be the same. * All updates must be guaranteed to propagate to all replicas ... eventually! * This works well if every client always updates the same replica. * Things are a little difficult if the clients are mobile. ## Eventual Consistency A diagram shows 3 data replicas connected via a wide-area network, and a portable computer connected to one of the replicas. The computer can connect to any of the replicas. The diagram is drawn in a way that suggests the client is able to simply move around and connect to any of the replicas without needing to worry about the details of the connection. The caption for the diagram is "The principle of a mobile user accessing different replicas of a distributed database" ## Monotonic Reads (1) * A data store is said to provide monotonic-read consistency if the following condition holds: * If a process reads the value of a data item x, any successive read operation on x by that process will always return that same value or a more recent value. ## Monotonic Reads (2) Two diagrams show 2 local copies (L1, L2) of a data store. Each copy is connected to a process, and both processes interact with the data store. There are two cases: * **(a)** Process writes to data item 'x' with value '1' on L1. Process reads value '1' from 'x' on L1. Then process writes '1' to 'x' on L2 and '2' to 'x' on L2. Process reads '2' from 'x' on L2. This is described as "A monotonic-read consistent data store". * **(b)** Process writes to data item 'x' with value '1' on L1. Process reads value '1' from 'x' on L1. Then process writes '2' to 'x' on L2. Process reads '2' from 'x' on L2. This is described as "A data store that does not provide monotonic reads". The caption for the diagrams is "The read operations performed by a single process P at two different local copies of the same data store." and "(a) A monotonic-read consistent data store." and "(b) A data store that does not provide monotonic reads." ## Monotonic Writes (1) * In a monotonic-write consistent store, the following condition holds: * A write operation by a process on a data item x is completed before any successive write operation on x by the same process. ## Monotonic Writes (2) Two diagrams show 2 local copies (L1, L2) of a data store. Each copy is connected to a process, and both processes interact with the data store. There are two cases: * **(a)** Process writes to data item 'x' with value '1' on L1. Process writes '1' to 'x' on L2. Then process writes '2' to 'x' on L2. This is described as "A monotonic-write consistent data store". * **(b)** Process writes to data item 'x' with value '1' on L1. Then process writes '2' to 'x' on L2. This is described as "A data store that does not provide monotonic-write consistency". The caption for the diagrams is "The write operations performed by a single process P at two different local copies of the same data store." and "(a) A monotonic-write consistent data store." and "(b) A data store that does not provide monotonic-write consistency." ## Read Your Writes (1) * A data store is said to provide read-your-writes consistency, if the following condition holds: * The effect of a write operation by a process on data item x will always be seen by a successive read operation on x by the same process. ## Read Your Writes (2) Two diagrams show 2 local copies (L1, L2) of a data store. Each copy is connected to a process, and both processes interact with the data store. There are two cases: * **(a)** Process writes to data item 'x' with value '1' on L1. Process writes '1' to 'x' on L2 and '2' to 'x' on L2. Then process reads '2' from 'x' on L2. This is described as "A data store that provides read-your-writes consistency". * **(b)** Process writes to data item 'x' with value '1' on L1. Then process writes '2' to 'x' on L2. Then process reads '2' from 'x' on L2. This is described as "A data store that does not provide read-your-writes consistency". The caption for the diagrams is "(a) A data store that provides read-your-writes consistency" and "(b) A data store that does not provide read-your-writes consistency". ## Writes Follow Reads (1) * A data store is said to provide writes-follow-reads consistency, if the following holds: * A write operation by a process on a data item x following a previous read operation on x by the same process is guaranteed to take place on the same or a more recent value of x that was read. ## Writes Follow Reads (2) Two diagrams show 2 local copies (L1, L2) of a data store. Each copy is connected to a process, and both processes interact with the data store. There are two cases: * **(a)** Process writes to data item 'x' with value '1' on L1. Process reads '1' from 'x' on L1. Then process writes '1' to 'x' on L2 and '2' to 'x' on L2. This is described as "A writes-follow-reads consistent data store". * **(b)** Process writes to data item 'x' with value '1' on L1. Process reads '1' from 'x' on L1. Then process writes '2' to 'x' on L2. This is described as "A data store that does not provide writes-follow-reads consistency". The caption for the diagrams is "(a) A writes-follow-reads consistent data store" and "(b) A data store that does not provide writes-follow-reads consistency". ## Content Replication and Placement * Regardless of which consistency model is chosen, we need to decide where, when and by whom copies of the data-store are to be placed. A diagram shows a central data store called "Permanent replicas". The data store interacts with 3 different kinds of clients: * **Clients** * **Server-initiated replicas**: which are connected to the central data store via bidirectional arrows. * **Client-initiated replicas**: which are connected to the central data store via unidirectional arrows. The central data store also interacts with **Server-initiated replication** and **Client-initiated replication** via bidirectional arrows. The caption for the diagram is: "Permanent replicas" ## Replica Placement Types * There are three types of replica: 1. **Permanent replicas**: tend to be small in number, organized as COWS (Clusters of Workstations) or mirrored systems. 2. **Server-initiated replicas**: used to enhance performance at the initiation of the owner of the data-store. Typically used by web hosting companies to geographically locate replicas close to where they are needed most. (Often referred to as "push caches"). 3. **Client-initiated replicas**: created as a result of client requests - think of browser caches. Works well assuming, of course, that the cached data does not go stale too soon. ## Server-Initiated Replicas A diagram shows a server, a client, and two replicas. The client connects to one replica (C₁), while the server connects to the second replica (C₂). Both (C₁ and C₂) connect to a single server (P) which in turn connects to the primary server (Q). The caption for the diagram is "Counting access requests from different clients"

Consistency Models in Distributed Systems PDF

Document Details

Tags

Related

Summary

Full Transcript