Podcast
Questions and Answers
Why study Distributed Systems?
Why study Distributed Systems?
Because they are everywhere.
What is a distributed system?
What is a distributed system?
A system consisting of multiple independent components that can interact and collaborate to complete a common task.
Which of the following are examples of distributed system applications? (Select all that apply)
Which of the following are examples of distributed system applications? (Select all that apply)
In a distributed system, independent components share all information with each other.
In a distributed system, independent components share all information with each other.
Signup and view all the answers
What is the primary challenge regarding communication in distributed systems?
What is the primary challenge regarding communication in distributed systems?
Signup and view all the answers
What does the term 'asynchrony' in distributed systems refer to?
What does the term 'asynchrony' in distributed systems refer to?
Signup and view all the answers
Which of the following is NOT one of the eight fallacies in distributed systems?
Which of the following is NOT one of the eight fallacies in distributed systems?
Signup and view all the answers
What are some of the important aspects to consider while choosing a model for distributed systems?
What are some of the important aspects to consider while choosing a model for distributed systems?
Signup and view all the answers
What can be deduced about an in-flight message?
What can be deduced about an in-flight message?
Signup and view all the answers
A cut can be considered consistent if a receive event happens before the cut but there is no corresponding send event.
A cut can be considered consistent if a receive event happens before the cut but there is no corresponding send event.
Signup and view all the answers
What are the properties of channels in a distributed system model as described?
What are the properties of channels in a distributed system model as described?
Signup and view all the answers
The Chandy-Lamport's Snapshot Algorithm captures a consistent ______.
The Chandy-Lamport's Snapshot Algorithm captures a consistent ______.
Signup and view all the answers
What does the FLP Theorem state?
What does the FLP Theorem state?
Signup and view all the answers
What are the three key properties implied by the ability of a system to reach consensus?
What are the three key properties implied by the ability of a system to reach consensus?
Signup and view all the answers
The recorded state in distributed consensus needs to be real and must have actually occurred in the system.
The recorded state in distributed consensus needs to be real and must have actually occurred in the system.
Signup and view all the answers
Which challenge is NOT mentioned as affecting consensus in distributed systems?
Which challenge is NOT mentioned as affecting consensus in distributed systems?
Signup and view all the answers
What does safety refer to in consensus protocols?
What does safety refer to in consensus protocols?
Signup and view all the answers
What does liveness refer to in consensus protocols?
What does liveness refer to in consensus protocols?
Signup and view all the answers
According to the FLP theorem, it is possible to achieve both safety and liveness?
According to the FLP theorem, it is possible to achieve both safety and liveness?
Signup and view all the answers
Name the three types of nodes in a consensus protocol.
Name the three types of nodes in a consensus protocol.
Signup and view all the answers
What is the first phase of the 2-Phase Commit protocol?
What is the first phase of the 2-Phase Commit protocol?
Signup and view all the answers
The 2-Phase Commit protocol guarantees liveness?
The 2-Phase Commit protocol guarantees liveness?
Signup and view all the answers
What is the primary goal of the 3-Phase Commit protocol?
What is the primary goal of the 3-Phase Commit protocol?
Signup and view all the answers
Who wrote the original Paxos paper?
Who wrote the original Paxos paper?
Signup and view all the answers
What is the first phase of the Paxos protocol?
What is the first phase of the Paxos protocol?
Signup and view all the answers
What is checkpointing?
What is checkpointing?
Signup and view all the answers
What is the domino effect in uncoordinated checkpointing?
What is the domino effect in uncoordinated checkpointing?
Signup and view all the answers
Coordinated checkpointing eliminates the need for garbage collection?
Coordinated checkpointing eliminates the need for garbage collection?
Signup and view all the answers
What is the blocking approach in communication-induced checkpoints?
What is the blocking approach in communication-induced checkpoints?
Signup and view all the answers
What does the non-blocking approach in global snapshot algorithm rely on?
What does the non-blocking approach in global snapshot algorithm rely on?
Signup and view all the answers
What are the ACID properties in the context of transactions?
What are the ACID properties in the context of transactions?
Signup and view all the answers
The pessimistic logging approach saves I/O but requires complex recovery.
The pessimistic logging approach saves I/O but requires complex recovery.
Signup and view all the answers
What is a distributed transaction?
What is a distributed transaction?
Signup and view all the answers
What is a key feature of Google Spanner?
What is a key feature of Google Spanner?
Signup and view all the answers
Pessimistic logging incurs high overhead because writes to __________ are slow.
Pessimistic logging incurs high overhead because writes to __________ are slow.
Signup and view all the answers
Which logging approach assumes that the log will always be persistent before failure?
Which logging approach assumes that the log will always be persistent before failure?
Signup and view all the answers
What is the main problem associated with sender-based timing in distributed systems?
What is the main problem associated with sender-based timing in distributed systems?
Signup and view all the answers
Causality tracking allows safe execution of external events without any delays indefinitely.
Causality tracking allows safe execution of external events without any delays indefinitely.
Signup and view all the answers
What are the types of logical clocks mentioned?
What are the types of logical clocks mentioned?
Signup and view all the answers
Logical clocks measure the same kind of 'time' as physical clocks.
Logical clocks measure the same kind of 'time' as physical clocks.
Signup and view all the answers
The happens before relationship is represented by ____ between events.
The happens before relationship is represented by ____ between events.
Signup and view all the answers
What problem does 'clock consistency condition' address?
What problem does 'clock consistency condition' address?
Signup and view all the answers
What does Lamport's clock guarantee?
What does Lamport's clock guarantee?
Signup and view all the answers
Vector clocks provide a weaker consistency than scalar clocks.
Vector clocks provide a weaker consistency than scalar clocks.
Signup and view all the answers
Match the following clock types with their characteristics:
Match the following clock types with their characteristics:
Signup and view all the answers
What is the purpose of making a cut in a distributed system?
What is the purpose of making a cut in a distributed system?
Signup and view all the answers
All cuts in a distributed system are consistent cuts.
All cuts in a distributed system are consistent cuts.
Signup and view all the answers
A network is always secure.
A network is always secure.
Signup and view all the answers
If components in a distributed system fail or are added, the network topology remains unchanged.
If components in a distributed system fail or are added, the network topology remains unchanged.
Signup and view all the answers
Which of the following are properties of a distributed system?
Which of the following are properties of a distributed system?
Signup and view all the answers
The system should recover from component failures without performing wrong actions, which is known as ______.
The system should recover from component failures without performing wrong actions, which is known as ______.
Signup and view all the answers
What is the main goal of a distributed system regarding responses?
What is the main goal of a distributed system regarding responses?
Signup and view all the answers
The CAP theorem states that a distributed system can never simultaneously meet which of the following properties?
The CAP theorem states that a distributed system can never simultaneously meet which of the following properties?
Signup and view all the answers
What is the purpose of Remote Procedure Calls (RPC)?
What is the purpose of Remote Procedure Calls (RPC)?
Signup and view all the answers
What distinguishes synchronous RPC operations from asynchronous RPC operations?
What distinguishes synchronous RPC operations from asynchronous RPC operations?
Signup and view all the answers
The client in a client-server architecture needs to know the server's interface and parameter types.
The client in a client-server architecture needs to know the server's interface and parameter types.
Signup and view all the answers
What is meant by 'exactly once execution' in the context of RPC?
What is meant by 'exactly once execution' in the context of RPC?
Signup and view all the answers
The ideal scenario for command execution in RPC is ______.
The ideal scenario for command execution in RPC is ______.
Signup and view all the answers
In distributed systems, lack of response from the server always indicates a failure.
In distributed systems, lack of response from the server always indicates a failure.
Signup and view all the answers
What role does the RPC runtime play in the architecture of an RPC system?
What role does the RPC runtime play in the architecture of an RPC system?
Signup and view all the answers
Which of the following is not a challenge faced in client-server architecture?
Which of the following is not a challenge faced in client-server architecture?
Signup and view all the answers
What does 'partition tolerance' refer to in a distributed system?
What does 'partition tolerance' refer to in a distributed system?
Signup and view all the answers
Study Notes
What Are Distributed Systems?
- Distributed systems are a collection of computing units that interact with each other via messages and appear as a single coherent entity to the user.
- The independent units do not share all information.
- The communication between units is not perfectly reliable (messages can be delayed, dropped, or delivered multiple times).
- Units work on a common task or goal.
Modeling Distributed Systems
- A simple model of a distributed system represents nodes, messages, and time. It abstracts out the underlying network, directionality of channels, and processing at nodes.
- A more complex model adds a state variable to each node to represent the processing actions happening at the nodes.
- Models should be accurate (represent the actual system) and tractable (allow analysis of the problem).
Challenges in Distributed Systems
- Asynchrony: message latency can be unpredictable or even infinite.
- Failures: nodes or network links can fail, either permanently, transiently, or in a Byzantine way (system performs incorrect actions).
- Consistency: maintaining a single up-to-date copy of data consistent across all nodes is challenging due to concurrency and replication.
Properties of Distributed Systems
- Consistency: The system always gives correct answers.
- Availability: The system always provides responses.
- Partition Tolerance: The system provides responses regardless of failures or delays in nodes or the network.
- Fault-Tolerance: The system recovers from component failures without performing incorrect actions.
- High Availability: The system restores operations and resumes services after component failures.
- Recoverability: Failed components can restart and rejoin the system after a failure is resolved.
Correctness
- Distributed system is considered correct if the output produced for the same set of inputs is the same as the output produced by a single entity.
- Each node works on a separate set of inputs, making it harder to maintain consistency.
- Consistency Model guarantees the ordering of events across all nodes.
- Strict Consistency - all nodes agree on the same global order for all events, almost impossible to achieve due to different time perception by nodes.
- Linearizable - guarantees that transactions appear to happen in linear order despite individual operations occurring simultaneously.
- Serializable - ensures that the outputs of transactions align with some order, even if it doesn't match the actual order of events.
Brewer's CAP Theorem
- A Distributed System cannot simultaneously guarantee Consistency, Availability, and Partition Tolerance at any given time.
- This means a system must trade off one property for another if it were to experience a network partition.
- In a system without network partitions, both Consistency and Availability can be achieved.
- Several systems are classified based on their CAP tradeoffs, with key-value stores like Cassandra and DynamoDB opting for P+A and systems like Megastore and MySQL Cluster choosing P+C.
- In practice, high latency is equivalent to unavailability.
- The PACELC theorem addresses how a system trades off availability and consistency during network partitions (C.A.P). It also explores the trade-offs between latency and consistency (L.C) during normal operation.
Client-Server Architecture
- Clients send requests and data to Server nodes in the distributed system.
- Clients must discover the server's address, establish connections, and manage data transfer for their requests.
- Servers process requests, potentially accessing databases, and send responses back to clients.
- Both clients and servers can reside on the same machine.
Challenges of Client-Server Architecture
- Clients need to discover servers and establish connections (Discovery and Binding).
- Clients need to determine the server's operation and required parameters (Identifying Interface and Parameters).
- Clients and servers must utilize a common data representation scheme (Data Representation).
- Data must be explicitly copied between server and client memory (Explicit Data Management).
- Network and server delays lead to unpredictable execution times (Unpredictable Execution Time).
- It's difficult for clients to distinguish fault sources, making failure handling complicated (Unknown Cause of Failure).
Role of Remote Procedure Call (RPC)
- RPC aims to simplify client-server interactions by addressing the above challenges.
- It hides the complexity of distributed programming, making it feel similar to programming single-node systems.
- RPC provides services for registration, connection establishment, interface, type specification, data management, and failure handling.
- Serialization (marshalling), deserialization (unmarshalling) transform data between network bu er and memory.
Architecture of an RPC System
- RPC systems operate at multiple layers, including the API, Stubs, and RPC Runtime.
- The API serves as the programming interface for client and server applications.
- Stubs manage the connection establishment, data marshaling/unmarshaling, and function calls.
- The RPC Runtime manages connections, message sending/receiving, and failure handling.
- An IDL is used to describe the server's interfaces and functions.
- An RPC compiler translates the IDL into stub code, skeleton server code, and other necessary components.
- A service registry provides a centralized location for servers to announce their services.
Anatomy of an RPC Call
- The client makes a procedure-like function call using the RPC call.
- The client stub processes the arguments and generates a buffer for the RPC request.
- The RPC runtime takes care of sending the message to the server.
- The server receives the message and the RPC runtime passes it to the server stub.
- The server stub unpacks the message and determines the corresponding function.
- The actual local implementation of the function is executed, and the result is returned to the server stub.
- The result is packed into a message buffer and sent back by the server.
- The client stub receives the message, unpacks the result, and places it in the expected memory location.
- This makes the RPC call seem like a regular local function call for the client.
Invocation Semantics of RPC Operations
- Synchronous RPC operations block client threads until the response is received.
- Asynchronous RPC operations allow client threads to continue performing other tasks while waiting for the response.
- Clients can choose to be notified via callbacks when the response arrives.
Execution Guarantees of RPC Operations
- RPC operations can be classified based on their guarantees for message delivery.
- Exactly Once Execution: An ideal scenario where operations execute only once per invocation.
- At Most Once Execution: Operations execute at most once per invocation: the operation may execute once or not at all.
- At Least Once Execution: Operations execute at least once per invocation: the operation may execute once or multiple times.
Examples of RPC Systems
- The system includes sunRPC, SOAP, CORBA, Apache Thrift, and gRPC.
- gRPC uses protocol buffers for defining interfaces and managing data serialization.
- It uses a .proto file to specify the interface, which is compiled to generate a .protoc file.
Time in Distributed Systems
- Time is crucial for ordering events, performing resource allocation, and garbage collection.
- However, time in distributed systems is challenging to measure due to the lack of global synchronization, network delays, and the possibility of failures and malicious nodes.
Logical Clocks
- Logical clocks generate timestamps to infer the ordering of events in distributed systems.
- Three main types of logical clocks exist: Scalar Clocks (Lamport Clocks), Vector Clocks, and Matrix Clocks.
Representing Time and Sequence
- Processes (pi) generate events (eki).
- The 'happens before' relationship (→) indicates the order of events within a process and across messages sent and received.
- Process history (Hi) represents the ordered sequence of events within a process.
- Concurrent events have no 'happens before' relationship between them.
Time Diagrams
- Time diagrams depict processes as horizontal lines with dots representing events.
- Arrows connect send and receive events, indicating message transmission.
- Arrowless dots represent internal events.
Clock Consistency
- Clock consistency refers to logical clocks producing timestamps reflective of the actual relationship between events.
-
Monotonicity: If event
e1
happens before evente2
, then the timestamp ofe1
must be less than the timestamp ofe2
. -
No Implication for Concurrent Events: If events
e1
ande2
are concurrent, their timestamps are not required to have a specific order. -
Strong Clock Consistency: When a clock ensures that
e1
happens beforee2
if and only if the timestamp ofe1
is less than the timestamp ofe2
.
Lamport's Scalar Clock
-
Each node has its own scalar clock.
-
Clock Definition:
- Data Structure: A scalar value represents the timestamp.
- Rules to generate timestamps:
- For internal events
a
andb
wherea
happens beforeb
, the clock is incremented:Ci(b) = Ci(a) + 1
. - For messaging events
a = send_i(m_k)
andb = recv_j(m_k)
, the receiver's clock is updated to the maximum of the sender's clock plus one and its current value:Cj(b) = max(Ci(a) + 1, Cj)
.
- For internal events
-
Clock Correctness:
- Lamport's clock satisfies clock consistency but not strong clock consistency.
- The clock can be used to determine the minimum number of events that happened before a given event across all nodes, but not the exact number.
- The lack of strong consistency can lead to unnecessary ordering enforcement in systems.
Vector Clock
-
A vector of scalars, where the size of the vector is equal to the number of nodes in the system.
-
Clock Definition:
- Each node maintains its own view of time for all nodes.
-
vti[i] = Ci
, whereCi
is the Lamport clock at nodei
. -
vti[j] = Cj0
, whereCj0
is nodei
's perception of nodej
's Lamport clock.
-
Rules for updating Vector Clocks:
-
Rule 1: Increment the clock value at
i
for each internal event:vti[i] = vti[i] + d
. -
Rule 2: On receiving a message, update the local view of the sender's clock:
vti[k] = max(vti[k], vt[k])
, after which the receiver’s clock is updated using Rule 1 and the message is processed.
-
Rule 1: Increment the clock value at
-
Comparing Vector Clocks:
-
vt1 ≤ vt2
ifvt1[i] ≤ vt2[i]
for alli
. -
vt1 < vt2
ifvt1[i] ≤ vt2[i]
for alli
, andvt1[k] < vt2[k]
for at least onek
. -
vt1 k vt2
if neithervt1 < vt2
norvt2 < vt1
holds.
-
-
Clock consistency: Vector clocks are both consistent and strongly consistent.
-
Benefits:
- Provides a stronger guarantee of causality.
- Improves efficiency as unnecessary reordering of concurrent events is avoided.
-
Cost:
- Requires more storage (O(N)) and data to be transmitted with messages.
Matrix Clock
-
Extends the vector clock to a matrix.
-
Clock Definition:
-
mti[i, i] = Ci
, the scalar clock value at nodei
. -
mti[i] = vti
, the vector clock of nodei
. -
mti[k] = vt0k
, nodei
's perception of nodek
's vector clock.
-
-
Benefits:
- Enables garbage collection, as nodes can determine if their perception of another node's time is greater than a certain value.
- If
min(mti[k, j]) > t
, all events atj
that occurred beforet
can be deleted.
-
Cost:
- Requires more storage and complexity.
State in Distributed Systems
- State: The state of a distributed system is defined by the states of the nodes and channels, including process state, channel state, and state transitions.
- Cut: A division in the execution time of the system that separates completed events from events that haven't finished.
- Consistent Cut: A cut that captures a possible ordering of events in the system, even if it may not have actually occurred.
-
Challenges in Capturing State:
-
No Instantaneous Recording:
- Absence of a global clock makes capturing simultaneous events impossible.
- Distributed systems lack a central authority for coordinating snapshots.
- Non-determinism: Multiple possible future events due to concurrent events make ordering and consistency challenging.
-
No Instantaneous Recording:
Chandy-Lamport's Snapshot Algorithm
-
Aims to capture a consistent snapshot of the distributed system.
-
Steps:
- The initiator node saves its local state and sends a marker message to all outgoing channels.
- Upon receiving the first marker on an incoming channel, other nodes:
- Save their local state.
- Mark the state of the incoming channel as empty.
- Send a marker message to all outgoing channels.
- Resume execution, saving incoming messages until a marker is received on that particular channel.
- Once the marker arrives on an incoming channel, the state of that channel is marked, and all received messages since the last state capture become in-flight messages.
-
Assumptions:
- No failures.
- Unidirectional channels.
- FIFO communication channels.
- The algorithm does not disrupt normal processing.
- Each node records its local state and the state of all its incoming channels.
-
Guarantees: A consistent state snapshot, but not necessarily the actual state of the system at any given time.
Properties of State Captured
- The algorithm captures a consistent state that doesn't necessarily reflect the exact execution point.
- The observed global state can be a permutation of the actual state or one of the other global states within the state tree.
- The system's possible states can be depicted as a tree starting from the initial state. Each branch represents a different concurrent event. The algorithm eventually lands on one of the possible states within the tree.
Global State Formal Definition
- The recorded state must be on a path from the initial state to the final state.
- There exists a permutation of the sequence of computations leading from the initial state to the final state that passes through the recorded state.
- The recorded state can be either the initial event or occur after the initial event.
- Similarly, it can be the final event or occur prior to the final event.
Global State Benefits
- Stable Property Detection: Helps perform garbage collection and detect stable properties like deadlock, termination, and token loss.
- Unstable Property Detection: Useful for identifying correctness/consistency issues and detecting unstable properties like buffer overflow, race conditions, and load spikes.
- If a stable property is true in the recorded state, it is also true in the final state.
- If a stable property is false in the recorded state, it is false in the initial state.
- Unstable properties might not remain true in the final state as they are transient.
- Observing an unstable property in the recorded state indicates the potential for its occurrence in the system.
What is Consensus?
- Multiple distributed processes reaching an agreement on a shared state, an action, or a timestamp.
- Crucial for progress in distributed systems.
- In a common scenario, processes agree on the outcome of a transaction.
Consensus Challenges
- Non-determinism
- Lack of a global clock
- Network delays
- Failures
- Malicious behavior
Consensus Properties
- Termination/Liveness: Processes either reach a consensus or terminate.
- Correctness/Agreement/Safety: All processes agree on a single value.
- Validity/Safety: The decided value must be proposed by one of the processes.
Theoretical Possibility of Consensus
- Asynchronous system model: messages may be reordered and delayed but not corrupted.
- At-most one faulty process using the fail-stop model.
- This simplified model allows proving the possibility or impossibility of consensus in complex systems.
FLP Theorem
- It states that in a system with one fault, no consensus protocol can be totally correct.
- There is always an initial bivalent configuration for which an admissible non-deciding schedule exists.
Is Consensus Really Impossible?
- The FLP theorem applies to systems that meet its assumptions.
- Real-world systems often deviate from the idealized model and may achieve consensus.
- Consensus protocols might not terminate if specific conditions are not met.
Goals of Consensus Protocols
- Agreement: Multiple parties must agree on the same value.
- Validity: The chosen value must be valid.
- Safety: Only a proposed value can be chosen, and only a single value is chosen and learned.
- Liveness: Some proposed value is chosen, and any chosen value is learned.
2-Phase Commit (2PC)
- A coordinator coordinates the commit process.
- Phase 1: Vote Collection Phase: Participants vote on the proposed value.
- Phase 2: Decision Phase: The coordinator tallies votes and communicates the decision (commit).
- Blocking Protocol: Blocks if there is a failure, not guaranteeing liveness.
3-Phase Commit (3PC)
- Attempts to address 2PC's blocking problem.
- Phase 1: Prepare Phase: Votes are solicited.
- Phase 2: Pre-Commit Phase: Decision communicated (might lock resources).
- Phase 3: Decision/Commit Phase: Actual commit performed.
- Non-Blocking: Timeout during phases 2 or 3 leads to protocol abort.
- Not suitable for fail-stop with restart due to potential safety issues.
Paxos
- Distributed consensus protocol that handles asynchronous communication and non-Byzantine failures.
- Uses a state machine replication approach where each node replicates the same state machine.
- Decisions are based on majority quorum, enabling fault tolerance.
- Ordering is maintained using timestamps.
- It consists of three phases: Prepare, Accept, and Learn.
Paxos Phases
- Prepare Phase: The initiator proposes an agreement round and sends a prepare request with a proposal number.
- Accept Phase: Participants vote on the proposal based on the proposal number and value.
- Learn Phase: The agreed-upon value is learned by all participants once a quorum agrees to commit.
Paxos Proposal Number
- The proposal number ensures ordering and helps handle fail-restart and delayed messages.
- Each proposal number is unique across all processes, allowing for multiple ongoing proposals.
CRAQ
- A highly scalable distributed database system that uses Paxos for consensus.
- Offers significantly higher write throughput compared to other solutions.
- CRAQ replicas maintain multiple copies of data for increased availability.
- Read throughput slightly drops as write load increases due to the need to manage multiple copies and perform tail checks.
Basics of Failures
- Fault: A flaw in hardware or software.
- Error: Occurs when an activated fault leads to incorrect behavior.
- Failure: Refers to the manifestation of an error during system execution.
Fault Types
- Transient: Occur once and disappear.
- Intermittent: Occur occasionally.
- Permanent: Occur once and persist until removed.
Failure Types
- Fail-Stop: Components stop working.
- Omission: Components fail to send or receive messages.
- Timing: Components fail to meet timing requirements, possibly causing delays.
- Byzantine: Components exhibit arbitrary failures.
System Failure
- System failure can be caused by malicious nodes or software bugs.
- There are four approaches to deal with failures: avoidance, detection, removal, and recovery.
- Avoidance is difficult and costly.
- Detection is commonly done using heartbeats or ping to check node responsiveness.
- Error correction codes detect incorrect execution.
- Removal involves reverting the system to the last clean state.
- Rollback takes the system to a point before the fault occurred and may not be possible if there are external actions by the system.
- Recovery encompasses detecting, removing the root cause, reverting to the last clean state, and resuming clean execution.
- A fault-tolerant system can perform all four actions.
Rollback-Recovery
- Rollback-recovery reverts the system to a previous state when a fault is detected.
- Rollback includes rolling back the effects of messages transmitted after the fault was detected.
- Rollback also includes any updates made to the system state.
- The system needs to roll back to a consistent state, not necessarily a real state.
- Checkpointing and logging are two methods used to choose the consistent cut point.
- The granularity of operation varies: transparent/full-system, transaction-level, and application-specific.
Checkpointing
- Checkpointing periodically saves the application state so the system can revert to the checkpoint on failure.
- It involves flushing the checkpoint to disk or persistent storage during normal operation.
- On failure, checkpoint restoration and system restart are performed.
- Hardware failure may require repair, replacement, or starting a different node.
- Checkpointing allows for instantaneous restart after restoration.
- It requires significant I/O to save the full system state, which can be mitigated with application-specific checkpoints and tracking delta changes.
Logging
- Logging captures information about operations performed to repeat them for state redemption on failure.
- It saves information about changes in state variables.
- Two styles are possible: undo, which stores the original value of changed variables, and redo, which stores the new value of changed variables.
- Logs need to be written to persistent storage.
- Logging requires less I/O to persistent storage and performs I/O while the application is executing.
- Recovery is slower and more complicated with redo logs, which require a log scan to find the most recent value.
Combining Logging and Checkpointing
- The system periodically performs checkpoints.
- Logging is used to save updates between checkpoints, and earlier logs can be discarded.
- This combination limits recovery duration, space, and bandwidth usage.
- It requires detecting a stable consistent cut checkpoint.
Uncoordinated Checkpointing
- Processes take checkpoints independently.
- Recovery requires computing the recovery line, and processes revert to it.
- Failure of one process may force others to roll back to consistent checkpoints.
- It requires storing dependency information or tracking sent messages.
- Uncoordinated checkpointing suffers from the domino effect, where rollback can cascade to earlier points, leading to potentially unnecessary rollbacks.
- It also results in useless checkpoints that cannot be part of a globally consistent state.
- Multiple checkpoints per process need to be stored, leading to excessive and potentially wasteful storage.
- Garbage collection is needed to clean up useless checkpoints and can be complex and time-consuming.
Coordinated Checkpointing
- Processes coordinate checkpointing to ensure a consistent state.
- It eliminates the need for dependency graphs and domino effects.
- It requires only one checkpoint per process.
- Garbage collection is not needed.
- The coordination itself is a disadvantage.
- Delays can occur due to initiator messages.
- Synchronous systems can take checkpoints every T units of time, while reliable and bounded message delivery allows for a round-robin approach.
- Unnecessary checkpoints may occur when nodes are forced to checkpoint even if no changes have occurred.
Communication-Induced Checkpoints
- Consensus protocols determine whether to checkpoint.
- Two approaches are possible: blocking and non-blocking.
- The blocking approach involves a two-phase commit or other consensus protocols.
- The non-blocking approach relies on marker messages and requires a FIFO network.
- It uses periodic independent checkpoints for nodes not communicating with others.
- Nodes snapshot their state when receiving marker messages before processing the actual message.
Logging Approaches
- Logging saves storage but requires a more complex recovery process.
- It aims to capture information enabling the retrieval of a consistent cut.
- It should prevent orphaned events, where receive events are logged without the corresponding send events.
- Three approaches exists: pessimistic logging, optimistic logging, and causality tracking.
Pessimistic Logging
- Each process logs everything to storage before events are propagated.
- It involves logging send messages before sending.
- This approach has high overhead due to slow writes to persistent memory, but can be improved with faster persistent memories.
Optimistic Logging
- Assumes the log will be persistent before failure and allows for the reversal of effects.
- It requires tracking dependencies.
- Incomplete operations need to be identified and reversed during recovery.
- Operations with external effects, such as robot movements, need to be delayed until dependency information is captured.
Causality Tracking
- Operates optimistically when no dependencies are detected.
- Captures dependencies to ensure causally related events are deterministically recorded.
- There is no risk of delaying external events indefinitely.
Choosing a Method
- The right method depends on factors like workload type, possible failure types, and system configuration.
- Improvements in persistent memory speeds may make pessimistic logging more efficient.
- Coordinated checkpointing, while favorable for many features, may cause overhead depending on the network costs and application.
Distributed Transactions (Google Spanner)
- Distributed transactions are similar to regular transactions but executed across multiple nodes.
- They need to guarantee ACID properties (Atomicity, Consistency, Isolation, Durability) but across multiple nodes.
- Google Spanner is a global data management layer used by Google and available as a Cloud DB service.
- It allows applications to use SQL queries for interaction with data, offering higher scalability than MySQL.
- It stores data globally across geographically separated zones and shards it at each zone.
- It replicates data across multiple sites for fault tolerance and availability.
- Spanner consists of multiple components: persistent storage, a distributed file system, and data organized as files.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.