1- Introduction_merged.pdf

Introduction Dr. Rodrigue Rizk [email protected] What is Distributed System? ❑ “... a system in which the failure of a computer you didn’t even know existed can render your own computer unusable.” Leslie Lamport, ACM Turing Award 2013 a renowned computer scientist known for his pioneering work in distributed systems https://youtu.be/rkZzg7Vowao 2 What is Distributed System? ❑ A distributed system is a network of independent nodes that work together to achieve a common goal or task. multiple nodes communicating via a network “nodes”, e.g: computer, phone, car, robot,... ❑ These systems are designed to share resources, data, and computation power, while appearing to users and applications as a single, unified system. ❑ Distributed systems are foundational in modern computing, enabling scalability, fault tolerance, and resource sharing across large and diverse environments. 3 Centralized vs. Distributed Systems ❑ Centralized Systems: Single point of control, often leading to single points of failure and scalability issues. ❑ Distributed Systems: Multiple independent entities (nodes) working together, which can provide better fault tolerance and scalability. 4 Centralized vs. Distributed Systems Aspect Centralized Systems Distributed Systems Multiple independent Control Single central unit nodes Resource Allocation Centralized Distributed Complex, requires Data Consistency Easier to manage consistency protocols Limited to upgrading the Horizontal scaling by Scalability central unit adding more nodes Redundant and fault- Fault Tolerance Single point of failure tolerant design Distributed security Centralized security Security measures and measures communication protection 5 Historical Evolution ❑ The historical development of distributed systems reflects the evolution of computing from centralized to decentralized architectures, driven by the need for scalability, reliability, and resource sharing. ❑ 1950s - 1970s: Centralized computing with early networking experiments (e.g., ARPANET) and initial ideas about distributed systems. ❑ 1980s - 1990s: Formalization of distributed systems theory, development of distributed file systems, databases, and client-server architectures. ❑ 2000s - Present: Emergence of cloud computing, big data technologies, containerization, microservices, and modern distributed architectures that address scalability, flexibility, and resource management. 6 Key Characteristics of Distributed Systems ❑ Distributed Resources: In a distributed system, resources such as data, computation, and storage are spread across multiple machines (nodes). These resources can be geographically distributed or located in different parts of a data center. ❑ Concurrency: Multiple processes or threads can run concurrently on different nodes in a distributed system. This allows the system to perform many operations simultaneously, improving efficiency and performance. 7 Key Characteristics of Distributed Systems ❑ Scalability: Distributed systems are designed to scale horizontally by adding more nodes to the network. - This is different from vertical scaling, where resources are added to a single machine. ❑ Fault Tolerance: Distributed systems are built to handle failures gracefully. If one node fails, the system can continue to operate with minimal disruption. Redundancy, replication, and failover mechanisms are often employed to ensure reliability. 8 Key Characteristics of Distributed Systems ❑ Heterogeneity: Distributed systems often consist of different types of hardware, operating systems, and network technologies. They are designed to operate across diverse environments, providing interoperability and compatibility. ❑ Latency: Communication between nodes in a distributed system introduces latency due to network delays. Managing and optimizing latency is crucial for maintaining system performance. ❑ Consistency: Ensuring consistency across distributed data is a major challenge. Techniques such as distributed transactions, consensus algorithms, and eventual consistency models are used to maintain data integrity. 9 Relationships with other fields ❑ Operating Systems inter-process communication, scheduling ❑ Databases many modern databases are distributed, e.g., Cassandra, MongoDB, Google Spanner… ❑ Computer Networking distributed systems involve network communication ❑ Cloud Computing distributed systems for processing large amounts of data - Infrastructure as a Service (IaaS), e.g., AWS EC2, Google Compute Engine. - Platform as a Service (PaaS), e.g., Google App Engine, Microsoft Azure. - Microservices Architecture 10 Why make a system Distributed? ❑ It is inherently distributed e.g. sending a message from your mobile phone to your friend’s phone ❑ For better reliability even if one node fails, the system as a whole keeps functioning ❑ Performance improvement Distributed systems can perform tasks in parallel by leveraging multiple nodes - parallelism / concurrent processing: By distributing nodes across various geographical locations, a distributed system can reduce latency and improve response times for users by serving requests from nodes that are closer to the user’s location - geographical distribution, reduced latency ❑ To solve bigger problems e.g. huge amounts of data, can’t fit on one machine - Handling increased load 11 Why NOT make a system Distributed? ❑ Complexity: Designing, implementing, and managing a distributed system is more complex than centralized systems due to issues such as network communication, data consistency, and fault tolerance. ❑ Latency and Synchronization: Network communication can introduce latency, and maintaining consistency across nodes can be challenging. ❑ Security Risks: Distributing resources increases the attack surface, requiring careful management of security protocols and access controls. 12 Challenges in Distributed Systems ❑ Communication: Reliable communication between nodes is difficult due to network failures, latency, and bandwidth constraints. ❑ Synchronization: Synchronizing clocks, data, and operations across distributed nodes is complex. - Techniques like Lamport timestamps and vector clocks are used to address this. ❑ Consistency: Achieving consistency in the presence of network partitions and concurrent operations is challenging. - The CAP theorem highlights the trade-offs between Consistency, Availability, and Partition tolerance. 13 Challenges in Distributed Systems ❑ Fault Tolerance: Detecting and recovering from node failures is essential for maintaining system reliability. - Techniques like replication, checkpointing, and consensus algorithms (e.g., Paxos, Raft) are used. ❑ Security: Securing a distributed system involves protecting data in transit, ensuring authentication and authorization, and mitigating potential attacks such as Distributed Denial of Service (DDoS). 14 Types of Distributed Systems ❑ Client-Server Systems: In a client-server model, client nodes request services and resources from centralized server nodes. - Web applications, email services, and databases are common examples. ❑ Peer-to-Peer Systems: In a peer-to-peer (P2P) system, all nodes have equal roles, and they both request and provide services. - BitTorrent and blockchain networks are examples of P2P systems. ❑ Distributed Databases: Distributed databases store data across multiple nodes, providing high availability and fault tolerance. - Examples include Google Spanner, Cassandra, MongoDB, and Amazon DynamoDB. 15 Types of Distributed Systems ❑ Distributed File Systems: allow data to be stored and accessed across multiple nodes, ensuring data redundancy and availability. - e.g., Hadoop Distributed File System (HDFS) and Google File System (GFS) ❑ Distributed Computing Systems: distribute computational tasks across multiple nodes, enabling parallel processing and faster execution. e.g., Apache Hadoop, Apache Spark ❑ Real-Time Distributed Systems: require low-latency communication and rapid processing of data. - e.g., high-frequency trading systems and distributed control systems in industrial automation. 16 Q&A The only stupid question is the one you were afraid to ask but never did. -Rich Sutton 17 Computer Networking Dr. Rodrigue Rizk [email protected] Distributed Systems and Computer Networking ❑ We use a simple abstraction of communication 2 Distributed Systems and Computer Networking ❑ We use a simple abstraction of communication ❑ Reality is much more complex: Various network operators: - eduroam, home DSL, cellular data, coffee shop wifi, submarine cable, satellite... Physical communication: - electric current, radio waves, laser, hard drives in a van... 3 Hard drives in a van?! https://docs.aws.amazon.com/snowball/ High latency, high bandwidth! 4 Latency ❑ Latency refers to the time delay from the moment a message is sent until it is received. It is typically measured in milliseconds (ms). ❑ Examples: In the same building/datacenter ≈ 1 ms One continent to another ≈ 100 ms Hard drives in a van ≈ 1 day 5 Bandwidth ❑ Bandwidth refers to the maximum rate at which data can be transmitted over a network link. It is typically measured in bits per second (bps), kilobits per second (Kbps), megabits per second (Mbps), or gigabits per second (Gbps). ❑ It refers to the maximum capacity of a network link or channel to transmit data. It is the theoretical limit of how much data can be sent over a network path in a given period. It is a measure of the potential maximum speed of the link, not necessarily what is achieved in practice. 6 Bandwidth ❑ Examples: 3G cellular data ≈ 1 Mbit/s Home broadband ≈ 10 Mbit/s Hard drives in a van ≈ 50 TB/box 7 Throughput ❑ Throughput refers to the actual rate of data transfer achieved over the network, which may be less than the maximum theoretical bandwidth due to various factors like network congestion and protocol overhead. It measures the real-world performance of the network Throughput is also measured in bits per second (bps), kilobits per second (Kbps), megabits per second (Mbps), or gigabits per second (Gbps), but reflects the actual data rate experienced by users. ❑ Practical Example: If a network has a bandwidth of 1 Gbps but due to high traffic and overheads, the actual throughput might be only 800 Mbps. 8 Client-server example: the web * Time flows from top to bottom. 9 Client-server example: the web * Time flows from top to bottom. 10 Client-server example: the web * Time flows from top to bottom. 11 Q&A The only stupid question is the one you were afraid to ask but never did. -Rich Sutton 12 Remote Procedure Call Dr. Rodrigue Rizk [email protected] Remote Procedure Call ❑ Our first model for communication in distributed systems is the Remote Procedure Call (RPC). major intellectual breakthrough in distributed computing, with the goal of making the programming of distributed systems look similar, if not identical, to conventional programming – that is, achieving a high level of distribution transparency. ❑ RPC is a protocol that allows a program to execute procedures or functions on a remote server as if they were local to the client machine. ❑ This abstraction simplifies distributed computing by making remote interactions appear like local function calls. The underlying RPC system hides important aspects of distribution, including the encoding and decoding of parameters and results, the passing of messages and the preserving of the required semantics for the procedure call. 2 Remote Procedure Call ❑ RPC was first introduced in a paper by Andrew Birrell and Bruce Nelson in 1984. Their work, titled “Implementing Remote Procedure Calls,” described a mechanism for executing procedures on a remote server as if they were local. https://dl.acm.org/doi/pdf/10.1145/2080.357392 This was a significant contribution to the field of distributed computing, simplifying the process of making networked applications by abstracting the complexity of communication between different systems. This concept paved the way for many of the developments in distributed systems programming used today. 3 Remote Procedure Call Implementing Remote Procedure Calls, Birrell & Nelson, 1984 The components of the system, and their interactions for a simple call. 4 Remote Procedure Call RPC enables a client to invoke a procedure or method on a remote server over a network. The client interacts with the remote server as if it were calling a local function. 5 RPC Purpose ❑ Abstraction: Hides the complexity of network communication and remote interactions from the developer. Ideally, RPC makes a call to a remote function look the same as a local function call. “Location transparency” - system hides where a resource is located ❑ Distributed Computing: Facilitates communication between distributed components in a networked environment. 6 RPC Components ❑ Client: Initiates the RPC request. ❑ Server: Receives and processes the request. ❑ Stubs: Proxy functions on both the client and server sides. Client Stub: - Acts as a proxy for the remote procedure on the client side, handling request marshalling and response unmarshalling. Server Stub: - Acts as a proxy for the remote procedure on the server side, handling request unmarshalling and response marshalling ❑ RPC Runtime: Handles the communication between client and server, including retransmissions, error handling, and security. 7 Example ❑ The steps involved in calling a remote procedure doit(a,b) † The return path for the result is not shown. ❑ doit(a,b): two-parameter procedure ❑ parameter a is of type type1, and b of type type2 8 How RPC Works - Procedure Call Flow Client Stub - A piece of code on the client-side that represents the remote procedure. - It provides the same interface as the remote procedure. - The client application calls the local stub instead of the actual remote procedure. Marshalling: - The process of packaging the procedure parameters into a message format suitable for transmission over the network. - The client stub marshals the procedure arguments into a request message. Network Transmission: - The request message is sent over the network to the server. - The message is transmitted using underlying network protocols (e.g., TCP/IP). 9 How RPC Works - Procedure Call Flow Server Stub: - A piece of code on the server-side that receives the request message and unpacks the parameters. - The server stub unmarshals the request message and invokes the actual remote procedure. Procedure Execution: - The server executes the requested procedure with the provided parameters. - The server-side procedure performs its task and prepares a response. Response: - The server stub marshals the results of the procedure into a response message. - The response message is sent back to the client. Client Stub: - The client stub unmarshals the response message and returns the result to the client application. - The client application receives the result as if it were a local function return value. 10 Client-server example: online payments 11 Client-server example: online payments 12 Client-server example: online payments 13 Client-server example: online payments 14 Client-server example: online payments 15 Client-server example: online payments 16 Client-server example: online payments 17 Client-server example: online payments 18 Client-server example: online payments 19 Client-server example: online payments 20 RPC in enterprise systems ❑ Service-oriented architecture (SOA) / microservices: splitting a large software application into multiple services (on multiple nodes) that communicate via RPC. ❑ Different services implemented in different languages: interoperability: datatype conversions Interface Definition Language (IDL) - language-independent API specification 21 Interface Definition Language ❑ An Interface Definition Language (IDL) is a specification language used to define the interface between client and server RPC systems. ❑ It allows different programming languages and systems to communicate by defining a common interface that both client and server adhere to. ❑ IDLs are especially useful in distributed systems and RPC frameworks like Google RPC (gRPC), Common Object Request Broker Architecture (CORBA), and Apache Thrift. 22 Key Features of IDL ❑ Language Agnostic: IDL allows different languages to communicate by defining a common interface. Each language can generate code (called "stubs") from the same IDL file. ❑ Protocol Agnostic: The IDL doesn’t specify how communication occurs, only the structure of messages and services. The underlying system (e.g., gRPC or CORBA) determines the transport protocol. ❑ Service and Data Type Definition: IDL specifies the operations (methods) that can be called remotely and the data types (messages) exchanged. 23 Why Use IDL? ❑ Cross-language Communication: In a distributed system, services may be written in different programming languages. IDL provides a way for these services to communicate without worrying about language differences. ❑ Data Serialization: IDL specifies how data is structured and serialized across the network (e.g., using Protocol Buffers in gRPC). 24 gRPC ❑ gRPC is a high-performance RPC framework developed by Google in 2015. ❑ It allows communication between services in a distributed system by letting you invoke methods on remote servers as if they were local calls. 25 Key Features of gRPC ❑ Protocol Buffers: gRPC uses Protocol Buffers (protobuf) as the Interface Definition Language (IDL) for defining service methods and messages. This allows efficient serialization and deserialization of data. ❑ HTTP/2: gRPC uses HTTP/2 as its transport protocol, which enables features like multiplexing, flow control, and efficient binary framing. ❑ Language Agnostic: gRPC supports many programming languages, including Python, Go, Java, C++, etc. ❑ Streaming Support: gRPC supports various communication patterns such as unary (single request-response), client-side streaming, server-side streaming, and bidirectional streaming. 26 gRPC IDL example 27 Q&A The only stupid question is the one you were afraid to ask but never did. -Rich Sutton 28 MapReduce Dr. Rodrigue Rizk [email protected] MapReduce ❑ MapReduce is a programming model and processing technique used for large-scale data processing, particularly in distributed systems. ❑ It was originally developed by Google and popularized by the Hadoop framework. ❑ MapReduce enables the processing of vast amounts of data across many machines (nodes) in a distributed environment, while abstracting away the complexities of parallelization, fault-tolerance, data distribution, and load balancing. 2 MapReduce ❑ https://static.googleusercontent.com/media/research.googl e.com/en//archive/mapreduce-osdi04.pdf 3 Key Components of MapReduce ❑ Map The Map function processes input data and converts it into intermediate key-value pairs. ❑ Reduce The Reduce function aggregates or summarizes the intermediate results (key-value pairs) produced by the map function. 4 How MapReduce Works ❑ Input Splitting The input data is split into chunks (input splits) which are processed independently. ❑ Mapping The map function is applied to each input split, producing a list of key-value pairs as output. ❑ Shuffling The system automatically groups the intermediate key-value pairs by key and distributes them to the appropriate reducers. ❑ Reducing The reduce function is applied to all values corresponding to the same key, producing a final output. ❑ Output The final output is stored in a distributed file system (e.g., 5 GFS or HDFS in Hadoop). Example of MapReduce: Word Count ❑ Problem: Word Count Imagine you have a large set of text documents, and you want to count the occurrence of each word across all documents. ❑ Problem Statement: Given a collection of text documents, count the occurrence of each word. 6 Example of MapReduce: Word Count ❑ Input: A large text file, say a book or a collection of documents. ❑ Map Phase: Input: A single line or block of text. Operation: For each word in the block, the map function emits the word as the key and the number 1 as the value. Output: Key-value pairs like ("word", 1), ("anotherword", 1). ❑ Shuffling: Intermediate key-value pairs are grouped by key - (i.e., all occurrences of the same word are brought together). 7 Example of MapReduce: Word Count ❑ Reduce Phase: Input: Each key and its list of values (counts). Operation: Sum up the values for each key (word). Output: A single key-value pair per word with the word count. ❑ Output: The final result is written back to the distributed file system, with each word and its count. 8 Example of MapReduce: Word Count ❑ Input Data: Let’s assume we have the following lines of text as our input data: Hello world Hello Hadoop Hadoop is great Hello world ❑ Map Phase (Mapper Function) In the Map phase, the input text is split into chunks (lines or blocks), and each chunk is processed independently by the mapper. For each line of input, the mapper emits key-value pairs where the key is the word, and the value is 1 (indicating that the word was seen once). Mapper Input: The mapper takes each line of the input text as input. Mapper Output: For each word in the line, the mapper emits a key-value pair of the word and 1. 9 Example of MapReduce: Word Count ❑ Example of Mapper Function: For the line: Hello world The mapper output will be: ("Hello", 1) ("world", 1) For the line: Hadoop is great The mapper output will be: ("Hadoop", 1) ("is", 1) ("great", 1) 10 Example of MapReduce: Word Count Mapper output for all lines: ("Hello", 1), ("world", 1) ("Hello", 1), ("Hadoop", 1) ("Hadoop", 1), ("is", 1), ("great", 1) ("Hello", 1), ("world", 1) 11 Example of MapReduce: Word Count ❑ Shuffle and Sort After the map phase, the system groups all the intermediate key-value pairs by the key (the word). This is called the shuffle and sort phase. The result is that all values associated with the same word are collected together. Grouped Mapper Output (after shuffle and sort): ("Hello", [1, 1, 1]) ("world", [1, 1]) ("Hadoop", [1, 1]) ("is", ) ("great", ) 12 Example of MapReduce: Word Count ❑ Reduce Phase (Reducer Function): In the Reduce phase, the reducer processes each key (word) and the list of values (counts). It sums up the values for each word to get the total count of each word. Reducer Input: The reducer takes the word and the list of counts as input. Reducer Output:The reducer sums the values for each word and outputs the word and its total count. 13 Example of MapReduce: Word Count ❑ Example of Reducer Function: For the key "Hello" and the list [1, 1, 1] - the reducer sums the values: ("Hello", 3) For the key "world" and the list [1, 1], - the reducer sums the values: ("world", 2) Final Reducer Output (word counts): ("Hello", 3) ("world", 2) ("Hadoop", 2) ("is", 1) ("great", 1) 14 Example of MapReduce: Word Count ❑ Final Output: The final output of the MapReduce job is a list of words and their corresponding counts: Hello 3 world 2 Hadoop 2 is 1 great 1 15 Example of MapReduce: Maximum Temperature ❑ Problem: Find the maximum temperature recorded for each year from a dataset of weather station records ❑ Input: A dataset where each line represents a weather record with fields: Date (YYYY-MM-DD) Temperature (in Celsius) ❑ Sample data: 2021-01-01, -5 2021-01-02, 3 2021-12-31, 2 2022-01-01, -3 2022-05-10, 15 2022-08-25, 30 16 Example of MapReduce: Maximum Temperature ❑ Map Phase: Each mapper processes a subset of the records and extracts the year and temperature, emitting key-value pairs where the key is the year, and the value is the temperature. Map output: - For 2021-01-01, -5 → Emit (2021, -5) - For 2021-01-02, 3 → Emit (2021, 3) - For 2021-12-31, 2 → Emit (2021, 2) - For 2022-01-01, -3 → Emit (2022, -3) - For 2022-05-10, 15 → Emit (2022, 15) - For 2022-08-25, 30 → Emit (2022, 30) 17 Example of MapReduce: Maximum Temperature ❑ Shuffle and Sort: All the key-value pairs with the same key (year) are shuffled and grouped together. Grouped Output: - (2021, [-5, 3, 2]) - (2022, [-3, 15, 30]) 18 Example of MapReduce: Maximum Temperature ❑ Reduce Phase: Each reducer takes the grouped key- value pairs and processes them. ❑ In this case, the reducer calculates the maximum temperature for each year. Reduce Output: - For 2021 → max(-5, 3, 2) = 3 → Emit (2021, 3) - For 2022 → max(-3, 15, 30) = 30 → Emit (2022, 30) ❑ Final Output: 2021: 3°C 2022: 30°C 19 Benefits of MapReduce ❑ Scalability MapReduce can process massive amounts of data (petabytes or exabytes) distributed across thousands of nodes. ❑ Fault Tolerance: If a node fails, the MapReduce system can rerun the tasks on another node using replicated data. ❑ Abstraction of Complexity: It abstracts the details of parallel execution, fault-tolerance, and data distribution from the programmer. ❑ Data Locality: By moving computation close to where the data is stored, it reduces the overhead of moving large datasets across networks. 20 Limitations of MapReduce ❑ Latency: MapReduce is optimized for batch processing but not for low- latency, real-time processing. ❑ Fixed Data Flow: The rigid two-phase approach (Map and Reduce) makes it less flexible for certain types of computations. ❑ Iteration and Chaining: It’s inefficient for iterative algorithms like machine learning, where multiple MapReduce jobs need to be chained together. 21 Modern Alternatives to MapReduce ❑ While MapReduce is powerful for batch processing, modern big data frameworks provide more flexible and efficient alternatives: Apache Spark: Provides an in-memory processing engine, improving performance over MapReduce, especially for iterative algorithms. Apache Flink: Designed for both stream and batch processing with low latency. Google Dataflow: A cloud-based model that generalizes MapReduce with advanced features. 22 Q&A The only stupid question is the one you were afraid to ask but never did. -Rich Sutton 23 Distributed Systems Models Dr. Rodrigue Rizk [email protected] Models of Distributed Systems ❑ In distributed systems, models help provide a structured way to understand and analyze the complex interactions, properties, and behaviors of such systems. ❑ These models serve as a conceptual framework to describe how distributed systems function and how they can handle challenges such as communication, synchronization, and failures. ❑ These models describe how components can fail and define the system's behavior in the presence of failures. They help identify failure scenarios and design distributed systems that can tolerate or recover from such failures. 2 Models of Distributed Systems ❑ In distributed systems, there are several fundamental problems that arise due to the nature of decentralized control, communication over unreliable networks, and the possibility of failures. ❑ Two widely recognized problems in distributed systems are: The Two Generals Problem The Byzantine Generals Problem 3 The Two General Problems ❑ The Two Generals Problem is a classic problem that illustrates the challenges of achieving coordination and reliable communication between two parties over an unreliable network. ❑ The problem is considered unsolvable in its most general form due to the impossibility of guaranteeing message delivery and achieving consensus. 4 The Two General Problems 5 The Two General Problems 6 The Two General Problems 7 The Two General Problems 8 The Two General Problems 9 How should the generals decide? ❑ Option 1: General 1 always attacks, even if no response is received? Send lots of messengers to increase probability that one will get through If all are captured, general 2 does not know about the attack, so general 1 loses ❑ Option 2: General 1 only attacks if positive response from general 2 is received? Now general 1 is safe But general 2 knows that general 1 will only attack if general 2’s response gets through Now general 2 is in the same situation as general 1 in option 1 ❑ No common knowledge: the only way of knowing something is to communicate it. 10 Summary: The Two General Problems ❑ The Two Generals Problem involves two generals (or parties) trying to coordinate an attack. ❑ They communicate by sending messages through an unreliable channel, where a message might get lost. ❑ The main goal is for both generals to reach an agreement on whether to attack or retreat, but since message delivery cannot be guaranteed, there is no way for either general to be absolutely certain that the other received the message. ❑ This problem demonstrates that reliable communication and agreement are impossible over an unreliable channel. 11 The two generals problem applied 12 The two generals problem applied 13 Byzantine Generals Problem ❑ The Byzantine Generals Problem is a fundamental issue in distributed systems that deals with achieving consensus (agreement) in the presence of faulty or malicious components (often referred to as Byzantine faults). ❑ It was first introduced by Leslie Lamport, Robert Shostak, and Marshall Pease in 1982 as a way to model systems where some of the participants may behave incorrectly or even maliciously, and yet the system must still reach agreement on a common course of action. https://lamport.azurewebsites.net/pubs/byz.pdf 14 The Byzantine empire (650 CE) ❑ “Byzantine” has long been used for “excessively complicated, bureaucratic, devious” e.g. “the Byzantine tax law” 15 Byzantine Generals Problem – Problem Overview ❑ The Byzantine Generals Problem involves a group of generals, each commanding their own army, who must decide on a common strategy, such as whether to attack or retreat. ❑ However, some of the generals may be traitors (Byzantine generals) who are actively working against the group by sending conflicting or false messages to others. ❑ The challenge is for the loyal generals to reach a consensus despite the presence of these traitors. 16 The Byzantine generals problem 17 Generals who might lie 18 Generals who might lie 19 The Byzantine generals problem ❑ Each general is either malicious or honest ❑ Up to f generals might be malicious ❑ Honest generals don’t know who the malicious ones are ❑ The malicious generals may cooperate ❑ Nevertheless, honest generals must agree on plan 20 The Byzantine generals problem ❑ Theorem: need 3f + 1 generals in total to tolerate f malicious generals i.e. < 1/3 may be malicious ❑ Cryptography (digital signatures) helps – but problem remains hard 21 Trust relationships and malicious behaviour 22 Relation ❑ The Two Generals Problem is a special case of the Byzantine Generals Problem. In the Two Generals Problem, the communication channel is unreliable, but both generals are assumed to be honest and non- faulty. - The only issue is message loss. In the Byzantine Generals Problem, communication is reliable, but some generals can be malicious or faulty, sending conflicting or incorrect information to prevent agreement. 23 Relation ❑ While the Two Generals Problem highlights the impossibility of coordination in the presence of unreliable communication, the Byzantine Generals Problem deals with coordination in the presence of faulty or untrustworthy nodes. ❑ The Byzantine Generals Problem is more general because it introduces Byzantine faults—a form of arbitrary failures—and addresses the issue of reaching consensus even when some nodes act maliciously. consensus—both generals must agree on a common action (attack or retreat). 24 System models ❑ We have seen two experiments: Two generals problem: a model of networks Byzantine generals problem: a model of node behavior ❑ In real systems, both nodes and networks may be faulty! ❑ Capture assumptions in a system model consisting of: Network behavior (e.g. message loss) Node behavior (e.g. crashes) Timing behavior (e.g. latency) ❑ Choice of models for each of these parts. 25 System model: Network Behavior ❑ Assume bidirectional point-to-point communication between two nodes, with one of: Reliable (perfect) links: A message is received if and only if it is sent. - Messages may be reordered. Fair-loss links: Messages may be lost, duplicated, or reordered. - If you keep retrying, a message eventually gets through. Arbitrary links (active adversary): A malicious adversary may interfere with messages (eavesdrop, modify, drop, spoof, replay). 26 System model: Network Behavior – Network Partition ❑ Network Partition is a condition in distributed systems where a network is split into multiple disjoint sub-networks (or "partitions") that cannot communicate with each other. ❑ This occurs when the communication links between parts of the network fail, effectively isolating some nodes from others. ❑ In a network partition scenario, some nodes are unable to exchange messages with other nodes because they are separated into distinct groups that cannot reach each other. This can lead to issues like inconsistent data, loss of availability, and challenges in achieving consensus. 27 How Network Partition Happens? ❑ A network partition can occur due to: Hardware failures (e.g., a broken switch or router). Software bugs or configuration errors in network devices. Natural disasters or physical disruptions that disconnect parts of the infrastructure. Congestion or overload on a network link that causes timeouts and packet loss. 28 Summary: Network Partition ❑ Network partition is a failure mode in distributed systems where nodes become isolated into different groups that cannot communicate. ❑ This can result in challenges related to data consistency, availability, and consensus. 29 System model: Node Behavior ❑ Each node executes a specified algorithm, assuming one of the following: Crash-stop (fail-stop): A node is faulty if it crashes (at any moment). - After crashing, it stops executing forever. Crash-recovery (fail-recovery): A node may crash at any moment, losing its in-memory state. - It may resume executing sometime later. - Data stored on disk survives the crash. Byzantine (fail-arbitrary): A node is faulty if it deviates from the algorithm. Faulty nodes may do anything, including crashing or malicious behavior. ❑ A node that is not faulty is called “correct” 30 System model: synchrony (timing) assumptions ❑ Assume one of the following for network and nodes: Synchronous: Message latency no greater than a known upper bound. - Nodes execute the algorithm at a known speed. Partially synchronous: The system is asynchronous for some finite (but unknown) periods of time, synchronous otherwise. Asynchronous: Messages can be delayed arbitrarily. - Nodes can pause execution arbitrarily. - No timing guarantees at all. 31 Violations of synchrony in practice ❑ Networks usually have quite predictable latency, which can occasionally increase: Message loss requiring retry Congestion/contention causing queueing Network/route reconfiguration ❑ Nodes usually execute code at a predictable speed, with occasional pauses: Operating system scheduling issues Page faults… 32 System models summary ❑ For each of the three parts, pick one: Network: reliable, fair-loss, or arbitrary Nodes: crash-stop, crash-recovery, or Byzantine Timing: synchronous, partially synchronous, or asynchronous ❑ This is the basis for any distributed algorithm. ❑ If your assumptions are wrong, all bets are off! 33 Availability ❑ Online shop wants to sell stuff 24/7! ❑ Service unavailability = downtime = losing money ❑ Availability = uptime = fraction of time that a service is functioning correctly “Two nines” = 99% up = down 3.7 days/year “Three nines” = 99.9% up = down 8.8 hours/year “Four nines” = 99.99% up = down 53 minutes/year “Five nines” = 99.999% up = down 5.3 minutes/year 34 Availability ❑ Service-Level Objective (SLO): e.g. “99.9% of requests in a day get a response in 200 ms” ❑ Service-Level Agreement (SLA): contract specifying some SLO, penalties for violation 35 Achieving high availability: fault tolerance ❑ Failure: system as a whole isn’t working ❑ Fault: some part of the system isn’t working ❑ Node fault: crash (crash-stop/crash-recovery), deviating from algorithm (Byzantine) ❑ Network fault: dropping or significantly delaying messages ❑ Fault tolerance: system as a whole continues working, despite faults (up to some maximum number of faults) ❑ Single point of failure (SPOF): node/network link whose fault leads to failure 36 Failure detectors ❑ Failure detector: algorithm that detects whether another node is faulty ❑ Perfect failure detector: labels a node as faulty if and only if it has crashed ❑ Typical implementation for crash-stop/crash-recovery: send message, await response, label node as crashed if no reply within some timeout ❑ Problem: cannot tell the difference between crashed node, temporarily unresponsive node, lost message, and delayed message 37 Failure detection in partially synchronous systems ❑ Perfect timeout-based failure detector exists only in a synchronous crash-stop system with reliable links. ❑ Eventually perfect failure detector: May temporarily label a node as crashed, even though it is correct May temporarily label a node as correct, even though it has crashed But eventually, labels a node as crashed if and only if it has crashed ❑ Reflects the fact that detection is not instantaneous, and we may have spurious timeouts 38 Q&A The only stupid question is the one you were afraid to ask but never did. -Rich Sutton 39

1- Introduction_merged.pdf

Document Details

Related

Full Transcript