FSS-DS-E02-answers.pdf
Document Details
Uploaded by StatelyAgate7771
2023
Tags
Full Transcript
This is background and exercise material, only the teaching slides’ content and the book's chapters refer to the valid class information. Distributed Systems Exercise 02 Dr. Bruno Rodrigues, Prof. Dr. Burkhard Stiller Department of Informatics IfI, Communication Systems Group CSG, University of Zur...
This is background and exercise material, only the teaching slides’ content and the book's chapters refer to the valid class information. Distributed Systems Exercise 02 Dr. Bruno Rodrigues, Prof. Dr. Burkhard Stiller Department of Informatics IfI, Communication Systems Group CSG, University of Zurich UZH [rodrigues|stiller]@ifi.uzh.ch © 2023 UZH, CSG@IfI 1 Outline Summary Lecture 04 Review E02 Release E03 © 2023 UZH, CSG@IfI 2 Content DS Tour D‘Horizon on Distributed Systems and Encoding in Distributed Systems – Language-specific Formats – JSON, XML, Binary, ASN.1 – IPFS Unreliable Communication Systems – Faults – Unreliable Networks and Clocks – Knowledge, Truth, Lies Consistency and Consensus in Distributed Systems – Guarantees – Linearizability – Ordering © 2023 UZH, CSG@IfI 3 Summary DS Part 2 Unreliable Communication Systems © 2023 UZH, CSG@IfI 4 Faults and Partial Failures Individual computer with “good” software is usually either fully functional or entirely broken, but not something in-between – Deterministic behavior – In case of internal faults, system “crashes” instead of returning wrong results However… Distributed Systems are not always predictable – Buggy software and hardware issues • Logic errors, syntax, flaky performance, security defects • Memory or hard-drive corruption, loose connection – Non-deterministic behavior • Failure of individual nodes (i.e., partial failure) involved in a computation © 2023 UZH, CSG@IfI 5 Clouds vs. Supercomputing Categories Cloud Supercomputing Application Variety of services (IaaS, PaaS, SaaS) running on virtual machines Dedicated services, running on bare metal Hardware Commodity machines supporting a variety of virtualization technologies Specialized machines built to run dedicated applications on dedicated topologies Topology Datacenter tree networks such as fat tree, star-bus Specialized topologies such as multi-dimensional meshes and torus topologies Geographical distribution Geographically distributed Single location Fault-tolerance Relatively less reliable Relatively more reliable servers, software (services), servers, software (service), and network and network Nodes’ distribution Relatively higher © 2023 UZH, CSG@IfI 6 Relatively lower Discrete-event Dynamic Systems (DEDS) DEDS model the transition of states in complex Distributed Systems, relying on asynchronous discrete events – Models abstract away from reality, how abstract to be? Examples – Petri Nets (example in following slides) • Stochastic (timed) Petri Nets – A probabilistic delay is used to trigger transitions • Coulored Petri Nets – Tokens with a color allowing for their differentiations – Markov Chain: probabilities of sequences of random variables • Hidden Markov Models (HMM): useful when needed to compute a probability for a sequence of hidden events (not directly observable) – Other examples: Automata theory, queueing theory © 2023 UZH, CSG@IfI 7 Summary of Petri Nets Petri Nets are a simple and graphical way to represent Distributed Systems Advantages Example of a complex net: SBB Power supply mapped as a Stochastic Petri Net – Simple design – easy to learn – Graphical illustration – Lots of available tools Drawbacks – Nets grow very fast – No guidelines for correct representation of a net – Large nets are not straightforward to understand Special forms allow for more complex representations – Stochastic – Colored © 2023 UZH, CSG@IfI Used to map available power in the northern part of Switzerland after a Blackout in June 2005 due to a linebreak in Steinen 8 Synchronization with a Time Server A time server has very accurate time – E.g., atomic clock or GPS receiver How can a client synchronize with a time server? – Problem: messages do not travel instantly Cristian’s algorithm – Estimate the transmission delay to the server: ((T4-T1) – (T3-T2)) / 2 Used in the Network Time Protocol (NTP) – Cristian’s algorithm is run multiple times • Outlier values are ignored to rule out packets delayed due to congestion or longer paths GPS: Global Positioning System © 2023 UZH, CSG@IfI 9 Clocks: Comparison of Approaches Approach Description Pros Cons Physical clocks Keeps time of the day - Simple implementation Synchrinization via known protocols, e.g., NTP (Network Time Protocol) - Provide notion of occurrence and causality Relatively easy to be implemented - - ‘‘Happened-before defines causality’’ - Logical clocks Keeps track of event ordering - Lamport (type of logical clock) Timestamp updates with logical clocks - Easy to implement and maintain (only an integer) Efficient usage of memory Vector clock (type of logical clock) Generating a partial ordering of events in a distributed system Captures causality Allows to compare events across processes - © 2023 UZH, CSG@IfI 10 Hard to keep synchronized May not be present in a distributed system Full notion of causality requires (1) larger data structures and (2) higher exchange of lenghty messages to synchronize Difficult to scale since it keeps the clock of all processes Often the number not processes is not known Mutual Exclusion Problem (1) Application-level protocol – Enter “Critical Section (CS)” – Use resource exclusively – Exit Critical Section PID 7323 PID 1783 PID 3234 Requirements – Safety • At most one process may execute in CS at once – Liveness • Requests to enter and exit the critical section should eventually succeed (no deadlocks or livelocks should occur, and fairness should be enforced) – Ordering • Requests are handled in the order of appearance © 2023 UZH, CSG@IfI 11 Mutual Exclusion Problem (2) Evaluation criteria – Bandwidth (number of messages) – Client waiting time to enter CS – Vulnerabilities Alternative approaches to solve the CS problem via mutual-exclusion – Centralized Approach – Distributed Approach – Token-Ring Approach © 2023 UZH, CSG@IfI Approach Number of Messages per entry/exit Waiting time to enter CS (best case in units) Issues Centralized 3 2 Crash of coordinator Distributed 2*(n-1) 2*(n-1) Crash of any node Token ring 1 to infinite 0 to n-1 Lost token, crash of any node 12 CS: Critical Section E02 Review © 2023 UZH, CSG@IfI 13 Review E02 (1/10) Cloud computing: – Designed to run different types of services (IaaS, PaaS, SaaS) and relatively (relative to HPC) low processing applications. – Generic hardware – Multiple datacenters can belong to a single cloud – Relatively less reliable • Depends on how a user deploy its services High-performance computing (HPC): – – – – Designed to run specific services requiring high processing capacity. Specialized hardware Typically deployed in a single location Running application is relatively more reliable wrt. its software, data, network. © 2023 UZH, CSG@IfI 14 Review E02 (2/10) F T T F F © 2023 UZH, CSG@IfI 15 Review E02 (3/10) © 2023 UZH, CSG@IfI 16 Review E02 (4/10) Communication Conflict Synchronization Concurrency © 2023 UZH, CSG@IfI 17 Review E02 (5/10) P = {P1, P2, P3, P4, P5} T = {T1, T2, T3, T4} E = {(P1,T1), (T1,P2), (P2,T2), (T2,P5), (T2,P1), (P5,T3), (P3,T3), (T3, P4), (P4,T4), (T4,P3)} © 2023 UZH, CSG@IfI 18 Review E02 (6/10) The definition is correct T3 P1 T1 P2 T2 P3 P4 T4 P5 © 2023 UZH, CSG@IfI T5 19 Review E02 (7/10) X Correct answer, i.e., the only incorrect alternative © 2023 UZH, CSG@IfI 20 Review E02 (8/10) (2,1,0) (2,2,0) Message m1 is sent, increase clock «p2» c d, Message is sent within the same process (same as in p1 from «a» to «b») Assumptions (cf. Slide DS part 2, #30) – No packets are lost – Clocks are increased only when sending a new message – A packet is delivered when only the sender’s logical clock is increased © 2023 UZH, CSG@IfI 21 Message m2 is sent, increase clock «p3» (2,2,2) (2,1,0), (2,2,0), (2,2,2) Alternative d is correct Review E02 (9/10) Tclient = Tserver + (T1 – T0)/2 Tclient = 10:54:28.342 + 0.020 / 2 Tclient = 10:54:28.342 + 0.010 (i.e., 10ms) Tclient = 10:54:28.352 Accuracy = (Current RTT/2) – minimal RTT 1. The client selects the minimum RTT: 20 ms = 0.02 s. = 20ms/2 – 0 ms 2. Then, the client estimates its current time to be: 10:54:28.342 + 0.02/2 = 10 ms 3. The accuracy is +/- 10 ms (half of the RTT of 20 ms). 4. If the transfer time is known to be 8 ms (i.e., RTT is 0.008 s), the setting remains the same but the accuracy improves to +/- 2 ms. Accuracy = (Current RTT/2) – minimal RTT = 20ms/2 – 8 ms = 10 – 8 = 2 ms © 2023 UZH, CSG@IfI 22 Source: George Couloris “Distributed Systems, p. 602 Review E02 (10/10) It is distributed X Token-ring could cause a similar issue (node busy or crashed). However, it has no coordinator X © 2023 UZH, CSG@IfI 23 Questions? © 2023 UZH, CSG@IfI 24