Distributed Systems Lecture Notes PDF

Outline Introduction Defining Distributed Systems Characteristics of Distributed Systems Examples of Distributed Systems Challenges and Problems Summary 10/10/2024 Distributed Systems 2 This Course Week Topic Hours 1 Introduction of models of distributed systems 2 2 Basic principles and models 2 3-4 Communication and Operations 4 5 Names of distributed systems 2 6 Synchronization of distributed systems 2 7-8 Compatibility of distributed systems 4 9 Midterm Exam 10 Replication of distributed systems 2 11 Fault detection 2 12 Explaining different models including object systems 2 13 Distributed file systems 2 14 Document systems 2 15 Discussion + Oral Exam 16 Final Exam 10/10/2024 Distributed Systems 3 Assessment Strategy Assessment Method Degrees Attendance 2 Assignments 2 Quiz 1 6 40 Quiz 2 6 Midterm 24 Final Exam 60 60 Total 100 10/10/2024 Distributed Systems 4 Introduction As soon as computers are interconnected and communicating we have a "distributed system“ This approach of interconnectivity has been known by several names: Multi-computers. Parallel computers. Cluster. Computational grid. Cloud. 10/10/2024 Distributed Systems 5 Goals and Motivation The Need For High Performance Computers Solving technology problems using computer modeling, simulation and analysis Mechanical Design Geographic Life Sciences Aerospace & Analysis Information (CAD/CAM) Systems 10/10/2024 Distributed Systems 6 Multiprocessor vs. Mainframe Economic, Speed, Distribution, Reliability, Growth, Software, Networking, Security 10/10/2024 Distributed Systems 7 What is a Distributed System? A collection of independent computers (H/W) that appear to the users of the system as a single computer (S/W) [Tanenbaum] task (job) S/W DS (Pool of processors) S/W maps the job to different computers (processors). So that the user feels that the system is a single computer (transparency). 10/10/2024 Distributed Systems 8 What is a Distributed System? A distributed system is several computers doing something together. Thus, a distributed system has three primary characteristics: multiple computers, interconnections, and shared state. [Michael Schroeder] 10/10/2024 Distributed Systems 9 What is a Distributed System? A system in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal. 10/10/2024 Distributed Systems 10 What is a Distributed System? Independent components or elements connected by a network communicate by passing messages to achieve a common goal, appearing as a single coherent system. 10/10/2024 Distributed Systems 11 Examples of DS World Wide Web A cluster of nodes on the cloud (AWS, Azure, GCP) Multi-player games BitTorrent Online banking 10/10/2024 Distributed Systems 12 Why DS? Nature of the Multiplayer games, P2P file sharing, application client requesting a service. Availability despite A service shouldn’t fail when one unreliable components computer does. More CPU cycles, more memory, more Scale up capacity storage, etc. Customize computers for specific tasks E.g. for storage, email, backup. 10/10/2024 Distributed Systems 13 Distributed Systems levels Centralized A mainframe and dumb terminals All of the computation is done on the mainframe. Each line or keystroke is sent from the terminal to the mainframe. 10/10/2024 Distributed Systems 15 Moving Towards Distribution In a client-server system, the clients are workstations or computers in their own right and perform computations and formatting of the data. However, the data and the application which manipulates it ultimately resides on the server. 10/10/2024 Distributed Systems 16 More Decentralization In Distributed-with-Coordinator, the nodes or sites depend on a coordinator node with extra knowledge or processing abilities Coordinator might be used only in case of failures or other problems 10/10/2024 Distributed Systems 17 True Decentralization A true Distributed system has no distinguished node which acts as a coordinator and all nodes or sites are equals. The nodes may choose to elect one of their own to act as a temporary coordinator or leader 10/10/2024 Distributed Systems 18 Types of Distributed Systems Client-Server A centralized server provides resources or services to multiple clients. Model Web servers serving multiple users. Peer-to-Peer Each node can act as both a client and a server, sharing resources directly with one another. (P2P) Model File-sharing networks like BitTorrent. Multi-tier A system organized into layers, where each layer offers specific services. Architecture Three-tier architecture 10/10/2024 Distributed Systems 19 Types of Distributed Systems Distributed resources are provided Cloud as services over the internet (IaaS, PaaS, SaaS). Computing Google Cloud, AWS. A distributed system that connects Grid heterogeneous computing resources to work on complex problems. Computing SETI@home project. 10/10/2024 Distributed Systems 20 Flynn’s Taxonomy of computer architectures Flynn’s Taxonomy SISD Single Instruction/Single Data SIMD Single Instruction/Multiple Data MISD Multiple Instruction/Single Data MIMD Multiple Instruction/Multiple Data 10/10/2024 Distributed Systems 22 Single Instruction/Single Data Your desktop, before the spread of dual core CPUs PU – Processing Unit 10/10/2024 Distributed Systems 23 SIMD Processors that execute same instruction on multiple pieces of data NVIDIA GPUs 10/10/2024 Distributed Systems 24 SIMD Each core runs the same set of instructions on different data Example: NVIDIA: processes pixels of an image in parallel. Array Processor. 10/10/2024 Distributed Systems 25 MISD Example: Pipeline Architecture 10/10/2024 Distributed Systems 26 MIMD 10/10/2024 Distributed Systems 27 MIMD Shared Memory Distributed Memory Processors are all Each processor has its connected to a own individual memory location. "globally available" Each processor has no memory direct knowledge about Via either software or other processor's hardware means. memory. 10/10/2024 Distributed Systems 28 DS Design Goals Typical Design Goals can the system handle a large variety of types of Heterogeneity machines and devices? is the system resilient to host crashes and failures, Robustness and to the network dropping messages? can it handle the increased nodes or load without degrading service? Scalability vertical (adding power to existing nodes) and horizontal (adding more nodes).? can the system hide its internal workings from the Transparency users? Location, Migration, and Replication Transparency. Typical Design Goals can the server handle multiple clients Concurrency concurrently? is the service fast enough? Does it utilize Efficiency 100% of all resources? are data & services always there for Availability clients? Security can the system withstand hacker attacks? Openness is the system extensible? DS Challenges & Problems Challenges Increasing the (inter-processor communications). Number of processors available to execute on, and Processors synchronization. Memory size and the bandwidth of interconnection. Analyzing the problem to suit distributed computing. Infrastructure to handle distribution and communication. HW & SW combined to provide a tool to solve specific problems. 10/10/2024 Distributed Systems 34 Challenges Multiple computers Concurrent execution. Independent failure. Autonomous administration. Heterogeneous. Large numbers. 10/10/2024 Distributed Systems 35 Challenges Networked communication Asynchronous Unreliable Insecure 10/10/2024 Distributed Systems 36 Challenges Common goal Consistency Transparency 10/10/2024 Distributed Systems 37 Problems Hardware Software Problems Problems Memory, Data Memory size and Management bandwidth Detection of Bandwidth of parallelism interconnection Processes Scheduling No. of processors 10/10/2024 Distributed Systems 38 10/10/2024 Distributed Systems 39 Outline Flynn’s Taxonomy of computer architectures Distributed Systems categories Tightly coupled (Multiprocessor) Loosely Coupled (Multicomputer) 10/17/2024 Distributed Systems 2 Flynn’s Taxonomy of computer architectures Flynn’s Taxonomy SISD Single Instruction/Single Data SIMD Single Instruction/Multiple Data MISD Multiple Instruction/Single Data MIMD Multiple Instruction/Multiple Data 10/17/2024 Distributed Systems 4 Single Instruction/Single Data Your desktop, before the spread of dual core CPUs PU – Processing Unit 10/17/2024 Distributed Systems 5 SIMD Processors that execute same instruction on multiple pieces of data NVIDIA GPUs 10/17/2024 Distributed Systems 6 SIMD Each core runs the same set of instructions on different data Example: NVIDIA: processes pixels of an image in parallel. Array Processor. 10/17/2024 Distributed Systems 7 MISD Example: Pipeline Architecture 10/17/2024 Distributed Systems 8 MIMD 10/17/2024 Distributed Systems 9 MIMD Shared Memory Distributed Memory Processors are all Each processor has its connected to a own individual memory location. "globally available" Each processor has no memory direct knowledge about Via either software or other processor's hardware means. memory. 10/17/2024 Distributed Systems 10 Summary for Flynn's taxonomy Item Proc. Inst. Data Parallelism Example Application M6800,M68000,i80 SISD 1 1 1 No Adding N numbers 86. Checking whether z MISD N N 1 Yes Pipeline is prime Super computers, Yes Adding matrices A SIMD N 1 N Array processors, ICL →synchronous and B (DAP). Yes Loosely coupled (multicomputer) MIMD N N N →Asynchronous Tightly coupled (multiprocessor) 10/17/2024 Distributed Systems 11 Distributed Systems categories DS categories Based on Based on Based on Fault Based on Data Communication Resource Tolerance Distribution Method Management Mechanisms Message passing Centralized Replicated Passive vs. vs. vs. vs. shared memory decentralized partitioned data active replication Synchronous Consistency models vs. Load balancing Checkpointing and strategies (e.g., eventual rollback recovery asynchronous consistency, strong communication consistency) 10/17/2024 Distributed Systems 13 Distributed Computer Systems Hardware Software Tightly Loosely System Applications coupled coupled Web Bus Switch Bus OS applications Grid Single Crossbar Switch Non-OS computation Multiple M-stage 10/17/2024 Distributed Systems 14 To remember Bus Switched a single network bus, cable. there are individual wires from one machine to another and messages route along one of the outgoing wires. Sequential program Parallel program A sequence of operations A sequence of operations for the processor to follow for the processor to follow step by step. in parallel. 10/17/2024 Distributed Systems 15 H.W Point of view Point of view Tightly coupled Loosely coupled PE PE Connection n-processors on the n-computers same board connected together Delay Short Long Data rate High Low Usage parallel system works DS works on many on a single problem unrelated problems Communication Shared memory Messages 10/17/2024 Distributed Systems 16 Tightly coupled (Multiprocessor) Bus-based multiprocessors SMP: Symmetric Multi-Processing All CPUs connected to one bus (backplane). One Shared memory (coherent). Limited to 64 processors. Disadvantages: One communication per clock pulse. One memory access at a time. With 4 or 5 processors connected to the bus it will be overloaded (1st problem). CPU A CPU B Device memory I/O Bus 10/17/2024 Distributed Systems 18 Bus-based multiprocessors Problem As the no of processors increase Dealing with bus overload Solution Add a high-speed cache memory between the processor and the bus to keep the most recent data. CPU does I/O to cache memory access main memory on cache miss CPU A CPU B Device memory cache cache I/O Bus 10/17/2024 Distributed Systems 19 Working with a cache The cache holds the most recently accessed words. All memory requests go through the cache. If the word requested is in the cache, the cache itself responds to the CPU, and no bus request is made Problem → Memory incoherency CPU A reads location 123 from memory CPU A CPU B Device 123 : 50 123 : 50 cache I/O Bus 10/17/2024 Distributed Systems 20 Working with a cache CPU A modifies location 123 CPU B reads location 123 from memory Get old value CPU A CPU B Device 123 : 50 123 : 70 123 : 50 I/O Memory not coherent! Bus 10/17/2024 Distributed Systems 21 Write-through cache Fix coherency problem by writing all values through bus to main memory CPU A modifies location 123 – write-through main memory is now coherent CPU B reads location 123 from memory loads into cache CPU A CPU B Device 123 : 70 123 : 70 123 : 70 I/O Bus 10/17/2024 Distributed Systems 22 Snoopy cache Add logic to each cache controller: monitor the bus Virtually all bus-based architectures use a snoopy cache CPU A CPU B Device 123 : 50 70 123 : 50 70 12345: 123 : 50 703 I/O write  0 Bus 10/17/2024 Distributed Systems 23 Switched multiprocessors Bus-based architecture does not scale to a large number of CPUs (8+) Divide memory into groups and connect chunks of memory to the processors with a crossbar switch memory memory memory memory CPU n2 CPU crosspoint CPU switches CPU 10/17/2024 Distributed Systems 24 Switched multiprocessors Disadvantages: No more than one CPU can access the same memory at the same time. Complexity. Low link utilization Crossbar switches are very expensive (problem) 10/17/2024 Distributed Systems 25 Omega network Reduce crosspoint switches by adding more switching stages CPU memory CPU memory CPU memory CPU memory 10/17/2024 Distributed Systems 26 Omega network with n CPUs and n memory modules: need log2n switching stages each with n/2 switches Total: (nlog2n)/2 switches.  Better than n2 but still a quite expensive  delay increases: ▪ 1024 CPU and memory chunks ▪ overhead of 10 switching stages to memory and 10 back. 10/17/2024 Distributed Systems 27 Example If a system of size 8*8 CPU and memory to be built using either Cross bar or Omega switch compare between both designs on the basis of: The required number of switching elements used. Number of switching stages. The cell delay in terms of switching element delay (T). Item Cross bar Omega number of switching elements N2 = 64 4* log2 8 = 12 Number of switching stages N=8 log2 8 = 3 Cell delay in terms of switching 8T T * log2 8 element delay T 10/17/2024 Distributed Systems 28 Loosely Coupled (Multicomputer) 10/17/2024 Distributed Systems 29 Bus-based multicomputer No shared memory Communication mechanism needed on bus Traffic much lower than memory access Need not use physical system bus ▪ Can use LAN (local area network) instead CPU CPU CPU CPU memory memory memory memory LAN LAN LAN LAN connector connector connector connector Interconnect 10/17/2024 Distributed Systems 30 Switched multicomputer Each computer has a direct access to another. CPU-to-CPU communication depends on the organization. Advantages Flexible and Expandable. Disadvantages Complex CPU CPU CPU CPU memory memory memory memory LAN LAN LAN LAN connector connector connector connector n-port switch 10/17/2024 Distributed Systems 31 Multicomputer Systems 2D-Grid Hypercube 10/17/2024 Distributed Systems 32 32 10/17/2024 Distributed Systems 33 Distributed Systems : Software Concepts DS Software Distributed Systems Multi- Multi- Processor Computer OS for each Distributed Asymmetric Network OS CPU OS Time Symmetric sharing 10/24/2024 Distributed Systems 6 Distributed Systems : Software System Software Hardware Main goal Network Loosely Loosely Offer local service for operating system coupled coupled remote clients Additional layer on Provide distribution Middleware top of NOS transparency Distributed Tightly Loosely provide user with operating system coupled coupled transparency Multiprocessor Provide the feeling of Tightly Tightly Time Sharing working with single coupled coupled systems processor 10/24/2024 Distributed Systems 7 Multiprocessor OS OS for each CPU Each CPU Has Its Own Operating System: The simplest possible way to organize a multiprocessor operating system is to: divide memory into partitions = No. of CPUs give each CPU its own ▪ private memory ▪ private copy of the operating system. The n CPUs operate as n independent computers. 10/24/2024 Distributed Systems 9 OS for each CPU 10/24/2024 Distributed Systems 10 Asymmetric Multiprocessors Also called Master-slave multiprocessing One copy of the operating system and its tables is present on one CPU (Master) and not on any of the others. All system calls are redirected to Master CPU for processing there. Master CPU may also run user processes if there is CPU time left over. 10/24/2024 Distributed Systems 11 Asymmetric Multiprocessors 10/24/2024 Distributed Systems 12 Asymmetric Multiprocessors Disadvantages of Advantages of AMP AMP Simplified Control Single Point of Failure Reduced Resource Limited Scalability Conflicts Less Efficient Specialized Processors Resource Utilization 10/24/2024 Distributed Systems 13 Symmetric Multiprocessors In SMP, two or more identical processors are connected to a single, shared main memory have full access to all input/output devices controlled by a single OS instance that treats all processors equally. There is one copy of the operating system in memory, but any CPU can run it. When a system call is made, the CPU on which the system call was made traps to the kernel and processes the system call. A mutex (i.e., lock) is associated with the OS. When a CPU wants to run operating system code, it must first acquire the mutex. 10/24/2024 Distributed Systems 14 Symmetric Multiprocessors 10/24/2024 Distributed Systems 15 Symmetric Multiprocessors Disadvantages of Advantages of SMP SMP Better Resource Complex Scheduling Utilization Scalability Resource Contention Increased Hardware Fault Tolerance Costs 10/24/2024 Distributed Systems 16 SMP vs AMP Symmetric Asymmetric Parameter Multiprocessing Multiprocessing Processor types Identical Not identical Tasks of the OS Any symmetric CPU The master processor. the master manages all Communication Via shared memory. the slave processors. Task assignment via a prepared queue list. Via master processor very complex due to the very simple and Architecture need for CPU straightforward. synchronization. Expensive due to comparatively Cheaper Expense complicated architectural due to simpler design. architectural design. 10/24/2024 Distributed Systems 17 Multiprocessor Scheduling On a uniprocessor, scheduling is one dimensional. The only question that must be answered (repeatedly) is: ''Which process should be run next?‘’ On a multiprocessor, scheduling is two dimensional. The scheduler has to decide which process to run and which CPU to run it on. This extra dimension greatly complicates scheduling on multiprocessors. Another complicating factor is that in some systems, all the processes are unrelated whereas in others they come in groups. An example of the former situation is a timesharing system in which independent users start up independent processes. The processes are unrelated and each one can be scheduled without regard to the other ones 10/24/2024 Distributed Systems 18 Timesharing The simplest scheduling algorithm for dealing with unrelated processes (or threads) is to have a single system wide data structure for ready processes, possibly just a list, but more likely a set of lists for processes at different priorities as depicted 10/24/2024 Distributed Systems 19 Timesharing Here the 16 CPUs are all currently busy, and a prioritized set of 14 processes are waiting to run. The first CPU to finish its current work (or have its process block) is CPU 4, which then locks the scheduling queues and selects the highest priority process, A Next, CPU 12 goes idle and chooses process B, as illustrated in Fig. As long as the processes are completely unrelated, doing scheduling this way is a reasonable choice. 10/24/2024 Distributed Systems 20 Multiprocessor Time Sharing Having a single scheduling data structure used by all CPUs timeshares the CPUs, much as they would be in a uniprocessor system. It also provides automatic load balancing because it can never happen that one CPU is idle while others are overloaded. Two disadvantages of this approach are the potential contention for the scheduling data structure as the numbers of CPUs grows the usual overhead in doing a context switch when a process blocks for I/O. It is also possible that a context switch happens when a process' quantum expires. 10/24/2024 Distributed Systems 21 Challenges and Overheads Spin locks vs. blocking locks Synchronization Hardware vs. software support Number of accesses increases with number of CPUs. Memory Access Memory speed must scale with N(CPU) or constrain. Caching and Memory bottleneck increases the need to cache Consistency multiple copies in caches means coherency problems System Bus Traffic increases with the number of CPUs File System Serializes access if designed conventionally 10/24/2024 Distributed Systems 22 Network Operating System Network Operating System Each computer has its own operating system with networking facilities. Computers work independently (i.e., they may even have different operating systems). Services are tied to individual nodes (ftp, WWW). Highly file oriented (basically, processors share only files). Machine A Machine B Machine C Distributed applications NOS services NOS services NOS services Kernel Kernel Kernel Network 10/24/2024 Distributed Systems 24 Network Operating System Intermediate stage between independent individual workstations and a distributed SW environment. Independent machines with significant interaction: They may run different operating systems. Servers support interaction of distributed pieces. Network File System (NFS) is the most obvious and the strongest combining force. Extensive communication - location dependent. Client/server - service model. File servers - file sharing. WWW service - newest addition. 10/24/2024 Distributed Systems 25 NOS for users Users are aware of multiplicity of machines Access to resources of various machines is done explicitly by: Remote logging into the appropriate remote machine Transferring data from remote machines to local machines, via the File Transfer Protocol (FTP) mechanism Upload, download, access, or share files through cloud storage Users must change paradigms – establish a session, give network-based commands, use a web browser More difficult for users 10/24/2024 Distributed Systems 26 PC vs. NOS The NOS enhances the reach of the client PC by making remote services available as extensions of the local operating system. Although a number of users may have accounts on a PC, only a single account is active on the system at any given time. NOS supports multiple user accounts at the same time and enables concurrent access to shared resources by multiple clients (multitasking and multiuser environment). 10/24/2024 Distributed Systems 27 Multiuser, Multitasking, and Multiprocessor Systems A NOS server is a multitasking system. OS is capable of executing multiple tasks at the same time. Some systems are equipped with more than one processor, called multiprocessing systems. multiprocessing systems are capable of executing multiple tasks in parallel by assigning each task to a different processor. The total amount of work that the server can perform in a given time is greatly enhanced in multiprocessor systems. 10/24/2024 Distributed Systems 28 NOS: Where to use? NOS can be used in: Routers, switches and hardware firewall. PCs in Peer-to-peer networks Client-server Architecture 10/24/2024 Distributed Systems 29 Middleware Middleware is a computer software that connects software components or people and their applications. It consists of a set of services that allows multiple processes running on one or more machines to interact. Interoperable in support of distributed systems. Middleware sits "in the middle" between application software that may be working on different operating systems. 10/24/2024 Distributed Systems 30 Middleware OS on each computer need not know about the other computers. OS on different computers need not generally be the same. Services are generally (transparently) distributed across computers. 10/24/2024 Distributed Systems 31 Distributed Operating System Multicomputer OS Multicomputer OS has a totally different structure and complexity from multiprocessor OS because of the lack of physically shared memory for storing data structures for system-wide resource management. Users are not aware of multiplicity of machines Access to remote resources similar to access to local resources Machine A Machine B Machine C Distributed applications Distributed OS applications Kernel Kernel Kernel Network 10/24/2024 Distributed Systems 33 Fully Distributed Systems Programming and user interface provides a virtual uni-processor. Built on top of distributed heterogeneous machines. New approaches and implementations to user and system software are generally required. Long development time. Execution overhead is the REAL challenge Message latency and overhead are usually the right place to start looking 10/24/2024 Distributed Systems 34 Fully Distributed Systems Familiar uni-processor challenges are all present They become more challenging when generalized for a truly distributed implementation Synchronization Squared. Scheduling multiplied. Programming model complication. Shared and Virtual Memory complications. 10/24/2024 Distributed Systems 35 Fully Distributed Systems Distribution brings its own complications Load balancing. Cache coherence. Fault tolerance. Process migration. 10/24/2024 Distributed Systems 36 Migrations Data Migration – transfer data by transferring entire file, or transferring only those portions of the file necessary for the immediate task Computation Migration – transfer the computation, rather than the data, across the system Via remote procedure calls (RPCs) Via messaging system Process Migration – execute an entire process, or parts of it, at different sites Load balancing – distribute processes across network to even the workload Computation speedup – subprocesses can run concurrently on different sites Hardware preference – process execution may require specialized processor Software preference – required software may be available at only a particular site Data access – run process remotely, rather than transfer all data locally 10/24/2024 Distributed Systems 37 Design questions! Robustness Can the distributed system withstand failures? Transparency Can the distributed system be transparent to the user both in terms of where files are stored and user mobility? Scalability Can the distributed system be scalable to allow addition of more computation power, storage, or users? 10/24/2024 Distributed Systems 38 Robustness Hardware failures can include failure of a link, failure of a site, and loss of a message. A fault-tolerant system can tolerate a certain level of failure Degree of fault tolerance depends on design of system and the specific fault The more fault tolerance, the better! Involves failure detection, reconfiguration, and recovery 10/24/2024 Distributed Systems 39 Failure Detection Detecting hardware failure is difficult To detect a link failure, a heartbeat protocol can be used  Assume Site A and Site B have established a link ▪ At fixed intervals, each site will exchange an I-am-up message indicating that they are up and running ▪ If Site A does not receive a message within the fixed interval, it assumes either (a) the other site is not up or (b) the message was lost ▪ Site A can now send an Are-you-up? message to Site B ▪ If Site A does not receive a reply, it can repeat the message or try an alternate route to Site B ▪ If Site A does not ultimately receive a reply from Site B, it concludes some type of failure has occurred Types of failures:  Site B is down  The direct link between A and B is down  The alternate link from A to B is down  The message has been lost However, Site A cannot determine exactly why the failure has occurred 10/24/2024 Distributed Systems 40 Failure Detection Detecting hardware failure is difficult To detect a link failure, a heartbeat protocol can be used Assume Site A and Site B have established a link  At fixed intervals, each site will exchange an I-am-up message indicating that they are up and running If Site A does not receive a message within the fixed interval, it assumes either (a) the other site is not up or (b) the message was lost Site A can now send an Are-you-up? message to Site B If Site A does not receive a reply, it can repeat the message or try an alternate route to Site B If Site A does not ultimately receive a reply from Site B, it concludes some type of failure has occurred Types of failures: ▪ Site B is down ▪ The direct link between A and B is down ▪ The alternate link from A to B is down ▪ The message has been lost However, Site A cannot determine exactly why the failure has occurred 10/24/2024 Distributed Systems 41 Reconfiguration and Recovery When Site A determines a failure has occurred, it must reconfigure the system: If the link from A to B has failed, this must be broadcast to every site in the system If a site has failed, every other site must also be notified indicating that the services offered by the failed site are no longer available When the link or the site becomes available again, this information must again be broadcast to all other sites 10/24/2024 Distributed Systems 42 Transparency The distributed system should appear as a conventional, centralized system to the user User interface should not distinguish between local and remote resources User mobility allows users to log into any machine in the environment and see his/her environment 10/24/2024 Distributed Systems 43 Scalability As demands increase, the system should easily accept the addition of new resources to accommodate the increased demand Reacts gracefully to increased load Adding more resources may generate additional indirect load on other resources if not careful Data compression or deduplication can cut down on storage and network resources used 10/24/2024 Distributed Systems 44 Comparison between Systems Multiproc. Multicomp. Network Middleware Item DOS DOS OS -based OS Degree of transparency Very High High Low High Same OS on all nodes Yes Yes No No Number of copies of OS 1 N N N Basis for Shared Model Messages Files communication memory specific Global, Global, Resource management Per node Per node central distributed Scalability No Moderately Yes Varies Openness Closed Closed Open Open 10/24/2024 Distributed Systems 45 10/24/2024 Distributed Systems 46 Introduction In distributed systems, communication between nodes is crucial for coordination and data exchange. Communication paradigms describe and classify a set of methods for the exchange of data between entities in a Distributed System 11/8/2024 Distributed Systems 3 Classification of Communication Paradigms Communication Paradigms can be categorized into three types based on where the entities reside. If entities are running on: Same Computer, Same Address-Space Networked Computers Different Address-Space Socket communication Global variables, Files, Signals, Shared Remote Invocation Procedure calls, … Memory… Indirect communication Networked Communication Paradigms (CP) Networked CP Socket communication Low-level API for communication using underlying network protocols Remote Invocation A procedure call abstraction for communicating between entities Indirect Communication Communicating without direct coupling between sender and receiver Sockets and Communication Protocols Sockets and Communication Protocols Definition: Sockets provide an interface for sending and receiving data over a network. Communication protocols define the rules for data exchange. Types: TCP Sockets: Reliable, ordered communication. UDP Sockets: Faster, but without guarantees. Use Case: Real-time applications like chat systems or online gaming. 11/8/2024 Distributed Systems 8 Socket Communication Socket is a communication end-point to which an application can write or read data Use Case: Real-time applications like chat systems or online gaming. Sockets provide an interface for sending and receiving data over a network (transport layer). Transport protocols define the rules for data exchange. Each socket is associated with a particular type of transport protocol UDP Socket: TCP Socket: Provides Connection-less and Provides Connection-oriented unreliable communication and reliable communication UDP vs. TCP 11/8/2024 Distributed Systems 10 1. UDP Sockets Messages are sent from sender process to receiver process using UDP protocol. UDP provides connectionless communication, SS.receive(recvPacket) with no acknowledgements or message Client Server transmission retries CS.Send(msg, ServerIP, sp) Communication mechanism: CS SS Server opens a UDP socket SS at a known port cp sp sp SS.Send(msg, recvPacket.IP, recvPacket.port) Socket SS waits to receive a request Client opens a UDP socket CS at a random port cp No ACK will be sent by the Client socket CS sends a message to ServerIP receiver and port sp Server socket SS may send back data to CS H = Host computer H S = Socket S n = Port n 11/8/2024 Distributed Systems 11 UDP Sockets – Design Considerations Messages may be delivered out-of-order If necessary, programmer must re-order packets Communication is not reliable Messages might be dropped due to check-sum error or buffer overflows at routers Sender must explicitly fragment a long message into smaller chunks before transmitting A maximum size of 548 bytes is suggested for transmission Receiver should allocate a buffer that is big enough to fit the sender’s message Otherwise the message will be truncated 2. TCP Sockets Messages are sent from sender to receiver using TCP protocol  TCP provides in-order delivery, reliability and congestion control Communication mechanism  Server opens a TCP server socket SS at a known port sp  Server waits to receive a request (using accept call)  Client opens a TCP socket CS at a random port cp  CS initiates a connection initiation message to ServerIP and port sp  Server socket SS allocates a new socket NSS on random port nsp for the client nSS = SS.accept()  CS can send data to NSS Client Server CS SS sp cp nSS nsp Advantages of TCP Sockets TCP Sockets ensure in-order delivery of messages Applications can send messages of any size TCP Sockets ensure reliable communication using acknowledgements and retransmissions Congestion control of TCP regulates sender rate, and thus prevents network overload Challenges 1. Network 2. Data Serialization Reliability Issue Issue Network failures, such as packet loss, Different systems may use different can disrupt communication. data formats, leading to incompatibility. Solution Solution Implement retries and Use standardized serialization acknowledgment mechanisms to formats (e.g., JSON, Protocol Buffers) ensure data integrity. for consistent data exchange. 11/8/2024 Distributed Systems 15 Challenges 4. Blocking 3. Concurrency Operations Issue Issue Handling multiple clients Synchronous operations can simultaneously can lead to race block the server, making it conditions and data corruption. unresponsive to other clients. Solution Solution Use threading or asynchronous Use non-blocking sockets or programming to manage asynchronous I/O to prevent multiple connections effectively. blocking 11/8/2024 Distributed Systems 16 Challenges 5. Firewall and 6. Protocol NAT Issues Implementation Issue Issue Firewalls and Network Address Designing a robust communication Translation (NAT) can block or protocol can be complex, leading to interfere with socket connections. errors in message formatting or sequencing. Solution Solution Use well-known ports and consider Clearly define the protocol and using techniques like STUN/TURN for implement error-checking NAT traversal. mechanisms. 11/8/2024 Distributed Systems 17 Challenges 7. Security 8. Performance Concerns Bottlenecks Issue Issue Data transmitted over sockets can High latency and low bandwidth be intercepted or tampered with. can affect performance, especially in real-time applications. Solution Solution Use encryption (e.g., TLS/SSL) to Optimize data transfer, minimize secure data in transit. the size of messages, and use efficient algorithms. 11/8/2024 Distributed Systems 18 Challenges 10. Error 9. Timeouts Handling Issue Issue Connections may hang Errors can occur at various stages indefinitely if no data is received. of communication (e.g., connection failures, read/write errors). Solution Solution Implement timeout settings for Implement comprehensive error read and write operations to close handling to manage exceptions stale connections. gracefully. 11/8/2024 Distributed Systems 19 Sockets Example (Python) TCP Socket Server TCP Socket Client import socket import socket server_socket = client_socket = socket.socket(socket.AF_INET, socket.socket(socket.AF_INET, socket.SOCK_STREAM) socket.SOCK_STREAM) server_socket.bind(('localhost', 8080)) client_socket.connect(('localhost', 8080)) server_socket.listen(5) message = client_socket.recv(1024) while True: print(message.decode()) client_socket, addr = server_socket.accept() client_socket.close() print(f"Connection from {addr}") client_socket.send(b"Hello Client") client_socket.close() 11/8/2024 Distributed Systems 20 Remote Invocation Remote Invocation Remote invocation enables an entity to call a procedure that typically executes on another computer without the programmer explicitly coding the details of communication The underlying middleware will take care of raw- communication Programmer can transparently communicate with remote entity There are two main types of remote invocations: Remote Procedure Calls (RPC) Remote Method Invocation (RMI) Remote Procedure Call (RPC) Remote Procedure Call (RPC) RPC is a protocol that allows a program to execute a procedure on a remote server as if it were a local call. Key Features: Abstracts the complexities of network communication. Provides a synchronous or asynchronous call mechanism. Example: Scenario: A client wants to fetch user details from a server. Client Code (Pseudo): ▪ user_details = rpc_call("getUserDetails", user_id) Server Code (Pseudo): ▪ def getUserDetails(user_id): ▪ # Fetch user from database ▪ return user_data Use Case: Microservices architecture where services need to communicate over a network. 11/8/2024 Distributed Systems 24 Remote Procedure Calls (RPC) RPC enables a sender to communicate with a receiver using a simple procedure call No communication or message-passing is visible to the programmer Basic RPC Approach Machine A – Client Machine B – Server Client Communication Communication Server Program Module Module Procedure Request int add(int … add(a,b) x, int y) { ; return x+y; … Response } Client process Server process Client Server Stub Stub (Skeleton) Challenges in RPC Parameter passing via Marshaling Procedure parameters and results have to be transferred over the network as bits Data representation Data representation has to be uniform ▪ Architecture of the sender and receiver machines may differ Parameter passing via Marshaling Procedure parameters and results have to be transferred over the network as bits Parameter Passing via Marshaling Parameter marshalling Packing parameters into a message that will be transmitted over the network The parameters to the procedure and the result have to be marshaled before transmitting them over the network Two types of parameters can passed 1. Value parameters 2. Reference parameters 1. Passing Value Parameters Value parameters have complete information about the variable, and can be directly encoded into the message e.g., integer, float, character Values passed are passed through call-by- value The changes made by the callee procedure are not reflected in the caller procedure Example of Passing Value Parameters Client Server Client process Server process Implementation of add k = add(i,j) k = add(i,j) proc: add proc: add int: val(i) int: val(i) int: val(j) int: val(j) Client OS Server OS proc: add int: val(i) int: val(j) 2. Passing Reference Parameters Passing reference parameters like value parameters in RPC leads to incorrect results due to two reasons: Invalidity of reference parameters at the server ▪ Reference parameters are valid only within client’s address space ▪ Solution: Pass the reference parameter by copying the data that is referenced Changes to reference parameters are not reflected back at the client ▪ Solution: “Copy/Restore” the data Copy the data that is referenced by the parameter. Copy-back the value at server to the client. Data representation Data representation has to be uniform Data Representation Computers in DS often have different architectures and operating systems The size of the data-type differ ▪ e.g., A long data-type is 4-bytes in 32-bit Unix, while it is 8-bytes in 64-bit Unix systems The format in which the data is stored differ ▪ e.g., Intel stores data in little-endian format, while SPARC stores in big-endian format The client and server have to agree on how simple data is represented in the message e.g., format and size of data-types such as integer, char and float Remote Procedure Call Types Remote procedure calls can be: Synchronous Asynchronous (or Deferred Synchronous) Synchronous vs. Asynchronous RPCs An RPC with strict request-reply blocks the client until the server returns Blocking wastes resources at the client Asynchronous RPCs are used if the client does not need the result from server The server immediately sends an ACK back to client The client continues the execution after an ACK from the server Synchronous RPCs Asynchronous RPCs Deferred Synchronous RPCs Asynchronous RPC is also useful when a client wants the results, but does not want to be blocked until the call finishes Client uses deferred synchronous RPCs Single request-response RPC is split into two RPCs First, client triggers an asynchronous RPC on server Second, on completion, server calls-back client to deliver the results Remote Method Invocation (RMI) Remote Method Invocation (RMI) In RMI, a calling object can invoke a method on a potentially remote object RMI is similar to RPC, but in a world of distributed objects The programmer can use the full expressive power of object-oriented programming RMI not only allows to pass value parameters, but also pass object references Remote Objects and Supporting Modules In RMI, objects whose methods can be invoked remotely are known as “remote objects” Remote objects implement remote interface During any method call, the system has to resolve whether the method is being called on a local or a remote object Local calls should be called on a local object Remote calls should be called via remote method invocation Remote Reference Module is responsible for translating between local and remote object references RMI Control Flow Machine A – Client Machine B – Server Communication Communication Module Module Skeleton and Proxy Request Dispatcher for Obj A for B B’s class Remote Remot Remote Response e Obj B Reference Reference Module Module Remote Invocation Advantages: Programmer does not have to write code for socket communication Disadvantages: Space Coupling ▪ Where the procedure resides should be known in advance Time Coupling ▪ On the receiver, a process should be explicitly waiting to accept requests for procedure calls Challenges 1. Network 2. Fault Tolerance Latency Issue Issue The time it takes for messages to travel Network failures, server crashes, or over the network can introduce delays. timeouts can disrupt communication and lead to incomplete operations. Solution Solution Minimize the number of remote calls Implement retry mechanisms, circuit and asynchronous communication. breakers, and fallback strategies. 11/8/2024 Distributed Systems 42 challenges 3. Data 4. Serialization Consistency Overhead Issue Issue Ensuring data consistency across Converting complex data structures to different nodes can be challenging, a format suitable for transmission can especially in distributed systems. introduce performance overhead. Solution Solution Utilize strong consistency models or Use efficient serialization formats (e.g., eventual consistency based on the Protocol Buffers, Thrift) to minimize application requirements. this overhead. 11/8/2024 Distributed Systems 43 challenges 6. Security 5. Versioning Concerns Issue Issue Changes in the service interface or RPCs and message passing can expose data structures can lead to sensitive data to interception or compatibility issues between client and unauthorized access. server. Solution Solution Implement versioning strategies to Use encryption (e.g., TLS) and maintain backward compatibility. authentication mechanisms to secure communications. 11/8/2024 Distributed Systems 44 challenges 7. Complexity of 8. Blocking Calls Debugging Issue Issue Debugging distributed systems can be Synchronous RPC calls can block the more complex than debugging local client while waiting for a response, applications due to the involvement of leading to unresponsive applications. multiple processes and networks. Solution Solution Implement logging and monitoring Use asynchronous RPCs or callbacks to tools to trace requests and identify handle operations without blocking. issues. 11/8/2024 Distributed Systems 45 challenges 9. Load 10. Message 11. Protocol Balancing Ordering Complexity Issue Issue Issue Uneven distribution of Messages may arrive out of Designing a robust requests can overload some order, which can be communication protocol servers while others remain problematic for certain can be complex and error- underutilized. applications. prone. Solution Solution Solution Implement load balancing Implement mechanisms to Clearly define the protocol strategies to distribute order messages or manage specification and requests evenly across state consistency. incorporate error-checking available resources. and handling mechanisms. 11/8/2024 Distributed Systems 46 11/8/2024 Distributed Systems 47 Distributed Systems Lec 05 Communication in DS Indirect Communication Indirect Communication Indirect communication uses middleware to Provide one-to-many communication Mechanisms eliminate space and time coupling ▪ Space coupling Sender and receiver should know each other’s identities ▪ Time coupling Sender and receiver should be explicitly listening to each other during communication 15/11/2024 Distributed Systems 3 Indirect Comm. Middleware Indirect communication can be achieved through: Group communication Publish-subscribe Message queues 15/11/2024 Distributed Systems 4 1. Group Communication Group Communication One-to-many communication Multicast communication Sender Abstraction of a group Group is represented in the system by a Recv 2 Recv 1 groupId Recipients join the group A sender sends a message to the group Recv 3 which is received by all the recipients 15/11/2024 Distributed Systems 6 Group Communication Services provided by middleware Group membership Handling the failure of one or more group members Advantages Enables one-to-many communication Efficient use of bandwidth Identity of the group members need not be available at all nodes Disadvantages Time coupling 15/11/2024 Distributed Systems 7 Multicasting Multicast can be supported using two approaches Network-level multicasting Application-level multicasting 15/11/2024 Distributed Systems 8 1. Network-Level Multicast Each multicast group is assigned a unique IP address Sender Applications “join” the multicast group Multicast tree is built by connecting routers and computers in the group Recv 2 Recv 1 Network-level multicast is not scalable Each DS may have a number of multicast groups Each router on the network has to store information for multicast IP address for each Recv 3 group for each DS 15/11/2024 Distributed Systems 9 2. Application-Level Multicast (ALM) ALM organizes the computers involved in a DS into an overlay network The computers in the overlay network cooperate to deliver messages to other computers in the network Network routers do not directly participate in the group communication The overhead of maintaining information at all the Internet routers is eliminated Connections between computers in an overlay network may cross several physical links. Hence, ALM may not be optimal 15/11/2024 Distributed Systems 10 2. Publish-Subscribe Publish-Subscribe The publish-subscribe pattern enables publishers to send messages to a multitude of subscribers interested in receiving those messages. By creating a mechanism for publishers to send messages to a channel and subscribers to receive messages from that channel, the publish- subscribe model is achieved with ease. The publish-subscribe pattern is a messaging pattern that allows different services in a system to communicate in a decoupled manner. The publish-subscribe model's fundamental semantic feature lies in how messages flow from publishers to subscribers. In this model, publishers do not directly target subscribers. Instead, subscribers express their interest by issuing subscriptions for specific messages. 15/11/2024 Distributed Systems 12 Publish-Subscribe An event-based communication mechanism Publishers publish events to an event service Subscribers express interest in particular events Large number of producers distribute information to large number of consumers Publishers Subscribers Publish (Event2) Publish-subscribe Subscribe (Event3) Event Service 15/11/2024 Distributed Systems 13 Publish-Subscribe 15/11/2024 Distributed Systems 14 pub/sub service Stores all the subscriptions associated with respective subscribers Receives all the notifications from publishers Dispatches all the published notifications to correct subscribers Therefore, publishers and subscribers exchange information without directly knowing each other. 15/11/2024 Distributed Systems 15 Example Use Cases Broadcasting Notifications: large-scale systems like Twitter can notify all interested subscribers of specific hashtags. This also applies to social media platforms. Stock Trading System: Publishers can publish stock price updates to a channel, whereas different components can subscribe to the channel to receive the latest price updates without needing to know about each other's existence. This enables the system to update itself in real-time, minimizing latency, and reducing traffic on the system. 15/11/2024 Distributed Systems 16 3- Message-Queuing (MQ) Systems MQ Systems A refinement of Publish-Subscribe where Producers deposit the messages in a queue Messages are delivered to consumers through different methods Queue takes care of ensuring message delivery They provide intermediate-term storage capacity for messages (in the form of Queues), without requiring sender or receiver to be active during communication Advantages Enables space decoupling Enables time decoupling 15/11/2024 Distributed Systems 18 MQ Systems 1. Send message 1. Put message 2. Get message to the receiver into the queue from the queue Sender Receiver Sender MQ Receiver Traditional Request Model Message-Queuing Model 15/11/2024 Distributed Systems 19 MQ workflow Message Creation A producer generates a message containing the necessary data and metadata. Message Enqueue The producer sends the message to the queue, where it is stored until a consumer retrieves it. Message Storage The queue stores the message in a persistent or transient manner based on its configuration. Message Dequeue A consumer retrieves the message from the queue for processing. Depending on the queue's configuration, messages can be consumed in order, based on priority, or even in parallel. Acknowledgment Once the consumer processes the message, it may send an acknowledgment back to the broker, confirming that the message has been successfully handled. 15/11/2024 Distributed Systems 20 Space and Time Decoupling MQ enables space and time decoupling between sender and receivers Sender and receiver can be passive during communication However, MQ has other types of coupling Sender and receiver have to know the identity of the queue The middleware (queue) should be always active 15/11/2024 Distributed Systems 21 Space and Time Decoupling Sender MQ Recv Sender MQ Recv 1. Sender active; Receiver 2. Sender active; Receiver active passive Sender MQ Recv Sender MQ Recv 3. Sender passive; Receiver 4. Sender passive; Receiver active passive 15/11/2024 Distributed Systems 22 MQ System challenges The architecture of an MQ system has to address the following challenges: Placement of the Queue ▪ Is the queue placed near to the sender or receiver? Identity of the Queue ▪ How can sender and receiver identify the queue location? Intermediate Queue Managers ▪ Can MQ be scaled to a large-scale distributed system? 15/11/2024 Distributed Systems 24 a. Placement of the Queue Each application has a specific pattern of inserting and receiving the messages MQ system is optimized by placing the queue at a location that improves performance Typically, a queue is placed in one of the two locations Source queues ▪ Queue is placed near the source Destination queues ▪ Queue is placed near the destination Examples: “Email Messages” is optimized by the use of destination queues “RSS Feeds” requires source queuing 15/11/2024 Distributed Systems 25 b. Identity of the Queue In MQ systems, queues are generally addressed by names However, the sender and the receiver should be aware of the network location of the queue A naming service for queues is necessary Database of queue names to network locations is maintained Database can be distributed (similar to DNS) 15/11/2024 Distributed Systems 26 c. Intermediate Queue Managers Queues are managed by Queue Managers Queue Managers directly interact with sending and receiving processes However, Queue Managers are not scalable in dynamic large-scale Distributed Systems (DSs) Computers participating in a DS may change (thus changing the topology of the DS) There is no general naming service available to dynamically map queue names to network locations Solution: To build an overlay network (e.g., Relays) 15/11/2024 Distributed Systems 27 c. Intermediate Queue Managers Relay queue managers (or relays) assist in building dynamic scalable MQ systems Relays act as “routers” for routing the messages from sender to the queue manager Machine A Machine B Relay 1 Application Applicati Relay 1 1 on Application 2 Relay 1 Machine C Applicati on 15/11/2024 Distributed Systems 28 When to Use Message Queues Microservices Architecture Task Scheduling and Event-Driven Architectures Microservices need to Background Processing Propagate events to multiple communicate with each other, Offload time-consuming tasks services or components, without without tight coupling and (such as image processing or direct communication. Use a cascading failures. sending emails) to a message Pub/Sub message queue and have background workers (consumers) process them asynchronously. Load Leveling Reliable Communication Queue incoming requests using a Use persistent message queues to message queue and process them ensure that messages are not lost at a steady rate, ensuring that the and can be retried if delivery fails. system remains stable under load. 15/11/2024 Distributed Systems 29 Popular MQ Systems RabbitMQ Apache Kafka Amazon SQS Google Cloud Pub/Sub Redis Streams ActiveMQ 15/11/2024 Distributed Systems 31 Summary Inter-Process Communication (IPC) IPC provides a low-level communication API e.g., Socket API Remote Invocation Programmer can transparently invoke a remote function by using a local procedure-call syntax e.g., RPC and RMI Indirect Communication Allows one-to-many communication paradigm Enables space and time decoupling e.g., Multicasting and Message-Queue systems 15/11/2024 Distributed Systems 42 Thank you Questions? 15/11/2024 Distributed Systems 43 Distributed Systems Lec 06 Naming in DS Naming Names are used to uniquely identify entities in Distributed Systems Entities may be processes, remote objects, newsgroups, … Names are mapped to entities’ locations using name resolution 30/11/2024 Distributed Systems 2 name resolution An example of name resolution Name http://www.cdk5.net:8888/WebExamples/earth.html DNS Lookup Resource ID (IP Address, Port, File Path) 55.55.55.55 8888 WebExamples/earth.html MAC address Host 02:60:8c:02:b0:5a 30/11/2024 Distributed Systems 3 Names, Addresses and Identifiers An entity can be identified by three types of references 1. Name A name is a set of bits or characters that references an entity Names can be human-friendly (or not) 2. Address Every entity resides on an access point, and access point has an address Addresses may be location-dependent (or not) e.g., IP Address + Port 3. Identifier Identifiers are names that uniquely identify entities A true identifier is a name with the following properties: a. An identifier refers to at-most one entity b. Each entity is referred to by at-most one identifier c. An identifier always refers to the same entity (i.e. it is never reused) 30/11/2024 Distributed Systems 4 Naming Systems A naming system is simply a middleware that assists in name resolution Naming systems are classified into three classes based on the type of names used: 1. Flat naming 2. Structured naming 3. Attribute-based naming 30/11/2024 Distributed Systems 5 Flat naming Flat Naming In Flat Naming, identifiers are simply random bits of strings (known as unstructured or flat names) Flat name does not contain any information on how to locate an entity Types of name resolution mechanisms for flat names: 1. Broadcasting 2. Forwarding pointers 3. Home-based approaches 4. Distributed Hash Tables (DHTs) 30/11/2024 Distributed Systems 7 1. Broadcasting Approach: Broadcast the identifier to the complete network. The entity associated with the identifier responds with its current address Example: Address Resolution Protocol (ARP) I am 192.168.0.1. My address is 02:AB:4A:3C:59:85 Resolve an IP address to a MAC address In this application, ▪ IP address is the identifier of the entity ▪ MAC address is the address of the access point Challenges: Who has the identifier Not scalable in large networks 192.168.0.1? ▪ This technique leads to flooding the network with broadcast messages Requires all entities to listen to all requests 30/11/2024 Distributed Systems 8 2. Forwarding Pointers Forwarding Pointers enable locating mobile entities ▪ Mobile entities move from one access point to another When an entity moves from location A to location B, it leaves behind (at A) a reference to its new location at B Name resolution mechanism Follow the chain of pointers to reach the entity Update the entity’s reference when the present location is found Challenges: Reference to at-least one pointer is necessary Long chains lead to longer resolution delays Long chains are prone to failure due to broken links 30/11/2024 Distributed Systems 9 Forwarding Pointers – An Example Stub-Scion Pair (SSP) chains implement remote invocation for mobile entities using forwarding pointers  Server stub is referred to as Scion in the original paper Each forwarding pointer is implemented as a pair: (client stub, server stub)  The server stub contains a local reference to the actual object or a local reference to the remote client stub When object moves from A to B,  It leaves a client stub in its place  It installs a server stub that refers to the new remote client stub on B Process P1 Process P2 Process P3 Process P4 n = Process n; = Remote Object; = Caller Object; = Server stub; = Client stub 30/11/2024 Distributed Systems 10 3. Home-Based Approaches Each entity is assigned a home node Home node is typically static (has fixed access point and address) Home node keeps track of current address of the entity Entity-home interaction: Entity’s home address is registered at a naming service Entity updates the home about its current address (foreign address) whenever it moves Name resolution Client contacts the home to obtain the foreign address Client then contacts the entity at the foreign location 30/11/2024 Distributed Systems 11 Home-Based Approaches – An example Example: Mobile-IP 1. Update home node about the foreign address Mobile entity Home 3a. Home node forwards the message to the foreign node 2. Client sends the packet to the mobile entity at its home address of the mobile entity node 3b. Home node replies the client with the current IP address of the mobile entity 4. Client directly sends all subsequent packets directly to the foreign address of the mobile entity 30/11/2024 Distributed Systems 12 Home-Based Approaches – Challenges Home address is permanent for an entity’s lifetime If the entity permanently moves, then a simple home-based approach incurs higher communication overhead Connection set-up overheads due to communication between the client and the home can be excessive Consider the scenario where the clients are nearer to the mobile entity than the home entity 30/11/2024 Distributed Systems 13 4. Distributed Hash Table (DHT) DHT is a class of decentralized distributed system that provides a lookup service similar to a hash table (key, value) pair is stored in the nodes participating in the DHT The responsibility for maintaining the mapping from keys to values is distributed among the nodes Any participating node can retrieve the value for a given key A representative DHT known as Chord DATA KEY DISTRIBUTED NETWORK Hash Pink Panther ASDFADFAD function Participating Hash learn2.sha.edu.eg function DGRAFEWRH Nodes Hash 86.56.87.93 4PINL3LK4DF function 30/11/2024 Distributed Systems 14 Chord Entity Chord assigns an m-bit identifier key (randomly with id k Node n (node with id=n) chosen) to each node 000 Each node can be contacted through its network address 003 Node 000 Chord also maps each entity to an m-bit identifier 004 key 008 Node 005 Entities can be processes, files, etc. 040 Mapping of entities to nodes 079 Each node is responsible for a set of entities Node 010 An entity with key k falls under the control of the node 540 with smallest identifier id >= k. This node is known as the successor of k, and is Node 301 Match each entity with key denoted by succ(k) k with node succ(k) 30/11/2024 Distributed Systems 15 A Naïve Key Resolution Algorithm The main issue in DHT-based solution is to efficiently resolve a key k to the network location of succ(k) 00 19 Given an entity with key k on node n, how to 31 01 29 30 02 03 find the node succ(k)? 28 04 1. All nodes are arranged in a logical ring according to 27 05 their keys 26 06 2. Each node ‘p’ keeps track of its immediate neighbors: 25 07 succ(p) and pred(p) 24 08 3. If node ‘n’ receives a request to resolve key ‘k’: 23 09 If pred(p) < k

Distributed Systems Lecture Notes PDF

Document Details

Tags

Related

Summary

Full Transcript