Distributed Systems (3rd Edition) - Chapter 01 PDF

Summary

This document provides an introduction to distributed systems, covering key concepts such as definitions, characteristics, and organization. It explores the challenges in managing autonomous nodes and handling resource sharing in a distributed environment. The document touches upon themes such as transparency, synchronization issues, and middleware.

Full Transcript

# Distributed Systems (3rd Edition) ## Chapter 01: Introduction ### Introduction: What is a distributed system? #### Distributed System **Definition** A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system. **Characteristic fe...

# Distributed Systems (3rd Edition) ## Chapter 01: Introduction ### Introduction: What is a distributed system? #### Distributed System **Definition** A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system. **Characteristic features** * Autonomous computing elements, also referred to as nodes, be they hardware devices or software processes. * Single coherent system: users or applications perceive a single system => nodes need to collaborate. #### Collection of autonomous nodes **Independent behavior** Each node is autonomous and will thus have its own notion of time: there is no global clock. Leads to fundamental synchronization and coordination problems. **Collection of nodes** * How to manage group membership? * How to know that you are indeed communicating with an authorized (non)member? ### Organization **Overlay network** Each node in the collection communicates only with other nodes in the system, its neighbors. The set of neighbors may be dynamic, or may even be known only implicitly (i.e., requires a lookup). **Overlay types** Well-known example of overlay networks: peer-to-peer systems. * **Structured:** each node has a well-defined set of neighbors with whom it can communicate (tree, ring). * **Unstructured:** each node has references to randomly selected other nodes from the system. ### Coherent system **Essence** The collection of nodes as a whole operates the same, no matter where, when, and how interaction between a user and the system takes place. **Examples** * An end user cannot tell where a computation is taking place * Where data is exactly stored should be irrelevant to an application * If or not data has been replicated is completely hidden **Keyword** is distribution transparency **The snag:** partial failures It is inevitable that at any time only a part of the distributed system fails. Hiding partial failures and their recovery is often very difficult and in general impossible to hide. ### Middleware and distributed systems #### Middleware: the OS of distributed systems **Same interface everywhere** * Computer 1: Appl. A * Computer 2: Application B * Computer 3: Appl. C * Computer 4: Distributed-system layer (middleware): * Local OS 1 * Local OS 2 * Local OS 3 * local OS 4 * Network **What does it contain?** Commonly used components and functions that need not be implemented by applications separately. ### What do we want to achieve? * Support sharing of resources * Distribution transparency * Openness * Scalability ### Sharing resources **Canonical examples** * Cloud-based shared storage and files * Peer-to-peer assisted multimedia streaming * Shared mail services (think of outsourced mail systems) * Shared Web hosting (think of content distribution networks) **Observation** “The network is the computer" (quote from John Gage, then at Sun Microsystems) ### Distribution transparency **Types** | Transparency | Description | |---|---| | Access | Hide differences in data representation and how an object is accessed | | Location | Hide where an object is located | | Relocation | Hide that an object may be moved to another location while in use | |Migration | Hide that an object may move to another location | | Replication | Hide that an object is replicated | | Concurrency | Hide that an object may be shared by several independent users | | Failure | Hide the failure and recovery of an object | ### Degree of transparency **Observation** Aiming at full distribution transparency may be too much: * There are communication latencies that cannot be hidden * Completely hiding failures of networks and nodes is (theoretically and practically) *impossible* * You cannot distinguish a slow computer from a failing one * You can never be sure that a server actually performed an operation before a crash * Full transparency will *cost performance*, exposing distribution of the system * Keeping replicas exactly up-to-date with the master *takes time* * Immediately flushing write operations to disk for fault tolerance **Exposing distribution may be good** * Making use of location-based services (finding your nearby friends) * When dealing with users in different time zones * When it makes it easier for a user to understand what's going on (when e.g., a server does not respond for a long time, report it as failing). **Conclusion** Distribution transparency is a nice a goal, but achieving it is a different story, and it should often not even be aimed at.

Use Quizgecko on...
Browser
Browser