Topic1 Introduction to distrbuited systems-updated (3).pptx

Distributed Systems (3rd Edition) Chapter 01: Introduction to Distributed Systems DA 210 Outline  What is a distributed system - Autonomous - Coherent system - Middleware and distributed systems  Design goals - Being open - Being scalable - Pitfalls  Types of distributed systems - High performance distributed computing - Distributed information systems - Pervasive systems Introduction: What is a distributed system? Distributed System Definition A distributed system is a collection of autonomous computing elements that appears to its users as a single coherent system. Characteristic features ► Autonomous computing elements, also referred to as nodes, be they hardware devices or software processes. ► Single coherent system: users or applications perceive a single system ⇒ nodes need to collaborate. 3 / 56 Introduction: What is a distributed system? Characteristic 1: Collection of autonomous computing elements Collection of autonomous nodes Independent behavior Each node is autonomous and will thus have its own notion of time: there is no global clock. Leads to fundamental synchronization and coordination problems. Collection of nodes ► How to manage group membership? ► How to know that you are indeed communicating with an authorized (non)member? 4 / 56 Introduction: What is a distributed system? Characteristic 1: Collection of autonomous computing elements Organization Overlay network Each node in the collection communicates only with other nodes in the system, its neighbors. The set of neighbors may be dynamic, or may even be known only implicitly (i.e., requires a lookup). Overlay types Well-known example of overlay networks: peer-to-peer systems. Structured: each node has a well-defined set of neighbors with whom it can communicate (tree, ring). Unstructured: each node has references to randomly selected other nodes from the system. 5 / 56 Introduction: What is a distributed system? Characteristic 2: Single coherent system Coherent system Essence The collection of nodes as a whole operates the same, no matter where, when, and how interaction between a user and the system takes place. Examples ► An end user cannot tell where a computation is taking place ► Where data is exactly stored should be irrelevant to an application ► If or not data has been replicated is completely hidden Keyword is distribution transparency The snag: partial failures It is inevitable that at any time only a part of the distributed system fails. Hiding partial failures and their recovery is often very difficult and in general impossible to hide. 6 / 56 Introduction: What is a distributed system? Middleware and distributed systems Middleware: the OS of distributed systems Same interface everywhere Computer 1 Computer 2 Computer 3 Computer 4 Appl. A Application B Appl. C Distributed-system layer (middleware) Local OS 1 Local OS 2 Local OS 3 Local OS 4 Network What does it contain? Commonly used components and functions that need not be implemented by applications separately. 7 / 56 Introduction: Design goals What do we want to achieve? ► Support sharing of resources ► Distribution transparency ► Openness ► Scalability 8 / 56 Introduction: Design goals Supporting resource sharing Sharing resources Examples ► Cloud-based shared storage and files ► Peer-to-peer assisted multimedia streaming ► Shared mail services (think of outsourced mail systems) ► Shared Web hosting (think of content distribution networks) Observation “The network is the computer” (quote from John Gage, then at Sun Microsystems) 9 / 56 Introduction: Design goals Making distribution transparent Distribution transparency Types Transparency Description Access Hide differences in data representation and how an object is accessed Location Hide where an object is located Relocation Hide that an object may be moved to another location while in use Migration Hide that an object may move to another location Replication Hide that an object is replicated Concurrency Hide that an object may be shared by several independent users Failure Hide the failure and recovery of an object Types of distribution transparency 10 / Introduction: Design goals Making distribution transparent Degree of transparency Observation Aiming at full distribution transparency may be too much: ► There are communication latencies that cannot be hidden ► Completely hiding failures of networks and nodes is (theoretically and practically) impossible ► You cannot distinguish a slow computer from a failing one ► You can never be sure that a server actually performed an operation before a crash ► Full transparency will cost performance, exposing distribution of the system ► Keeping replicas exactly up-to-date with the master takes time ► Immediately flushing write operations to disk for fault tolerance Degree of distribution transparency 10 / 56 Introduction: Design goals Making distribution transparent Degree of transparency Exposing distribution may be good ► Making use of location-based services (finding your nearby friends) ► When dealing with users in different time zones ► When it makes it easier for a user to understand what’s going on (when e.g., a server does not respond for a long time, report it as failing). Conclusion Distribution transparency is a nice a goal, but achieving it is a different story, and it should often not even be aimed at. Degree of distribution transparency 11 / 56 Introduction: Design goals Being open Openness of distributed systems What are we talking about? Be able to interact with services from other open systems, irrespective of the underlying environment: ► Systems should conform to well-defined interfaces ► Systems should easily interoperate ► Systems should support portability of applications ► Systems should be easily extensible Interoperability, composability, and extensibility 12 / 56 Introduction: Design goals Being scalable Scale in distributed systems Observation Many developers of modern distributed systems easily use the adjective “scalable” without making clear why their system actually scales. At least three components ► Number of users and/or processes (size scalability) ► Maximum distance between nodes (geographical scalability) ► Number of administrative domains (administrative scalability) Observation Most systems account only, to a certain extent, for size scalability. Often a solution: multiple powerful servers operating independently in Scalability dimensions 15 / 56 Introduction: Design goals Being scalable Size scalability Root causes for scalability problems with centralized solutions ► The computational capacity, limited by the CPUs ► The storage capacity, including the transfer rate between CPUs and disks ► The network between the user and the centralized service Scalability dimensions 16 / 56 Introduction: Design goals Being scalable Techniques for scaling Hide communication latencies ► Make use of asynchronous communication ► Have separate handler for incoming response ► Problem: not every application fits this model Scaling techniques 22 / 56 Introduction: Design goals Being scalable Techniques for scaling Facilitate solution by moving computations to client Client Server FIRST NAME MAARTEN M A LAST NAME VAN STEEN A E-MAIL [email protected] R T E N Check form Process form Client Server FIRST NAME MAARTEN MAARTEN LAST VAN STEEN VAN STEEN NAME E- [email protected] MVS@VAN- MAIL STEEN.NET Check form Process form Scaling techniques 23 / 56 Introduction: Design goals Being scalable Techniques for scaling Partition data and computations across multiple machines ► Move computations to clients (Java applets) ► Decentralized naming services (DNS) ► Decentralized information systems (WWW) Scaling techniques 24 / 56 Introduction: Design goals Being scalable Techniques for scaling Replication and caching: Make copies of data available at different machines ► Replicated file servers and databases ► Mirrored Web sites ► Web caches (in browsers and proxies) ► File caching (at server and client) Scaling techniques 25 / 56 Introduction: Design goals Being scalable Scaling: The problem with replication Applying replication is easy, except for one thing ► Having multiple copies (cached or replicated), leads to inconsistencies: modifying one copy makes that copy different from the rest. ► Always keeping copies consistent and in a general way requires global synchronization on each modification. ► Global synchronization precludes large-scale solutions. Observation If we can tolerate inconsistencies, we may reduce the need for global synchronization, but tolerating inconsistencies is application dependent. Scaling techniques 26 / 56 Introduction: Design goals Pitfalls Developing distributed systems: Pitfalls Observation Many distributed systems are needlessly complex caused by mistakes that required patching later on. Many false assumptions are often made. False (and often hidden) assumptions ► The network is reliable ► The network is secure ► The network is homogeneous ► The topology does not change ► Latency is zero ► Bandwidth is infinite ► Transport cost is zero ► There is one administrator 27 / 56 Introduction: Types of distributed systems Three types of distributed systems ► High performance distributed computing systems ► Distributed information systems ► Distributed systems for pervasive computing 28 / 56 Introduction: Types of distributed systems High performance distributed computing Parallel computing Observation High-performance distributed computing started with parallel computing Multiprocessor and multicore versus multicomputer M SharedMmemoryM M M M M Private memory Interconnect P P P P P P P Interconnect Processor Memory P 29 / 56 Introduction: Types of distributed systems High performance distributed computing Distributed shared memory systems Observation Multiprocessors are relatively easy to program in comparison to multicomputers, yet have problems when increasing the number of processors (or cores). Solution: Try to implement a shared-memory model on top of a multicomputer. 30 / 56 Introduction: Types of distributed systems High performance distributed computing Cluster computing Essentially a group of high-end systems connected through a LAN ► Homogeneous: same OS, near-identical hardware ► Single managing node Master node Compute node Compute node Compute node Management Component Component Component application of of of parallel parallel parallel application application application Parallel libs Local OS Local OS Local OS Local OS Remote access Standard network network High-speed network Cluster computing 31 / 56 Introduction: Types of distributed systems High performance distributed computing Grid computing The next step: lots of nodes from everywhere ► Heterogeneous ► Dispersed across several organizations ► Can easily span a wide-area network Note To allow for collaborations, grids generally use virtual organizations. In essence, this is a grouping of users (or better: their IDs) that will allow for authorization on resource allocation. Grid computing 32 / 56 Introduction: Types of distributed systems High performance distributed computing Architecture for grid computing The layers Fabric: Provides interfaces to local resources (for querying state and capabilities, locking, etc.) Applications Connectivity: Communication/transaction protocols, e.g., for moving data Collective layer between resources. Also various authentication protocols. Connectivity layer Resource layer Resource: Manages a single resource, such as creating processes or reading data. Fabric layer Collective: Handles access to multiple resources: discovery, scheduling, replication. Application: Contains actual grid applications in a single organization. Grid computing 33 / 56 Introduction: Types of distributed systems High performance distributed computing Cloud computing Google docs aa Svc Software Web services, multimedia, business apps Gmail YouTube, Flickr Application MS Azure Software framework (Java/Python/.Net) Google App engine Storage (databases) Platform aa Svc Platforms Amazon S3 Computation (VM), storage (block, file) Amazon EC2 Infrastructure Infrastructure aa Svc CPU, memory, disk, bandwidth Datacenters Hardware Cloud computing 34 / 56 Introduction: Types of distributed systems High performance distributed computing Cloud computing Make a distinction between four layers ► Hardware: Processors, routers, power and cooling systems. Customers normally never get to see these. ► Infrastructure: Deploys virtualization techniques. Evolves around allocating and managing virtual storage devices and virtual servers. ► Platform: Provides higher-level abstractions for storage and such. Example: Amazon S3 storage system offers an API for (locally created) files to be organized and stored in so-called buckets. ► Application: Actual applications, such as office suites (text processors, spreadsheet applications, presentation applications). Comparable to the suite of apps shipped with OSes. Cloud computing 35 / 56 Introduction: Types of distributed systems Distributed information systems Integrating applications Situation Organizations confronted with many networked applications, but achieving interoperability was painful. Basic approach A networked application is one that runs on a server making its services available to remote clients. Simple integration: clients combine requests for (different) applications; send that off; collect responses, and present a coherent result to the user. Next step Allow direct application-to-application communication, leading to Enterprise Application Integration. 42 / 56 Introduction: Types of distributed systems Distributed information systems Example EAI: (nested) transactions Transaction Primitive Description BEGIN TRANSACTION Mark the start of a transaction END TRANSACTION Terminate the transaction and try to commit ABORT TRANSACTION Kill the transaction and restore the old values READ Read data from a file, a table, or otherwise WRITE Write data to a file, a table, or otherwise Issue: all-or-nothing Nested transaction Subtransaction Subtransaction ► Atomic: happens indivisibly (seemingly) ► Consistent: does not violate system invariants Airline database Hotel database ► Isolated: not mutual interference ► Durable: commit means changes are Two different (independent) databases permanent Distributed transaction processing 43 / 56 Introduction: Types of distributed systems Distributed information systems TPM: Transaction Processing Monitor Server Reply Transaction Request Requests Request Client TP monitor Server application Reply Reply Request Server Reply Observation In many cases, the data involved in a transaction is distributed across several servers. A TP Monitor is responsible for coordinating the execution of a transaction. Distributed transaction processing 44 / 56 Introduction: Types of distributed systems Distributed information systems Middleware and EAI Client Client application application Communication middleware Server-side Server-side Server-side application application application Middleware offers communication facilities for integration Remote Procedure Call (RPC): Requests are sent through local procedure call, packaged as message, processed, responded through message, and result returned as return from call. Message Oriented Middleware (MOM): Messages are sent to logical contact point (published), and forwarded to subscribed applications. Enterprise application integration 45 / 56 Introduction: Types of distributed systems Distributed information systems How to integrate applications File transfer: Technically simple, but not flexible: ► Figure out file format and layout ► Figure out file management ► Update propagation, and update notifications. Shared database: Much more flexible, but still requires common data scheme next to risk of bottleneck. Remote procedure call: Effective when execution of a series of actions is needed. Messaging: RPCs require caller and callee to be up and running at the same time. Messaging allows decoupling in time and space. Enterprise application integration 46 / 56 Introduction: Types of distributed systems Pervasive systems Distributed pervasive systems Observation Emerging next-generation of distributed systems in which nodes are small, mobile, and often embedded in a larger system, characterized by the fact that the system naturally blends into the user’s environment. Three (overlapping) subtypes ► Ubiquitous computing systems: pervasive and continuously present, i.e., there is a continuous interaction between system and user. ► Mobile computing systems: pervasive, but emphasis is on the fact that devices are inherently mobile. ► Sensor (and actuator) networks: pervasive, with emphasis on the actual (collaborative) sensing and actuation of the environment. 47 / 56 Introduction: Types of distributed systems Pervasive systems Ubiquitous systems Core elements 1. (Distribution) Devices are networked, distributed, and accessible in a transparent manner 2. (Interaction) Interaction between users and devices is highly unobtrusive 3. (Context awareness) The system is aware of a user’s context in order to optimize interaction 4. (Autonomy) Devices operate autonomously without human intervention, and are thus highly self-managed 5. (Intelligence) The system as a whole can handle a wide range of dynamic actions and interactions Ubiquitous computing systems 48 / 56 Introduction: Types of distributed systems Pervasive systems Mobile computing Distinctive features ► A myriad of different mobile devices (smartphones, tablets, GPS devices, remote controls, active badges. ► Mobile implies that a device’s location is expected to change over time ⇒ change of local services, reachability, etc. Keyword: discovery. ► Communication may become more difficult: no stable route, but also perhaps no guaranteed connectivity ⇒ disruption-tolerant networking. Mobile computing systems 49 / 56 Introduction: Types of distributed systems Pervasive systems Community detection Issue How to detect your community without having global knowledge? Gradually build your list 1. Node i maintains familiar set Fi and community set Ci , initially both empty. 2. Node i adds j to Ci when |F j |F ∩C i | > λ j| 3. Merge two communities when |Ci ∩Cj | > γ|Ci ∪Cj | Experiments show that λ = γ = 0.6 is good. Mobile computing systems 51 / 56 Introduction: Types of distributed systems Pervasive systems How mobile are people? Experimental results Tracing 100,000 cell-phone users during six months leads to: 1 -2 10 Probability -4 10 -6 10 5 10 50 100 500 1000 Displacement Moreover: people tend to return to the same place after 24, 48, or 72 hours ⇒ we’re not that mobile. Mobile computing systems 52 / 56 Introduction: Types of distributed systems Pervasive systems Sensor networks Characteristics The nodes to which sensors are attached are: ► Many (10s-1000s) ► Simple (small memory/compute/communication capacity) ► Often battery-powered (or even battery-less) Sensor networks 53 / 56 Introduction: Types of distributed systems Pervasive systems Sensor networks as distributed databases Two extremes Sensor network Operator's site Sensor data is sent directly to operator Each sensor can process and Sensor network store data Operator's site Query Sensors send only answers Sensor networks 54 / 56

Topic1 Introduction to distrbuited systems-updated (3).pptx

Document Details

Tags

Related

Full Transcript