Distributed Systems Chapter 1 Slides PDF
Document Details

Uploaded by AchievableBlackHole8808
University of Engineering and Technology, Lahore
2015
Coulouris, Dollimore, Kindberg and Blair
Tags
Summary
These slides, from Coulouris, Dollimore, Kindberg and Blair's book, provide an introduction to distributed systems. The document explores the characterization of distributed systems, covering topics such as system challenges, examples, and trends within computer networks and system design.
Full Transcript
Slides for Chapter 1 Characterization of Distributed Systems From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2015 Distributed Systems (Introduction) Definition: A distributed system is one in which components (software/...
Slides for Chapter 1 Characterization of Distributed Systems From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2015 Distributed Systems (Introduction) Definition: A distributed system is one in which components (software/hardware) located at networked computers communicate and coordinate their actions only by-passing messages. [Coulouris] Definition: It is a collection of autonomous computers linked by a computer networks and equipped with distributed systems software (middleware). Definition: A distributed system is a collection of independent computers that appear to the users of the system as a single computer. [Andrew Tanenbaum] Definition: A distributed system is several computers doing something together. Thus, a distributed system has three primary characteristics: multiple computers, interconnections, and shared state. [Michael Schroeder] Distributed Systems (Introduction) e.g., the Internet and the associated World Wide Web, web search, online gaming, financial trading systems, email, social networks, eCommerce, etc. The sharing of resources is a main motivation for constructing D.S. Resources may be managed by servers & accessed by clients or they may be encapsulated as objects and accessed by other client objects. Figure 2.7 Software and hardware service layers in distributed systems To solving the problems of heterogeneity, middleware provides a uniform computational model for use by the programmers of servers and distributed applications. The term middleware applies to a software layer that provides a programming abstraction as well as masking the heterogeneity of the underlying networks, hardware, operating systems and programming languages. The Common Object Request Broker (CORBA), Java RMI etc. Distributed Systems (Introduction) The challenges arising from the construction of distributed systems are the heterogeneity (variety & differences) of their components, openness (which allows components to be added or replaced), security, scalability – the ability to work well when the load or the number of users increases – failure handling, concurrency of components, transparency and providing quality of service. H e t e ro g e n e i t y – n e t w o r k s , c o m p u t e r h a r d w a r e , O S , P L , implementations by different developers Fig. 1.1 (see book for the full text) Uses of Distributed Systems Technology: Selected application domains and associated networked applications Finance and commerce eCommerce e.g. Amazon and eBay, PayPal, online banking and trading The information society Web information and search engines, ebooks, Wikipedia; social networking: Facebook and MySpace. Creative industries and online gaming, music and film in the home, user- entertainment generated content, e.g. YouTube, Flickr Healthcare health informatics, on online patient records, monitoring patients Education e-learning, virtual learning environments; distance learning Transport and logistics GPS in route finding systems, map services: Google Maps, Google Earth Science The Grid as an enabling technology for collaboration between scientists Environmental management sensor technology to monitor earthquakes, floods or tsunamis Distributed Systems - Challenges Heterogeneity: Heterogeneity (that is, variety and difference) applies to all of the following: networks computer hardware operating systems programming languages implementations by different developers Distributed Systems - Challenges Openness: that determines whether the system can be extended and reimplemented in various ways new resource-sharing services can be added and be made available for use by a variety of client programs needs specification and documentation of the key software interfaces of the components of a system are made available to software developers a uniform communication mechanism and published interfaces for access to shared resources. Distributed Systems - Challenges Security: Security for information resources has three components: confidentiality (protection against disclosure to unauthorized individuals), integrity (protection against alteration or corruption), and availability (protection against interference with the means to access the resources - fairness). Denial of service attacks: Another security problem is that a user may wish to disrupt a service for some reason. A large number of pointless requests that the serious users are unable to use it Security of mobile code: Mobile code needs to be handled with care. An executable program as email attachment - the possible effects of running the program are unpredictable, it may seem to display an interesting picture but in reality it may access local resources Distributed Systems - Challenges Scalability: A system is described as scalable if it will remain effective when there is a significant increase in the number of resources and the number of users. The number of computers and servers in the Internet has increased dramatically. Controlling the cost of physical resources Controlling the performance loss Preventing software resources running out: An example of lack of scalability is shown by the numbers used as Internet (IP) addresses (32bit to 128bit) Figure 1.6: Growth of the Internet (computers and web servers) Date Computers Web servers Percentage 1993, July 1,776,000 130 0.008 1995, July 6,642,000 23,500 0.4 1997, July 19,540,000 1,203,096 6 1999, July 56,218,000 6,598,697 12 2001, July 125,888,197 31,299,592 25 2003, July ~200,000,000 42,298,371 21 2005, July 353,284,187 67,571,581 19 Which is a Distributed System – (A) or (B)? (A) (A) Facebook Social Network Graph among humans Source: https://www.facebook.com/note.php?note_id=469716398919 (B) (B) The Internet (Internet Mapping Project, color coded by ISPs) Distributed System Example – Web domains What are the “entities” (nodes)? What is the communication medium (links)? Source: http://www.vlib.us/web/worldwideweb3d.html Distributed System Example – A Datacenter Distributed Systems - Challenges Failure Handling: Computer systems sometimes fail. When faults occur in hardware or software, programs may produce incorrect results or may stop before they have completed the intended computation. Detecting failures: Some failures can be detected. For example, checksums can be used to detect corrupted data in a message or a file Masking failures: i) Messages can be retransmitted when they fail to arrive, ii) File data can be written to a pair of disks so that if one is corrupted, the other may still be correct Recovery from failures: Recovery involves the design of software so that the state of permanent data can be recovered or ‘rolled back’ after a server has crashed Redundancy: Services can be made to tolerate failures by the use of redundant components; i) Two different routes between any two routers in the Internet; ii) In the DNS, every name table is replicated in at least two different servers; iii) A database may be replicated in several servers to ensure that the data remains accessible after the failure of any single server Distributed Systems - Challenges Redundancy and resiliency: Relying on just one DNS server creates a single point of failure. If the primary server fails or is compromised by an attack, prospective visitors can no longer access the desired domain. Using secondary servers creates redundancy and makes it less likely that users will experience a disruption of service. Load balancing: Secondary DNS servers can share the burden of incoming requests to the domain so that the primary server doesn’t get overloaded and cause a denial-of-service. They do this using round-robin DNS, a load balancing technique designed to send roughly equal amounts of traffic to each server. Distributed Systems - Challenges Concurrency: Both services and applications provide resources that can be shared by clients in a distributed system. The services and applications generally allow multiple client requests to be processed concurrently. There is therefore a possibility that several clients will attempt to access a shared resource at the same time. For an object to be safe in a concurrent environment, its operations must be synchronized in such a way that its data remains consistent. Therefore, concurrency is another challenges, however, it is useful in multicore, multiprocessor and distributed computer systems: Increased performance from true parallelism Increased reliability (fault tolerance) Specialized processors (graphics, communication, encryption...) Some applications, like email, are inherently distributed Distributed Systems - Challenges Transparency: It means that any form of distributed system should hide its distributed nature from its users, appearing and functioning as a normal centralized system. Transparency is defined as the concealment (hiding) from the user and the application programmer of the separation of components in a distributed system, so that the system is perceived as a whole rather than as a collection of independent components. The two most important transparencies are access and location transparency; their presence or absence most strongly affects the utilization of distributed resources. Distributed Systems - Challenges Access transparency: enables local & remote resources to be accessed using identical operations (GUI with folders, which is same whether the files inside the folder are local or remote) Location transparency: enables resources to be accessed without knowledge of their physical or network location (for example, which building or IP address). Concurrency transparency: enables several processes to operate concurrently using shared resources without interference between them. Replication transparency: enables multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or application programmers. Failure transparency: enables the concealment of faults, allowing users and application programs to complete their tasks despite the failure of hardware or software components. Mobility transparency: allows the movement of resources and clients within a system without affecting the operation of users or programs. Performance transparency: allows the system to be reconfigured to improve performance as loads vary. Scaling transparency: allows the system and applications to expand in scale without change to the system structure or the application algorithms. Distributed Systems - Challenges We use the term service for a distinct part of a computer system that manages a collection of related resources and presents their functionality to users and applications, e.g., we access shared files through a file service; we send documents to printers through a printing service; we buy goods through an electronic payment service. The only access we have to the service is via the set of operations that it exports. e.g., a file service provides read, write and delete operations on files. The term server is probably familiar to most readers. It refers to a running program (a process) on a networked computer that accepts requests from programs running on other computers to perform a service and responds appropriately. The requesting processes are referred to as clients, and the overall approach is known as client-server computing. In this approach, requests are sent in messages from clients to a server and replies are sent in messages from the server to the clients. When the client sends a request for an operation to be carried out, we say that the client invokes an operation upon the server. A complete interaction between a client and a server, from the point when the client sends its request to when it receives the server’s response, is called a remote invocation. Characteristics of Distributed Systems (cont.) No global clock: When programs need to cooperate, they coordinate their actions by exchanging messages. Close coordination often depends on a shared idea of the time at which the programs’ actions occur. But it turns out that there are limits to the accuracy with which the computers in a network can synchronize their clocks – there is no single gl oba l not i on of t he c orre c t t i m e. T hi s i s a di re c t consequence of the fact that the only communication is by sending messages through a network. Distributed Systems (Examples & Trends) We now look at more specific examples of distributed systems to further illustrate the diversity and indeed complexity of distributed systems provision today: Web Search Pervasive networking and the modern Internet Mobile and ubiquitous computing Massively multiplayer online games (MMOGs) Financial Trading Distributed Systems (Examples) Web Search: has emerged as a major growth industry in the last decade and global number of searches has risen to over 10 billion per calendar month task of a web search engine is to index the entire contents of the World Wide Web, web pages, multimedia sources, books etc Google, the market leader in web search technology has put significant effort into the design of a sophisticated distributed system infrastructure to support search (and Google Earth) very large numbers of networked computers located at data centres all around the world very large parallel and distributed computations/storage system that offers fast access to very large datasets Distributed System Example – A Datacenter Distributed Systems (Examples) Pervasive networking and the modern Internet The modern Internet is a vast interconnected collection of computer networks of many different types - WiFi, WiMAX, Bluetooth and 3G/4G mobile phone networks. Programs running on the Computers connected to it interact by passing messages, employing a common means of communication. Internet enables users, wherever they are, to make use of services such as the www, email and file transfer. Intranets – subnetworks operated by companies and other organizations and typically protected by firewalls (preventing unauthorized messages from leaving or entering). The intranets are linked together by backbones. A backbone is a network link with a high transmission capacity, employing satellite connections, fibre optic cables and other high-bandwidth circuits. Figure 1.3: A typical portion of the Internet intranet ☎ ☎ ☎ ISP ☎ backbone satellite link desktop computer: server: network link: An Intranet and a distributed system over it email server print and other servers Running over this Intranet is a distributed file system Local area Web server network What are the “entities” (nodes) in it? What is the communication medium? email server Police, military and other print File server security and law enforcement agencies other servers the rest of the Internet router/firewall prevents unauthorized messages from leaving/entering; implemented by filtering incoming and outgoing messages Lecture 1-28 Distributed Systems (Examples) Mobile and ubiquitous computing: Ubiquitous computing - connecting the many small, cheap computational devices that are present in users’ physical environments, including the home, office etc. a concept where computing is made to appear anytime & everywhere, in contrast to desktop computing it can occur using any device/any location/any format. Laptop computers, Handheld devices - including mobile phones, smart phones, GPS-enabled devices, pagers, personal digital assistants (PDAs), video/digital cameras & smart watches. Devices embedded in appliances such as washing machines, hi-fi systems, cars and refrigerators Distributed Systems (Examples) Figure 1.4: Portable and handheld devices in a distributed system User visiting a host organization - user’s home intranet and the host intranet at the site that the user is visiting. Both intranets are connected to the rest of the Internet. access to three forms of wireless connection Distributed Systems (Examples) Financial Trading the sources are typically in a variety of formats, such as Reuters market data events and FIX events (events following the specific format of the Financial Information eXchange protocol) use of adapters which translate heterogeneous formats into a common internal format Fig. 1.2: An example financial trading system Distributed Systems are layered over networks Application Underlying Application layer protocol transport protocol e-mail smtp [RFC 821] TCP remote terminal access telnet [RFC 854] TCP Web http [RFC 2068] TCP file transfer ftp [RFC 959] TCP streaming multimedia proprietary TCP or UDP (e.g. RealNetworks) remote file server NFS-network file system TCP or UDP Internet telephony proprietary typically UDP (e.g., Skype) TCP=Transmission Control Protocol UDP=User Datagram Protocol Distributed System Protocols! e.g. SMTP- Simple Mail Transfer Protocol Networking Protocols Lecture 1-33 The Secret of the World Wide Web: the HTTP Standard htt pr e qu HTTP: hypertext transfer PC running htt es t pr protocol Explorer es p on s e WWW’s application layer protocol st client/server model re que se Server – client: browser that requests, Mac running h ttp s po n Running receives, and “displays” Safari r e WWW objects h ttp Apache Web – server: WWW server, which is storing the website, sends server objects in response to requests http1.0: RFC 1945 http1.1: RFC 2068 – Leverages same connection to download images, scripts, etc. Lecture 1-34 The HTTP Protocol: More http: TCP transport http is “stateless” service: server maintains no client initiates a TCP information about connection (creates socket) past client requests to server, port 80 Why? Protocols that maintain session server accepts the TCP connection from client “state” are complex! http messages (application- past history (state) must be layer protocol messages) maintained and updated. exchanged between if server/client crashes, their browser (http client) and views of “state” may be WWW server (http server) inconsistent, and hence must TCP connection closed be reconciled. RESTful protocols are stateless. Lecture 1-35 HTTP Example Suppose user enters URL www.cs.uiuc.edu/ (contains text, references to 10 jpeg images) 1a. http client initiates a TCP 1b. http server at host connection to http server www.cs.uiuc.edu waiting for a (process) at www.cs.uiuc.edu. TCP connection at port 80. Port 80 is default for http server. “accepts” connection, notifying client 2. http client sends a http request message (containing URL) into TCP connection socket 3. http server receives request messages, forms a response message containing requested object (index.html), sends message into socket time Lecture 1-36 HTTP Example (cont.) 4. http server closes the TCP connection (if necessary). 5. http client receives a response message containing html file, displays html, Parses html file, finds 10 referenced jpeg objects 6. Steps 1-5 are then repeated for each of 10 jpeg objects time For fetching referenced objects, have 2 options: non-persistent connection: only one object fetched per TCP connection – some browsers create multiple TCP connections simultaneously - one per object persistent connection: multiple objects transferred within one TCP connection Lecture 1-37 A human as a browser (Client Side) 1. Telnet to your favorite WWW server: telnet www.google.com 80 Opens TCP connection to port 80 (default http server port) at www.google.com Anything typed in sent to port 80 at www.google.com 2. Type in a GET http request: GET /index.html By typing this in (may need to hit Or return twice), you send GET /index.html HTTP/1.0 this minimal (but complete) GET request to http server 3. Look at response message sent by http server! What do you think the response is? Lecture 1-38 Distributed computing as a utility – Cloud Computing The term cloud computing is used to capture this vision of computing as a utility. A cloud is defined as a set of Internet-based application, storage and computing services sufficient to support most users’ needs, thus enabling them to largely or totally distribute with local data storage and application software. What is cloud computing, in simple terms? Cloud computing is the delivery of on-demand computing services -- from applications to storage and processing power -- typically over the internet and on a pay-as-you-go basis Rather than owning their own computing infrastructure or data centers, companies can rent access to anything from applications to storage from a cloud service provider One benefit of using cloud computing services is that firms can avoid the upfront cost and complexity of owning and maintaining their own IT infrastructure, and instead simply pay for what they use, when they use it In turn, providers of cloud computing services can benefit from significant economies of scale by delivering the same services to a wide range of customers, which includes consumer services like Gmail or the cloud back-up of the photos on your smartphone Services Models – Cloud Computing SaaS is a way in which a software is carried as on demand application over Internet. Instead of installing it on the computer it can be accessed and utilized over Internet makes us free from Complex software PaaS provides an environment for development as a service & used by users to run and deploy application. Different providers make their own tools which are used by other users to develop their own program. These programs are provided to the users through internet IaaS provides infrastructure which is used by client (it can be storage, Network and firewalls). IaaS provides multiple resources like load balancer, virtual Local area network, IP address and software Figure 7: Web servers and web browsers Web servers Browsers http://www.google.comlsearch?q=obama www.google.com www.cdk5.net Internet http://www.cdk5.net/ www.w3c.org File system of http://www.w3.org/standards/faq.html#conformance www.w3c.org standards faq.html