dsd-combo-p1.pdf
Document Details
Uploaded by CharismaticParallelism3660
Full Transcript
15-440 DISTRIBUTED SYSTEM Distributed DESIGN Systems Recitation Lab 1 2 Socket Programming in Python I Review: Communication via Sockets Sockets provide a communication mechanism between networked computers. ASocket is an end-point of communicatio...
15-440 DISTRIBUTED SYSTEM Distributed DESIGN Systems Recitation Lab 1 2 Socket Programming in Python I Review: Communication via Sockets Sockets provide a communication mechanism between networked computers. ASocket is an end-point of communication that is identified by an IP address and port number. A client sends requests to a server using a clientsocket. A server receives clients’ requests via a listeningsocket Review: Communication via Sockets Listening Socket Service Socket Review: Socket Methods SN Methods with Description 1 socket.socket(family=AF_INET, type=SOCK_STREAM, proto=0, fileno=None) Create a new socket using the given address family, socket type and protocol number. The address family should be AF_INET (the default), AF_INET6, AF_UNIX, AF_CAN, AF_PACKET, or AF_RDS. The socket type should be SOCK_STREAM (the default), SOCK_DGRAM, SOCK_RAW or perhaps one of the other SOCK_ constants. The protocol number is usually zero and may be omitted or in the case where the address family is AF_CAN the protocol should be one of CAN_RAW, CAN_BCM, CAN_ISOTP or CAN_J1939. 2 socket.create_server(address, *, family=AF_INET, backlog=None, reuse_port=False, dualstack_ipv6=False) Convenience function which creates a TCP socket bound to address (a 2-tuple (host, port)) and return the socket object. 3 socket.accept() Accept a connection. The socket must be bound to an address and listening for connections. The return value is a pair (conn, address) where conn is a new socket object usable to send and receive data on the connection, and address is the address bound to the socket on the other end of the connection. 4 socket.bind(address) Bind the socket to address. The socket must not already be bound. 5 socket.connect(address) Connect to a remote socket at address. 6 socket.recv(bufsize[, flags]) Receive data from the socket. The return value is a bytes object representing the data received. The maximum amount of data to be received at once is specified by bufsize. See the Unix manual page recv(2) for the meaning of the optional argument flags; it defaults to zero. 7 socket.sendall(bytes[, flags]) Send data to the socket. The socket must be connected to a remote socket. The optional flags argument has the same meaning as for recv() above. Unlike send(), this method continues to send data from bytes until either all data has been sent or an error occurs. None is returned on success. On error, an exception is raised, and there is no way to determine how much data, if any, was successfully sent. 8 socket.close() Mark the socket closed. The underlying system resource (e.g. a file descriptor) is also closed when all file objects from makefile() are closed. Once that happens, all future operations on the socket object will fail. The remote end will receive no more data (after queued data is flushed). Multi-Threaded Socket Applications Modern distributed systems perform multiple tasks in parallel. Servers: Communicate with multiple clients. Store/update records in a database. Perform application logic. Clients: Communicate with one or more servers. Display UI to user. These systems leverage threaded programming to perform all tasks. Each task is carried out in a separate thread. Multi-Threaded Socket in Servers Threads in Python Threads can be created via the Thread class from the built-in threading library. 1. Create a class MyCustomThread that inherits the Thread class: a) Override the constructor (__init__()) to take the required arguments. A threaded socket operation should supply the client/service socket to the constructor. b) Override the run() method to perform the required task. 2. Create a CustomThread object and supply the required arguments. 3. Call the start() method from the object. 4. (optional) call join() method from the object in case the thread returns a value to ensure the thread has finished. Thread Example from threading import Thread class MyThread(Thread): def __init__(self, message): Thread.__init__(self) self.message = message # save the message argument self.return_val = None # variable to store return value def run(self): user_input = input(self.message) # get input from the user (blocking) self.return_val = user_input # store the input in return value def join(self, *args): Thread.join(self, *args) return self.return_val # return the value upon join myThread = MyThread('Enter user name.') myThread.start() print('Thread is now running.') username = myThread.join() print('Thread is done. Username is:', username) Hello World Server import socket HOST = "127.0.0.1" # localhost PORT = 65432 # Port to listen on with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.bind((HOST, PORT)) s.listen() conn, addr = s.accept() with conn: print(f"Connected by {addr}") while True: # handle all requests from the client data = conn.recv(1024) if not data: break message = f'I got "{data.decode()}" from you and I am sending it back.' conn.sendall(str.encode(message)) Threaded Hello World Server import socket from threading import Thread class ClientThread(Thread): def __init__(self, service_socket : socket.socket, address : str): Thread.__init__(self) self.service_socket = service_socket self.address = address def run(self): print(f"Connected by {self.address}") while True: # handle all requests from the client data = self.service_socket.recv(1024) if not data: break message = f'I got "{data.decode()}" from you and I am sending it back.' self.service_socket.sendall(str.encode(message)) print(f'Sent back "{data.decode()}" to: {self.address}') self.service_socket.close() HOST = "127.0.0.1" # localhost PORT = 65432 # Port to listen on with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.bind((HOST, PORT)) s.listen() while True: conn, addr = s.accept() client_thread = ClientThread(conn, addr) client_thread.start() Hello World Client import socket HOST = "127.0.0.1" # The server's hostname or IP address PORT = 65432 # The port used by the server with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.connect((HOST, PORT)) print('Connected to:', HOST, PORT) message = input('Enter your message to server: ') # user input to block the client s.sendall(message.encode()) data = s.recv(1024) print(f"Server says: {data.decode()}") Exercises 1. Run the Threaded Hello World example. 2. Create a new threaded server file_server.py that takes a client message containing a file name (either “turtle.jpg” or “eagle.jpg”). It reads the corresponding file and sends it to the client as binary. It should listen on port 6565. 3. Modify the client as follows: a) Establishes a connection with the threaded hello world server. b) In a new thread: I. Takes user input: either “turtle.jpg” or “eagle.jpg” II. Opens a connection with file_server.py III. Gets the corresponding file as binary. c) Sends a message to the threaded hello world server: “The size of is ”, where the former is the name of the file and the latter is its size (length of the bytearray). You should get 55696 bytes for eagle.jpg and 583163 bytes for turtle.jpg 15-440 DISTRIBUTED SYSTEM Distributed DESIGN Systems Recitation Lab 1 2 Socket Programming in Python I Communication via Sockets Sockets provide a communication mechanism between networked computers. ASocket is an end-point of communication that is identified by an IP address and port number. A client sends requests to a server using a clientsocket. A server receives clients’ requests via a listeningsocket Communication via Sockets Person A Person B (A’s home) (Guest) Person B knocks the door Person A Is Listening Person B Enters Person A Opens door Communication via Sockets A “binds” to his home Person A Person B (A’s home) (Guest) Person B B sends a request to communicate knocks the door Person A Is Listening Person B A accepts the Enters B is now “connected” with A request Person A Opens door Communication via Sockets A binds to: (1) IPaddress (2) Port number Server A Client B Client B sends a request to communicate with the server Server A isListening to Requests Client B is now connected with Server A Server A accepts request Communication via Sockets Listening Socket Service Socket Socket Communication Recipe 1. Server instantiates a socket object. This socket is referred to as the listening socket. 2. Server binds its socket to a specific IP address and port. 3. Server invokes the accept()method that awaits incoming client connections. 4. Client instantiates socket object. This socket is referred to as a client socket. 5. Client connects to the server with IP address and port. 6. On the server side, accept()returns a new socket referred to as a service socket on which the client reads/writes. Socket Methods SN Methods with Description 1 socket.socket(family=AF_INET, type=SOCK_STREAM, proto=0, fileno=None) Create a new socket using the given address family, socket type and protocol number. The address family should be AF_INET (the default), AF_INET6, AF_UNIX, AF_CAN, AF_PACKET, or AF_RDS. The socket type should be SOCK_STREAM (the default), SOCK_DGRAM, SOCK_RAW or perhaps one of the other SOCK_ constants. The protocol number is usually zero and may be omitted or in the case where the address family is AF_CAN the protocol should be one of CAN_RAW, CAN_BCM, CAN_ISOTP or CAN_J1939. 2 socket.create_server(address, *, family=AF_INET, backlog=None, reuse_port=False, dualstack_ipv6=False) Convenience function which creates a TCP socket bound to address (a 2-tuple (host, port)) and return the socket object. 3 socket.accept() Accept a connection. The socket must be bound to an address and listening for connections. The return value is a pair (conn, address) where conn is a new socket object usable to send and receive data on the connection, and address is the address bound to the socket on the other end of the connection. 4 socket.bind(address) Bind the socket to address. The socket must not already be bound. 5 socket.connect(address) Connect to a remote socket at address. 6 socket.recv(bufsize[, flags]) Receive data from the socket. The return value is a bytes object representing the data received. The maximum amount of data to be received at once is specified by bufsize. See the Unix manual page recv(2) for the meaning of the optional argument flags; it defaults to zero. 7 socket.sendall(bytes[, flags]) Send data to the socket. The socket must be connected to a remote socket. The optional flags argument has the same meaning as for recv() above. Unlike send(), this method continues to send data from bytes until either all data has been sent or an error occurs. None is returned on success. On error, an exception is raised, and there is no way to determine how much data, if any, was successfully sent. 8 socket.close() Mark the socket closed. The underlying system resource (e.g. a file descriptor) is also closed when all file objects from makefile() are closed. Once that happens, all future operations on the socket object will fail. The remote end will receive no more data (after queued data is flushed). Transport Protocols Socket: endpoint to read and write data Each Socket has a network protocol Two types of protocols used for communicating data/packets over the internet: TCP: Transmission Control Protocol Connection Oriented (handshake) socket.socket(type=SOCK_STREAM) UDP: User Datagram Protocol “Connectionless” socket.socket(type=SOCK_DGRAM) Hello World Server import socket HOST = "127.0.0.1" # localhost PORT = 65432 # Port to listen on with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.bind((HOST, PORT)) s.listen() conn, addr = s.accept() with conn: print(f"Connected by {addr}") while True: # handle all requests from the client data = conn.recv(1024) if not data: break message = f'I got "{data.decode()}" from you and I am sending it back.' conn.sendall(str.encode(message)) Hello World Client import socket HOST = "127.0.0.1" # The server's hostname or IP address PORT = 65432 # The port used by the server with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.connect((HOST, PORT)) s.sendall(b"Hello world") data = s.recv(1024) print(f"Server says: {data.decode()}") Exercises 1. Run the Hello World example. 2. Modify the client to send one more message and receive the server’s response. 3. Modify the server to keep running after handling a client request. 4. Modify the client to send multiple numbers as a string (separated by spaces), the server calculates and returns their sum. 5. socket.sendall() ensures all the data is sent but socket.recv(1024)receives a maximum of 1024 bytes. Modify the client and server to handle messages of variable sizes. You might need to add an end-of-file token (e.g. "") to your messages to know when a message ends. Distributed Systems Design COMP 6231 Introduction Lecture 1 Essam Mansour 1 2 Course Staff Instructor: Dr. Essam Mansour - Email: [email protected] - Instructor’s Website: emansour.com - Office hours: by appointment You should arrange for a meeting via email Teaching Assistant: Essam - Omij Mangukiya [email protected] - Waleed Afandi [email protected] 3 Omij Waleed Teaching Style I like interaction in class I like to ask questions I like to be asked questions I like to know (and memorize) your names J I like to give practical assignments and projects I like to learn … 4 Course Textbook Textbook: Distributed Systems, 3rd ed, by Tannenbaum and Van Steen, Prentice Hall 2018 All lectures will be prepared from this book. The book is available at the moodle. plus some other material, we may need. 5 Course Outline Introduction (today Part 1) - What, why, why not? - Basics Distributed Architectures Interprocess Communication - RPCs, RMI, message- and stream-oriented communication Processes and their scheduling - Thread/process scheduling, code/process migration, virtualization Naming and location management - Entities, addresses, access points 6 Course Outline Canonical problems and solutions - Mutual exclusion, leader election, clock synchronization, … Resource sharing, replication and consistency - DFS, consistency issues, caching and replication Fault-tolerance Security in distributed Systems Distributed middleware Advanced topics: web, cloud computing, green computing, big data, multimedia, and mobile systems 7 Course Grading Type # Weight Project 1 30% Exams 2 45% Assignments 5 25% Table 1: Breakdown of the main activities involved in the course. 8 Course Grading Project: 30% of your final score. - Each team consists of 3 to 4 students. - Each team will learn one of the following systems: - Pregel, GraphLab, PowerGraph, Cassandra, Couchbase, MongoDB, or Elasticsearch. You can propose a parallel system to develop. - Discuss the concepts of distributed systems in the chosen system. - Use a real dataset of at least one GB, the larger the better, and - Demo the capabilities of the chosen system. - The deliverables of the project are: Demo 15% of the final grade Presentation 10% of the final grade Report 5% of the final grade 9 Course Grading Assignments: - 25% towards your final score - 4 programming assignments ID Topic Weight Release Date Due Date 1 Socket 5% Fri Sept 22 Sun Oct 08 2 Multithreading & 9% Fri Oct 06 Sun Oct 29 MPI 3 Docker 5% Fri Oct 27 Sun Nov 12 4 SPARK Map- 6% Fri Nov 10 Sun Nov 26 Reduce 10 … and now 11 Data Becoming Critical to Our Lives Health Science Domains Education of Data Work Environment Finance … and more 12 We Live in a World of Data… 13 What Do We Do With Data? Store Share Access Process …. and Encrypt more! We want to do these seamlessly... 14 How to Store and Process Data at Scale? A system can be scaled: Either vertically (or up) ü Can be achieved by hardware upgrades (e.g., faster CPU, more memory, and/or larger disk) And/Or horizontally (or out) Can be achieved by adding more machines 15 Vertical Scaling Caveat: Individual computers can still suffer from limited resources with respect to the scale of today’s problems The image part with relationship ID 1. Caches and Memory: rId8 was not found in the file. L1 16-32 KB/Core, 4-5 cycles Cache L2 Cache 128-256 KB/Core, 12-15 cycles L3 Cache 512KB- 2 MB/Core, 30-50 cycles Main Memory 8GB- 128GB, 300+ cycles 16 Vertical Scaling Caveat: Individual computers can still suffer from limited resources with respect to the scale of today’s problems 2. Processors: § Moore’s law still holds § Chip Multiprocessors (CMPs) are now available P P P P P L1 L1 L1 L1 L1 Interconnect L2 L2 Cache A single Processor Chip 17 A CMP Vertical Scaling Caveat: Individual computers can still suffer from limited resources with respect to the scale of today’s problems 2. Processors: § But up until a few years ago, CPU speed grew at the rate of 55% annually, while the memory speed grew at the rate of only 7% P Processor-Memory speed gap M 18 Vertical Scaling Caveat: Individual computers can still suffer from limited resources with respect to the scale of today’s problems 2. Processors: § But up until a few years ago, CPU speed grew at the rate of 55% annually, while the memory speed grew at the rate of only 7% § Even if 100s or 1000s of cores are placed on a CMP, it is a challenge to deliver input data to these cores fast enough for processing Vertical Scaling P P P P 10000 Suffer from L1 L1 L1 L1 seconds (or limited scalability 3 hours) to Interconnect load data L2 Cache A Data Set of 4 TBs Memory 4 100MB/S IO Channels 20 How to Store and Process Data at Scale? A system can be scaled: Either vertically (or up) Can be achieved by hardware upgrades (e.g., faster CPU, more memory, and/or larger disk) And/Or horizontally (or out) Can be achieved by adding more machines ü 21 Horizontal Scaling P 100 P Splits Only 3 L1 Machines L1 minutes to L2 L2 load data Memory Memory A Data Set (data) of 4 TBs 22 Requirements But, this necessitates: A way to express the problem in terms of parallel processes and execute them on different machines (Programming and Concurrency Models) A way to organize processes (Architectures) A way for distributed processes to exchange information (Communication Paradigms) A way to locate and share resources (Naming Protocols) 23 Requirements But, this necessitates: A way for distributed processes to cooperate, synchronize with one another, and agree on shared values (Synchronization) A way to reduce latency, enhance reliability, and improve performance (Caching, Replication, and Consistency) A way to enhance load scalability, reduce diversity across heterogeneous systems, and provide a high degree of portability and flexibility (Virtualization) A way to recover from partial failures (Fault Tolerance) 24 Degree of Parallelism DATA D A T A Task Parallelism Data Parallelism 25 So, What is a Distributed System? A distributed system is: One in which components A collection of independent located at networked computers that appear to its computers communicate and users as a single coherent coordinate their actions only system by passing messages 26 Features Distributed Systems imply four main features: 1 Geographical Separation 2 No Common Physical Clock 3 No Common Physical Memory 4 Autonomy and Heterogeneity 27 Parallel vs. Distributed Systems Distributed systems contrast with parallel systems, which entail: 1 Strong Coupling 2 A Common Physical Clock I am not sure. Can you verify that? 3 A Shared Physical Memory 4 Homogeneity 28 Another Definition of a Distributed System A distributed system: - Multiple connected CPUs working together - A collection of independent computers that appears to its users as a single coherent system Examples: parallel machines, networked machines 29 Distributed System Models Cluster computing systems / Data centers LAN with a cluster of servers + storage Linux, Mosix,.. Used by distributed web servers, scientific applications, enterprise applications Grid computing systems Cluster of machines connected over a WAN SETI @ home WAN-based clusters / distributed data centers Google, Amazon, Facebook … Virtualization and data center Cloud Computing 30 Emerging Models Distributed Pervasive Systems “smaller” nodes with networking capabilities Computing is “everywhere” Mobile Devices Computers …and even appliances Consumer Electronics Personal Monitors and Sensors We also want to access, share and process our data from all of our devices, anytime, anywhere! 31 Middleware-based Systems General structure of a distributed system as middleware. 32 Transparency in a Distributed System Transparency Description Access Hide differences in data representation and how a resource is accessed Location Hide where a resource is located Migration Hide that a resource may move to another location Relocation Hide that a resource may be moved to another location while in use Replication Hide that a resource may be replicated Concurrency Hide that a resource may be shared by several competitive users Failure Hide the failure and recovery of a resource Persistence Hide whether a (software) resource is in memory or on disk 33 Scalable Distributed System Scalability Dimensions - Size scalability: - A system can be scalable with respect to its size, - we can easily add more users and resources to the system without any noticeable loss of performance. - Geographical scalability: - The users and resources may lie far apart, - The fact that communication delays may be significant is hardly noticed. - Administrative Scalability: - Still be easily managed even if it spans many independent administrative organizations. 34 Scaling Techniques Principles for good decentralized algorithms No machine has complete state Make decision based on local information A single failure does not bring down the system No global clock Techniques Asynchronous communication Distribution Caching and replication 35 Comparison between Systems Distributed OS Middleware- Item Network OS Multiproc. Multicomp. based OS Degree of transparency Very High High Low High Same OS on all nodes Yes Yes No No Number of copies of OS 1 N N N Basis for communication Shared memory Messages Files Model specific Global, Resource management Global, central Per node Per node distributed Scalability No Moderately Yes Varies Openness Depends on OS Depends on OS Open Open 36 A To-Do List Read Chapters 1, Introduction Attend the lab Start working in your first assignment Next Lecture Chapter 2: Architectures Questions? 38 Distributed Systems Design COMP 6231 Architectures Lecture 2 Essam Mansour 1 Today… § Last Session: § Introduction § This Session: Architectures for distributed systems (Chapter 2) Architectural styles Client-server architectures Decentralized and peer-to-peer architectures 2 3 Bird’s Eye View of Some Distributed Systems Peer 2 Google Expedia Server t Peer 1 Peer 3 es se qu on Re sp Re Search Reservation Search Reservation Search Reservation Peer 4 Client Client 1 1 Client Client 2 2 Client Client 33 Is it still Bit-torrent the case Google Search ? Airline Booking Skype How would one characterize these distributed systems? 4 Simple Characterization of Distributed Systems What are the entities that are communicating in a DS? a) Communicating entities (system-oriented vs. problem-oriented entities) How do the entities communicate? b) Communication paradigms (sockets and RPC) What roles and responsibilities do the entities have? c) This could lead to different organizations (referred, henceforth, to as architectures) 5 Architectural Styles Important styles of architecture for distributed systems Layered architectures Object-based architectures Data-centered architectures Event-based architectures Resource-based architectures 6 Layered Design Each layer uses previous layer to implement new functionality that is exported to the layer above Example: Multi-tier web apps 7 Layering A complex system is partitioned into layers Upper layer utilizes the services of the lower layer A vertical organization of services For example, a three-layer solution could easily be deployed on a single tier, such as a personal workstation. Layering simplifies the design of complex distributed systems by hiding the complexity of below layers Layer 3 Control flows from layer to layer Request Layer 2 Response flow flow Layer 1 8 Layering – Platform and middleware Distributed systems can be organized into three layers: 1.Platform Low-level hardware and software layers Provides common services for higher layers 2.Middleware Masks heterogeneity and provides convenient programming models to application programmers Typically, it simplifies application programming Applications by abstracting communication mechanisms Middleware 3.Applications Operating system Platform Computer and network hardware 9 Object-based Style Each object corresponds to a components Components interact via remote procedure calls Popular in client-server systems 10 Resource-oriented Architecture Example of ROA: Representational State Transfer (REST) 11 Resource-oriented Architecture Example of ROA: Representational State Transfer (REST) Basis for RESTful web services Resources identified through a single naming scheme Uniform Resource Identifier (URI) All services offer same interface (e.g., 4 HTTP operations) Get / Put / Delete / Post HTTP operations Messages are fully described No state of the caller is kept (stateless execution) Example: use HTTP for API http://bucketname.s3.aws.com/objName Return JSON objects {"name":"test.com","messages":["msg 1","msg 2","msg 3”],"age":100} Discuss: Service-oriented (SOA) vs. Resource-oriented (ROA) 12 Event-based architecture Communicate via a common repository Use a publish-subscribe paradigm Consumers subscribe to types of events Events are delivered once published by any publisher 13 Shared data-space “Bulletin-board” architecture Decoupled in space and time Post items to shared space; consumers pick up at a later time 14 Client-Server Architectures Most common style: client-server architecture Application layering User-interface level Processing level Data level 15 Search Engine Example Search engine architecture with 3 layers 16 A Spectrum of Choices 17 Tiering Tiering is a technique to: 1. Organize the functionality of a service, 2. and place the functionality into appropriate servers 3. a tier is a physical structuring mechanism for the system infrastructure Airline Search Application Display UI Get user Get data from Rank the screen Input database offers 18 A Two-Tiered Architecture How would you design an airline search application? EXPEDIA Airline Search Application Display Get user user input Input screen Airline Display Database Rank the result to offers user Tier 1 Tier 2 19 A Three-Tiered Architecture How would you design an airline search application? EXPEDIA Airline Search Application Display Get user user input Input screen Airline Display Database Rank the result to offers user Tier 1 Tier 2 Tier 3 20 A Three-Tiered Architecture Presentation Application Data Logic Logic Logic User Application- view, and Specific controls Processing Database manager User Application- view, and Specific control Processing Tier 1 Tier 2 Tier 3 21 Three-tier Web Applications Server itself uses a “client-server” architecture 3 tiers: HTTP, J2EE and database Very common in most web-based applications 22 Three-Tiered Architecture: Pros and Cons Advantages: Enhanced maintainability of the software (one-to-one mapping from logical elements to physical servers) Each tier has a well-defined role Disadvantages: Added complexity due to managing multiple servers Added network traffic Added latency 23 Edge-Server Systems Edge servers: from client-server to client-proxy-server Content distribution networks: proxies cache web content near the edge 24 Edge-Server Systems https://en.wikipedia.org/wiki/Edge_computing 25 Decentralized Peer-to-Peer Architecture § A peer-to-peer (P2P) architecture can be characterized as follows: 1) All nodes are equal (no hierarchy) § No Single-Point-of-Failure (SPOF) 2) A central coordinator is not needed § But, decision making becomes harder 3) The underlying system can scale out indefinitely § In principle, no performance bottleneck 26 Decentralized Peer-to-Peer Architecture Peer-to-peer systems Removes distinction between a client and a server Overlay network of nodes Chord: structured peer-to-peer system Use a distributed hash table to locate objects Data item with key k -> smallest node with id >= k 27 Client-Server vs Peer-to-Peer 28 Peer-to-Peer Architecture § A peer-to-peer (P2P) architecture can be characterized as follows: 4) Peers can interact directly, forming groups and sharing contents (or offering services to each other) § At least one peer should share the data, and this peer should be accessible § Popular data will be highly available (it will be shared by many) § Unpopular data might eventually disappear and become unavailable (as more users/peers stop sharing them) 5) Peers can form a virtual overlay network on top of a physical network topology § Logical paths do not usually match physical paths (i.e., higher latency) § Each peer plays a role in routing traffic through the overlay network 29 Peer-to-Peer Architecture Collaborative Distributed Systems BitTorrent: Collaborative P2P downloads Download chunks of a file from multiple peers Reassemble file after downloading Use a global directory (web-site) and download a.torrent.torrent contains info about the file Tracker: server that maintains active nodes that have requested chunks Force altruism: If P sees Q downloads more than uploads, reduce rate of sending to Q 30 P2P Types Types of P2P Architecture Unstructured Structured Hybrid ü 31 P2P Types 32 P2P Types § Unstructured P2P: § The architecture does not impose any particular structure on the overlay network Each node pick a random set of nodes and becomes their neighbors Choice of degree impacts network dynamics § Advantages: § Easy to build § Highly robust against high rates of churn (i.e., when a great deal of peers frequently join and leave the network) § Main disadvantage: § Peers and contents are loosely-coupled, creating a data location problem § Searching for data might require broadcasting 33 P2P Types Types of P2P Architecture Unstructured Structured Hybrid ü 34 P2P Types § Structured P2P: § The architecture imposes some structure on the overlay network topology § Main advantage: § Peers and contents are tightly-coupled (e.g., through hashing), simplifying data location § Disadvantages: § Harder to build § For optimized data location, peers must maintain extra metadata (e.g., lists of neighbors that satisfy specific criteria) § Less robust against high rates of churn 35 P2P Types Types of P2P Architecture Unstructured Structured Hybrid ü 36 P2P Types § Hybrid P2P: § The architecture can use some central servers to help peers locate each other § A combination of P2P and master-slave models § It offers a trade-off between the centralized functionality provided by the master-slave model and the node equality afforded by the pure P2P model § In other words, it combines the advantages of the master-slave and P2P models and precludes their disadvantages 37 Self-Managing Systems System is adaptive Monitors itself and takes action autonomously when needed Autonomic computing, self-managing systems Self-*: self-managing, self-healing Example: automatic capacity provisioning Vary capacity of a web server based on demand Monitor Compute current/ workload future demand Adjust allocation 38 Feedback Control Model Use feedback and control theory to design a self-managing system 39 A To-Do List Read Chapters 2, Architectures Project teams Distributed Systems Design COMP 6231 Processes Lecture 3 – Part 1 Essam Mansour 1 Today… § Last Session: § Architectures for distributed systems § Today’s Session: § Processes § Threads § Virtualization Communicating Entities in Distributed Systems Communicating entities in distributed systems can be classified into two types: System-oriented entities Processes Threads Nodes Problem-oriented entities Objects (in object-oriented programming based approaches) 3 Classification of Communication Paradigms Communication paradigms can be categorized into three types based on where the entities reside. If entities are running on: 1. Same Address-Space Global variables, Procedure calls, … 2. Same Computer but Different Address-Spaces Files, Signals, Shared Memory… 3. Networked Computers Socket Communication Remote Invocation 4 Processes vs Threads A program to be executed needs more than just binary code. It needs: Memory, and operating system resources a register may hold an instruction, or storage address, program counter keeps track of where a computer is in its program sequence, “stack” is a data structure that stores information about the active subroutines of a computer program 5 Processes vs Threads A computer process May we have Multiple processes Of the same program? 6 Processes vs Threads A computer process Yes, but each process has a separate memory address space, i.e., a process runs independently and is isolated from other processes. 7 Processes vs Threads A thread could be defined as the unit of execution within a process. 8 Processes vs Threads A thread could be defined as the unit of execution within a process. What multithreaded process does share? 9 Processes vs Threads A thread could be defined as the unit of execution within a process. Threads could be seen as lightweight processes as they have their own stack but can access shared data. Is it easier to What multithreaded enable communication process betweendoes share? threads or Processes? 10 Processes vs Threads Figure 3-1. Context switching as the result of interprocess communication 11 Processes vs Threads Why use threads? Avoid needless blocking: a single-threaded process will block when doing I/O; in a multi-threaded process, the operating system can switch the CPU to another thread in that process. Exploit parallelism: the threads in a multi-threaded process can be scheduled to run in parallel on a multiprocessor or multicore processor. Avoid process switching: structure large applications not as a collection of processes, but through multiple threads. 12 Processes vs Threads A thread could be defined as the unit of execution within a process. Threads could be seen as lightweight processes as they have their own stack but can access shared data. Whichmultithreaded What ones are more vulnerable process to does share? Problems ? 13 Processes vs Threads Figure 3-3. A multithreaded server organized in a dispatcher/worker model. 14 Processes vs Threads Design Choose: Process over Thread or Thread over Process? Use-cases: - The Chrome browser - Spreadsheeted - A web client Whichmultithreaded What ones are more vulnerable process to does share? Problems ? 15 Processes vs Threads Design Choose: Process over Thread or Thread over Process? Use-cases: - The Chrome browser (multiple processes) - Spreadsheeted (multiple threads) - A web client (multiple threads) Whichmultithreaded What ones are more vulnerable process to does share? Problems ? 16 Virtualization Why do we need it? Whichmultithreaded What ones are more vulnerable process to does share? Problems ? (a) (b) (a) General organization between a program, interface, and system. (b) General organization of virtualizing system A on top of system B. 17 Virtualization Virtualization is important: Hardware changes faster than software Ease of portability and code migration, Isolation of failing or attacked components Whichmultithreaded What ones are more vulnerable process to does share? Problems ? (a) (b) (a) General organization between a program, interface, and system. (b) General organization of virtualizing system A on top of system B. 18 Virtualization Mimicking interfaces Four types of interfaces at three different levels Instruction set architecture: the set of machine instructions, with two subsets: Privileged instructions: allowed to be executed only by the operating system. General instructions: can be executed by any program. System calls as offered by an operating system. Library calls, known as an application programming interface What (API) Whichmultithreaded ones are more vulnerable toprocess does share? Problems ? 19 Ways of virtualization hypervisor hypervisor Whichmultithreaded What ones are more vulnerable process to does share? Problems ? (a) Separate set of instructions, an interpreter/emulator, running atop an OS. (b) Low-level instructions, along with bare-bones minimal operating system (c) Low-level instructions, but delegating most work to a full-fledged OS. 20 Zooming into VMs: performance hypervisor Whichmultithreaded What ones are more vulnerable process to does share? Problems ? Control-sensitive instruction: may affect configuration of a machine (e.g., one affecting relocation register or interrupt table). Behavior-sensitive instruction: effect is partially determined by context (e.g., POPF sets an interrupt-enabled flag, but only in system mode). 21 VMs and cloud computing Whichmultithreaded What ones are more vulnerable process to does share? Problems ? 22 containers Emulate OS-level interface with native interface “Lightweight” virtual machines No hypervisor, OS provides necessary support Referred to as containers Solaris containers, BSD jails, Linux containers 23 Docker and Linux Containers Linux containers are a set of kernel features Need user space tools to manage containers Virtuozo, OpenVZm, VServer,Lxc-tools, Docker What does Docker add to Linux containers? Portable container deployment across machines Application-centric: geared for app deployment Automatic builds: create containers from build files Whichmultithreaded ones are Component re-use What more vulnerable process to does share? Problems ? Docker containers are self-contained: no dependencies 24 VMs vs Containers Whichmultithreaded What ones are more vulnerable process to does share? Problems ? Containers share OS kernel of the host OS provides resource isolation Benefits Fast provisioning, bare-metal like performance, lightweight 25 VMs vs Containers VMs Containers Heavyweight Lightweight Limited performance Native performance Each VM runs in its own OS All containers share the host OS Hardware-level virtualization OS virtualization Startup time in minutes Startup time in milliseconds Allocates required memory Requires less memory space Which ones are What multithreaded more vulnerable process to does share? Fully isolated and hence more secure Process-level isolation, possibly less secure Problems ? 26 A To-Do List Read Chapters 3, Sec 1, 2 and part of 3 Building your team