Distributed Systems PDF Lecture Notes

Distributed Systems (4th edition, version 01) Chapter 03: Processes Threads 1 Outline 3.4 Servers 3.1 Threads 3.4.1 General design issues 3.1.1 Introduction to threads...

Distributed Systems (4th edition, version 01) Chapter 03: Processes Threads 1 Outline 3.4 Servers 3.1 Threads 3.4.1 General design issues 3.1.1 Introduction to threads 3.4.2 Object servers 3.1.2 Threads in distributed systems 3.4.3 Example: The Apache Web server 3.2 Virtualization 3.4.4 Server clusters 3.2.1 Principle of virtualization 3.2.2 Containers 3.5 Code migration 3.2.3 Comparing virtual machines and 3.5.1 Reasons for migrating code containers 3.5.2 Models for code migration 3.5.3 Migration in heterogeneous systems 2 Processes Threads Introduction to threads Basic idea We build virtual processors in software, on top of physical processors: Processor: Provides a set of instructions along with the capability of automatically executing a series of those instructions. Thread: A minimal software processor in whose context a series of instructions can be executed. Saving a thread context implies stopping the current execution and saving all the data needed to continue the execution at a later stage. Process: A software processor in whose context one or more threads may be executed. Executing a thread, means executing a series of instructions in the context of that thread. Virtual processors: programs should be able to share the CPU without one program halting progress of the others Threads allow multiple programs to share a CPU 3 Introduction to threads Introduction to threads Thread - basic unit of CPU utilization, consisting of a program counter, a stack, and a set of registers, (and a thread ID) Traditional processes have a single thread of control, with one program counter, and one sequence of instructions that can be carried out at any given time Multi-threaded applications multiple threads within a single process each thread has own program counter, stack and set of registers threads share common code, data, and certain structures such as open files 4 Processes Threads Process vs Thread Abstraction Process abstraction: virtual computer makes the program feel like it has the entire machine to itself – like a fresh computer has been created, with fresh memory, just to run that program Thread abstraction: virtual processor Simulates making a fresh processor inside the virtual computer represented by the process This new virtual processor runs the same program and shares the same memory as other threads in the process 5 Introduction to threads Processes Threads Context switching Contexts Processor context: The minimal collection of values stored in the registers of a processor used for the execution of a series of instructions (e.g., stack pointer, addressing registers, program counter). 6 Introduction to threads Processes Threads Context switching Contexts Processor context: The minimal collection of values stored in the registers of a processor used for the execution of a series of instructions (e.g., stack pointer, addressing registers, program counter). Thread context: The minimal collection of values stored in registers and memory, used for the execution of a series of instructions (i.e., processor context, state). 7 Introduction to threads Processes Threads Context switching Contexts Processor context: The minimal collection of values stored in the registers of a processor used for the execution of a series of instructions (e.g., stack pointer, addressing registers, program counter). Thread context: The minimal collection of values stored in registers and memory, used for the execution of a series of instructions (i.e., processor context, state). Process context: The minimal collection of values stored in registers and memory, used for the execution of a thread (i.e., thread context, but now also at least MMU register values). 8 Introduction to threads Processes Threads Context switching Observations 1. Threads share the same address space. Thread context switching can be done entirely independent of the operating system. 2. Process switching is generally (somewhat) more expensive as it involves getting the OS in the loop, i.e., trapping to the kernel. 3. Creating and destroying threads is much cheaper than doing so for processes. 9 Introduction to threads Processes Threads Why use threads Some simple reasons Avoid needless blocking: a single-threaded process will block when doing I/O; in a multithreaded process, the operating system can switch the CPU to another thread in that process. Exploit parallelism: the threads in a multithreaded process can be scheduled to run in parallel on a multiprocessor or multicore processor. Avoid process switching: structure large applications not as a collection of processes, but through multiple threads. 10 Introduction to threads Processes Threads Avoid process switching Avoid expensive context switching associated with inter process communication (IPC) Trade-offs Threads use the same address space: more prone to errors No support from OS/HW to protect threads using each other’s memory Thread context switching may be faster than process context switching 11 Introduction to threads Processes Threads The cost of a context switch Consider a simple clock-interrupt handler direct costs: actual switch and executing code of the handler indirect costs: other costs, notably caused by messing up the cache What a context switch may cause: indirect costs (a) before the context switch (b) after the context switch (block D is evicted) (c) after accessing block D (block C is evicted) (a) (b) (c) Cache is organized such that a least-recently used (LRU) block of data is removed from the cache when room is needed for a fresh data block. 12 Introduction to threads Processes Threads A simple example in Python (mp.py) 1 from multiprocessing import Process 2 from time import * 3 from random import * 4 5 def sleeper(name): 6 t = gmtime() 7 s = randint(1,20) 8 t x t = str(t.tm_min)+’:’+str(t.tm_sec)+’ ’+name+’ i s going t o s l e e p f o r ’ + s t r ( s ) + ’ seconds’ 9 pr i nt ( t x t ) 10 sleep(s) 11 t = gmtime() 12 t x t = str(t.tm_min)+’:’+str(t.tm_sec)+’ ’+name+’ has woken up’ 13 pr i nt ( t x t ) 14 15 if name == ’ main ’ : 16 p = Process(t arget = sl eeper, args=(’eve’,)) 17 q = Process(t arget = sl eeper, args=(’bob’,)) 18 p.start(); q.start() # start the new processes 19 # wait for the new process to finish 20 print('Main: Waiting for processes to terminate...') 21 p.j oi n( ) ; q.join() 22 # continuing on 40:23 eve i s going t o s l e e p f o r 14 seconds 22 print('Main: Continuing on') 40:23 bob i s going t o s l e e p f o r 4 seconds Main: Waiting for processes to terminate... 40:27 bob h a s woken up 40:37 eve h a s woken up Main: Continuing on start() is the technique used to start child processes in Python join() tells the main process to wait until the child processes have finished (blocking operation) 13 Introduction to threads Processes Threads A simple example in Python (mpthread.py) 1 from multiprocessing import Process 2 from t hreadi ng import Thread 3 4 shared_x = randint(10,99) 5 6 def sleeping(name): 7 global shared_x 8 t = gmtime(); s = randint(1,20) 9 t x t = str(t.tm_min)+’:’+str(t.tm_sec)+’ ’+name+’ i s going t o s l e e p f o r ’ + s t r ( s ) + ’ seconds’ 10 pr i nt ( t x t ) 11 sleep(s) 12 t = gmtime(); shared_x = shared_x + 1 13 t x t = str(t.tm_min)+’:’+str(t.tm_sec)+’ ’+name+’ has woken u p , seei ng shared x being ’ 14 print(txt+str(shared_x) ) 15 16 def sleeper(name): 17 sleeplist = list() 18 print(name, ’ s e e s shared x b e i n g ’ , shared_x) 19 f o r i i n range(3): 20 subsleeper = Thread(target=sleeping, args=(name+’ ’ + s t r ( i ) , ) ) 21 sleeplist.append(subsleeper) 22 23 for s i n s l e e p l i s t : s. s t a r t ( ) ; for s i n s l e e p l i s t : s.join() 24 print(name, ’ s e e s shared x b e i n g ’ , shared_x) 25 26 if name == ’ main ’ : 27 p = Process(t arget = sl eeper, args=(’eve’,)) 28 q = Process(t arget = sl eeper, args=(’bob’,)) 29 p.start(); q.start() 30 p.j oi n( ) ; q.join() 14 Introduction to threads Processes Threads A simple example in Python eve se e s shared x being 71 53:21 eve 0 i s going t o sle e p f o r 20 seconds bob se e s shared x being 84 53:21 eve 1 i s going t o sle e p f o r 15 seconds 53:21 eve 2 i s going t o sle e p f o r 3 seconds 53:21 bob 0 i s going t o sle e p f o r 8 seconds 53:21 bob 1 i s going t o sle e p f o r 16 seconds 53:21 bob 2 i s going t o sle e p f o r 8 seconds 53:24 eve 2 has woken u p , seeing shared x being 72 53:29 bob 0 has woken u p , seeing shared x being 85 53:29 bob 2 has woken u p , seeing shared x being 86 53:36 eve 1 has woken u p , seeing shared x being 73 53:37 bob 1 has woken u p , seeing shared x being 87 bob se e s shared x being 87 53:41 eve 0 has woken u p , seeing shared x being 74 eve se e s shared x being 74 15 Introduction to threads Processes Threads Concurrency When there are more threads than processors, concurrency is simulated by time slicing processor switches between threads On most systems, time slicing happens unpredictably and non- deterministically, meaning that a thread may be paused or resumed at any time see https://ocw.mit.edu/ans7870/6/6.005/s16/classes/19-concurrency/ 16 Introduction to threads Processes Threads Threads and operating systems Main issue Should an OS kernel provide threads, or should they be implemented as user-level packages? User-space solution All operations can be completely handled within a single process ⇒ implementations can be extremely efficient. All services provided by the kernel are done on behalf of the process in which a thread resides ⇒ if the kernel decides to block a thread, the entire process will be blocked. Threads are used when there are many external events: threads block on a per-event basis ⇒ if the kernel can’t distinguish threads, how can it support signaling events to them? 17 Introduction to threads Processes Threads Threads and operating systems Kernel solution The whole idea is to have the kernel contain the implementation of a thread package. This means that all operations return as system calls: Operations that block a thread are no longer a problem: the kernel schedules another available thread within the same process. handling external events is simple: the kernel (which catches all events) schedules the thread associated with the event. The problem is (or used to be) the loss of efficiency because each thread operation requires a trap to the kernel. Conclusion – but Try to mix user-level and kernel-level threads into a single concept, however, performance gain has not turned out to generally outweigh the increased complexity. 18 Introduction to threads Processes Threads Combining user-level and kernel-level threads Basic idea Introduce a two-level threading approach: kernel threads that can execute user-level threads. 19 Introduction to threads Processes Threads User and kernel threads combined Principle operation 20 Introduction to threads Processes Threads User and kernel threads combined Principle operation User thread does system call ⇒ the kernel thread that is executing that user thread, blocks. The user thread remains bound to the kernel thread. 21 Introduction to threads Processes Threads User and kernel threads combined Principle operation User thread does system call ⇒ the kernel thread that is executing that user thread, blocks. The user thread remains bound to the kernel thread. The kernel can schedule another kernel thread having a runnable user thread bound to it. Note: this user thread can switch to any other runnable user thread currently in user space. 22 Introduction to threads Processes Threads User and kernel threads combined Principle operation User thread does system call ⇒ the kernel thread that is executing that user thread, blocks. The user thread remains bound to the kernel thread. The kernel can schedule another kernel thread having a runnable user thread bound to it. Note: this user thread can switch to any other runnable user thread currently in user space. A user thread calls a blocking user-level operation ⇒ do context switch to a runnable user thread, (then bound to the same kernel thread). 23 Introduction to threads Processes Threads User and kernel threads combined Principle operation User thread does system call ⇒ the kernel thread that is executing that user thread, blocks. The user thread remains bound to the kernel thread. The kernel can schedule another kernel thread having a runnable user thread bound to it. Note: this user thread can switch to any other runnable user thread currently in user space. A user thread calls a blocking user-level operation ⇒ do context switch to a runnable user thread, (then bound to the same kernel thread). When there are no user threads to schedule, a kernel thread may remain idle, and may even be removed (destroyed) by the kernel. 24 Introduction to threads Processes Threads Using threads at the client side Multithreaded web client Hiding network latencies: Web browser scans an incoming HTML page, and finds that more files need to be fetched. Each file is fetched by a separate thread, each doing a (blocking) HTTP request. As files come in, the browser displays them. Multiple request-response calls to other machines (RPC) A client does several calls at the same time, each one by a different thread. It then waits until all results have been returned. Note: if calls are to different servers, we may have a linear speed-up. 25 Threads in distributed systems Processes Threads Multithreaded clients: does it help? Thread-level parallelism: TLP Let ci denote the fraction of time that exactly i threads are being executed simultaneously. ∑N i ·ci TLP = i=1 1 −c 0 with N the maximum number of threads that (can) execute at the same time. 26 Threads in distributed systems Processes Threads Multithreaded clients: does it help? Thread-level parallelism: TLP Let ci denote the fraction of time that exactly i threads are being executed simultaneously. ∑N i ·ci TLP = i=1 1 −c 0 with N the maximum number of threads that (can) execute at the same time. Practical measurements A typical Web browser has a TLP value between 1.5 and 2.5 ⇒ threads are primarily used for logically organizing browsers. 27 Threads in distributed systems Processes Threads Using threads at the server side Improve performance Starting a thread is cheaper than starting a new process. Having a single-threaded server prohibits simple scale-up to a multiprocessor system. As with clients: hide network latency by reacting to next request while previous one is being replied. Better structure Most servers have high I/O demands. Using simple, well-understood blocking calls simplifies the structure. Multithreaded programs tend to be smaller and easier to understand due to simplified flow of control. 28 Threads in distributed systems Processes Threads Why multithreading is popular: organization Dispatcher/worker model Overview Model Characteristics Multithreading Parallelism, blocking system calls Single-threaded process No parallelism, blocking system calls Finite-state machine Parallelism, nonblocking system calls 29 Threads in distributed systems Outline 3.4 Servers 3.1 Threads 3.4.1 General design issues 3.1.1 Introduction to threads 3.4.2 Object servers 3.1.2 Threads in distributed systems 3.4.3 Example: The Apache Web server 3.2 Virtualization 3.4.4 Server clusters 3.2.1 Principle of virtualization 3.2.2 Containers 3.5 Code migration 3.2.3 Comparing virtual machines and 3.5.1 Reasons for migrating code containers 3.5.2 Models for code migration 3.5.3 Migration in heterogeneous systems 30

Distributed Systems PDF Lecture Notes

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue