Full Transcript

2. Linux O(1) Scheduler Problem For the next four questions, consider a Linux system with the following assumptions: uses the O(1) scheduling algorithm for time sharing threads must assign a time quantum for thread T1 with priority 110 must assign a time quantum for thread T2 with priori...

2. Linux O(1) Scheduler Problem For the next four questions, consider a Linux system with the following assumptions: uses the O(1) scheduling algorithm for time sharing threads must assign a time quantum for thread T1 with priority 110 must assign a time quantum for thread T2 with priority 135 Provide answers to the following: 1. Which thread has a "higher” priority (will be serviced first)? 2. Which thread is assigned a longer time quantum? 3. Assume T2 has used its time quantum without blocking. What will happen to the value that represents its priority level when T2 gets scheduled again (lower/decrease, higher/increase, same)? 4. Assume now that T2 blocks for I/O before its time quantum expired. What will happen to the value that represents its priority level when T2 gets scheduled again (lower/decrease, higher/increase, same)? Solution 1. t1 (in O(1) lower priority => greater numerical value; so t1 has higher priority) 2. t1 (in O(1) lower priority tasks => smaller time quantum; so t1 will have longer time quantum) 3. increase (since it used up all of its quantum without blocking, it means that it’s CPU bound, so it’s priority level should be lowered, which means the numerical value of its priority should be increased) 4. decrease (since it blocked, it’s more I/O intensive, will get a boost in priority, which means that the numerical value will be decreased) 3. Hardware Counters Problem Consider a quad-core machine with a single memory module connected to the CPU's via a shared “bus”. On this machine, a CPU instruction takes 1 cycle, and a memory instruction takes 4 cycles. The machine has two hardware counters: counter that measures IPC counter that measures CPI Answer the following: 1. What does IPC stand for in this context? 2. What does CPI stand for in this context? 3. What is the highest IPC on this machine? 4. What is the highest CPI on this machine? Solution 1. instructions per cycle 2. cycles per instruction 3. 4 (best case scenario: if all cores execute CPU-bound instructions, 4 instructions can be completed in a single cycle) 4. 16 (worst case scenario: if all cores issue a memory instruction at the same time, the instructions will contend on the shared bus and the shared memory controller. so one of them can take up to 4 * 4 cycles = 16 cycles to complete) 4. Synchronization Problem In a multiprocessor system, a thread is trying to acquire a locked mutex. 1. Should the thread spin until the mutex is released or block? 2. Why might it be better to spin in some instances? 3. What if this were a uniprocessor system? Solution 1. If owner of mutex is running on a another CPU -> spin; otherwise block 2. Owner may release mutex quickly; may be faster to wait for owner to complete than to pay overhead for context switching twice. 3. On uniprocessor, always block, there is no way for owner to be running at the same time, since there is only CPU. 5. Spinlocks Problem For the following question, consider a multi-processor with write-invalidated cache coherence. Determine whether the use of a dynamic (exponential backoff) delay has the same, better, or worse performance than a test-and-test-and-set (“spin on read”) lock. Then, explain why. Make a performance comparison using each of the following metrics: 1. Latency 2. Delay 3. Contention Solution 1. same. if you look at the pseudocode for the two algorithms, if the lock is free then exactly the same operations will be executed in both cases. 2. worse. because of the delay, the lock may be released while delaying, so the overall time that it will take to acquired a freed lock will be potentially longer 3. better. in the delayed algorithm, some of the invalidations are not observed because of the wait, so those will not trigger additional memory accesses, therefore overall memory bus/interconnect contention is improved. 6. Page Table Size Problem Consider a 32-bit (x86) platform running Linux that uses a single-level page table. What are the maximum number of page table entries when the following page sizes are used? 1. regular (4 kB) pages? 2. large (2 MB) pages? Solution 1. regular 4kB pages: 32 - 12 bits = 20bits => need 2^20 entries for each virtual page 2. large 2MB pages: 32 - 21bits = 11 bits => need 2^11 entries for each virtual page 7. PIO Problem Answer the following questions about PIO: 1. Considering I/O devices, what does PIO stand for? 2. List the steps performed by the OS or process running on the CPU when sending a network packet using PIO. Solution 1. programmed I/O 2. write ‘command’ to the device to send a packet of appropriate size; copy the header and data in a contiguous memory location; copy data to NIC / device data transfer registers; read status register to confirm that copied data was successfully transmitted; repeat last two steps as many times as needed to transmit the full packet, until end-of-packet is reached. 12. Distributed Applications Problem You are designing the new image datastore for an application that stores users’ images (like Picasa). The new design must consider the following scale: The current application has 30 million users Each user has on average 2,000 photos Each photo is on average 500 kB Requests are evenly distributed across all images Answer the following: 1. Would you use replication or partitioning as a mechanism to ensure high responsiveness of the image store? 2. If you have 10 server machines at your disposal, and one of them crashes, what’s the percentage of requests that the image store will not be able to respond to, if any? Solution 1. Partitioning. Current data set is ~30 PB, so cannot store it on one machine 2. 10% since requests are evenly distributed. 1. Process Creation Problem How is a new process created? Solution Via fork ○ If we want to create a process that is an exact replica of the calling process Via fork followed by exec ○ If we want to create a process that is not an exact replica of the calling process Via fork or fork followed by exec ○ Either of the above two options Relevant Sections P2L1: Processes and Process Management ○ Process Life Cycle: Creation 2. Multi-Threading and 1 CPU Problem Is there a benefit of multithreading on 1 CPU? Solution Yes. The main reason is to hide the latency associated with code that blocks processing (such as a disk I/O request). Relevant Sections P2L2: Threads and Concurrency ○ Benefits of Multithreading: Single CPU 5. Signals Problem If the kernel cannot see user-level signal masks, then how is a signal delivered to a user-level thread (where the signal can be handled)? Solution Recall that all signals are intercepted by a user-level threading library handler, and the user- level threading library installs a handler. This handler determines which user-level thread, if any, the signal be delivered to, and then it takes the appropriate steps to deliver the signal. Note: If all user-level threads have the signal mask disabled and the kernel-level signal mask is updated, then and the signal remains pending to the process. Relevant Sections P2L4: Thread Design Considerations ○ All morsels including and after the Interrupts and Signals Intro 6. Solaris Papers Problem The implementation of Solaris threads described in the paper "Beyond Multiprocessing: Multithreading the Sun OS Kernel", describes four key data structures used by the OS to support threads. For each of these data structures, list at least two elements they must contain: 1. Process 2. LWP 3. Kernel-threads 4. CPU Solution The video Kernel Level Structures in Solaris 2.0 in Relevant Sections contains many possible solutions. Relevant Sections P2L4: Thread Design Considerations ○ Kernel Level Structures in Solaris 2.0 7. Pipeline Model Problem An image web server has three stages with average execution times as follows: Stage 1: read and parse request (10ms) Stage 2: read and process image (30ms) Stage 3: send image (20ms) You have been asked to build a multi-threaded implementation of this server using the pipeline model. Using a pipeline model, answer the following questions: 1. How many threads will you allocate to each pipeline stage? 2. What is the expected execution time for 100 requests (in sec)? 3. What is the average throughput of this system (in req/sec)? Assume there are infinite processing resources (CPU's, memory, etc.). Solution 1. Threads should be allocated as follows: ○ Stage 1 should have 1 thread This 1 thread will parse a new request every 10ms ○ Stage 2 should have 3 threads The requests parsed by Stage 1 get passed to the threads in Stage 2. Each thread picks up a request and needs 30ms to process the image. Hence, we need 3 threads in order to pick up a new request as soon as Stage 1 passes it. ○ Stage 3 should have 2 threads. This is because Stage 2 will process an image and pass a request every 10ms (once the pipeline is filled). In this way, each Stage 3 thread will pick up a request and send an image in 20ms. Once the pipeline is filled, Stage 3 will be able to pick up a request and send the image every 10ms. 2. The first request will take 60ms. The last stage will continue delivering the remaining 99 requests at 10ms intervals. So, the total is 60 + (99 * 10ms) = 1050ms = 1.05s 3. 100 req / 1.05 sec = 95.2 req/s Relevant Sections P2L2: Threads and Concurrency ○ Pipeline Pattern ○ Multithreading Patterns Quiz

Use Quizgecko on...
Browser
Browser