Concurrency: Locks | Sangmyung University PDF

Concurrency: Locks 성한울 게임전공, SW융합학부 융합공과대학, 상명대학교 Motivation for Synchronization ▪ 프로세스는 병렬적으로 수행 가능함 ▪ 언제든지 인터럽트 당할 수 있고, 부분적으로 완료 될 수 있음 ▪ 공유 데이터에 대한 동시(병렬)접근은 데이터의 비일관성을 야기할 수 있음 ▪ Build higher-level synchronization primitives in OS ▪ operations that ensure correct ordering of instructions across threads Locks: The Basic Idea ▪ Ensure that any critical section executes as if it were a single atomic instruction ▪ An example: the canonical update of a shared variable balance = balance + 1; ▪ Add some code around the critical section Locks: The Basic Idea ▪ Lock variable holds the state of the lock ▪ available (or unlocked or free) ▪ No thread holds the lock. ▪ acquired (or locked or held) ▪ Exactly one thread holds the lock and presumably is in a critical section. The semantics of the lock() ▪ lock() ▪ Try to acquire the lock. ▪ If no other thread holds the lock, the thread will acquire the lock. ▪ Enter the critical section. ▪ This thread is said to be the owner of the lock. ▪ Other threads are prevented from entering the critical section while the first thread that holds the lock is in there. Lock Implementation Goals ▪ Correctness ▪ Mutual exclusion ▪ Only one thread in critical section at a time ▪ Progress (deadlock-free) ▪ If several simultaneous requests, must allow one to proceed ▪ Bounded (starvation-free) ▪ Must eventually allow each waiting thread to enter ▪ Fairness ▪ Each thread waits for same amount of time ▪ Performance ▪ CPU is not used unnecessarily (e.g., spinning) Building A Lock ▪ Efficient locks provide mutual exclusion at low cost. ▪ Building a lock need some help from the hardware and the OS. Controlling Interrupts ▪ Disable Interrupts for critical sections ▪ One of the earliest solutions used to provide mutual exclusion ▪ Invented for single-processor systems. ▪ Problem: ▪ Require too much trust in applications ▪ Greedy (or malicious) program could monopolize the processor. ▪ Do not work on multiprocessors ▪ Code that masks or unmasks interrupts be executed slowly by modern CPUs Why is hardware support needed? ▪ First attempt: Using a flag denoting whether the lock is held or not. ▪ The code below has problems. Why hardware support needed? (Cont.) ▪ Problem 1: No Mutual Exclusion (assume flag=0 to begin) ▪ Problem 2: Spin-waiting wastes time waiting for another thread. ▪ Solution: So, we need an atomic instruction supported by Hardware! ▪ test-and-set instruction, also known as atomic exchange Peterson’s Algorithm ▪ 하드웨어의 지원 없이 critical section을 atomic 하게 보장해줄 수 있는 알고리즘 방법 ▪ Only two threads (tid = 0, 1) ▪ Use just loads and stores (atomic instruction) ▪ 1개의 high level language가 여러 개의 instruction으로 나눠질 수 없음 Different Cases: All Work Different Cases: All Work Different Cases: All Work Different Cases: All Work Peterson’s Algorithm: Intuition ▪ Mutual exclusion: Enter critical section if and only if ▪ Other thread does not want to enter ▪ Other thread wants to enter, but your turn ▪ Progress: Both threads cannot wait forever at while() loop ▪ Completes if other process does not want to enter Other process (matching turn) will eventually finish ▪ Bounded waiting (not shown in examples) ▪ Each process waits at most one critical section ▪ Problem ▪ doesn’t work on modern hardware (cache-consistency issues) ▪ doesn’t work with 3+ threads Implementing Synchronization ▪ To implement, need atomic operations ▪ Atomic operation: No other instructions can be interleaved ▪ Examples of atomic operations ▪ Code between interrupts on uniprocessors ▪ Disable timer interrupts, don’t do any I/O ▪ Loads and stores of words ▪ Load r1, B ▪ Store r1, A ▪ Special hw instructions ▪ Test&Set ▪ Compare&Swap Test And Set (Atomic Exchange, XCHG) ▪ An instruction to support the creation of simple locks ▪ return(testing) old value pointed to by the ptr. ▪ Simultaneously update(setting) said value to new. ▪ This sequence of operations is performed atomically LOCK Implementation with XCHG Compare-And-Swap ▪ Test whether the value at the address(ptr) is equal to expected ▪ If so, update the memory location pointed to by ptr with the new value. ▪ In either case, return the actual value at that memory location. Compare-And-Swap (Cont.) ▪ C-callable x86-version of compare-and-swap Load-Linked and Store-Conditional ▪ The store-conditional only succeeds if no intermittent store to the address has taken place. ▪ success: return 1 and update the value at ptr to value. ▪ fail: the value at ptr is not updates and 0 is returned. Load-Linked and Store-Conditional (Cont.) Pthread Locks - mutex ▪ 앞의 Instruction을 프로그래머가 바로 갖다 쓰기에는 제한적 ▪ The name that the POSIX library uses for a lock. ▪ Used to provide mutual exclusion between threads. ▪ We may be using different locks to protect different variables -> Increase concurrency (a more fine-grained approach). Lock Implementation Goals ▪ Correctness ▪ Mutual exclusion ▪ Only one thread in critical section at a time ▪ Progress (deadlock-free) ▪ If several simultaneous requests, must allow one to proceed ▪ Bounded (starvation-free) ▪ Must eventually allow each waiting thread to enter ▪ Fairness ▪ Each thread waits for same amount of time ▪ Performance ▪ CPU is not used unnecessarily Basic Spinlocks are Unfair ▪ Scheduler is independent of locks/unlocks Fetch-And-Add ▪ Atomically increment a value while returning the old value at a particular address. Ticket Lock (1) ▪ Ticket lock can be built with fetch-and add. ▪ Ensure progress for all threads -> fairness Ticket Lock (2) ▪ Ticket lock can be built with fetch-and add. ▪ Ensure progress for all threads -> fairness Ticket Lock (3) ▪ Do we need FAA even for unlock ?? So Much Spinning ▪ Hardware-based spin locks are simple and they work. ▪ In some cases, these solutions can be quite inefficient. ▪ Any time a thread gets caught spinning, it wastes an entire time slice doing nothing but checking a value. Evaluating Spin Locks ▪ Correctness: yes ▪ The spin lock only allows a single thread to entry the critical section. ▪ Fairness: no ▪ Spin locks don’t provide any fairness guarantees. ▪ Indeed, a thread spinning may spin forever. ▪ Performance: ▪ In the single CPU, performance overheads can be quite painful. ▪ If the number of threads roughly equals the number of CPUs, spin locks work reasonably well. A Simple Approach: Just Yield ▪ When you are going to spin, give up the CPU to another thread. ▪ OS system call moves the caller from the running state to the ready state. ▪ The cost of a context switch can be substantial and the starvation problem still exists. Using Queues: Sleeping Instead of Spinning ▪ Queue to keep track of which threads are waiting to enter the lock. ▪ park() ▪ Put a calling thread to sleep ▪ unpark(threadID) ▪ Wake a particular thread as designated by threadID. Using Queues: Sleeping Instead of Spinning Using Queues: Sleeping Instead of Spinning Wakeup/waiting race ▪ In case of releasing the lock (thread A) just before the call to park() (thread B) -> Thread B would sleep forever (potentially). ▪ Solaris solves this problem by adding a third system call: setpark(). ▪ By calling this routine, a thread can indicate it is about to park. ▪ If it happens to be interrupted and another thread calls unpark before park is actually called, the subsequent park returns immediately instead of sleeping. Futex ▪ Linux provides a futex (is similar to Solaris’s park and unpark). ▪ futex_wait(address, expected) ▪ Put the calling thread to sleep if *address == expected ▪ Otherwise, the call returns immediately with a value EAGAIN. ▪ futex_wake(address) ▪ Wake one thread that is waiting on the queue. Futex ▪ A futex consists of a kernelspace wait queue that is attached to an atomic integer in userspace. ▪ Multiple processes or threads operate on the integer entirely in userspace (using atomic operations to avoid interfering with one another), ▪ It only resorts to relatively expensive system calls to request operations on the wait queue to wake up waiting processes, or to put the current process on the wait queue. ▪ A properly programmed futex-based lock will not use system calls except when the lock is contended; since most operations do not require arbitration between processes, this will not happen in most cases. Futex (Cont.) ▪ Snippet from lowlevellock.h in the nptl library ▪ The high bit of the integer v: track whether the lock is held or not ▪ All the other bits : the number of waiters Futex (Cont.) Lock Evaluation ▪ How to tell if a lock implementation is good? ▪ Fairness: ▪ Do processes acquire lock in same order as requested? ▪ Performance ▪ Two scenarios: ▪ low contention (fewer threads, lock usually available) ▪ high contention (many threads per CPU, each contending) Spinlock Performance ▪ Fast when… ▪ many CPUs ▪ locks held a short time ▪ advantage: avoid context switch ▪ Slow when… ▪ one CPU ▪ locks held a long time ▪ disadvantage: spinning is wasteful Spinlock Performance ▪ Why is yield useful? ▪ Why doesn’t yield solve all performance problems? Spinlock Performance ▪ Waste… ▪ Without yield: O(threads * time_slice) ▪ With yield: O(threads * context_switch) ▪ So even with yield, we’re slow with high contention. Lock Implementation: Blockwhen Waiting ▪ Lock implementation removes waiting threads from scheduler ready queue (e.g., park() and unpark()) ▪ Scheduler runs any thread that is ready ▪ Good separation of concerns Spin-Waiting vs Blocking ▪ Each approach is better under different circumstances ▪ Uniprocessor ▪ Waiting process is scheduled --> Process holding lock isn’t ▪ Waiting process should always relinquish processor Associate queue of waiters with each lock (as in previous implementation) ▪ Multiprocessor ▪ Waiting process is scheduled --> Process holding lock might be Spin or block depends on how long, t, before lock is released ▪ Lock released quickly --> Spin-wait ▪ Lock released slowly --> Block ▪ Quick and slow are relative to context-switch cost, C When to Spin-Wait? When to Block? ▪ If know how long, t, before lock released, can determine optimal behavior ▪ How much CPU time is wasted when spin-waiting? t ▪ How much wasted when block? C ▪ What is the best action when tC? block ▪ Problem: ▪ Requires knowledge of future; too much overhead to do any special prediction Two-Phase Waiting ▪ Theory: Bound worst-case performance; ratio of actual/optimal When does worst-possible performance occur? ▪ Spin for very long time t >> C ▪ Ratio: t/C (unbounded) ▪ Algorithm: Spin-wait for C then block --> Factor of 2 of optimal ▪ Two cases: ▪ t < C: optimal spin-waits for t; we spin-wait t too ▪ t > C: optimal blocks immediately (cost of C); we pay spin C then block (cost of 2 C); 2C / C -> 2-competitive algorithm ▪ Example of competitive analysis Two-Phase Locks ▪ A two-phase lock realizes that spinning can be useful if the lock is about to be released. ▪ First phase ▪ The lock spins for a while, hoping that it can acquire the lock. ▪ If the lock is not acquired during the first spin phase, a second phase is entered, ▪ Second phase ▪ The caller is put to sleep. ▪ The caller is only woken up when the lock becomes free later. Filter Lock Bakery Algorithm by Lamport Bakery Algorithm by Lamport ▪ Provides First-Come-First-Serve ▪ Deadlock-free ▪ How? ▪ Take a “number” ▪ Wait until lower numbers have been served ▪ Lexicographic order ▪ (a,i) > (b,j) ▪ If a > b, or a = b and i > j ▪ Overflow possible with label[i] ? (32bit : yes, 64bit ??)

Concurrency: Locks | Sangmyung University PDF

Document Details

Tags

Related

Summary

Full Transcript