Concurrency: Locks | Sangmyung University PDF

Summary

These are lecture slides from Sangmyung University on concurrency and locks. Topics covered include motivations for synchronization, monitors, semaphores, condition variables, the semantics of locks, and lock implementation goals. The slides also discuss building locks, disabling interrupts, and hardware support needed for concurrency.

Full Transcript

Concurrency: Locks 성한울 게임전공, SW융합학부 융합공과대학, 상명대학교 Motivation for Synchronization ▪ 프로세스는 병렬적으로 수행 가능함 ▪ 언제든지 인터럽트 당할 수 있고, 부분적으로 완료 될 수 있음 ▪ 공유 데이터에 대한 동시(병렬)접근은 데이터의 비일관성을 야기할 수 있음 ▪ Build highe...

Concurrency: Locks 성한울 게임전공, SW융합학부 융합공과대학, 상명대학교 Motivation for Synchronization ▪ 프로세스는 병렬적으로 수행 가능함 ▪ 언제든지 인터럽트 당할 수 있고, 부분적으로 완료 될 수 있음 ▪ 공유 데이터에 대한 동시(병렬)접근은 데이터의 비일관성을 야기할 수 있음 ▪ Build higher-level synchronization primitives in OS ▪ operations that ensure correct ordering of instructions across threads Locks: The Basic Idea ▪ Ensure that any critical section executes as if it were a single atomic instruction ▪ An example: the canonical update of a shared variable balance = balance + 1; ▪ Add some code around the critical section Locks: The Basic Idea ▪ Lock variable holds the state of the lock ▪ available (or unlocked or free) ▪ No thread holds the lock. ▪ acquired (or locked or held) ▪ Exactly one thread holds the lock and presumably is in a critical section. The semantics of the lock() ▪ lock() ▪ Try to acquire the lock. ▪ If no other thread holds the lock, the thread will acquire the lock. ▪ Enter the critical section. ▪ This thread is said to be the owner of the lock. ▪ Other threads are prevented from entering the critical section while the first thread that holds the lock is in there. Lock Implementation Goals ▪ Correctness ▪ Mutual exclusion ▪ Only one thread in critical section at a time ▪ Progress (deadlock-free) ▪ If several simultaneous requests, must allow one to proceed ▪ Bounded (starvation-free) ▪ Must eventually allow each waiting thread to enter ▪ Fairness ▪ Each thread waits for same amount of time ▪ Performance ▪ CPU is not used unnecessarily (e.g., spinning) Building A Lock ▪ Efficient locks provide mutual exclusion at low cost. ▪ Building a lock need some help from the hardware and the OS. Controlling Interrupts ▪ Disable Interrupts for critical sections ▪ One of the earliest solutions used to provide mutual exclusion ▪ Invented for single-processor systems. ▪ Problem: ▪ Require too much trust in applications ▪ Greedy (or malicious) program could monopolize the processor. ▪ Do not work on multiprocessors ▪ Code that masks or unmasks interrupts be executed slowly by modern CPUs Why is hardware support needed? ▪ First attempt: Using a flag denoting whether the lock is held or not. ▪ The code below has problems. Why hardware support needed? (Cont.) ▪ Problem 1: No Mutual Exclusion (assume flag=0 to begin) ▪ Problem 2: Spin-waiting wastes time waiting for another thread. ▪ Solution: So, we need an atomic instruction supported by Hardware! ▪ test-and-set instruction, also known as atomic exchange Peterson’s Algorithm ▪ 하드웨어의 지원 없이 critical section을 atomic 하게 보장해줄 수 있는 알고리즘 방법 ▪ Only two threads (tid = 0, 1) ▪ Use just loads and stores (atomic instruction) ▪ 1개의 high level language가 여러 개의 instruction으로 나눠질 수 없음 Different Cases: All Work Different Cases: All Work Different Cases: All Work Different Cases: All Work Peterson’s Algorithm: Intuition ▪ Mutual exclusion: Enter critical section if and only if ▪ Other thread does not want to enter ▪ Other thread wants to enter, but your turn ▪ Progress: Both threads cannot wait forever at while() loop ▪ Completes if other process does not want to enter Other process (matching turn) will eventually finish ▪ Bounded waiting (not shown in examples) ▪ Each process waits at most one critical section ▪ Problem ▪ doesn’t work on modern hardware (cache-consistency issues) ▪ doesn’t work with 3+ threads Implementing Synchronization ▪ To implement, need atomic operations ▪ Atomic operation: No other instructions can be interleaved ▪ Examples of atomic operations ▪ Code between interrupts on uniprocessors ▪ Disable timer interrupts, don’t do any I/O ▪ Loads and stores of words ▪ Load r1, B ▪ Store r1, A ▪ Special hw instructions ▪ Test&Set ▪ Compare&Swap Test And Set (Atomic Exchange, XCHG) ▪ An instruction to support the creation of simple locks ▪ return(testing) old value pointed to by the ptr. ▪ Simultaneously update(setting) said value to new. ▪ This sequence of operations is performed atomically LOCK Implementation with XCHG Compare-And-Swap ▪ Test whether the value at the address(ptr) is equal to expected ▪ If so, update the memory location pointed to by ptr with the new value. ▪ In either case, return the actual value at that memory location. Compare-And-Swap (Cont.) ▪ C-callable x86-version of compare-and-swap Load-Linked and Store-Conditional ▪ The store-conditional only succeeds if no intermittent store to the address has taken place. ▪ success: return 1 and update the value at ptr to value. ▪ fail: the value at ptr is not updates and 0 is returned. Load-Linked and Store-Conditional (Cont.) Pthread Locks - mutex ▪ 앞의 Instruction을 프로그래머가 바로 갖다 쓰기에는 제한적 ▪ The name that the POSIX library uses for a lock. ▪ Used to provide mutual exclusion between threads. ▪ We may be using different locks to protect different variables -> Increase concurrency (a more fine-grained approach). Lock Implementation Goals ▪ Correctness ▪ Mutual exclusion ▪ Only one thread in critical section at a time ▪ Progress (deadlock-free) ▪ If several simultaneous requests, must allow one to proceed ▪ Bounded (starvation-free) ▪ Must eventually allow each waiting thread to enter ▪ Fairness ▪ Each thread waits for same amount of time ▪ Performance ▪ CPU is not used unnecessarily Basic Spinlocks are Unfair ▪ Scheduler is independent of locks/unlocks Fetch-And-Add ▪ Atomically increment a value while returning the old value at a particular address. Ticket Lock (1) ▪ Ticket lock can be built with fetch-and add. ▪ Ensure progress for all threads -> fairness Ticket Lock (2) ▪ Ticket lock can be built with fetch-and add. ▪ Ensure progress for all threads -> fairness Ticket Lock (3) ▪ Do we need FAA even for unlock ?? So Much Spinning ▪ Hardware-based spin locks are simple and they work. ▪ In some cases, these solutions can be quite inefficient. ▪ Any time a thread gets caught spinning, it wastes an entire time slice doing nothing but checking a value. Evaluating Spin Locks ▪ Correctness: yes ▪ The spin lock only allows a single thread to entry the critical section. ▪ Fairness: no ▪ Spin locks don’t provide any fairness guarantees. ▪ Indeed, a thread spinning may spin forever. ▪ Performance: ▪ In the single CPU, performance overheads can be quite painful. ▪ If the number of threads roughly equals the number of CPUs, spin locks work reasonably well. A Simple Approach: Just Yield ▪ When you are going to spin, give up the CPU to another thread. ▪ OS system call moves the caller from the running state to the ready state. ▪ The cost of a context switch can be substantial and the starvation problem still exists. Using Queues: Sleeping Instead of Spinning ▪ Queue to keep track of which threads are waiting to enter the lock. ▪ park() ▪ Put a calling thread to sleep ▪ unpark(threadID) ▪ Wake a particular thread as designated by threadID. Using Queues: Sleeping Instead of Spinning Using Queues: Sleeping Instead of Spinning Wakeup/waiting race ▪ In case of releasing the lock (thread A) just before the call to park() (thread B) -> Thread B would sleep forever (potentially). ▪ Solaris solves this problem by adding a third system call: setpark(). ▪ By calling this routine, a thread can indicate it is about to park. ▪ If it happens to be interrupted and another thread calls unpark before park is actually called, the subsequent park returns immediately instead of sleeping. Futex ▪ Linux provides a futex (is similar to Solaris’s park and unpark). ▪ futex_wait(address, expected) ▪ Put the calling thread to sleep if *address == expected ▪ Otherwise, the call returns immediately with a value EAGAIN. ▪ futex_wake(address) ▪ Wake one thread that is waiting on the queue. Futex ▪ A futex consists of a kernelspace wait queue that is attached to an atomic integer in userspace. ▪ Multiple processes or threads operate on the integer entirely in userspace (using atomic operations to avoid interfering with one another), ▪ It only resorts to relatively expensive system calls to request operations on the wait queue to wake up waiting processes, or to put the current process on the wait queue. ▪ A properly programmed futex-based lock will not use system calls except when the lock is contended; since most operations do not require arbitration between processes, this will not happen in most cases. Futex (Cont.) ▪ Snippet from lowlevellock.h in the nptl library ▪ The high bit of the integer v: track whether the lock is held or not ▪ All the other bits : the number of waiters Futex (Cont.) Lock Evaluation ▪ How to tell if a lock implementation is good? ▪ Fairness: ▪ Do processes acquire lock in same order as requested? ▪ Performance ▪ Two scenarios: ▪ low contention (fewer threads, lock usually available) ▪ high contention (many threads per CPU, each contending) Spinlock Performance ▪ Fast when… ▪ many CPUs ▪ locks held a short time ▪ advantage: avoid context switch ▪ Slow when… ▪ one CPU ▪ locks held a long time ▪ disadvantage: spinning is wasteful Spinlock Performance ▪ Why is yield useful? ▪ Why doesn’t yield solve all performance problems? Spinlock Performance ▪ Waste… ▪ Without yield: O(threads * time_slice) ▪ With yield: O(threads * context_switch) ▪ So even with yield, we’re slow with high contention. Lock Implementation: Blockwhen Waiting ▪ Lock implementation removes waiting threads from scheduler ready queue (e.g., park() and unpark()) ▪ Scheduler runs any thread that is ready ▪ Good separation of concerns Spin-Waiting vs Blocking ▪ Each approach is better under different circumstances ▪ Uniprocessor ▪ Waiting process is scheduled --> Process holding lock isn’t ▪ Waiting process should always relinquish processor Associate queue of waiters with each lock (as in previous implementation) ▪ Multiprocessor ▪ Waiting process is scheduled --> Process holding lock might be Spin or block depends on how long, t, before lock is released ▪ Lock released quickly --> Spin-wait ▪ Lock released slowly --> Block ▪ Quick and slow are relative to context-switch cost, C When to Spin-Wait? When to Block? ▪ If know how long, t, before lock released, can determine optimal behavior ▪ How much CPU time is wasted when spin-waiting? t ▪ How much wasted when block? C ▪ What is the best action when tC? block ▪ Problem: ▪ Requires knowledge of future; too much overhead to do any special prediction Two-Phase Waiting ▪ Theory: Bound worst-case performance; ratio of actual/optimal When does worst-possible performance occur? ▪ Spin for very long time t >> C ▪ Ratio: t/C (unbounded) ▪ Algorithm: Spin-wait for C then block --> Factor of 2 of optimal ▪ Two cases: ▪ t < C: optimal spin-waits for t; we spin-wait t too ▪ t > C: optimal blocks immediately (cost of C); we pay spin C then block (cost of 2 C); 2C / C -> 2-competitive algorithm ▪ Example of competitive analysis Two-Phase Locks ▪ A two-phase lock realizes that spinning can be useful if the lock is about to be released. ▪ First phase ▪ The lock spins for a while, hoping that it can acquire the lock. ▪ If the lock is not acquired during the first spin phase, a second phase is entered, ▪ Second phase ▪ The caller is put to sleep. ▪ The caller is only woken up when the lock becomes free later. Filter Lock Bakery Algorithm by Lamport Bakery Algorithm by Lamport ▪ Provides First-Come-First-Serve ▪ Deadlock-free ▪ How? ▪ Take a “number” ▪ Wait until lower numbers have been served ▪ Lexicographic order ▪ (a,i) > (b,j) ▪ If a > b, or a = b and i > j ▪ Overflow possible with label[i] ? (32bit : yes, 64bit ??)