Chapter 4: An Overview of Concurrent Systems PDF

Chapter 4. An Overview of Concurrent Systems Distributed systems comprise multiple independent pieces of code executing in parallel, or concurrently, on many processing nodes across multiple locations. Any distributed system is hence by definition a concurrent system, even if each node is processing events one at a time. The behavior of the various nodes must of course be coordinated in order to make the application behave as desired. As I described in [[Chapter 3]](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch03.html#distributed_systems_essentials), coordinating nodes in a distributed system is fraught with danger. Luckily, our industry has matured sufficiently to provide complex, powerful software frameworks that hide many of these distributed system perils from our applications (most of the time, anyway). The majority of this book focuses on describing how we can utilize these frameworks to build scalable distributed systems. This chapter, however, is concerned with concurrent behavior in our systems on a single node. By explicitly writing our software to perform multiple actions concurrently, we can optimize the processing and resource utilization on a single node, and hence increase our processing capacity both locally and system-wide. I'll use the Java 7.0 concurrency capabilities for examples, as these are at a lower level of abstraction than those introduced in Java 8.0. Knowing how concurrent systems operate "closer to the machine" is essential foundational knowledge when building concurrent and distributed systems. Once you understand the lower-level mechanisms for building concurrent systems, the more abstract approaches are easier to optimally exploit. And while this chapter is Java-specific, the fundamental problems of concurrent systems don't change when you write systems in other languages. Mechanisms for handling concurrency exist in all mainstream programming languages. [["Concurrency Models"]](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch04.html#concurrency_models) gives some more details on alternative approaches and how they are implemented in modern languages. One final point. This chapter is a concurrency *primer*. It won't teach you everything you need to know to build complex, high-performance concurrent systems. It will also be useful if your experience writing concurrent programs is rusty, or you have some exposure to concurrent code in another programming language. The further reading section at the end of the chapter points to more comprehensive coverage of this topic for those who wish to delve deeper. Why Concurrency? Think of a busy coffee shop. If everyone orders a simple coffee, then the producer can quickly and consistently deliver each drink. Suddenly, the person in front of you orders a soy, vanilla, no sugar, quadruple-shot iced brew. Everyone in line sighs and starts reading their social media. In two minutes the line is out of the door. Processing requests in web applications is analogous to our coffee example. In a coffee shop, we enlist the help of a new barista to simultaneously make coffees on a different machine to keep the line length in control and serve customers quickly. In software, to make applications responsive, we need to somehow process requests in our server in an overlapping manner, handling requests concurrently. In the good old days of computing, each CPU was only able to execute a single machine instruction at any instant. If our server application runs on such a CPU, why do we need to structure our software systems to potentially execute multiple instructions concurrently? It all seems slightly pointless. There is actually a very good reason. Virtually every program does more than just execute machine instructions. For example, when a program attempts to read from a file or send a message on the network, it must interact with the hardware subsystem (disk, network card) that is peripheral to the CPU. Reading data from a magnetic hard disk takes around 10 milliseconds (ms). During this time, the program must wait for the data to be available for processing. Now, even an ancient CPU such as a [[1988 Intel 80386]](https://oreil.ly/NK6NC) can execute [[more than 10 million instructions per second (mips)]](https://oreil.ly/8jWH5). 10 ms is one hundredth of a second. How many instructions could our 80386 execute in a hundredth second? Do the math. (Hint---it's a lot!) A lot of wasted processing capacity, in fact. This is how operating systems such as Linux can run multiple programs on a single CPU. While one program is waiting for an I/O event, the operating system schedules another program to execute. By explicitly structuring our software to have multiple activities that can be executed in parallel, the operating system can schedule tasks that have work to do while others wait for I/O. We'll see in more detail how this works with Java later in this chapter. In 2001, IBM introduced the world's first multicore processor, a chip with two CPUs---see [[Figure 4-1]](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch04.html#simplified_view_of_a_multicore_processo) for a simplified illustration. Today, even my laptop has 16 CPUs, or "cores," as they are commonly known. With a multicore chip, a software system that is structured to have multiple parallel activities can be executed concurrently on each core, up to the number of available cores. In this way, we can fully utilize the processing resources on a multicore chip, and thus increase our application's processing capacity. Simplified view of a multicore processor Figure 4-1. Simplified view of a multicore processor The primary way to structure a software system as concurrent activities is to use *threads*. Virtually every programming language has its own threading mechanism. The underlying semantics of all these mechanisms are similar---there are only a few primary threading models in mainstream use---but obviously the syntax varies by language. In the following sections, I'll explain how threads are supported in Java, and how we need to design our programs to be safe (i.e., correct) and efficient when executing in parallel. Armed with this knowledge, leaping into the concurrency features supported in other languages shouldn't be too arduous. **Concurrency Models** This chapter describes one model for concurrent systems, based on independently executing threads using locks to operate on shared, mutable resources. Concurrency models have been a much studied and explored topic in computer science for roughly the last 50 years. Many theoretical proposals have been put forward, and some of these are implemented in modern programming languages. These models provide alternative approaches for structuring and coordinating concurrent activities in programs. Here's a sampler that you might well encounter in your work: Hopefully this gives you a feel for the diversity of concurrency models and primitives in modern programming languages. Luckily, when you know the fundamentals and one model, the rest are straightforward to learn. Threads Every software process has a single thread of execution by default. This is the thread that the operating system manages when it schedules the process for execution. In Java, for example, the main() function you specify as the entry point to your code defines the behavior of this thread. This single thread has access to the program's environment and resources such as open file handles and network connections. As the program calls methods in objects instantiated in the code, the program's runtime stack is used to pass parameters and manage variable scopes. Standard programming language runtime stuff, that we all know and love. This is a sequential process. In your systems, you can use programming language features to create and execute additional threads. Each thread is an independent sequence of execution and has its own runtime stack to manage local object creation and method calls. Each thread also has access to the process' global data and environment. A simple depiction of this scheme is shown in [[Figure 4-2]](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch04.html#comparing_a_single_threaded_and_multith). ![Comparing a single threaded and multithreaded process](media/image2.png) Figure 4-2. Comparing a single-threaded and multithreaded process In Java, we can define a thread using a class that implements the Runnable interface and defines the run() method. Let's look at a simple example: class NamingThread implements **Runnable** { private String name; public NamingThread(String threadName) { name = threadName ; System.out.println(\"Constructor called: \" + threadName) ; } public void **run()** { //Display info about this thread System.out.println(\"Run called : \" + name); System.out.println(name + \" : \" + Thread.currentThread()); // and now terminate \.... } } To execute the thread, we need to construct a Thread object using an instance of our Runnable and call the start() method to invoke the code in its own execution context. This is shown in the next code example, along with the output of running the code in bold text. Note this example has two threads: the main() thread and the NamingThread. The main thread starts the NamingThread, which executes asynchronously, and then waits for 1 second to give our run() method in NamingThread ample time to complete: public static void main(String\[\] args) { NamingThread name0 = new NamingThread(\"My first thread\"); //Create the thread Thread t0 = new Thread (name0); // start the threads t0.start(); //delay the main thread for a second (1000 milliseconds) try { Thread.currentThread().sleep(1000); } catch (InterruptedException e) {} //Display info about the main thread and terminate System.out.println(Thread.currentThread()); } **===EXECUTION OUTPUT===** **Constructor called: My first thread** **Run called : My first thread** **My first thread : Thread\[Thread-0,5,main\]** **Thread\[main,5,main\]** For illustration, we also call the static currentThread() method, which returns a string containing: - - - Note that to instantiate a thread, we call the start() method, not the run() method we define in the Runnable. The start() method contains the internal system magic to create the execution context for a separate thread to execute. If we call run() directly, the code will execute, but no new thread will be created. The run() method will execute as part of the main thread, just like any other Java method invocation that you know and love. You will still have a single-threaded code. In the example, we use sleep() to pause the execution of the main thread and make sure it does not terminate before the NamimgThread. This approach, namely coordinating two threads by delaying for an absolute time period (1 second in the example) is not a very robust mechanism. What if for some reason---a slower CPU, a long delay reading disk, additional complex logic in the method---our thread doesn't terminate in the expected timeframe? In this case, main will terminate first---this is not what we intend. In general, if you are using absolute times for thread coordination, you are doing it wrong. Almost always. Like 99.99999% of the time. A simple and robust mechanism for one thread to wait until another has completed its work is to use the join() method. We could replace the try-catch block in the above example with: t0.join(); This method causes the calling thread (in this case, main) to block until the thread referenced by t0 terminates. If the referenced thread has terminated before the call to join(), then the method call returns immediately. In this way we can coordinate, or synchronize, the behavior of multiple threads. Synchronization of multiple threads is in fact the major focus of the rest of this chapter. Order of Thread Execution The system scheduler (in Java, this lives in the Java virtual machine \[JVM\]) controls the order of thread execution. From the programmer's perspective, the order of execution is *nondeterministic*. Get used to that term, I'll use it a lot. The concept of nondeterminism is fundamental to understanding multithreaded code. I'll illustrate this by building on the earlier NamingThread example. Instead of creating a single NamingThread, I'll create and start up a few. Three, in fact, as shown in the following code example. Again, sample output from running the code is in bold text beneath the code itself: NamingThread name0 = new NamingThread(\"thread0\"); NamingThread name1 = new NamingThread(\"thread1\"); NamingThread name2 = new NamingThread(\"thread2\"); //Create the threads Thread t0 = new Thread (name0); Thread t1 = new Thread (name1); Thread t2 = new Thread (name2); // start the threads t0.start(); 1 t1.start(); !(media/image3.png) t2.start(); 1 **===EXECUTION OUTPUT===** **Run called : thread0** **thread0 : Thread\[Thread-0,5,main\]** !(media/image4.png) **Run called : thread2** 3 **Run called : thread1** **thread1 : Thread\[Thread-1,5,main\]** !(media/image6.png) **thread2 : Thread\[Thread-2,5,main\]** **Thread\[main,5,main\]** The output shown is a sample from just one execution. You can see the code starts three threads sequentially, namely *t0*, *t1*, and *t2* (see 1). Looking at the output, we see thread *t0* completes (see !(media/image4.png)) before the others start. Next *t2*'s run() method is called (see 3) followed by *t1*'s run() method, even though *t1* was started before *t2*. Thread *t1* then runs to completion (see !(media/image6.png)) before *t2*, and eventually the *main* thread and the program terminate. This is just one possible order of execution. If we run this program again, we will almost certainly see a different execution trace. This is because the JVM scheduler is deciding which thread to execute, and for how long. Put very simply, once the scheduler has given a thread an execution time slot on a CPU, it can interrupt the thread after a specified time period and schedule another one to run. This interruption is known as preemption. Preemption ensures each thread is given an opportunity to make progress. Hence the threads run independently and asynchronously until completion, and the scheduler decides which thread runs when based on a scheduling algorithm. There's more to thread scheduling than this, and I'll explain the basic scheduling algorithm used later in this chapter. For now, there is a major implication for programmers; regardless of the order of thread execution---which you don't control---your code should produce correct results. Sounds easy? Read on. Problems with Threads The basic problem in concurrent programming is coordinating the execution of multiple threads so that whatever order they are executed in, they produce the correct answer. Given that threads can be started and preempted nondeterministically, any moderately complex program will have essentially an infinite number of possible orders of execution. These systems aren't easy to test. There are two fundamental problems that all concurrent programs need to avoid. These are race conditions and deadlocks, and these topics are covered in the next two subsections. **Race Conditions** Nondeterministic execution of threads implies that the code statements that comprise the threads: - - Hence, when many threads are executed on a single processor, their execution is *interleaved*. The CPU executes some steps from one thread, then performs some steps from another, and so on. If we are executing on a multicore CPU, then we can execute one thread per core. The statements of each thread execution are still however interleaved in a nondeterministic manner. Now, if every thread simply does its own thing and is completely independent, this is not a problem. Each thread executes until it terminates, as in our trivial NamingThread example. This stuff is a piece of cake! Why are these thread things meant to be complex? Unfortunately, totally independent threads are not how most multithreaded systems behave. If you refer back to [[Figure 4-2]](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch04.html#comparing_a_single_threaded_and_multith), you will see that multiple threads share the global data within a process. In Java this is both global and static data. Threads can use shared data structures to coordinate their work and communicate status across threads. For example, we may have threads handling requests from web clients, one thread per request. We also want to keep a running total of how many requests we process each day. When a thread completes a request, it increments a global RequestCounter object that all threads share and update after each request. At the end of the day, we know how many requests were processed. A simple and elegant solution indeed. Well, maybe? The code below shows a very simple implementation that mimics the request counter example scenario. It creates 50,000 threads to update a shared counter. Note we use a lambda function for brevity to create the threads, and a (really bad idea) 5-second delay in main to allow the threads to finish:[****](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch04.html#ch01fn12) public class RequestCounter { final static private int NUMTHREADS = 50000; private int count = 0; public void inc() { count++; } public int getVal() { return this.count; } public static void main(String\[\] args) throws InterruptedException { final RequestCounter counter = new RequestCounter(); for (int i = 0; i \< NUMTHREADS; i++) { // lambda runnable creation Runnable thread = () -\> {counter.inc(); }; new Thread(thread).start(); } Thread.sleep(5000); System.out.println(\"Value should be \" + NUMTHREADS + \"It is: \" + counter.getVal()); } } What you can do at home is clone this code from [[the book's GitHub repo]](https://oreil.ly/fss-git-repo), run this code a few times, and see what results you get. In 10 executions my mean was 49,995. I didn't once get the correct answer of 50,000. Weird. Why? The answer lies in how abstract, high-level programming language statements, in Java in this case, are executed on a machine. In this example, to perform an increment of a counter, the CPU must: - - - This simple increment is actually a sequence of three machine-level operations. As [[Figure 4-3]](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch04.html#increments_are_not_atomic_at_the_machin) shows, at the machine level these three operations are independent and not treated as a single *atomic* operation. By atomic, we mean an operation that cannot be interrupted and hence once started will run to completion. As the increment operation is not atomic at the machine level, one thread can load the counter value into a CPU register from memory, but before it writes the incremented value back, the scheduler preempts the thread and allows another thread to start. This thread loads the old value of the counter from memory and writes back the incremented value. Eventually the original thread executes again and writes back its incremented value, which just happens to be the same as what is already in memory. This means we've lost an update. From our 10 tests of the counter code above, we see this is happening on average 5 times in 50,000 increments. Hence such events are rare, but even if it happens 1 time in 10 million, you still have an incorrect result. Increments are not atomic at the machine level Figure 4-3. Increments are not atomic at the machine level When we lose updates in this manner, it is called a race condition. Race conditions can occur whenever multiple threads make changes to some shared state, in this case a simple counter. Essentially, different interleavings of the threads can produce different results. Race conditions are insidious, evil errors, because their occurrence is typically rare, and they can be hard to detect as most of the time the answer will be correct. Try running the multithreaded counter code example with 1,000 threads instead of 50,000, and you will see this in action. I got the correct answer nine times out of ten. So, this situation can be summarized as "same code, occasionally different results." Like I said, race conditions are evil! Luckily, eradicating them is straightforward if you take a few precautions. The key is to identify and protect *critical sections*. A critical section is a section of code that updates shared data structures and hence must be executed atomically if accessed by multiple threads. The example of incrementing a shared counter is an example of a critical section. Another is removing an item from a list. We need to delete the head node of the list and move the reference to the head of the list from the removed node to the next node in the list. Both operations must be performed atomically to maintain the integrity of the list. This is a critical section. In Java, the synchronized keyword defines a critical section. If used to decorate a method, then when multiple threads attempt to call that method on the same shared object, only one is permitted to enter the critical section. All others block until the thread exits the synchronized method, at which point the scheduler chooses the next thread to execute the critical section. We say the execution of the critical section is serialized, as only one thread at a time can be executing the code inside it. To fix the counterexample, you therefore just need to identify the inc() method as a critical section and make it a synchronized method, i.e.: synchronized public void inc() { count++; } Test it out as many times as you like. You'll always get the correct answer. Slightly more formally, this means any interleaving of the threads that the scheduler throws at us will always produce the correct results. The synchronized keyword can also be applied to blocks of statements within a method. For example, we could rewrite the above example as: public void inc() { synchronized(this){ count++; } } Underneath the covers, every Java object has a *monitor lock*, sometimes known as an intrinsic lock, as part of its runtime representation. The monitor is like the bathroom on a long-distance bus---only one person is allowed to (and should!) enter at once, and the door lock stops others from entering when in use. In our totally sanitary Java runtime environment, a thread must acquire the monitor lock to enter a synchronized method or synchronized block of statements. Only one thread can own the lock at any time, and hence execution is serialized. This, very basically, is how Java and similar languages implement critical sections. As a rule of thumb, you should keep critical sections as small as possible so that the serialized code is minimized. This can have positive impacts on performance and hence scalability. I'll return to this topic later, but I'm really talking about [[Amdahl's law]](https://oreil.ly/kLFs1) again, as introduced in [[Chapter 2]](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch02.html#distributed_systems_architectures_an_in). Synchronized blocks are the serialized parts of a system as described by Amdahl, and the longer they execute for, then the less potential we have for system scalability. **Deadlocks** To ensure correct results in multithreaded code, I explained that we have to restrict the inherent nondeterminism to serialize access to critical sections. This avoids race conditions. However, if we are not careful, we can write code that restricts nondeterminism so much that our program stops. And never continues. This is formally known as a deadlock. A deadlock occurs when two or more threads are blocked forever, and none can proceed. This happens when threads need exclusive access to a shared set of resources and the threads acquire locks in different orders. This is illustrated in the example below in which two threads need exclusive access to critical sections A and B. Thread 1 acquires the lock for critical section A, and thread 2 acquires the lock for critical section B. Both then block forever as they cannot acquire the locks they need to continue. Two threads sharing access to two shared variables via synchronized blocks: - - - - - A deadlock, also known as a deadly embrace, causes a program to stop. It doesn't take a vivid imagination to realize that this can cause all sorts of undesirable outcomes. I'm happily texting away while my autonomous vehicle drives me to the bar. Suddenly, the vehicle code deadlocks. It won't end well. Deadlocks occur in more subtle circumstances than the simple example above. The classic example is the dining philosophers problem. The story goes like this. Five philosophers sit around a shared table. Being philosophers, they spend a lot of time thinking deeply. In between bouts of deep thinking, they replenish their brain function by eating from a plate of food that sits in front of them. Hence a philosopher is either eating or thinking or transitioning between these two states. In addition, the philosophers must all be physically very close, highly dexterous, and COVID-19 vaccinated friends, as they share chopsticks to eat. Only five chopsticks are on the table, placed between each philosopher. When one philosopher wishes to eat, they follow a protocol of picking up their left chopstick first, then their right chopstick. Once they are ready to think again, they first return the right chopstick, then the left. [[Figure 4-4]](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch04.html#the_dining_philosophers_problem) depicts our philosophers, each identified by a unique number. As each is either concurrently eating or thinking, we can model each philosopher as a thread. ![The dining philosophers problem](media/image8.png) Figure 4-4. The dining philosophers problem The code is shown in [[Example 4-1]](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch04.html#example_four-onedot_the_philosopher_thr). The shared chopsticks are represented by instances of the Java Object class. As only one object can hold the monitor lock on an object at any time, they are used as entry conditions to the critical sections in which the philosophers acquire the chopsticks they need to eat. After eating, the chopsticks are returned to the table and the lock is released on each so that neighboring philosophers can eat whenever they are ready. **Example 4-1. The philosopher thread** public class Philosopher implements Runnable { private final Object leftChopStick; private final Object rightChopStick; Philosopher(Object leftChopStick, Object rightChopStick) { this.leftChopStick = leftChopStick; this.rightChopStick = rightChopStick; } private void LogEvent(String event) throws InterruptedException { System.out.println(Thread.currentThread().getName() + \" \" + event); Thread.sleep(1000); } public void run() { try { while (true) { LogEvent(\": Thinking deeply\"); synchronized (leftChopStick) { LogEvent( \": Picked up left chopstick\"); synchronized (rightChopStick) { LogEvent(\": Picked up right chopstick -- eating\"); LogEvent(\": Put down right chopstick\"); } LogEvent(\": Put down left chopstick. Ate too much\"); } } // end while } catch (InterruptedException e) { Thread.currentThread().interrupt(); } } } To bring the philosophers described in [[Example 4-1]](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch04.html#example_four-onedot_the_philosopher_thr) to life, we must instantiate a thread for each and give each philosopher access to their neighboring chopsticks. This is done through the thread constructor call at 1 in [[Example 4-2]](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch04.html#example_four-twodot_dining_philosophers). In the for loop we create five philosophers and start these as independent threads, where each chopstick is accessible to two threads, one as a left chopstick, and one as a right. **Example 4-2. Dining philosophers---deadlocked version** private final static int NUMCHOPSTICKS = 5 ; private final static int NUMPHILOSOPHERS = 5; public static void main(String\[\] args) throws Exception { final Philosopher\[\] ph = new Philosopher\[NUMPHILOSOPHERS\]; Object\[\] chopSticks = new Object\[NUMCHOPSTICKS\]; for (int i = 0; i \< NUMCHOPSTICKS; i++) { chopSticks\[i\] = new Object(); } for (int i = 0; i \< NUMPHILOSOPHERS; i++) { Object leftChopStick = chopSticks\[i\]; Object rightChopStick = chopSticks\[(i + 1) % chopSticks.length\]; ph\[i\] = new Philosopher(leftChopStick, rightChopStick); !(media/image3.png) Thread th = new Thread(ph\[i\], \"Philosopher \" + i); th.start(); } } Running this code produces the following output on my first attempt. If you run the code you will almost certainly see different outputs, but the final outcome will be the same: Philosopher 3 : Thinking deeply Philosopher 4 : Thinking deeply Philosopher 0 : Thinking deeply Philosopher 1 : Thinking deeply Philosopher 2 : Thinking deeply Philosopher 3 : Picked up left chopstick Philosopher 0 : Picked up left chopstick Philosopher 2 : Picked up left chopstick Philosopher 4 : Picked up left chopstick Philosopher 1 : Picked up left chopstick Ten lines of output, then...nothing! We have a deadlock. This is a classic circular waiting deadlock. Imagine the following scenario: - - - Real philosophers in this situation would figure out some way to proceed by putting down a chopstick or two until one or more of their colleagues can eat. We can sometimes do this in our software by using timeouts on blocking operations. When the timeout expires, a thread releases the critical section and retries, allowing other blocked threads a chance to proceed. This is not optimal though, as blocked threads hurt performance and setting timeout values is an inexact science. It is much better, therefore, to design a solution to be deadlock-free. This means that one or more threads will always be able to make progress. With circular wait deadlocks, this can be achieved by imposing a resource allocation protocol on the shared resources, so that threads will not always request resources in the same order. In the dining philosophers problem, we can do this by making sure one of our philosophers picks up their right chopstick first. Let's assume we instruct Philosopher 4 to do this. This leads to a possible sequence of operations such as below: - - - - - In this example, Philosopher 4 must block, as Philosopher 0 already has acquired access to chopstick\[0\]. With Philosopher 4 blocked, Philosopher 3 is assured access to chopstick\[4\] and can then proceed to satisfy their appetite. The fix for the dining philosophers solution is shown in [[Example 4-3]](https://learning.oreilly.com/library/view/foundations-of-scalable/9781098106058/ch04.html#example_four-threedot_solving_the_dinin). **Example 4-3. Solving the dining philosophers deadlock** if (i == NUMPHILOSOPHERS - 1) { // The last philosopher picks up the right chopstick first ph\[i\] = new Philosopher(rightChopStick, leftChopStick); } else { // all others pick up the left chopstick first ph\[i\] = new Philosopher(leftChopStick, rightChopStick); } More formally we are imposing an ordering on the acquisition of shared resources, such that: chopStick\[0\] \

Chapter 4: An Overview of Concurrent Systems PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue