When Parallelism Beats Concurrency _ by The Bored Dev _ Better Programming.pdf
Document Details
Uploaded by Deleted User
Tags
Related
- When Parallelism Beats Concurrency _ by The Bored Dev _ Better Programming.pdf
- Parallelism Theory: American School of Comparative Literature PDF
- Chapter 3 Process PDF
- Java Multithreading (CSCI 2308) - Programming 3 PDF
- Graphics Programming I Optimizations PDF
- Principles Of Programming Languages - Unit 4 - Concurrency - PDF
Full Transcript
19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming Member-only story When Parallelism Beats Concurrency These two are not the same thing The Bored Dev · Follow...
19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming Member-only story When Parallelism Beats Concurrency These two are not the same thing The Bored Dev · Follow Published in Better Programming 10 min read · Jul 1, 2020 Listen Share More https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 1/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 2/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming I’m aware that for many of you, these two concepts probably mean the same thing, or maybe you’d struggle to explain the differences between them. However, they’re actually two very different concepts. This is quite important to understand the way in which we process data nowadays. In both cases, we try to solve a problem faster by increasing the number of workers dedicated to a given task, but the way in which they distribute the work to do is where their biggest difference stands. You’ll be able to see why it’s so important to understand their differences for an efficient, safe, and error-free processing of data. Let’s start by explaining these concepts! Introduction To start with, let’s take a brief look at what we should be understanding as concurrency and parallelism. Concurrency In layman’s terms, concurrency is the situation where, in order to solve a problem, we process it in a way that one single task gets processed concurrently by multiple workers; that said, let’s imagine a big array where we have multiple workers and each worker does some work on the next element to be processed in the array, until we reach the end of the array. https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 3/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming When concurrency takes place, some synchronisation is required in order to access the resource that gets shared among all the existing workers (in our example, this is the array). The complexity and performance overhead involved in this approach could be very significant in some cases; we’ll try to demonstrate this later on in this article. Parallelism On the other hand, parallelism is the situation where, in order to solve a problem, we decide to take a “ Divide and Conquer” approach and split the problem into multiple small problems. This allows us to solve multiple smaller problems in parallel. How would it look like using the same example we’ve shown above? Let’s imagine that we have four parallel workers. A parallel solution would be as shown below: https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 4/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming As you can see, the difference is substantial; we have now four smaller tasks that can be solved independently. Knowing this, we can affirm that the elements get processed sequentially by each worker! When each worker has completed the task, we can then combine their results to produce one single and final result. The main benefit of this is that we don’t need to synchronise the access of the workers to a shared resource; now, each worker has its independent chunk of work to process. I hope the difference between these two is clear, but what’s left? We have one more case, which is quite obvious: sequential processing. In a standard sequential process, we’d be processing our task in sequential order with a single worker. This is the more common approach, and in some cases, it could surprisingly be the fastest! If you need to have a better understanding about concurrency and multi-threading and why we do things in the way we do them nowadays, I’d recommend that you read the book “Java Concurrency in Practice” by Brian Goetz. How Can We Solve It? https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 5/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming If we compare these three approaches in terms of performance, we can almost guarantee that in most cases, the concurrent approach will be less performant by far. Why is that? As we mentioned before, synchronisation between threads to access shared resources causes contention, which affects performance. Let’s go through an example to understand the reasons for this performance impact! Concurrent Approach The problem we’ll be showing is the addition of all the elements in a collection of integers. In the concurrent approach, we’ll be using several threads to add each pair of elements to try to speed things up; so what does it look like? First of all, we are going to create an interface to implement the solution following different approaches. https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 6/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming 1 public interface Processor { 2 Integer process(List input) throws InterruptedException; 3 } Processor.java hosted with ❤ by GitHub view raw Having that, our concurrent implementation would be: https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 7/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming 1 package com.theboreddev.concurrency; 2 3 import com.theboreddev.Processor; 4 5 import java.util.List; 6 import java.util.concurrent.ExecutorService; 7 import java.util.concurrent.Executors; 8 import java.util.concurrent.TimeUnit; 9 import java.util.concurrent.atomic.AtomicInteger; 10 import java.util.concurrent.atomic.LongAccumulator; 11 import java.util.stream.IntStream; 12 13 public class ConcurrentProcessor implements Processor { 14 private static final int THREADS = 4; 15 16 private final LongAccumulator result = new LongAccumulator(Long::sum, 0L); 17 private final AtomicInteger position = new AtomicInteger(0); 18 private final ExecutorService executor = Executors.newFixedThreadPool(THREADS); 19 20 @Override 21 public Integer process(List input) throws InterruptedException { 22 processArrayConcurrently(input.toArray(new Integer[input.size()])); 23 return result.intValue(); 24 } 25 26 private void processArrayConcurrently(Integer[] array) throws InterruptedException 27 28 final Runnable runnable = () -> { 29 while (position.intValue() < array.length) { 30 addElements(array); 31 } 32 }; 33 IntStream.rangeClosed(1, THREADS) 34.forEach(threadNumber -> executor.submit(runnable)); 35 36 executor.awaitTermination(1, TimeUnit.SECONDS); 37 } 38 39 private void addElements(Integer[] array) { 40 int current = position.getAndAdd(2); 41 final int sum = array[current] + array[current + 1]; 42 result.accumulate(sum); 43 } 44 } ConcurrentProcessor.java hosted with ❤ by GitHub view raw https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 8/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming What has been done is basically to create a fixed number of threads, where each thread will be adding a pair of elements in the array each time until the end of the array is reached. You will notice that we’ve made use of two important Java classes available in the java.util.concurrent.atomic package; we’re talking about AtomicInteger and LongAccumulator. These two classes are very helpful to synchronise the access to the current position and the final result; however, they could also have a considerable impact on the performance of a solution. Let’s see why! Atomic variables internals First of all, why have we decided to use AtomicInteger to hold our current position? Well, AtomicInteger is a good fit for those situations where a variable is going to be modified by multiple threads but more importantly, this variable is going to be read very frequently. AtomicInteger uses a combination of “ compare and swap” and volatile variables. “Compare and swap” is an improvement with respect to the use of “synchronized” blocks; however, if a thread tries to set a value that has changed, its write operation will fail and it’ll have to retry. On top of that, the use of volatile implies that in many cases when one thread access that variable, its value has to be retrieved from main memory; this is a considerable performance impact, as access to main memory is much more expensive than access to cache levels. https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 9/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming What about LongAccumulator? This type of class is useful when several threads write to a variable but it never gets read until the operation completes. How does it work? LongAccumulator cleverly utilises a non-blocking approach by allowing multiple writes in parallel to individual counters and then they all get merged at the end of the process. Something like this: https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 10/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming So after understanding how our solution works, we could affirm that the main bottleneck in this implementation would be the synchronisation of the current position in the array among the different threads. Let’s take a look now at our sequential implementation! Sequential Approach The sequential approach is quite straightforward — we just process each element sequentially using a single thread. Let’s see what it looks like: https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 11/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming 1 package com.theboreddev.sequential; 2 3 import com.theboreddev.Processor; 4 5 import java.util.List; 6 7 public class SequentialProcessor implements Processor { 8 9 @Override 10 public Integer process(List input) { 11 return input 12.stream() 13.reduce(Integer::sum) 14.orElse(0); 15 } 16 } SequentialProcessor.java hosted with ❤ by GitHub view raw We are iterating through the elements using a Java Stream and making use of reduce method to combine the elements by adding each pair. We could also use a standard Java loop and iteratively go through each element, keeping the sum in a variable. What would be the problem with this approach? The main problem is that it couldn’t be easily parallelised; it’d need some level of synchronisation using a concurrent approach to be able to get a valid result. So our sequential solution is not bad, although it could be improved in those situations where adding more threads could reduce the amount of time taken to do the job. That takes us to our last approach: the parallel approach. Parallel Approach Implementing a parallel approach used to be much more difficult in the past, but with the introduction of Java Streams since JDK 8 things have become much simpler for us. Actually, our parallel implementation will only be different in one line with respect to our sequential approach, thanks to Java Streams API: https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 12/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming 1 package com.theboreddev.parallelism; 2 3 import com.theboreddev.Processor; 4 5 import java.util.List; 6 7 public class ParallelProcessor implements Processor { 8 9 @Override 10 public Integer process(List input) { 11 return input 12.parallelStream() 13.reduce(Integer::sum) 14.orElse(0); 15 } 16 } ParallelProcessor.java hosted with ❤ by GitHub view raw Very simple, right? So do you think that a parallel execution will always be faster than a sequential execution? Probably most of you will think that definitely it will, but the truth is that it’s not that simple! Why is that? In order to be able to have effective parallel processing, we have to consider different aspects; that means that only under certain conditions, parallelising our code will actually improve the performance of our code. Let’s take a look at these aspects! https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 13/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming Parallel vs. Sequential As we have mentioned, a parallel approach is not always the fastest solution to solve a problem. There are several aspects that we have to keep in consideration before parallelising our code: Is the data big enough to make a difference? In many cases, the amount of data that we have to process is so small that a sequential execution with a single thread will be completed by the time the data is split and distributed to the threads! Is the source easily splittable? If we cannot easily split the data in an even way, this would probably cause an overhead that will make us lose all the benefits of running parallel workers when compared to sequential execution. For example, if your source is an Iterator, you won’t be able to split it easily; however, Java has introduced Spliterator since JDK 8 to have a splittable iterator. Can it be split into completely independent and isolated chunks of work? https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 14/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming Sometimes each step in a step is dependent on the result from the previous step, creating an interdependency that will be impossible to break. These tasks are inherently sequential, and in these cases, it’ll be absolutely pointless to run our task in parallel because we won’t see any benefit. Is merging results cheap or expensive? It won’t be the same having to merge integers as having to merge multiple complex trees generated by different threads; that would be very expensive. We have to always consider this, as we could get unexpected results when parallelising our code. Do we have to keep the order in which we process the elements? In those cases where we have to keep the order in which we receive the elements, our parallelisation options will be very limited. Knowing that, we should know that Java Streams are able to do optimisations when it knows that data is unordered, as we explained in my article “Understanding Java Streams” about how streams use flags to do optimisations. One way to take advantage of that is to use unordered() to express that we don’t care about the order; that will allow optimisations for many short-circuiting operations like findFirst(), findAny() or limit(). Cache misses The way we implement our code could have a significant impact on performance; for example, the use of array-based structures favours the allocation in CPU caches. Our hardware will allocate a chunk of the array when we access the first element, because most of the time, we’ll be accessing more elements in the array. This has a very beneficial impact on performance, as cache access is much faster than main memory access. In the case of parallel executions, if we constantly get cache misses, our threads will be blocked waiting for data to be served from memory; this would mean that we’d lose part of the benefit of running parallel threads. https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 15/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming With Java streams, parallel execution is ALMOST magic, but it’s not as simple as it looks like! OK, so these are some of the things that you’ll have to keep in mind when using parallel executions! Let’s see now some performance results for our implementations to see if all the theory we’ve gone through does actually make sense. Performance Comparison In this section, we’ll be showing the results of some benchmark executions against our implementations to check if that matches what we’ve studied so far. JMH has been used to easily create a benchmark test; we have selected to run the “average time” benchmark among the different types of benchmark available in JMH. In our first run, we’re going to select a size of 1,000,000 elements; based on the results, we’ll see what to do next. First benchmark: 1,000,000 elements The source code to understand how these tests have been run can be found here. Basically, we’ve run two warmup benchmark iterations to give enough time to JIT compiler to do optimisations, and then we’ve taken the results on the third iteration. So, let’s get to the point — what were the results? As expected, the concurrent approach was by far the worst performant, taking 1 second and 38 ms on average. On the other hand, the parallel and sequential https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 16/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming approach had similar results. The sequential approach was the winner by more than 2 ms! As we mentioned earlier, it’s not that easy to write efficient parallel code; I’m thinking that one of the reasons could be that data was not big enough. Let’s run our benchmark again with 10,000,000 instead. Second benchmark: 10,000,000 elements The results to add 10,000,000 elements were: In this case, the clear winner is the parallel approach! We’ve also noticed that the advantage with respect to the concurrent approach has been reduced considerably. So we’ve had to process as much as 10,000,000 elements to notice a significant improvement with parallel processing; however, this is just a very simple example and other aspects could be taken intoOpen consideration in app and further improvements could be made. That’s it from me! I really hope you’ve learned something this time! Conclusion We’ve learned that although Java Streams make it really easy to switch our code to be executed in parallel, things are not always as simple as that. The best thing to do is to always run performance tests when performance is among our business requirements, but please don’t fall into premature optimisations. If making your code faster has no real business benefit in your organisation, then don’t waste your time and effort. https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 17/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming One more thing that always helps is to have a good understanding of how every layer works, from the CPU to the very top of our application’s source code. Having an understanding of the internals and the effects that this could have on performance always helps to anticipate possible issues in terms of performance. A new concurrency model in Java Things have changed considerably in the last few years in terms of how we write code in concurrent models. In the past… codeburst.io Combining multiple API calls with CompletableFuture https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 18/27 19.09.23, 17:15 When Parallelism Beats Concurrency | by The Bored Dev | Better Programming After having had an introduction to the new concurrency paradigm in Java in my previous article “ A new concurrency… medium.com Thank you very much for reading! Programming Java Concurrency Software Development Software Engineering Follow Written by The Bored Dev 1.5K Followers · Writer for Better Programming 💻 More from The Bored Dev and Better Programming https://betterprogramming.pub/when-parallelism-beats-concurrency-5f52d7012944 19/27