dokumenpub_c-high-performance-2nbsped-9781839216541-218-263.pdf
Document Details
Uploaded by GoodlySloth8585
Tags
Full Transcript
7 Memory Management After reading the previous chapters, it should no longer come as a surprise that the way we handle memory can have a huge impact on performance. The CPU spends a lot of time shuffling data between the CPU registers and the main memory (loading and storing data to and from the mai...
7 Memory Management After reading the previous chapters, it should no longer come as a surprise that the way we handle memory can have a huge impact on performance. The CPU spends a lot of time shuffling data between the CPU registers and the main memory (loading and storing data to and from the main memory). As shown in Chapter 4, Data Structures, the CPU uses memory caches to speed up access to memory, and programs need to be cache-friendly in order to run quickly. This chapter will reveal more aspects of how computers work with memory so that you know which things must be considered when tuning memory usage. In addition, this chapter covers: Automatic memory allocation and dynamic memory management. The life cycle of a C++ object and how to manage object ownership. Efficient memory management. Sometimes, there are hard memory limits that force us to keep our data representation compact, and sometimes, we have plenty of memory available but need the program to go faster by making memory management more efficient. How to minimize dynamic memory allocations. Allocating and deallocating dynamic memory is relatively expensive and, at times, we need to avoid unnecessary allocations to make the program run faster. We will start this chapter by explaining some concepts that you need to understand before we dig deeper into C++ memory management. This introduction will explain virtual memory and virtual address spaces, stack memory versus heap memory, paging, and swap space. [ 191 ] Memory Management Computer memory The physical memory of a computer is shared among all the processes running on a system. If one process uses a lot of memory, the other processes will most likely be affected. But from a programmer's perspective, we usually don't have to bother about the memory that is being used by other processes. This isolation of memory is due to the fact that most operating systems today are virtual memory operating systems, which provide the illusion that a process has all the memory for itself. Each process has its own virtual address space. The virtual address space Addresses in the virtual address space that programmers see are mapped to physical addresses by the operating system and the memory management unit (MMU), which is a part of the processor. This mapping or translation happens each time we access a memory address. This extra layer of indirection makes it possible for the operating system to use physical memory for the parts of a process that are currently being used, and back up the rest of the virtual memory on disk. In this sense, we can look at the physical main memory as a cache for the virtual memory space, which resides on secondary storage. The areas of the secondary storage that are used for backing up memory pages are usually called swap space, swap file, or simply pagefile, depending on the operating system. Virtual memory makes it possible for processes to have a virtual address space bigger than the physical address space, since virtual memory that is not in use does not have to occupy physical memory. Memory pages The most common way to implement virtual memory today is to divide the address space into fixed-size blocks called memory pages. When a process accesses memory at a virtual address, the operating system checks whether the memory page is backed by physical memory (a page frame). If the memory page is not mapped in the main memory, a hardware exception occurs, and the page is loaded from disk into memory. This type of hardware exception is called a page fault. This is not an error but a necessary interrupt in order to load data from disk to memory. As you may have guessed, though, this is very slow compared to reading data that is already resident in memory. [ 192 ] Chapter 7 When there are no more available page frames in the main memory, a page frame has to be evicted. If the page to be evicted is dirty, that is, it has been modified since it was last loaded from disk, it needs to be written to disk before it can be replaced. This mechanism is called paging. If the memory page has not been modified, the memory page is simply evicted. Not all operating systems that support virtual memory support paging. iOS, for example, does have virtual memory but dirty pages are never stored on disk; only clean pages can be evicted from memory. If the main memory is full, iOS will start terminating processes until there is enough free memory again. Android uses a similar strategy. One reason for not writing memory pages back to the flash storage of the mobile devices is that it drains the battery, and it also shortens the lifespan of the flash storage itself. The following diagram shows two running processes. They both have their own virtual memory space. Some of the pages are mapped to the physical memory, while some are not. If process 1 needs to use memory in the memory page that starts at address 0x1000, a page fault will occur. The memory page will then be mapped to a vacant memory frame. Also, note that the virtual memory addresses are not the same as the physical addresses. The first memory page of process 1, which starts at the virtual address 0x0000, is mapped to a memory frame that starts at the physical address 0x4000: Figure 7.1: Virtual memory pages, mapped to memory frames in physical memory. Virtual memory pages that are not in use do not have to occupy physical memory. [ 193 ] Memory Management Thrashing Thrashing can happen when a system runs low on physical memory and is, therefore, constantly paging. Whenever a process gets time scheduled on the CPU, it tries to access memory that has been paged out. Loading new memory pages means that the other pages first have to be stored on disk. Moving data back and forth between disk and memory is usually very slow; in some cases, this more or less stalls the computer since the system spends all its time paging. Looking at the system's page fault frequency is a good way to determine whether the program has started thrashing. Knowing the basics of how memory is being handled by the hardware and the OS is important when optimizing performance. Next, we will see how memory is handled during the execution of a C++ program. Process memory The stack and the heap are the two most important memory segments in a C++ program. There is also static storage and thread local storage, but we will talk more about that later. Actually, to be formally correct, C++ doesn't talk about stack and heap; instead, it talks about the free store, storage classes, and the storage duration of objects. However, since the concepts of stack and heap are widely used in the C++ community, and all the implementations of C++ that we are aware of use a stack to implement function calls and manage the automatic storage of local variables, it is important to understand what stack and heap are. In this book, I will also use the terms stack and heap rather than the storage duration of objects. I will use the terms heap and free store interchangeably and will not make any distinction between them. Both the stack and the heap reside in the process' virtual memory space. The stack is a place where all the local variables reside; this also includes arguments to functions. The stack grows each time a function is called and contracts when a function returns. Each thread has its own stack and, hence, stack memory can be considered threadsafe. The heap, on the other hand, is a global memory area that is shared among all the threads in a running process. The heap grows when we allocate memory with new (or the C library functions malloc() and calloc()) and contracts when we free the memory with delete (or free()). Usually, the heap starts at a low address and grows in an upward direction, whereas the stack starts at a high address and grows in a downward direction. Figure 7.2 shows how the stack and heap grow in opposite directions in a virtual address space: [ 194 ] Chapter 7 Figure 7.2: An address space of a process. The stack and the heap grow in opposite directions. The next sections will provide more details about the stack and the heap, and also explain when we are using each of these memory areas in the C++ programs we write. Stack memory The stack differs in many ways compared to the heap. Here are some of the unique properties of the stack: The stack is a contiguous memory block. It has a fixed maximum size. If a program exceeds the maximum stack size, the program will crash. This condition is called stack overflow. The stack memory never becomes fragmented. Allocating memory from the stack is (almost) always fast. Page faults are possible but rare. Each thread in a program has its own stack. The code examples that follow in this section will examine some of these properties. Let's start with allocations and deallocations to get a feel for how the stack is used in a program. We can easily find out in which direction the stack grows by inspecting the address of the stack-allocated data. The following example code demonstrates how the stack grows and contracts when entering and leaving functions: void func1() { auto i = 0; std::cout void; }; template auto operator==(const Alloc&, const Alloc&) -> bool; template auto operator!=(const Alloc&, const Alloc&) -> bool; [ 226 ] Chapter 7 It's really not that much code anymore, thanks to the improvements in C++11. The container that uses the allocator actually uses std::allocator_traits, which provides reasonable defaults if the allocator omits them. I recommend you have a look at the std::allocator_traits to see what traits can be configured and what the defaults are. By using malloc() and free(), we could quite easily implement a minimal custom allocator. Here, we will show the old and famous Mallocator, first published in a blog post by Stephan T. Lavavej, to demonstrate how to write a minimal custom allocator using malloc() and free(). Since then, it has been updated for C++11 to make it even slimmer. Here is how it looks: template struct Mallocator { using value_type = T; Mallocator() = default; template Mallocator(const Mallocator&) noexcept {} template auto operator==(const Mallocator&) const noexcept { return true; } template auto operator!=(const Mallocator&) const noexcept { return false; } auto allocate(size_t n) const -> T* { if (n == 0) { return nullptr; } if (n > std::numeric_limits::max() / sizeof(T)) { throw std::bad_array_new_length{}; } void* const pv = malloc(n * sizeof(T)); if (pv == nullptr) { throw std::bad_alloc{}; } return static_cast(pv); [ 227 ] Memory Management } auto deallocate(T* p, size_t) const noexcept -> void { free(p); } }; Mallocator is a stateless allocator, which means that the allocator instance itself doesn't have any mutable state; instead, it uses global functions for allocation and deallocation, namely malloc() and free(). A stateless allocator should always compare equal to the allocators of the same type. It indicates that memory allocated with Mallocator should also be deallocated with Mallocator, regardless of the Mallocator instance. A stateless allocator is the least complicated allocator to write, but it is also limited, since it depends on the global state. To use our arena as a stack-allocated object, we will need a stateful allocator that can reference the arena instance. Here, the arena class that we implemented really starts to make sense. Say, for example, that we want to use one of the standard containers in a function to do some processing. We know that, most of the time, we are dealing with very small amounts of data that will fit on the stack. But once we use the containers from the standard library, they will allocate memory from the heap, which, in this case, will hurt our performance. What are the alternatives to using the stack to manage data and avoid unnecessary heap allocations? One alternative is to build a custom container that uses a variation of the small object optimization we looked at for std::string. It is also possible to use a container from Boost, such as boost::container::small_ vector, which is based on LLVM's small vector. We advise you to check it out if you haven't already: http://www.boost.org/doc/libs/1_74_0/doc/html/container/non_ standard_containers.html. Yet another alternative, though, is to use a custom allocator, which we will explore next. Since we already have an arena template class ready, we could simply create the instance of an arena on the stack and have a custom allocator use it for the allocations. What we then need to do is implement a stateful allocator, which could hold a reference to the stack-allocated arena object. Again, this custom allocator that we will implement is a simplified version of Howard Hinnant's short_alloc: template struct ShortAlloc { using value_type = T; using arena_type = Arena; [ 228 ] Chapter 7 ShortAlloc(const ShortAlloc&) = default; ShortAlloc& operator=(const ShortAlloc&) = default; ShortAlloc(arena_type& arena) noexcept : arena_{&arena} { } template ShortAlloc(const ShortAlloc& other) noexcept : arena_{other.arena_} {} template struct rebind { using other = ShortAlloc; }; auto allocate(size_t n) -> T* { return reinterpret_cast(arena_->allocate(n*sizeof(T))); } auto deallocate(T* p, size_t n) noexcept -> void { arena_->deallocate(reinterpret_cast(p), n*sizeof(T)); } template auto operator==(const ShortAlloc& other) const noexcept { return N == M && arena_ == other.arena_; } template auto operator!=(const ShortAlloc& other) const noexcept { return !(*this == other); } template friend struct ShortAlloc; private: arena_type* arena_; }; The allocator holds a reference to the arena. This is the only state the allocator has. The functions allocate() and deallocate() simply forward their requests to the arena. The compare operators ensure that two instances of the ShortAlloc type are using the same arena. Now, the allocator and arena we implemented can be used with a standard container to avoid dynamic memory allocations. When we are using small data, we can handle all allocations using the stack instead. Let's see an example using std::set: [ 229 ] Memory Management int main() { using SmallSet = std::set; auto stack_arena = SmallSet::allocator_type::arena_type{}; auto unique_numbers = SmallSet{stack_arena}; // Read numbers from stdin auto n = int{}; while (std::cin >> n) unique_numbers.insert(n); // Print unique numbers for (const auto& number : unique_numbers) std::cout