ComOrg_CH04 - Memory Hierarchy - PDF

Summary

This document is about the principle of locality, also known as locality of reference, in computer systems. It discusses the characteristics of memory systems, how locality influences memory hierarchy development, and the performance implications of multiple memory levels, within a computer science context. 

Full Transcript

# Chapter 4 The Memory Hierarchy: Locality and Performance ## Learning Objectives After studying this chapter, you should be able to: - Present an overview of the principle of locality. - Describe key characteristics of a memory system. - Discuss how locality influences the development of a memory...

# Chapter 4 The Memory Hierarchy: Locality and Performance ## Learning Objectives After studying this chapter, you should be able to: - Present an overview of the principle of locality. - Describe key characteristics of a memory system. - Discuss how locality influences the development of a memory hierarchy. - Understand the performance implications of multiple levels of memory. ## 4.1 Principle Of Locality One of the most important concepts related to computer systems is the principle of locality (DENN05). It is also referred to as the locality of reference. This principle reflects the observation that during the course of execution of a program, memory references by the processor, for both instructions and data, tend to cluster. Programs typically contain a number of iterative loops and subroutines. Once a loop or subroutine is entered, there are repeated references to a small set of instructions. Similarly, operations on tables and arrays involve access to a clustered set of data words. Over a long period of time, the clusters in use change, but over a short period of time, the processor is primarily working with fixed clusters of memory references. We can put these observations more specifically. As we discuss in Section 4.3, for different types of memory, memory is accessed and retrieved in units of different sizes, ranging from individual words to large blocks of cache memory to much larger segments of disk memory. Denning observed that locality is based on three assertions (DENN72): 1. During any interval of time, a program references memory locations non-uniformly. That is, some units of memory are more likely to be accessed than others. 2. As a function of time, the probability that a given unit of memory is referenced tends to change slowly. Put another way, the probability distribution of memory references across the entire memory space tends to change slowly over time. 3. The correlation between immediate past and immediate future memory reference patterns is high, and tapers off as the time interval increases. Intuitively, the principle of locality makes sense. Consider the following line of reasoning: 1. Except for branch and call instructions, which constitute only a small fraction of all program instructions, program execution is sequential. Hence, in most cases, the next instruction to be fetched immediately follows the last instruction fetched. 2. It is rare to have a long uninterrupted sequence of procedure calls followed by the corresponding sequence of returns. Rather, a program remains confined to rather narrow window of procedure-invocation depth. Thus, over a short period of time, references to instructions tend to be localized to a few procedures. 3. Most iterative constructs consist of a relatively small number of instructions repeated many times. For the duration of the iteration, computation is therefore confined to a small contiguous portion of a program. 4. In many programs, much of the computation involves processing data structures, such as arrays or sequences of records. In many cases, successive references to these data structures will be to closely located data items. Numerous studies, stretching back to the early 1970s, confirm these observations. [FEIT15] provides a summary of many of these studies. A distinction is made in the literature between two forms of locality: 1. **Temporal locality:** Refers to the tendency of a program to reference in the near future those units of memory referenced in the recent past. For example, when an iteration loop is executed, the processor executes the same set of instructions repeatedly. Constants, temporary variables, and working stacks are also constructs that lead to this principle. 2. **Spatial locality:** Refers to the tendency of a program to reference units of memory whose addresses are near one another. That is, if a unit of memory x is referenced at time t, it is likely that units in the range x - k through x + k will be referenced in the near future, for a relatively small value of k. This reflects the tendency of a processor to access instructions sequentially. Spatial location also reflects the tendency of a program to access data locations sequentially, such as when processing a table of data. A crude analogy may help illuminate the distinction between these two concepts (Figure 4.1). - **Temporal Locality** Suppose that Bob is working in an office and spends much of his time dealing with documents in file folders. Thousand of folders are stored in file cabinets in the next room, and for convenience Bob has a file organizer on his desk that can hold a few dozen files. When Bob is working on a file and temporarily is finished, it may be likely that he will need to read or write one of the documents in that file in the near future, so he keeps it in his desk organizer. This is an example of exploiting temporal locality. - **Spatial Locality** Bob also observes that when he retrieves a folder from the filing cabinets, it is likely that in the near future he will need access to some of the nearby folders as well, so he retrieves the folder he needs plus a few folders on either side at the same time. This is an example of exploiting spatial locality. Of course, Bob's desktop file organizer soon fills up, so that when he goes to retrieve a folder from the file cabinets, he needs to return folders from his desk. Bob needs some policy for replacing folders. If he focuses on temporal locality, Bob could choose to replace only one folder at a time, on the reasoning that he might need any of the folders currently on his desk in the near future. So Bob could replace perhaps the folder that had been on the desk the longest or the one that had be the least recently used. If Bob focuses on spatial locality, when he needs a folder not on his desk, he could return and refile all the folders on his desk and retrieve a batch of contiguous folders that includes the one he needs plus other nearby folders sufficient to fill up his desktop organizer. It is likely that neither policy is optimal. In the first case, he might have to make frequent trips to the next room to get one folder he doesn't have but which is near one he does have. In the second case, he might have to make frequent trips to the next room to get a folder that he had just recently put away. So perhaps a policy of returning and retrieving in batches equal to 10% or 20% of his desktop capacity would be closer to optimal. For cache memory, temporal locality is traditionally exploited by keeping recently used instruction and data values in cache memory and by exploiting a cache hierarchy. Spatial locality is generally exploited by using larger cache blocks and by incorporating prefetching mechanisms (fetching items of anticipated use) into the cache control logic. Over the years, there has been considerable research on refining these techniques to achieve greater performance, but the basic strategies remain the same. ## 4.2 Characteristics Of Memory Systems The complex subject of computer memory is made more manageable if we classify memory systems according to their key characteristics. The most important of these are listed in Table 4.1. | Characteristic | Description | |---|---| | **Location** | - Internal (e.g., processor registers, cache, main memory) <br> - External (e.g., optical disks, magnetic disks, tapes) | | **Capacity** | - Number of words <br>- Number of bytes <br>- Unit of Transfer (Word, Block) | | **Access Method** | - Sequential <br>- Direct <br>- Random <br>- Associative | | **Performance** | - Access time <br>- Cycle time <br>- Transfer rate | | **Physical Type** | - Semiconductor <br>- Magnetic <br>- Optical <br>- Magneto-optical | | **Physical Characteristics** | - Volatile/nonvolatile <br>- Erasable/nonerasable <br>- Organization (Memory modules) | The term **location** in Table 4.1 refers to whether memory is internal or external to the computer. Internal memory is often equated with main memory, but there are other forms of internal memory. The processor requires its own local memory, in the form of registers (e.g., see Figure 2.3). Further, as we will see, the control unit portion of the processor may also require its own internal memory. We will defer discussion of these latter two types of internal memory to later chapters. Cache is another form of internal memory. External memory consists of peripheral storage devices, such as disk and tape, that are accessible to the processor via I/O controllers. An obvious characteristic of memory is its **capacity**. For internal memory, this is typically expressed in terms of bytes (1byte = 8bits) or words. Common word lengths are 8, 16, and 32 bits. External memory capacity is typically expressed in terms of bytes. A related concept is the **unit of transfer**. For internal memory, the unit of transfer is equal to the number of electrical lines into and out of the memory module. This may be equal to the word length, but is often larger, such as 64, 128, or 256 bits. To clarify this point, consider three related concepts for internal memory: - **Word** The "natural" unit of organization of memory. The size of a word is typically equal to the number of bits used to represent an integer and to the instruction length. Unfortunately, there are many exceptions. For example, the CRAY C90 (an older model CRAY supercomputer) has a 64-bit word length but uses a 46-bit integer representation. The Intel x86 architecture has a wide variety of instruction lengths, expressed as multiples of bytes, and a word size of 32 bits. - **Addressable units** In some systems, the addressable unit is the word. However, many systems allow addressing at the byte level. In any case, the relationship between the length in bits A of an address and the number N of addressable units is 2 = N. - **Unit of transfer** For main memory, this is the number of bits read out of or written into memory at a time. The unit of transfer need not equal a word or an addressable unit. For external memory, data are often transferred in much larger units than a word, and these are referred to as blocks. Another distinction among memory types is the method of accessing units of data. These include the following: - **Sequential access** Memory is organized into units of data, called records. Access must be made in a specific linear sequence. Stored addressing information is used to separate records and assist in the retrieval process. A shared read-write mechanism is used, and this must be moved from its current location to the desired location, passing and rejecting each intermediate record. Thus, the time to access an arbitrary record is highly variable. Tape units, discussed in Chapter 7, are sequential access. - **Direct access** As with sequential access, direct access involves a shared read-write mechanism. However, individual blocks or records have a unique address based on physical location. Access is accomplished by direct access to reach a general vicinity plus sequential searching, counting, or waiting to reach the final location. Again, access time is variable. Disk units, discussed in Chapter 6, are direct access. - **Random access** Each addressable location in memory has a unique, physically wired-in addressing mechanism. The time to access a given location is independent of the sequence of prior accesses and is constant. Thus, any location can be selected at random and directly addressed and accessed. Main memory and some cache systems are random access. - **Associative** This is a random access type of memory that enables one to make a comparison of desired bit locations within a word for a specified match, and to do this for all words simultaneously. Thus, a word is retrieved based on a portion of its contents rather than its address. As with ordinary random-access memory, each location has its own addressing mechanism, and retrieval time is constant independent of location or prior access patterns. Cache memories may employ associative access. From a user's point of view, the two most important characteristics of memory are capacity and performance. Three performance parameters are used: - **Access time (latency)** For random-access memory, this is the time it takes to perform a read or write operation, that is, the time from the instant that an address is presented to the memory to the instant that data have been stored or made available for use. For non-random-access memory, access time is the time it takes to position the read-write mechanism at the desired location. - **Memory cycle time** This concept is primarily applied to random-access memory and consists of the access time plus any additional time required before a second access can commence. This additional time may be required for transients to die out on signal lines or to regenerate data if they are read destructively. Note that memory cycle time is concerned with the system bus, not the processor. - **Transfer rate** This is the rate at which data can be transferred into or out of a memory unit. For random-access memory, it is equal to 1/(cycle time). For non-random-access memory, the following relationship holds: $$T_a = T_a + n/R$$ where - $T_a$ = Average time to read or writen bits - $T_a$= Average access time - n = Number of bits - R = Transfer rate, in bits per second(bps) A variety of physical types of memory have been employed. The most common today are semiconductor memory, magnetic surface memory, used for disk and tape, and optical and magneto-optical. Several physical characteristics of data storage are important. In a **volatile memory**, information decays naturally or is lost when electrical power is switched off. In a **nonvolatile memory**, information once recorded remains without deterioration until deliberately changed; no electrical power is needed to retain information. Magnetic-surface memories are nonvolatile. Semiconductor memory (memory on integrated circuits) may be either volatile or nonvolatile. **Nonerasable memory** cannot be altered, except by destroying the storage unit. Semiconductor memory of this type is known as **read-only memory (ROM)**. Of necessity, a practical nonerasable memory must also be nonvolatile. For random-access memory, the organization is a key design issue. In this context, organization refers to the physical arrangement of bits to form words. The obvious arrangement is not always used, as is explained in Chapter 6. ## 4.3 The Memory Hierarchy The design constraints on a computer's memory can be summed up by three questions: How much? How fast? How expensive? - The question of how much is somewhat open ended. If the capacity is there, applications will likely be developed to use it. - The question of how fast is, in a sense, easier to answer. To achieve greatest performance, the memory must be able to keep up with the processor. That is, as the processor is executing instructions, we would not want it to have to pause waiting for instructions or operands. - The final question must also be considered. For a practical system, the cost of memory must be reasonable in relationship to other components. As might be expected, there is a trade-off among the three key characteristics of memory: capacity, access time, and cost. A variety of technologies are used to implement memory systems, and across this spectrum of technologies, the following relationships hold: - Faster access time, greater cost per bit - Greater capacity, smaller cost per bit - Greater capacity, slower access time The dilemma facing the designer is clear. The designer would like to use memory technologies that provide for large-capacity memory, both because the capacity is needed and because the cost per bit is low. However, to meet performance requirements, the designer needs to use expensive, relatively lower-capacity memories with short access times. ## Cost and Performance Characteristics The way out of this dilemma is not to rely on a single memory component or technology, but to employ a memory hierarchy. A typical hierarchy is illustrated in Figure 4.6. As one goes down the hierarchy, the following occur: * Decreasing cost per bit * **Increasing** capacity * **Increasing** access time * **Decreasing** frequency of access of the memory by the processor ## 4.4 Performance Modeling Of A Multilevel Memory Hierarchy This section provides an overview of performance characteristics of memory access in a multilevel memory hierarchy. To gain insight, we begin with a look at the simplest case of two levels, and then develop models for multiple levels. ### Two-Level Memory Access In this chapter, reference is made to a cache that acts as a buffer between main memory and processor, creating a two-level internal memory. In the simplest case, rarely implemented in现代 systems, there is a single level of cache to interact with main memory. This two-level architecture exploits locality to provide improved performance over a comparable one-level memory. The main memory cache mechanism is part of the computer architecture, implemented in硬件 and typically invisible to the operating system. There are two other instances of a two-level memory approach that also exploit locality and that are, at least partially, implemented in the operating system: virtual memory and the disk cache. Virtual memory is explored in Chapter 9; disk cache is beyond the scope of this book but is examined in [STAL18]. In this subsection, we look at some of the performance characteristics of two-level memories that are common to all three approaches. ### **OPERATION OF TWO-LEVEL MEMORY** The locality property can be exploited in the formation of a two-level memory. The upper-level memory (M1) is smaller, faster, and more expensive (per bit) than the lower-level memory (M2). M1 is used as a temporary store for part of the contents of the larger M2. When a memory reference is made, an attempt is made to access the item in M1. If this succeeds, then a quick access is made. If not, then a block of memory locations is copied from M2 to M1 and the access then takes place via M1. Because of locality, once a block is brought into M1, there should be a number of accesses to locations in that block, resulting in fast overall service. To express the average time to access an item, we must consider not only the speeds of the two levels of memory, but also the probability that a given reference can be found in M1. We have: $$T_s = H \times T_1 + (1-H) \times (T_1 + T_2)$$ $$= T_1 + (1-H) \times T_2$$ (4.2) where - $T_s$ = average (system) access time - $T_1$ = access time of M1 (e.g., cache) - $T_2$ = access time of M2 (e.g., main memory) - H = hit ratio (fraction of time reference is found in M1) Figure 4.8 shows average access time as a function of hit ratio. As can be seen, for a high percentage of hits, the average total access time is much closer to that of M1 than M2. ### Performance Let us look at some of the parameters relevant to an assessment of a two-level memory mechanism. First consider cost. We have: $$C_s = C_1S_1 + C_2S_2$$ where - $C_s$ = average cost per bit for the combined two-level memory - $C_1$ = average cost per bit of upper-level memory M1 - $C_2$ = average cost per bit of lower-level memory M2 - $S_1$ = size of M1 - $S_2$ = size of M2 $$C_s = {C_1S_1+C_2S_2 \over S_1+S_2}$$ (4.3) We would like $C_s$ ≈ $C_2$. Given that $C_1$ » $C_2$, this requires $S_1$ < $S_2$. Figure 4.11 shows the relationship. Next, consider access time. For a two-level memory to provide a significant performance improvement, we need to have $T_s$ approximately equal to $T_1$ ($T_s$ ≈ $T_1$). Given that $T_1$ is much less than $T_2$ ($T_1$ << $T_2$), a hit ratio of close to 1 is needed. So we would like M1 to be small to help hold down cost, and large to improve the hit ratio and therefore the performance. Is there a size of M1 that satisfies both requirements to a reasonable extent? We can answer this question with a series of subquestions: - What value of hit ratio is needed so that $T_s$ ≈ $T_1$? - What size of M1 will assure the needed hit ratio? - Does this size satisfy the cost requirement? To get at this, consider the quantity $T_1 / T_s$, which is referred to as the access efficiency. It is a measure of how close average access time ($T_s$) is to M1 access time ($T_1$). From Equation (4.2), $${T_1 \over T_s} ={1\over 1 + (1-H){T_2 \over T_1}}$$ (4.4) Figure 4.12 plots $T_1/T_s$ as a function of the hit ratio H, with the quantity $T_2/T_1$ as a parameter. Typically, on-chip cache access time is about 25 to 50 times faster than main memory access time (i.e., $T_2/T_1$ is 25 to 50), off-chip cache access time is about 5 to 15 times faster than main memory access time (i.e., $T_2/T_1$ is 5 to 15), and main memory access time is about 1000 times faster than disk access time ($T_2/T_1$ = 1000). Thus, a hit ratio in the range of near 0.9 would seem to be needed to satisfy the performance requirement. We can now phrase the question about相對 memory size more exactly. Is a hit ratio of, say, 0.8 or better reasonable for $S_1$ << $S_2$? This will depend on a number of factors, including the nature of the software being executed and the details of the design of the two-level memory. The main determinant is, of course, the degree of locality. Figure 4.13 suggests the effect that locality has on the hit ratio. Clearly, if M1 is the same size as M2, then the hit ratio will be 1.0: All of the items in M2 are always also stored in M1. Now suppose that there is no locality; that is, references are completely random. In that case, the hit ratio should be a strictly linear function of the relative memory size. For example, if M1 is half the size of M2, then at any time half of the items from M2 are also in M1 and the hit ratio will be 0.5. In practice, however, there is some degree of locality in the references. The effects of moderate and strong locality are indicated in the figure. Note that Figure 4.13 is not derived from any specific data or model; the figure suggests the type of performance that is seen with various degrees of locality. So if there is strong locality, it is possible to achieve high values of hit ratio even with relatively small upper-level memory size. For example, numerous studies have shown that rather small cache sizes will yield a hit ratio above 0.75 regardless of the size of main memory (e.g., [AGAR89], [PRZY88], [STRE83], and [SMIT82]). A cache in the range of 1K to 128K words is generally adequate, whereas main memory is now typically in the gigabyte range. When we consider virtual memory and disk cache, we will cite other studies that confirm the same phenomenon, namely that a relatively small M1 yields a high value of hit ratio because of locality. This brings us to the last question listed earlier: Does the relative size of the two memories satisfy the cost requirement? The answer is clearly yes. If we need only a relatively small upper-level memory to achieve good performance, then the average cost per bit of the two levels of memory will approach that of the cheaper lower-level memory. ### Multilevel Memory Access² I would like to thank Professor Roger Kieckhafer of Michigan Technological University for permission to use his lecture notes in developing this section. This subsection develops a model for memory access performance in a memory hierarchy that has more than two levels. The following terminology is used: - $M_i$ = Memory level (i) where 1 ≤ i ≤n, with nlevels of memory. - $S_i$ = Size, or capacity of level $M_i$. (Bytes) - $t_i$ = Total time needed to access data in level $M_i$ - Is the sum of all times in the path to a hit in level $M_i$ - May be an average ($t_i$) - $h_i$= Hit ratio of level $M_i$ - Conditional probability that the data for a memory access is resident in level $M_i$ given that it is not resident in $M_{i-1}$ - $T_s$= Mean time needed to access data $T_s$ is the performance metric that is of most interest. It measures the average time to access data, regardless of which level of the hierarchy needs to be accessed at the time of the access request. The higher the hit ratio at each level, the lower will be the value of $T_s$. Ideally, we would like $h_1$ to be very close to 1.0, in which case $T_s$ will be very close to $t_1$. Figure 4.14 is a flowchart that provides a simplified memory access model for a memory hierarchy, which we can use to develop a formula for the average access time. It can be described as follows: 1. The processor generates a memory address request. The first step is to determine if the cache block containing that address resides in the L1 cache (memory level $M_1$). The probability of that is $h_1$. If the desired word is present in the cache, that word is delivered to the processor. The average time required for this operation is $t_1$. 2. For subsequent levels i, 2 < i <n, if the addressed word is not found in $M_{i-1}$, then the memory management hardware checks to determine if it is in $M_i$, which occurs with a probability of $h_i$. If the desired word is present in $M_i$, the word is delivered from $M_i$ to the processor, and the appropriate size block containing the word is copied to $M_{i-1}$. The average time for this operation is $t_i - 1$. 3. In the typical memory hierarchy, $M_n$ is a disk used in a virtual memory scheme. In this case, if the word is not found in any of the preceding levels and is found in $M_n$, then the word must be first moved as part of a page into main memory ($M_{n-1}$), from where it can be transferred to the processor. We designate the total time for this operation as $t_n$. Each of the $t_i$ consists of a number of components, including checking for the presence of the required word in level i, accessing the data if it is in level i, and transporting the data to the processor. The total value of $t_i$ must also include the amount of time to check for a hit on all previous levels and experiencing a miss. If we designate the time expended in determining a miss at level i as $t_{mis,i}$, then $t_i$ must include $t_{mis,1}+t_{mis,2}+...+t_{mis,i-1}$. In addition, Figure 4.14 indicates that the process of accessing memory and delivering a word to the processor is performed parallel to the process of copying the appropriate block of data to the preceding level in the hierarchy. If the two operations are performed in sequence, then the extra time involved is added to $t_1$. Looking at Figure 4.14, there are a number of different paths from start to finish. The average time $T_s$ can be expressed as the weighted average of the time of each path: $$T_s = \sum_{all paths} [Probability of taking a path × Duration of that a path]$$ (4.5) $$=\sum_{all paths} [\prod(All probabilities in the path) \times \sum(All times in that path)]$$ $$= \sum_{i=1}^n \prod_{j=0}^{i-1}(1-h_j)h_i \times t_i$$ where $h_0$ is assigned the value 0. For example, consider a simple system consisting of one level of cache ($M_1$), main memory ($M_2$), and secondary memory ($M_3$). Then, $$T_s = h_1 \times t_1 + (1-h_1)h_2 \times t_2+(1-h_1)(1-h_2) \times t_3$$ Note that Equation (4.5) works whether the time delays for a given path are constants or variables. If the time delays are constant, then $t_i$ is a constant equal to the sum of all the time delays (e.g. checking for presence, data access, and delivery to CPU). If one or more of the elements in the total time delay are variable, then $t_i$ is the mean time delay calculated as the sum of the mean time delays of the component delays. To use this model in designing a memory hierarchy, estimates are needed for the $h_i$ and $t_i$. These can be developed either by simulation or by setting up a real system and varying the sizes of the various $M_i$. ## 4.5 Key Terms, Review Questions, and Problems ### Key Terms - Access time - Addressable Unit - Associative memory - Auxiliary memory - Cache memory - Coherence - Data spatial locality - Data temporal locality - Direct Access - Dynamic Instruction - Hit ratio - Horizontal Coherence Inclusion - Instruction cache - Instruction spatial locality - Instruction temporal locality - L1 Cache - L2 Cache - L3 Cache - L4 Cache - Locality - Locality of Reference - Memory hierarchy - Memory cycle time - Multilevel Cache - Multilevel memory - Random access - Secondary memory - Sequential access - Spatial locality - Static instruction - Temporal locality - Transfer Rate - Unit of Transfer - Vertical coherence - Word ### Review Questions 4.1 What are the differences among sequential access, direct access, and random access? 4.2 What is the general relationship among access time, memory cost, and capacity? 4.3 How does the principle of locality relate to the use of múltiple memory levels? 4.4 What is the distinction between spatial locality and temporal locality? 4.5 In general, what are the strategies for exploiting spatial locality and temporal locality? 4.6 How do data locality and instruction locality relate to spatial locality and temporal locality? ### Problems 4.1 Consider these terms: instruction spatial locality, instruction temporal locality, data spatial locality, data temporal locality. Match each of these terms to one of the following definitions: a. Locality is quantified by computing the average distance (in terms of number of operand memory accesses) between two consecutive accesses to the same address, for every unique address in the program. The evaluation is performed in four distinct window sizes, analogous to cache block sizes. b. Locality metric is quantified by computing the average distance (in terms of number of instructions) between two consecutive accesses to the same static instruction, for every unique static instruction in the program that is executed at least twice. c. Locality for operand memory accesses is characterized by the ratio of the locality metric for window sizes mentioned in (a). d. Locality is characterized by the ratio of the locality metric for the window sizes mentioned in (b). 4.2 Consider these two programs: ``` for (i = 1; i < n; i++) { for (i = 1; i < n; i++) { Z[i] = X[i] * Y[i] Z[i] = X[i] * Y[i] Z[i] = Z[i] * Z[i] } } for (i = 1; i < n; i++) { Z[i] = Z[i] * Z[i] } ``` - Program A - program B a. The two programs perform the same function. Describe it. b. Which version performs better, and why? 4.3 Consider the following code: ``` for (i = 0; i < 20; i++) { for (j = 0; j < 10; j++) { a[i] = a[i] * j } } ``` a. Give one example of the spatial locality in the code. b. Give one example of the temporal locality in the code. 4.4 Consider a memory system with the following parameters: - $T_c$ = 100ns - $T_m$ = 1200ns - $C_m$ = 10⁻⁴ $/bit - $C_c$ = 10⁻⁵ $/bit a. What is the cost of 1 MB of main memory? b. What is the cost of 1 MB of main memory using cache memory technology? c. If the effective access time is 10% greater than the cache access time, what is the hit ratio $H$? 4.5 Consider an L1 cache with an access time of 1 ns and a hit ratio of $H$ = 0.95. Suppose that we can change the cache design (size of cache, cache organization) such that we increase $H$ to 0.97, but increase access time to 1.5 ns. What conditions must be met for this change to result in improved performance? Explain why this result makes intuitive sense. 4.6 Consider a single-level cache with an access time of 2.5 ns, a block size of 64 bytes, and a hit ratio of $H$ = 0.95. Main memory uses a block transfer capability that has a first-word (4 bytes) access time of 50 ns and an access time of 5 ns for each word thereafter. a. What is the access time when there is a cache miss? Assume that the cache waits until the line has been fetched from main memory and then re-executes for a hit. b. Suppose that increasing the block size to 128 bytes increases the $H$ to 0.97. Does this reduce the average memory access time? 4.7 A computer has a cache, main memory, and a disk used for virtual memory. If a referenced word is in the cache, 20 ns are required to access it. If it is in main memory but not in the cache, 60 ns are needed to load it into the cache, and then the reference is started again. If the word is not in main memory, 12 ns are required to fetch the word from the disk, followed by 60 ns to copy it to the cache, and then the reference is started again. The cache hit ratio is 0.9 and the main memory hit ratio is 0.6. What is the average time in nanoseconds required to access a referenced word on this system? 4.8 On the Motorola 68020 microprocessor, a cache access takes two clock cycles. Data access from main memory over the bus to the processor takes three clock cycles in the case of no wait state insertion; the data are delivered to the processor in parallel with delivery to the cache. a. Calculate the effective length of a memory cycle given a hit ratio of 0.9 and a clocking rate of 16.67 MHz. b. Repeat the calculations assuming insertion of two wait states of one cycle each per memory cycle. What conclusion can you draw from the results? 4.9 Assume a processor having a memory cycle time of 300 ns and an instruction processing rate of 1 MIPS. On average, each instruction requires one bus memory cycle for instruction fetch and one for the operand it involves. a. Calculate the utilization of the bus by the processor. b. Suppose that the processor is equipped with an instruction cache and the associated hit ratio is 0.5. Determine the impact on bus utilization. 4.10 The performance of a single-level cache system for a read operation can be characterized by the following equation: $$T_a =T_c+ (1-H)T_m$$ where $T_a$ is the average access time, $T_c$ is the cache access time, $T_m$ is the memory access time (memory to processor register), and $H$ is the hit ratio. For simplicity, we assume that the word in question is loaded into the cache in parallel with the load to processor register. This is the same form as Equation (4.2). a. Define $T_w$ = time to transfer a block between cache and main

Use Quizgecko on...
Browser
Browser