Podcast
Questions and Answers
Which memory level in the memory hierarchy has the fastest access time?
Which memory level in the memory hierarchy has the fastest access time?
- L1 Cache (correct)
- Main Memory
- Local Secondary Storage
- L3 Cache
Remote secondary storage, like web servers, is generally faster and more expensive per byte than main memory.
Remote secondary storage, like web servers, is generally faster and more expensive per byte than main memory.
False (B)
In a cache system, what is the purpose of the 'valid' bit?
In a cache system, what is the purpose of the 'valid' bit?
To indicate whether the data in the cache line is valid or not.
In a direct-mapped cache, if the tag does not match, the old line is ________.
In a direct-mapped cache, if the tag does not match, the old line is ________.
Match the cache types with their descriptions:
Match the cache types with their descriptions:
Which type of cache miss occurs when the cache is initially empty?
Which type of cache miss occurs when the cache is initially empty?
In a write-through cache, data is only written to the cache and not to the main memory.
In a write-through cache, data is only written to the cache and not to the main memory.
What is 'thrashing' in the context of cache memory?
What is 'thrashing' in the context of cache memory?
The number of sets in a cache is determined by the ________ bits in the memory address.
The number of sets in a cache is determined by the ________ bits in the memory address.
Match the terms related to cache memory with their definitions:
Match the terms related to cache memory with their definitions:
What is the primary advantage of using a set-associative cache over a direct-mapped cache?
What is the primary advantage of using a set-associative cache over a direct-mapped cache?
Spatial locality refers to the tendency of a program to access the same memory location repeatedly over a short period.
Spatial locality refers to the tendency of a program to access the same memory location repeatedly over a short period.
Briefly explain the difference between a 'write-through' and a 'write-back' cache.
Briefly explain the difference between a 'write-through' and a 'write-back' cache.
In the context of cache writes, ________ means to defer writing to the next level until the line is evicted.
In the context of cache writes, ________ means to defer writing to the next level until the line is evicted.
Match the cache write policies with their descriptions:
Match the cache write policies with their descriptions:
What is the purpose of the Least Recently Used (LRU) replacement policy in a cache?
What is the purpose of the Least Recently Used (LRU) replacement policy in a cache?
Increasing the associativity of a cache always reduces the miss rate.
Increasing the associativity of a cache always reduces the miss rate.
Explain how conflict misses can occur even when a cache is not full. Provide an example.
Explain how conflict misses can occur even when a cache is not full. Provide an example.
In a direct-mapped cache with a block size of 8 bytes, the least significant 3 bits of the address are used as the ________.
In a direct-mapped cache with a block size of 8 bytes, the least significant 3 bits of the address are used as the ________.
Match the following cache concepts with their impact on cache performance:
Match the following cache concepts with their impact on cache performance:
A program exhibits poor spatial locality. Which of the following cache optimizations would be MOST effective in reducing the miss rate?
A program exhibits poor spatial locality. Which of the following cache optimizations would be MOST effective in reducing the miss rate?
A 'dirty bit' in a cache line is used to indicate that the data in the cache line is inconsistent with main memory and needs to be written back.
A 'dirty bit' in a cache line is used to indicate that the data in the cache line is inconsistent with main memory and needs to be written back.
Explain the concept of cache-friendly code and provide one specific example of how to write code that improves cache performance.
Explain the concept of cache-friendly code and provide one specific example of how to write code that improves cache performance.
When the set of active cache blocks is larger than the cache itself, a ________ miss occurs.
When the set of active cache blocks is larger than the cache itself, a ________ miss occurs.
Match the cache hierarchy levels with their typical characteristics:
Match the cache hierarchy levels with their typical characteristics:
In a multi-core processor, what is the main advantage of having a shared L3 cache?
In a multi-core processor, what is the main advantage of having a shared L3 cache?
Write-allocate is beneficial when there are few writes to a particular memory location.
Write-allocate is beneficial when there are few writes to a particular memory location.
What is the significance of 'stride-1 reference patterns' in the context of cache performance?
What is the significance of 'stride-1 reference patterns' in the context of cache performance?
In E-way set associative caches, an associative memory is an array of ________ and ________ pairs
In E-way set associative caches, an associative memory is an array of ________ and ________ pairs
Match the cache access outcomes with their corresponding actions:
Match the cache access outcomes with their corresponding actions:
Consider the following code snippet:
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
a[j][i] += 1;
}
}
Assuming a
is a large 2D array, what can be done to improve the cache performance of this code?
Consider the following code snippet:
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
a[j][i] += 1;
}
}
Assuming a
is a large 2D array, what can be done to improve the cache performance of this code?
In the memory hierarchy, CPU registers hold words retrieved from the L2 cache.
In the memory hierarchy, CPU registers hold words retrieved from the L2 cache.
What is the purpose of cache simulation?
What is the purpose of cache simulation?
In a direct-mapped cache, the number of lines per set is always _____.
In a direct-mapped cache, the number of lines per set is always _____.
Match the following cache terminology with its corresponding description:
Match the following cache terminology with its corresponding description:
Consider a direct-mapped cache with 4 sets and a block size of 2 bytes. Given the following memory access sequence: 0, 1, 7, 8. Which accesses result in a cache miss?
Consider a direct-mapped cache with 4 sets and a block size of 2 bytes. Given the following memory access sequence: 0, 1, 7, 8. Which accesses result in a cache miss?
Temporal locality in cache memory refers to accessing data elements within close physical proximity in memory.
Temporal locality in cache memory refers to accessing data elements within close physical proximity in memory.
Why is it important to choose the correct cache block size?
Why is it important to choose the correct cache block size?
To improve the efficiency of matrix multiplication, loops should be arranged to enhance data ________.
To improve the efficiency of matrix multiplication, loops should be arranged to enhance data ________.
Match the cache types with their eviction policies:
Match the cache types with their eviction policies:
Assuming an 8 bit memory address, what is the set index for address 0001 1100 if there are 2 sets and a block size of 16 bytes?
Assuming an 8 bit memory address, what is the set index for address 0001 1100 if there are 2 sets and a block size of 16 bytes?
A cache that has 2 blocks and 4 sets is how many way set associative?
A cache that has 2 blocks and 4 sets is how many way set associative?
Flashcards
Memory Hierarchy
Memory Hierarchy
A hierarchical system that utilizes multiple levels of memory with varying speeds and costs to optimize performance.
CPU Registers
CPU Registers
Small, fast storage locations within the CPU used to hold data and instructions for immediate use.
Cache Memory
Cache Memory
A small, fast memory component that stores frequently accessed data to reduce the average time to access memory.
Main Memory (DRAM)
Main Memory (DRAM)
Signup and view all the flashcards
Local Secondary Storage
Local Secondary Storage
Signup and view all the flashcards
Remote Secondary Storage
Remote Secondary Storage
Signup and view all the flashcards
Direct-Mapped Cache
Direct-Mapped Cache
Signup and view all the flashcards
Lines per Set (E)
Lines per Set (E)
Signup and view all the flashcards
Cache Block Size (B)
Cache Block Size (B)
Signup and view all the flashcards
Valid Bit
Valid Bit
Signup and view all the flashcards
Tag (Cache)
Tag (Cache)
Signup and view all the flashcards
Set Index
Set Index
Signup and view all the flashcards
Block Offset
Block Offset
Signup and view all the flashcards
Cold (Compulsory) Miss
Cold (Compulsory) Miss
Signup and view all the flashcards
Conflict Miss
Conflict Miss
Signup and view all the flashcards
Capacity Miss
Capacity Miss
Signup and view all the flashcards
Thrashing
Thrashing
Signup and view all the flashcards
Set-Associative Cache
Set-Associative Cache
Signup and view all the flashcards
Replacement Policy
Replacement Policy
Signup and view all the flashcards
LRU (Least Recently Used)
LRU (Least Recently Used)
Signup and view all the flashcards
Write-Through
Write-Through
Signup and view all the flashcards
Write-Back
Write-Back
Signup and view all the flashcards
Write-Allocate
Write-Allocate
Signup and view all the flashcards
No-Write-Allocate
No-Write-Allocate
Signup and view all the flashcards
Temporal Locality
Temporal Locality
Signup and view all the flashcards
Spatial Locality
Spatial Locality
Signup and view all the flashcards
Study Notes
- Systems Programming: Cache II presented on November 18, 2024.
Caching & The Memory Hierarchy
- Memory hierarchy consists of multiple levels of storage.
- Registers sit at L0, followed by L1 cache (SRAM), L2 cache (SRAM), L3 cache (SRAM), main memory (DRAM), local secondary storage (local disks), and remote secondary storage (Web servers).
- As the levels go up: memory becomes smaller, faster, and more costly per byte.
- As the levels go down: memory becomes larger, slower, and cheaper per byte.
- CPU registers hold words retrieved from the L1 cache.
- L1 cache holds cache lines retrieved from the L2 cache.
- L2 cache holds cache lines retrieved from the L3 cache.
- L3 cache holds cache lines retrieved from main memory.
- Main memory holds disk blocks retrieved from local disks.
- Local disks hold files retrieved from disks on remote servers.
General Cache Organization: (S, E, B)
- S represents the number of sets in the cache, where S = 2s.
- E represents the number of lines per set, and a line contains a set and a line.
- B represents number of bytes per cache block
- Cache size (C) is calculated as S x E x B data byte.
- Each cache line includes a valid bit and a tag.
Cache Read Operation
- The process involves locating the appropriate set, checking for a matching tag within any line of that set.
- Hit occurs when a matching tag is found and the line is valid.
- Data retrieval starts at a specific offset within the cache line.
- Address is divided into tag bits (t bits), set index bits (s bits), and block offset bits (b bits).
Direct-Mapped Cache (E = 1)
- Direct-mapped cache has only one line per set.
- Cache block size is assumed to be 8 bytes.
- To find a desired set, the address of the int is sectioned into t-bits, and then 0...01 (100)
Direct-Mapped Cache Operation
- If the tag matches and the line is valid, it's a hit.
- The data is retrieved using the block offset.
- If the tag doesn't match, the old line is evicted and replaced.
Direct-Mapped Cache Simulation Example
- A direct-mapped cache with (S, E, B, m) = (4, 1, 2, 4) is considered.
- "Thrashing" happens with repeated eviction and replacement of the contents of a particular set in the cache.
- Example: an access pattern like addresses 0, 8, 0, 8, 0, and 8 cause this repeated eviction.
Types of Cache Misses
- Cold (compulsory) miss: This occurs when the cache is empty.
- Conflict miss: Most caches limit blocks at level k+1 to a small subset of the block positions at level k.
- It can occur when multiple data objects map to the same level k block.
- Capacity miss: The set of active cache blocks (working set) is larger than the cache.
Conflict Misses in Direct-Mapped Caches
- Conflict misses are common in real programs and can cause performance issues.
- They often occur in direct-mapped caches when programs access arrays whose sizes are powers of 2.
- Example: In a function dotprod(float x[8], float y[8]), floats are 4 bytes, a block holds 16 bytes, and the cache has 2 sets (total cache size: 32 bytes).
- If x starts at address 0 and y starts after x at address 32:x and y map to identical cache sets, can cause conflict misses.
Programmatic Solutions to Conflict Misses
- Pad array sizes away from powers of 2 to prevent direct mapping conflicts.
- Use more lines per set to increase associativity and reduce conflicts.
E-Way Set Associative Cache
- An E-Way Set Associative Cache has E = 2 which means two lines per set, with block size of 8 bytes.
- Associative memory is an array of (key, value) pairs.
- Keys are a concatenation of Tag & Valid bits.
E-Way Set Associative Cache: Read Operation
- E = 2 implies two lines per set.
- Block size is assumed to be 8 bytes.
- Compare both of them, and if valid? +match: yes means hit.
E-Way Set Associative Cache: No Match
- No Match means One line in set is selected for eviction & replacement.
- Replacement policy required, either random or LRU (least recently used)
2-Way Set Associative Cache: Simulation
- A direct-mapped cache with (S, E, B, m) = (4, 2, 2, 4)
- M = 16 bytes, m = 4 bits; B = 2 bytes/block; S = 2 sets
- E = 2 blocks/set
Intel Core i7 Cache Hierarchy
- L1 i-cache and d-cache: 32 KB, 8-way, Access: 4 cycles
- L2 unified cache: 256 KB, 8-way, Access: 10 cycles.
- L3 unified cache: 8 MB, 16-way, Access: 40-75 cycles
- Block size: 64 bytes for all caches.
Writing Cache Friendly Code
- Make the common case go fast by focusing on the inner loop.
- Minimize the cache misses in the inner loop.
- Repeated references to variables creates good temporal locality.
- Stride-1 reference patterns creates good spatial locality.
- Key Idea: Qualitative notion of locality is quantified through our understanding of the cache.
Cache Write Strategies
- Strategies to handle multiple copies of data that exist must exist to cover L1, L2, possibly L3, and main memory.
- Write-through: Write immediately to next level.
- Write-back: Defer write to the next level until line is evicted.
- Must track which cache lines are modified (dirty bit).
- What to do on a write miss?
- Write-allocate: Load into cache, update line in cache.
- Works if more writes to location follow.
- No-write-allocate: Writes straight to memory as such data is not loaded into cache.
- Write-allocate: Load into cache, update line in cache.
- Default write function:
- Write-through + No-write-allocate
- Write-back + Write allocate.
Rearranging Loops for Locality
- Matrix multiplication is the canonical example:
-Multiply N × N matrices, Matrix elements are doubles.
- O(n³) total operations, N reads per source element.
- N values summed per destination.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.