Computer Architecture - Pipelining, ILP, Cache, Branch Prediction, and Parallel Computing - PDF

# Computer Architecture ## Pipelining - Pipelining is a technique used in processors to increase instruction throughput by overlapping the execution of multiple instructions. - It divides instruction execution into stages, with each stage handling a part of the instruction cycle (e.g., fetch, decode, execute, memory access, write-back). - **Stages in a Typical 5-Stage Pipeline:** - **Instruction Fetch (IF):** Retrieve the instruction from memory. - **Instruction Decode (ID):** Decode the instruction and fetch operands. - **Execution (EX):** Perform arithmetic or logical operations. - **Memory Access (MEM):** Access data from memory if needed. - **Write-Back (WB):** Write the result back to the register file. - **Advantages:** - Increases throughput without increasing clock speed. - Efficient use of resources. - **Hazards in Pipelining:** - **Structural Hazard:** Arises when hardware resources are insufficient to execute all pipeline stages simultaneously. - **Data Hazard:** Occurs when instructions depend on the results of previous instructions. - **Control Hazard:** Results from branch instructions and changes in the flow of execution. - **Solution for Hazards:** - **Data Hazards:** Forwarding, stalls, or reordering instructions. - **Control Hazards:** Branch prediction, delayed branching, or pipeline flushing. ## ILP (Instruction-Level Parallelism) & Superscalar - **Instruction-Level Parallelism (ILP):** - ILP is the extent to which multiple instructions can be executed simultaneously. - Exploits parallelism in program code to improve performance. - **Key Techniques:** - **Pipelining:** Overlapping instruction execution. - **Superscalar Execution:** Multiple functional units execute instructions in parallel. - **Out-of-Order Execution:** Reordering instructions to maximize resource utilization. - **Speculative Execution:** Executing instructions before their conditions are resolved. - **Superscalar Processors:** - Definition: Processors that execute more than one instruction per clock cycle using multiple pipelines or functional units. - Key Features: - Parallel instruction dispatch. - Dynamic instruction scheduling. - Dependency checking to avoid hazards. - **Limitations of ILP:** - Dependencies between instructions. - Branch instructions (control hazards). - Resource constraints. ## Cache Optimization - Cache is a small, fast memory located close to the processor to reduce access time for frequently used data. - **Key Cache Optimizations:** - **Cache Hierarchies:** L1, L2, L3 caches reduce latency progressively. - **Associativity:** Direct-mapped, set-associative, and fully-associative caches balance speed and hit rate. - **Write Policies:** Write-through (write immediately to memory) vs. write-back (write to memory only on eviction). - **Prefetching:** Fetch data before it is requested based on access patterns. - **Block Size:** Larger blocks improve spatial locality but may cause higher miss penalties. - **Replacement Policies:** Least Recently Used (LRU), Random Replacement, or FIFO policies manage eviction of cache blocks. - **Cache Miss Types:** - **Compulsory Miss:** First-time data access. - **Conflict Miss:** Cache block conflicts in direct-mapped or set-associative caches. - **Capacity Miss:** Cache is too small to hold working data set. ## Branch Prediction - Branch prediction reduces control hazards by guessing the outcome of branch instructions (e.g., if-else, loops) before they are resolved. - **Techniques:** - **Static Prediction:** Always predict "taken" or "not taken." - **Dynamic Prediction:** Uses historical data to predict outcomes. - **Examples:** - **1-bit Predictor:** Maintains a single bit for the last outcome. - **2-bit Predictor:** Reduces prediction errors by requiring two mispredictions to change the guess. - **Branch Target Buffer (BTB):** Stores the addresses of recently predicted branches. - **Global History Register (GHR):** Tracks recent branch outcomes to identify patterns. - **Speculative Execution:** Instructions are executed based on branch prediction. If the prediction is wrong, the pipeline is flushed, and execution starts at the correct branch. ## Parallel Computing - A computational model where multiple processes or threads execute simultaneously to solve problems faster. - **Types of Parallelism:** - **Data Parallelism:** Same operation performed on multiple data elements. - **Task Parallelism:** Different tasks executed concurrently on separate processors. - **Models of Parallelism:** - **Shared Memory:** All processors share a single address space. Requires synchronization to avoid data races. - **Distributed Memory:** Each processor has its memory, communicating via message passing. Used in clusters or supercomputers. - **Challenges:** - Load balancing. - Data synchronization. - Communication overhead. ## Multi-Core Architecture - Multi-core processors integrate two or more independent processing units (cores) on a single chip to improve performance and energy efficiency. - **Key Features:** - **Shared Resources:** Cores may share caches, memory controllers, or buses. - **Core Communication:** Shared-memory or message-passing techniques. - **Power Efficiency:** Lower clock speeds for cores reduce heat generation. - **Applications:** - Multithreaded applications. - Virtualization. ## True/False Questions About Cache - **Doubling the line size halves the number of tags in the cache. (True)** - Since capacity (the amount of data that can be stored in the cache) and associativity are fixed, doubling the line size would halve the number of lines in the cache. Since there is one tag per line, halving the number of lines would halve the number of tags. - **Doubling the associativity doubles the number of tags in the cache. (False)** - Since capacity and line size are fixed, doubling the associativity will not change the number of lines in the cache. Therefore the number of tags in the cache remains the same. - **Doubling cache capacity of a direct-mapped cache usually reduces conflict misses. (True)** - Assuming that line size remains the same, doubling the cache capacity would double the number of lines, reducing the probability of a conflict miss. - **Doubling cache capacity of a direct-mapped cache usually reduces compulsory misses. (False)** - Since line size remains the same, the amount of data loaded into the cache on each miss remains the same. Therefore the number of compulsory misses remain the same. - **Doubling the line size usually reduces compulsory misses. (True)** - Doubling the line size means that more data is loaded into the cache on each miss. This reduces the number of compulsory misses because the likelihood of accessing the same data in multiple misses is reduced. ## Miscellaneous Questions - **A pipelined datapath must have separate instruction and data memories because the format of instructions is different from the format of data. (False)** - It is because the instruction and data memories should be accessed in parallel during the same cycle. Otherwise, there will be a structural hazard. - **Allowing ALU instructions to write back their result in the 4th stage rather than the 5th stage, improves the performance of a MIPS 5-stage pipeline. (False)** - It will cause a structural hazard. Pipeline performance depends on the rate of instruction completion, not on the latency of individual instructions. - **In the MIPS 5-stage pipeline, some but not all RAW data hazards can be eliminated by forwarding. (True)** - The Load instruction has a delay that cannot be eliminated by forwarding. - **Write-After-Read (WAR) hazard can occur in a processor pipeline when: (a)** - Some instruction writes to a data word that has been loaded into the cache before the next instruction can read this data word from the cache. - **When comparing a set-associative with a directly-mapped cache with the same capacity (data size), the set-associative cache decreases the cache miss rate but increases the hit time. (True)** - The set-associative cache reduces the cache miss rate because it provides more flexibility about block placement and less conflict misses. The hit time is increased because a multiplexer is used to select a block within the set. - **Each block in write-through cache has a Modified bit to indicate whether the block is modified or not. (False)** - The modified bit is used by Write-Back caches only to indicate that the block is modified.

Computer Architecture - Pipelining, ILP, Cache, Branch Prediction, and Parallel Computing - PDF

Document Details

Tags

Related

Summary

Full Transcript