TTTK 1153 Computer Organization & Architecture Lecture Notes (PDF)
Document Details
Uploaded by RetractableRocket
Dr. Faizan Qamar
Tags
Related
- Single-Cycle Datapath Performance PDF
- Computer Organization and Architecture Lecture 2: Data Representation - Fall 2024 - Mansoura University PDF
- Building a Datapath (RISC-V) PDF
- RISC-V Processor Implementation PDF
- Unit - III: Single Bus Organization of the Datapath in a Processor PDF
- Little Computer 3b Datapath PDF
Summary
These notes cover computer organization and architecture, focusing on data paths, control design, and pipelining. They include diagrams and examples related to hardware components and the execution of instructions.
Full Transcript
TTTK 1153 Computer Organization & Architecture Topic 6: Data Path, Control Design, and Pipelining Lecturer: Dr. Faizan Qamar 1 Designed by Ts. Dr. Mohd Nor Akmal Khalid COURSE JOURNEY ▰ Topic 1: Introduction to Computer Systems ▰ Topic 2: C...
TTTK 1153 Computer Organization & Architecture Topic 6: Data Path, Control Design, and Pipelining Lecturer: Dr. Faizan Qamar 1 Designed by Ts. Dr. Mohd Nor Akmal Khalid COURSE JOURNEY ▰ Topic 1: Introduction to Computer Systems ▰ Topic 2: Central Processing Unit ▰ Topic 3: Memory & Storage Systems ▰ Topic 4: Input/output Systems & Interconnection ▰ Topic 5: Computer Arithmetic & Instruction Set Architecture ▰ Topic 6: Data Path, Control Design, and Pipelining ▰ Topic 7: Parallel Architectures and Multicore Processors ▰ Topic 8: Advanced Memory Systems 2 ▰ Topic 9: Assembly Language Introduction to Data Path Collection of hardware components responsible for data processing and transfer (or data flow). Registers ALU Memory Control Unit Includes elements like registers, buses, and arithmetic logic units (ALUs). Operates under the direction of the control unit. Executes instructions by performing calculations and data manipulations. 3 Instructions and data flow instructions and data program counter increments by 4 main instructions 32 bits wide registers memory and data 32 in number functional units implement instructions Main Memory: This block stores both instructions and data that need to be accessed by the processor for execution. Program Counter (PC): The program counter keeps track of the next instruction to be executed. In this illustration, it is shown incrementing by 4, which typically corresponds to word-aligned addressing in systems where each instruction occupies 4 bytes. Registers: These are high-speed storage elements within the CPU. The illustration indicates a total of 32 registers, each 32 bits wide. Functional Units: These units perform the actual implementation of instructions, such as arithmetic and logic operations. 4 Execution of Instructions The data path is responsible for executing the operations specified by the instructions in a program Calculations It enabled performing several tasks between the Data manipulations processor and memory: Controls the flow of data Data Path Responsibilities: Executes operations specified by program instructions. Tasks Between Processor and Memory: Calculations: Performs arithmetic and logical computations. Data Manipulations: Handles data transformation and movement. Control of Data Flow: Manages how data moves between different components. 5 Control Unit Direction The control unit provides control signals that guide the data path components on what operations to perform It ensures that each instruction is executed correctly by coordinating the timing and control of data movements and operations. 6 Control Design – Roles of Control Unit Instruction Interpretation: The control unit decodes fetched instructions and produces the necessary control signals to guide the data path. 1. Sequencing of Operations: Manages the order of operations to ensure correct execution. 2. Data Flow Management: Directs data between various components. 3. Synchronization: Coordinates the timing and stages of execution. Pipeline Management: In pipelined architectures, the control unit handles hazards and maintains smooth instruction flow through different pipeline stages. 7 Pipelined Architectures The processor is divided into Pipelining is a technique used stages (instruction fetch, to increase CPU instruction decode, execute, memory throughput by overlapping the access, write-back), and each execution of multiple stage processes a different instructions. instruction simultaneously. Each stage of the pipeline In pipelined processors, the performs a part of the data path is organized into instruction execution (fetch, stages that allow multiple decode, execute, etc.), instructions to be processed increasing overall processing simultaneously speed 8 Components of Data Path in Pipelining We already know that pipelining involves breaking up instructions into five stages: Instruction Instruction Memory Write Back Execute (EX) Fetch (IF) Decode (ID) Access (MEM) (WB) 9 Instruction Stages Instruction Fetch (IF) Memory Access (MEM) This is the first stage in the pipeline, where the This stage involves accessing the data processor fetches the instruction from memory if the instruction involves data memory. manipulation or memory read/write. Increment Program Counter value by 4 Data is read from or written to memory, Instruction Decode (ID) depending on whether the instruction is a This stage decodes the instruction and load (read) or store (write).Otherwise do prepares the necessary operands. nothing The instruction is broken down into its Write back (WB) components, such as the operation code (opcode), registers, and immediate value. This is the final stage where the result of The source operands (data from registers or the operation is written back to the immediate values) are read, and the register file or memory. necessary control signals are generated. The computed result (such as a result from Execute (EX) the ALU or a loaded data value) is written This is where the actual computation or back to the register file or memory, operation takes place. completing the instruction cycle Depending on the type of instruction, the ALU (Arithmetic Logic Unit) performs the required operation, such as addition, subtraction, or logical operations. 10 Latency & Throughput Latency—the time it takes for an individual instruction to execute What’s the latency for this implementation? One instruction takes 5 clock cycles Cycles per Instruction (CPI) = 5 Throughput—the number of instructions that execute per unit time What’s the throughput of this implementation? One instruction is completed every 5 clock cycles Average CPI = 5 11 Why is Pipelining Needed? Underutilization of Functional Units: Non-overlapped Execution: If execution is non-overlapped, each functional unit (like the ALU, registers, etc.) is only used once every few cycles, which results in functional units being underutilized. Problem: Each functional unit is only active once every five cycles, meaning it isn’t being used as efficiently as possible. Effective Instruction Set Architecture (ISA) Design: Solution: If the Instruction Set Architecture (ISA) is carefully designed, the functional units can be organized and arranged so that they can execute in parallel. Benefit: This leads to improved usage of functional units as multiple stages of execution can be worked on concurrently. Pipelining: Solution: Pipelining overlaps the stages of execution, ensuring that every stage of the pipeline has something to do every clock cycle. Benefit: Pipelining improves the throughput by keeping the functional units busy, allowing different stages of multiple instructions to be processed simultaneously. 12 Pipelining Analysis Throughput Improvement: A pipeline with N stages can improve throughput by a factor of N. Conditions for Improvement: Same Time per Stage: Each stage must take the same amount of time. If some stages take longer than others, it would reduce the efficiency of the pipeline. Work for Each Stage: Every stage must always have work to do. If a stage is idle or doesn’t have data to process, the pipeline stalls, leading to inefficiency. Overhead: Implementing the pipeline itself may incur some overhead, meaning the gains in throughput may be reduced by the costs of creating and managing the pipeline. The overhead of managing a pipeline and the possible increase in latency are factors that need to be considered. 13 Pipeline datapath IF (Instruction Fetch): The instruction is fetched from memory using the address in the Program Counter (PC). ID (Instruction Decode): The instruction is decoded, and registers are read to fetch operands. EX (Execute): The ALU performs operations, and address calculation (if needed) occurs. MEM (Memory Access): Memory is accessed for load/store instructions. WB (Write Back): The result is written back to the register file. 14 Pipeline datapath – Example One way to visualize pipelining is to consider the execution of each instruction independently, as if it has the datapath all to itself. The stages are represented by elements in the data path IM for Instruction Memory We can place these Reg for Register File datapaths on a timeline to ALU for Arithmetic Logic Unit see their relationship. DM for Data Memory. Each stage is shaded to show its usage during the The stages are corresponding clock cycle. represented by the Instruction 1 starts at clock cycle 1 and moves through the datapath element being stages in subsequent cycles, while Instruction 2 begins at used, shaded according to use. clock cycle 2, and Instruction 3 starts at clock cycle 3. 15 Pipeline datapath – Example In reality, instructions do not execute in their own individual datapaths. Instead, they share a common datapath. The first instruction (lw $1, 100($0)) uses the instruction This shared use is a key memory (IM) in cycle 1 (CC1). characteristic of pipelined processors, where multiple In cycle 2 (CC2), the second instruction (lw $2, 200($0)) instructions progress through begins executing its IM stage using the same instruction different stages memory. simultaneously, using common resources like the To ensure instructions do not interfere with each other, instruction memory (IM), registers are used to store intermediate data between registers, ALU, and data cycles. memory (DM). For example, the register file stores values like read data and results after each stage, making sure the data flow smoothly between stages of different instructions. 16 Pipeline datapath + registers 1. IF/D - Instruction Fetch/Decode Register 2. ID/EX - Instruction Decode/Execute Register 3. EX/MEM - Execute/Memory Access Register 4. MEM/WB - Memory Access/Write Back Register The image shows registers between the pipeline stages (labeled as IF/D, ID/EX, EX/MEM, MEM/WB). These registers store data that needs to be passed between stages to ensure that each instruction progresses independently and correctly without interfering with others. These registers hold the intermediate results such as register reads, ALU outputs, and memory data, which are required by the next stage. 17 Pipeline Datapath for Load Word Let’s walk through the datapath lw $rt, immed($rs) using the load word instruction as 31-26 25-21 20-16 15-0 an example. opcode rs rt immed Load word (lw) instruction is a 1. Opcode (6 bits) good instruction to start with 2. rs (5 bits) - Source register 3. rt (5 bits) - Target register because it is active in every stage 4. immed (16 bits) - Immediate value or offset of the pipelined datapath. This structure allows the processor to load data from memory using an address computed as: The load word instruction adds address = rs + immed. immed to the contents of $rs to obtain the address in memory whose contents are written to $rt. 18 Pipeline Datapath for Load Word – Instruction Fetch (IF) The instruction is read from memory using the contents of PC and placed in the IF/ID register. The PC address is incremented by 4 and written back to the PC register, as well as placed in the IF/ID register in case the instruction needs it later. The right half of registers or memory are shaded when they are being read. The left half of registers is shaded when they are being written. 19 Pipeline Datapath for Load Word – Instruction Decode (ID) The registers read the data from the register file and stored in the ID/EX pipeline register. The 16-bit immediate field is sign-extended to 32-bits and stored in the ID/EX pipeline register. 20 Pipeline Datapath for Load Word – Execute or Address Calculation (EX) From the ID/EX pipeline register, take the contents of register and the sign- extended immediate field as inputs to the ALU, which performs an add operation. The sum is placed in the EX/MEM pipeline register. 21 Pipeline Datapath for Load Word – Memory Access (MEM) Take the address stored in the EX/MEM pipeline register and use it to access data memory. The data read from memory is stored in the MEM/WB pipeline register. 22 Pipeline Datapath for Load Word – Write Back (WB) Read the data from the MEM/WB register and write it back to the register file in the middle of the datapath. It’s important to note that any information we need will have to be passed from pipeline register to pipeline register while instruction executes. Because the instructions share the elements, we cannot assume anything from a previous cycle is still there. We must carry the data with us as we move along the data path. 23 Pipelining Demonstration Consider the following 5 instruction sequence: lw $10, 20($1) sub $11, $2, $3 add $12, $3, $4 lw $13, 24($1) add $14, $5, $6 We can start by diagramming the individual datapaths used by every instruction. This allows us to see which stage each instruction is executing in a given clock cycle. 24 Pipelining Demonstration (Continued…) 25 Pipelining Demonstration (Continued…) We will start with clock In the following slides, we cycle 1 and highlight the walk through all 9 cycles relevant datapath lines required to fully complete according to the instruction the 5 instructions in our being executed in each sample sequence. stage. 26 Pipelining Demonstration (Continued…) 27 Pipelining Demonstration (Continued…) 28 Pipelining Demonstration (Continued…) 29 Pipelining Demonstration (Continued…) 30 Pipelining Demonstration (Continued…) 31 Pipelining Demonstration (Continued…) 32 Pipelining Demonstration (Continued…) 33 Pipelining Demonstration (Continued…) 34 Pipelining Demonstration (Continued…) 35 Designing for Speed in CPU Hardware Key Challenges in Pipelining: 1. Pipeline Efficiency: Keeping the pipeline full is crucial for optimal performance. Empty pipeline stages lead to wasted cycles and lower throughput. 2. Hidden Optimizations: Many optimizations are managed by the assembler or compiler. 3. Impact on Programmers: Programmers specify high-level logic, while tools automatically optimize for speed. 36 Designing for Speed in CPU Hardware Solutions to Maintain Pipeline Performance: Use of instruction reordering and pipeline-friendly design strategies. 1. Rearranging instructions to ensure smooth flow without pipeline stalls. 2. While programmers write code for functionality, the tools (assembler and compiler) optimize the instruction sequence to maximize pipeline efficiency and CPU performance. 37 Pipelining Hazards Pipelining hazards are situations in a pipelined processor where the normal flow of instruction execution is disrupted, leading to delays or conflicts. There are three main types of pipelining hazards: 1. Structural hazards Instructions in different stages need the same resource (e.g., memory) 2. Control hazards Data not available to make branch decision 3. Data hazards Data not available to perform next operation 38 1. Structural Hazard Cause: Occur when two or more instructions require the same hardware resource at the same time. Example: If one instruction needs to access memory for a load operation while another instruction is trying to fetch an instruction from memory, both might conflict if they share the same memory unit. Solution: This can be solved by adding more resources (e.g., separating instruction and data caches) or introducing hardware changes. 39 2. Control Hazard Cause: Occur due to branch instructions, where the branch changes program flow unexpectedly and pipeline doesn't know which instruction to fetch next. Example: In conditional branch instructions, if the decision to jump or not is made late in the pipeline, it can delay subsequent instructions, leading to incorrect fetching. Solution: This can be managed by branch prediction (predicting the outcome of branches to keep the pipeline filled) or delayed branching. 40 3. Data Hazard Cause: Occur when one instruction depends on the result of a previous instruction that has not yet completed its execution. Types of Data Hazards: i. Read-after-write (RAW): A subsequent instruction needs data that is not yet written back by a previous instruction. ii. Write-after-write (WAW): Two instructions write to the same register, potentially in the wrong order. iii. Write-after-read (WAR): A write occurs to a register before a previous instruction reads from it, potentially causing incorrect data to be read. Solution: 1. Forwarding (passing the result directly to the next instruction) or 2. Stalls (pausing execution to ensure data is ready) can mitigate data hazards. 41 Example 1 of Data Hazard First Instruction: add $s0, $s1, $s2 This instruction writes the result to $s0 in the Write Back (WB) stage. Second Instruction: add $s4, $s3, $s0 This instruction reads the value of $s0 in the Instruction Decode (ID) stage. Since the second instruction reads from $s0, which is written by the first instruction, there is a data hazard because $s0 is not available until the first instruction reaches the WB stage. When an instruction depends on the results of a previous instruction still in the pipeline, this is a DATA DEPENDENCY 42 Solution for Data Dependency A situation where one instruction depends on the result of a previous instruction still in the pipeline. SOLUTIONS: 1. Pipeline Stall: Introducing a delay (stall) in the pipeline until the required data is available. 2. Data Forwarding (Bypassing): Immediate transfer of data between pipeline stages to avoid waiting for the instruction to complete. 43 1. Pipeline Stall A stall temporarily holds or pauses the pipeline at a particular stage (e.g., in the Instruction Decode (ID) or Execute (EX) stage) until the required data is available or the branch decision is resolved. The processor does not move to the next stage in the pipeline, effectively creating an empty "bubble" in the pipeline. This delay ensures that the instruction gets the correct data or can make the right decision before moving forward, but it also reduces the overall throughput of the CPU. 44 2. Data Forwarding (Bypassing) Forwarding (or bypassing) is a technique used to resolve data hazards without stalling the pipeline. When an instruction needs data that is being produced by another instruction, forwarding allows the required data to be passed directly from one pipeline stage to another without waiting for the instruction to complete all its stages. Forwarding allows the result from the Execute (EX) stage of the first instruction (which produces the value for $s0) to be immediately forwarded to the Execute (EX) stage of the second instruction, bypassing the need to wait for the first instruction to finish the Write Back (WB) stage. 45 Example 2 of Data Hazard In the given example, the first instruction is a load word (lw $s0, 0($s2)), which loads data into $s0. Problem: The value of $s0 isn't known until the instruction reaches the MEM (Memory Access) stage, which happens after the EX (Execute) and ID (Instruction Decode) stages. The second instruction depends on data from the first instruction that isn't available until after the MEM stage of the first instruction. Solution: Forwarding doesn't work in this case because the data is not available early enough in the pipeline. Either stall the pipeline or reorder the instructions to resolve the data hazard. 46 Advanced Pipelining Techniques Stall for lw hazard Reorder for lw hazard We can stall for one cycle, Try to execute an unrelated but we hate to stall instruction between the two instructions 47 Pipelining in Real-World 48 THANK YOU! Next Lecture: Parallel Architectures and Multicore 49