Lecture 18: Addressing Modes and Pipelining PDF
Document Details

Uploaded by LucidOpal7220
Harvard University
Tags
Summary
This lecture covers addressing modes in assembly language and instruction-level pipelining. It details different addressing modes and how they help access data. The overview of pipelining highlights its function in enhancing performance by executing instructions concurrently.
Full Transcript
ADDRESSING MODES AND PIPELINING How we access data in some assemblers, and how execution can be optimized. ADDRESSING MODES Addressing modes specify the location of an operand, and necessarily, how the stated operand is therefore interpreted They specify tha...
ADDRESSING MODES AND PIPELINING How we access data in some assemblers, and how execution can be optimized. ADDRESSING MODES Addressing modes specify the location of an operand, and necessarily, how the stated operand is therefore interpreted They specify that an operand is a constant, a register, or a memory location The actual location of an operand is its effective address Certain addressing modes permit us to determine the address of an operand dynamically SEVEN MODES TO RULE THEM ALL… Immediate addressing The data is part of the instruction Direct addressing The address of the data is given in the instruction Register addressing The data is located in the specified register Indirect addressing The address of the address of the data is given in the instruction Register indirect addressing The address of the address of the data is located in the specified register …AND IN THE MEMORY FIND THEM? Indexed addressing Uses a register (implicitly or explicitly) as an offset which is added to the address in the operand to determine the effective address of the data This will be especially useful when we talk about arrays Based addressing Puts the base address in a register (base register) and uses the operand as the offset Like indexed addressing – but different (sorry!) Although these last two addressing modes are effectively identical, the difference is in how the offset addressing is achieved – the end computation is the same STACK ADDRESSING With stack addressing, the operand is assumed to be on the top of the stack – and therefore, will be popped and used There are many variations of stack addressing mode: Indirect indexed Base/offset Self-relative Auto-increment / auto-decrement We won’t cover these in detail (This phrase normally means “so don’t expect too much of anything about it on an exam” – you need to learn how to interpret ProfSpeak) EXAMPLE Assuming the numbers given are decimal numbers, what value will be loaded into the accumulator for each addressing mode, given the instruction shown? EXAMPLE So – did these match the ones you found? Getting the instructions through the system INSTRUCTION-LEVEL PIPELINING PIPELINING, OVERVIEW... Pipelining is a popular production methodology found in factories Instead of one large module that assembles a complete product in one unit, the system is broken down into several smaller modules which work in sequence to produce a product More work is therefore done in parallel PIPELINING, COMPUTER SCIENCE STYLE Some CPUs divide the fetch-decode-execute cycle into smaller steps These smaller steps can often be executed in parallel – this increases overall throughput Such parallel execution is called instruction-level pipelining This term is sometimes abbreviated ILP MODIFYING THE CYCLE Suppose we were to modify the fetch-decode-execute cycle by breaking it down into a series of smaller steps: 1. Fetch instruction 2. Decode opcode 3. Calculate effective addresses of operands 4. Fetch operands 5. Execute instruction 6. Store result Suppose, now we have a six-stage pipeline: S1 fetches an instruction, S2 decodes it, S3 determines the effective addresses of operands... in other words, each of these steps can be considered a stage in the execution pipeline ORGANIZING THE STAGES For every clock cycle, one small step is carried out at each stage – and the stages are overlapped for any given instruction in the pipeline S1: fetch S2: decode opcode S3: calculate effective addresses of operands S4: fetch operands S5: execute instruction S6: store result During clock cycle 1, stage 1 is fetching instruction 1; during clock cycle 2, it’s fetching instruction 2, while during that same cycle, stage 2 is decoding the opcode of instruction 1. THEORETICAL THROUGHPUT IMPROVEMENT We calculate the theoretical performance increase (“speedup”) offered by a pipeline as follows: Let tp be the time per stage. Each instruction represents a discrete task, T, in the pipeline. There shall therefore be n discrete tasks in the set of instructions. The first task (or instruction) requires k × tp time to complete, given a k-stage pipeline. The remaining n-1 tasks will emerge from the pipeline at a rate of one per cycle – therefore, the time to complete the remaining tasks will be (n-1)*tp Thus, to complete n tasks using a k-stage pipeline requires: WITHOUT A PIPELINE? If a system does not have a pipeline, then the time required to complete the same n tasks is computed as follows We assume that tn is the time to process one instruction We also assume the relationship tn = k × tp holds – in other words, the total time to process one instruction is the sum of the times to process all k stages we previously had A CALCULATED IMPROVEMENT If we divide (n × tn) by the time it takes to complete n tasks using a pipeline, we can find the speedup factor S: Now, taking (the k and 1 become irrelevant at scale) we arrive at a theoretical speed factor increase of TOO GOOD TOO BE TRUE? Our rather neat and tidy equations are based on some assumptions: First, we assume necessarily that the architecture supports the fetching of instructions and data in parallel. If it doesn’t, then there will be no advantage to the pipeline. Second, we assume that the pipeline can be kept full at all times – but in practice, this is not guaranteed. Pipeline hazards arise that can cause pipeline conflicts and stalls Can you think of an example where two adjacent instructions might cause a conflict or stall? FAIL! An instruction pipeline may stall, or be flushed for any of the following reasons: 1. Resource conflicts If two instructions are seeking, for example, data in the same physical block, one must wait for the other – if there is no collision resolution, deadlock can occur - STALL 2. Data Dependencies A later operation may be waiting on the completion of a task higher up the pipeline, and cannot advance to the next stage without the results - STALL 3. Conditional Branching The subsequent instructions may end up being skipped due to a conditional branch statement (the assembly language equivalent of an if statement) which will cause the pipeline to be flushed, and a new pipeline process started - FLUSH HOW CAN WE CORRECT THIS? Measures can be taken at the software level, and at the hardware level, to mitigate the effects of these hazards, but they cannot be eliminated Branch prediction algorithms will help “guess” which way a conditional branch is likely to go, and will pipeline the likely target instructions – at the very least, it will be right some of the time Resource conflict issues can also be predictively checked – if the items being fetched are in the same block, but code analysis indicates that they are not interdependent, then some items may be prefetched independently to a cache, and the request which would have caused a collision is rerouted to fetch from cache instead In some advanced systems, data dependency can be predetermined, and if intervening instructions aren’t sequentially dependent, then it may be possible to change their order within the pipeline and execute them first NEXT We shall conclude Chapter 5 with a look at real-world ISA implementations of pipelining We shall then commence Chapter 6, and look at memory systems in more detail