CPU Structure and Function (PDF)

Structure of Computer Systems Course 4 The Central Processing Unit - CPU CPU - Central Processing Unit “Classic (idyllic) view” ⚫ Incorporates 2 of the 5 components of the von Neumann’s classical model: ALU CU – Control Unit ⚫ It is the brain (intelligent part) of a computer ⚫ Fetch (read) instruction, decode/interpret it, read data, execute instruction and store the result ⚫ Do its job in a synchronized and sequential way – “one thing at a time” CPU - Central Processing Unit Today’s view: ⚫ Contains all kind of computer components: Multiple CPUs: ⚫ symmetric, asymmetric, ⚫ multiple cores, ⚫ multiple ALUs, specialized ALUs (e.g. floating point, multimedia – MMX, SSE2) Memory – multiple levels of cache memory (L0, L1, L2, Trace cache) Interfaces and Peripheral devices – (in case of microcontrollers and DSPs) ⚫ Serial channels ⚫ Parallel interfaces, ⚫ Timers, counters ⚫ Converters (ADC, DAC) ⚫ Network interfaces Interrupt system Bus controller(s) and arbiter(s) Memory management units ⚫ Execute instructions in parallel and in a speculative order ⚫ Intelligence may be distributed in memories and interfaces as well Where is that nice idyllic image ? Starting with the beginning … A simple computer ⚫ Attributes: sequential, one (accumulator) register, one memory for instructions and data Legend CG - clock generator CG Clk PhG – phase generator Rst IR_ld PhG IR Addr Memory Data PC – program counter M U Data in IR – instruction register Dec&CC X Acc - accumulator P wr … C Sel Op_sel Rst ALU Control signals Acc_ld Acc_shr Inc PC_ld Acc Acc_shl Acc_clr A simple computer How does it work? ⚫ 4 phases: IF – instruction fetch – read the instruction into IR Dec - Decode the instruction – generate control signals PreEx - Prepare execution – e.g. read the data from memory Exe – Execute – e.g. adding, subtraction A simple computer ⚫ Example 1 – ADD Acc, M[100h] IF : Sel=0 => Address = PC ; IR_ld – impuls => IR = ADD 100 Dec: Sel=1 =>Address = IR_adr ; Inc=1 increment PC PreEx: Op_sel = code_add => ALU is doing an adding Exe: Acc_ld => Acc = Acc +M CG Clk Rst IR_ld PhG IR Addr Memory Data M U Data in Dec&CC X P wr … C Sel Op_sel Rst ALU Control signals Acc_ld Acc_shr Inc PC_ld Acc Acc_shl Acc_clr A simple computer Example 2 – JMP 200h ⚫ IF : Sel=0 => Address = PC ; IR_ld – impulse => IR=JMP 200 ⚫ Dec: Inc = 1 => increment PC ⚫ PreEx: PC_ld = 1 => PC=IR_addr=100 ⚫ Exe: Example 3 – SHR Acc ⚫ IF and Dec: the same ⚫ PreEx: ⚫ Exe: Acc_shr = 1 => shift the accumulator one position to the right CG Clk Rst IR_ld PhG IR Addr Memory Data M U Data in Dec&CC X P wr … C Sel Op_sel Rst ALU Control signals Acc_ld Acc_shr Inc PC_ld Acc Acc_shl Acc_clr A simple computer ⚫ Homework: try to implement: MOV M[addr], Acc MOV Acc, M[addr] Conditional jump (e.g if Acc=0, >0, the phase generator should depend on the instruction code ⚫ Multiple internal registers -> 2 buses: input data; output data ⚫ Front panel with 7segment LEDs and switches ⚫ Increase the number of instructions -> more complex Decoder and Command and Control Unit A more sophisticated computer, but still simple – the MIPS architecture Attributes: ⚫ Sequential ⚫ 32 internal registers of 16 bits ⚫ Instructions: fixed length, variable content ⚫ Harvard memory architecture: separate instruction and data memory ⚫ An instruction is executed in 5 phases: IF – instruction fetch ID – decode the instruction and prepare (read) the data Ex – execute the instruction M - operation with the memory Wb – write back – store the result ⚫ Instruction types: “R” Register ex. ADD $RS, $RD,$RT “I” Immediate ex. ADDI $RT,$RS, constant; LW $RT, offset($RS) “J” Jump ex. JMP target MIPS architecture Instruction formats: ⚫ Fixed length (4 bytes) but multiple content “R” – register type instructions rd, rs, rt rd –destination register rs – source register rt – target register Ex: add $s1, $s2, $s3 ; $s1=$s2+$s3 Opcode rs rt rd shift funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits MIPS architecture Instruction formats “I” immediate type instruction - with immediate value (constant) rt, rs, IMM rs – source register rt – target register Ex: addi $s1, $s2, 55 ; $s1=$s2+55 Opcode rs rt IMM/Addr 6 bits 5 bits 5 bits 16 bits “J” – jump type instructions LABEL Ex: j et1 ;jump Opcode Target address 6 bits 26 bits MIPS architecture Address generation and instruction fetch PC_MUX_Sel1 PC_ld IR_ld +4 M Op_code U Program P Address X Add Memory Instr. code I op_address C R 0 M const. U X Jump address PC_MUX_Sel2 PC = PC+4 - increment the PC PC=Jump_Address – absolute jump PC=PC+ Jump_Address – relative jump MIPS architecture Decode and data preparation Exec cmds. op_code D Mem. cmds. E C WB cmds. Instruction reg. 0 register M reg. 1 A (data) U reg. 2 X I op1_ad R op2_ad reg. 31 M U B (data) X Register Block address I (Immediate value) MIPS architecture Execute and memorize Result A Address ALU B Data Dout Memory Data out I Din Sel_ALU ex_op_code Wr_mem Exe and mem cmds MIPS architecture Write back the result Result reg. 0 M reg. 1 U reg. 2 Data out X reg. 31 I Dest. reg D Wr_R0,31 R E C Wr_reg Register Sel_rez Block WB cmds MIPS architecture The whole picture Clk Clock gen. Phase gen. Instr. dec +4 I P Instr. Regs Data C R ALU Regs 0 mem Mem Pipeline execution What does it mean? ⚫ Work as “an assembly line” idea – General Motors around 1900 How to do it? ⚫ Specialized components (units) for every phase of instruction execution ⚫ Memorize the partial results in temporary buffers What can we achieve? ⚫ Higher execution speed at the same clock frequency ⚫ CPI ~ 1 Sequential v.s. Pipeline execution Sequential execution CPI=5 IF ID Ex M Wb IF ID Ex M Wb IF ID Ex M Wb Instr. 1 Instr. 2 Instr. 3 Pipeline execution CPI=1 (in the ideal case) T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 IF ID Ex M Wb i1 IF ID Ex M Wb i2 IF ID Ex M Wb i3 IF ID Ex M Wb i4 IF ID Ex M Wb i5 Superscalare and superpipeline architectures T1 T2 T3 T4 T5 T6 Superscalar – instr. i IF ID Ex M Wb ⚫ Multiple pipelines instr. i+1 IF ID Ex M Wb ⚫ 2 instructions are fetched every clock instr. i+2 IF ID Ex M Wb ⚫ CPI= ½ instr. i+3 IF ID Ex M Wb T1 T2 T3 T4 T5 T6 instr. i IF ID Ex M Wb Superpipeline instr. i+1 IF ID Ex M Wb ⚫ phases require only half clock instr. i+2 IF ID Ex M Wb period instr. i+3 IF ID Ex M Wb ⚫ CPI = 1/2 Pipelined MIPS architecture A Data Inst. I Reg. Reg. Mem Mem R block R addr Do M block addr inst. B Di D I P D ex +4 C e m m wb wb c wb C1 C2 C3 IF DI Ex M Wb Pipeline architecture There is no free meal! Hazard cases: ⚫ Data hazard Data dependency between consecutive instructions ⚫ Control hazard Jump/branch instructions change the normal (sequential) order of instruction execution ⚫ Structural hazard Instructions in different phases use the same structural component (e.g. ALU, registers, memory, bus, etc.) Result: reduce the speed and the efficiency of the pipeline architecture Hazard cases in pipeline architectures Data hazard IF ID Ex M Wb MOV AX, 5 IF ID Ex M ADD BX, AX Stall IF ID Ex M SUB CX, 5 phases IF ID Ex M MOV DX, CX Data hazard types: ⚫ RAW - read after write Occurs very often; avoided through forwarding (see Common data bus) ⚫ WAR – write after read It is rare in classic pipeline; more often in superscalar pipelines ⚫ WAW – write after write ⚫ RAR – not a hazard Hazard cases in pipeline architectures Data hazard (cont.) ⚫ Solutions: Detection and Stall phases ⚫ instruction with unsolved data dependency waits in the “instruction fetch” stage until the data is available ⚫ the next instructions are also stalled Register renaming ⚫ multiple copies of a register (see alias registers for Pentium Pro) ⚫ instructions with no logical dependency between them can get different copies of the same register ⚫ avoid artificial data dependency caused by the limited number of internal registers Forwarding (see Common data bus) ⚫ transfer a result in advance before it is written in the final place (register or memory location) Out-of-order execution ⚫ speculative execution (see Pentium Pro architecture) Hazard cases in pipeline architectures Structural hazard IF ID Ex M Wb IF ID Ex Wb Instruction with no memory phase IF ID Ex M Wb IF ID Ex M Wb Two instr. are using the register block in different phases ⚫ Solutions: Detection and Stall phases Redundant functional units – see Pentium processors Harvard memory organization – separate code and data memory – see microcontrollers Multiple buses – see DSPs Out-of-order execution Hazard cases in pipeline architectures Control hazard JE et1 IF ID Ex ADD AX, BX IF ID Ex M SUB CX, DX IF ID Ex M............... et1: MOV SI, 1234h IF ID Ex M Wb Solutions: ⚫ Stall phases ⚫ Branch prediction ⚫ Out-of-order execution Pipeline architecture – hazard cases Solving hazard cases: ⚫ Detect hazard cases and introduce “stall” phases ⚫ Rearrange instructions: re-arrange instructions in order to reduce the dependences between consecutive instructions Methods: ⚫ Static scheduling – made before program execution – optimization made by the compiler or user ⚫ Dynamic scheduling – made during program execution – optimization made by the processor – out-of-order execution ⚫ Branch prediction techniques Static v.s. dynamic scheduling Static scheduling: ⚫ The optimal order of instructions is established by the compiler, based on information about the structure of the pipeline ⚫ Advantages: it is made once and benefit every time the code is executed ⚫ Drawback: compiler should know about the structure of the hardware (e.g. pipeline stages, phases of every instruction); compiler must be changed when the processor version changes Dynamic scheduling: ⚫ The hardware has the capacity to reorder instruction to avoid or reduce the effect of hazard cases ⚫ Advantage: the processor knows best its structure; optimization can be better connected to the hardware; some dependences are reviled on at run-time ⚫ Drawbacks: reordering decisions are made every time the code is executed; mode complex hardware is needed

CPU Structure and Function (PDF)

Document Details

Tags

Related

Summary

Full Transcript