csc25-chapter_02.pdf

CSC-25 High Performance Architectures Lecture Notes – Chapter II Processor Operation, RISC & CISC Denis Loubach [email protected] Department of Computer Systems Computer Science Division – IEC Aeronautics Institute of Technology – ITA 1st semester, 2024 Detailed Contents Processor Operation Control for Single Cycle Datapath - SCD The Five Basic Parts - Control Control Signals Control Unit Instruction Fetch Unit (1/4) Instruction Set Architectures Classes SCD During Add and Subtract Processor with Accumulator and Microprogram Instruction Fetch Unit (2/4) Instructions Set Architecture Approaches SCD During Load Instructions Design - RISC vs. CISC SCD During Store Overview Analysis SCD During Conditional Branch Control of a GPR Processor - MIPS Instruction Fetch Unit (3/4) Background SCD During Jump Instruction Layout Instruction Fetch Unit (4/4) Main Instructions Drawback of Single Cycle Processor Addressing Modes References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 2/52 Outline Processor Operation Instructions Set Architecture Approaches Control of a GPR Processor - MIPS References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 3/52 Processor Operation The Five Basic Parts - Control Memory 1. memory 2. arithmetic logic unit (part of datapath) 3. program control unit (control path) Arithmetic Input Control Logical Unit 4. input equipment Unit Acc Output 5. output equipment The original von Neumann machine The instruction set architecture defines the processor’s (control & datapaths) instructions which in turn leads the design of control and datapaths 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 4/52 Processor Operation (cont.) Control Unit Generally, the execution of the instruction can be split into different phases: 1. operation code fetch 2. operation code decode 3. operands fetch 4. effective instruction execution 5. results store Phases involving memory access can be 10× slower than the other phases 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 5/52 Processor Operation (cont.) Instruction Set Architectures Classes The internal storage type in a processor is the most basic differentiation: I stack I instructions and operands stored in a memory following this abstract data type I operands are implicitly on the top of stack - TOS I accumulator I instructions involve this special register, and in some other cases the memory I one operand is implicitly the accumulator I register I instructions involve operands in various general purpose registers - GPR or even the memory I only explicit operands 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 6/52 Processor Operation (cont.) Instruction Set Architectures Classes Operand locations related to four ISA major classes. Lighter shades stands for inputs, and darker for output 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 7/52 Processor Operation (cont.) Instruction Set Architectures Classes Code sequence for C = A + B Stack Accumulator Register Register (register-memory) (load-store) 1 Push A 1 Load A 1 Load R1 , A 1 Load R1 , A 2 Push B 2 Add B 2 Add R3 , R1 , B 2 Load R2 , B 3 Add 3 Store C 3 Store R3 , C 3 Add R3 , R1 , R2 4 Pop C 4 4 4 Store R3 , C Separate instructions for: stack I push and pop for memory access load-store I load and store for memory access 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 8/52 Processor Operation (cont.) Instruction Set Architectures Classes Code sequence for C = A + B Stack Accumulator Register Register (register-memory) (load-store) 1 Push A 1 Load A 1 Load R1 , A 1 Load R1 , A 2 Push B 2 Add B 2 Add R3 , R1 , B 2 Load R2 , B 3 Add 3 Store C 3 Store R3 , C 3 Add R3 , R1 , R2 4 Pop C 4 4 4 Store R3 , C Separate instructions for: stack I push and pop for memory access load-store I load and store for memory access 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 8/52 Processor Operation (cont.) Instruction Set Architectures Classes Code sequence for C = A + B Stack Accumulator Register Register (register-memory) (load-store) 1 Push A 1 Load A 1 Load R1 , A 1 Load R1 , A 2 Push B 2 Add B 2 Add R3 , R1 , B 2 Load R2 , B 3 Add 3 Store C 3 Store R3 , C 3 Add R3 , R1 , R2 4 Pop C 4 4 4 Store R3 , C Separate instructions for: stack I push and pop for memory access load-store I load and store for memory access 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 8/52 Processor Operation (cont.) Instruction Set Architectures Classes Code sequence for C = A + B Stack Accumulator Register Register (register-memory) (load-store) 1 Push A 1 Load A 1 Load R1 , A 1 Load R1 , A 2 Push B 2 Add B 2 Add R3 , R1 , B 2 Load R2 , B 3 Add 3 Store C 3 Store R3 , C 3 Add R3 , R1 , R2 4 Pop C 4 4 4 Store R3 , C Separate instructions for: stack I push and pop for memory access load-store I load and store for memory access 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 8/52 Processor Operation (cont.) Processor with Accumulator and Microprogram A simple case ALU ACC DR Memory PC IR Control AR Unit Processor with accumulator - ACC; data register - DR; address register - AR; program counter - PC; instruction register - IR 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 9/52 Processor Operation (cont.) Processor with Accumulator and Microprogram BEGIN #1 Load FALSE processor END AR ← DR[addr] DR ← Mem[AR] ACC ← DR active? TRUE #2 Store AR ← PC AR ← DR[addr] DR ← ACC Mem[AR] ← DR DR ← Mem[AR] #5 Addition IR ← DR[opcode] AR ← DR[addr] DR ← Mem[AR] ACC ← ACC + DR #6 Subtraction PC ← PC + 1 AR ← DR[addr] DR ← Mem[AR] ACC ← ACC - DR #7 Multiplication IR AR ← DR[addr] DR ← Mem[AR] ACC ← ACC × DR #15 Jump PC ← DR[addr] #16 Jump if positive TRUE ACC > 0 ? PC ← DR[addr] FALSE #21 Stop halt processor 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 10/52 Processor Operation (cont.) Processor with Accumulator and Microprogram BEGIN #1 Load FALSE processor END AR ← DR[addr] DR ← Mem[AR] ACC ← DR active? TRUE #2 Store ALU AR ← PC AR ← DR[addr] DR ← ACC Mem[AR] ← DR DR ← Mem[AR] ACC #5 Addition DR IR ← DR[opcode] AR ← DR[addr] DR ← Mem[AR] ACC ← ACC + DR Memory #6 Subtraction PC ← PC + 1 AR ← DR[addr] DR ← Mem[AR] ACC ← ACC - DR PC IR #7 Multiplication IR AR ← DR[addr] DR ← Mem[AR] ACC ← ACC × DR #15 Jump Control PC ← DR[addr] AR Unit #16 Jump if positive TRUE ACC > 0 ? PC ← DR[addr] FALSE #21 Stop halt processor 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 11/52 Processor Operation (cont.) Processor with Accumulator and Microprogram Program pseudo-code example Equivalent to: 1 T1 = F + G (C − D) × (A + B) − (E × F) 2 T1 = ( H - I ) * T1 X= (H − I) × (F + G) 3 T2 = E * F 4 X = A + B 5 X = ( ( C - D ) * X - T2 ) / T1 How to do the correspondent assembly code in a processor with accumulator? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 12/52 Processor Operation (cont.) Processor with Accumulator and Microprogram 4-bit adder 1-bit mux Binary multiplication (shift left & adds) 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 13/52 Processor Operation (cont.) Processor with Accumulator and Microprogram Logic gates 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 14/52 Processor Operation (cont.) Processor with Accumulator and Microprogram Considering the following phases: 1. operation code fetch 2. operation code decode 3. operands fetch 4. effective instruction execution 5. results store Is there an intrinsic performance problem with respect to the accumulator architecture? Which one? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 15/52 Processor Operation (cont.) Processor with Accumulator and Microprogram Accumulator GPR (register-memory) 1 Ld F 11 Ld A 1 Ld R1 , F 9 Add R3 , B 2 Add G 12 Add B 2 Add R1 , G 10 Ld R4 , C 3 Sto T1 13 Sto X 3 Ld R2 , H 11 Sub R4 , D 4 Ld H 14 Ld C 4 Sub R2 , I 12 Mult R3 , R4 5 Sub I 15 Sub D 5 Mult R1 , R2 13 Sub R3 , R2 6 Mult T1 16 Mult X 6 Ld R2 , E 14 Div R3 , R1 7 Sto T1 17 Sub T2 7 Mult R2 , F 15 Sto X , R3 8 Ld E 18 Div T1 8 Ld R3 , A 9 Mult F 19 Sto X 10 Sto T2 19 instructions 15 instructions I 19 fetches I 15 fetches I 19 operands fetches I 11 operands fetches I 38 memory accesses I 26 memory accesses 31.5% (12/38) less memory access 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 16/52 Outline Processor Operation Instructions Set Architecture Approaches Control of a GPR Processor - MIPS References 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 17/52 Instructions Set Architecture Approaches Instructions Design - RISC vs. CISC Instructions with very different execution times or with very different number of phases are not suitable for a production line (pipeline) Why not create simple instructions with small differences in phases execution time, and same number of phases? What about create powerful instructions that solve common problems rather than simple instructions that solve almost nothing? 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 18/52 Instructions Set Architecture Approaches (cont.) Instructions Design - RISC vs. CISC Some background I late ’70s – the idea of high-level language computer architecture - HLLCA came up I early ’80s – Ditzel and Patterson argued that simpler architectures would be the best approach, and presented the idea of the reduced instruction set computer - RISC I at the same time, some designers related to VAX refuted that idea and continued to build computers based on the complex instruction set computer - CISC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 19/52 Instructions Set Architecture Approaches (cont.) Instructions Design - RISC vs. CISC RISC and CISC development followed in parallel competing to the market The RISC computers had three major seminal projects: I Berkeley RISC processor, led by Patterson I Stanford MIPS processor, led by Hennessy I IBM 801 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 20/52 Instructions Set Architecture Approaches (cont.) Instructions Design - RISC vs. CISC Berkeley RISC I RISC I and II (1980 to ∼1983) I 16- and 32-bit instructions Stanford MIPS I 1981 to 1984 I 32-bit instructions Those universities projects were widely adopted by industry after their conclusions IBM had never launched the IBM 801 into the market, but created the RS 6000 in 1990. That was the first superscalar RISC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 21/52 Instructions Set Architecture Approaches (cont.) Instructions Design - RISC vs. CISC To help fixing the RISC vs. CISC debate, VAX designers compared the VAX 8700 and MIPS M2000 in the early ’90s VAX: MIPS: I powerful addressing modes I simple instructions I powerful instructions I simple addressing modes I efficient instruction coding I fixed length instruction format I few registers I large number of registers I pipelining Computers had similar organizations and equal clock times 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 22/52 Instructions Set Architecture Approaches (cont.) Instructions Design - RISC vs. CISC On average, MIPS executes about 2× as many instructions as the VAX CPI for the VAX is almost 6× the MIPS CPI, yielding almost a 3× performance advantage VAX was discontinued and replaced by the Alpha, a 64-bit MIPS-like architecture Ratio of MIPS to VAX in instructions executed and performance in clock cycles, based on SPEC89 programs 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 23/52 Instructions Set Architecture Approaches (cont.) Instructions Design - RISC vs. CISC Finally, just one CISC has survived this debate: the x86 I high chip volume I binary compatibility with PC software I internally translation from CISC to RISC I enough scaling to support extra hardware 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 24/52 Instructions Set Architecture Approaches (cont.) Instructions Design - RISC vs. CISC Some info from the embedded market I personal mobile devices - PMDs, home appliances, specialized computers I critical: cost and power I use RISC compilers and architectures In 2000, the number of embedded processors sold was more than twice the number of x86 processors, and more than 90% of them are RISC 1st semester, 2024 Loubach CSC-25 High Performance Architectures ITA 25/52 Instructions Set Architecture Approaches (cont.) Overview Analysis Stack 1 0 address add tos

csc25-chapter_02.pdf

Document Details

Tags

Related

Full Transcript