csc25-lecture-notes-35-54.pdf
Document Details
Uploaded by ManageableSatire
Tags
Full Transcript
Chapter 2 Processor Operation, RISC & CISC Processor Operation The Five Basic Parts The five basic parts, introduced...
Chapter 2 Processor Operation, RISC & CISC Processor Operation The Five Basic Parts The five basic parts, introduced in Chapter 1, illustrated the original von Neumann machine parts: memory, arithmetic logic unit (part of datapath), program control unit (control path), input equipment, and output equipment. In this sense, the instruction set architecture defines the processor’s instructions which in turn leads to the design of control and data paths. The Control Unit Generally, the execution of an instruction can be split into different phases: (i) operation code fetch, (ii) operation code decode, (iii) operands fetch, (iv) effective instruction execution, and (v) results store. The phases involving memory access, e.g., write some data back to the memory, can be 10 times slower than the other phases. Thus, it impacts the program performance in an important way. Instruction Set Architectures Classes The internal storage type in a processor is the most basic differentiation in terms of instruction set architectures classes. The following classes are the main ones, as shown in Fig. 2.1. Stack In the stack storage type, both instructions and operands are stored in memory following this abstract data type. The arithmetic logic unit - ALU operands are implicitly on the top of stack - TOS. Accumulator In the accumulator storage type, the instructions involve this special register (ACC register), and in some other cases, the memory. One operand is implicitly the accumulator (ACC) register. 29 2 Processor Operation, RISC & CISC Figure 2.1: Operand locations related to four ISA major classes. Lighter shades stands for inputs, and dark ones output. Register In the register storage type, instructions involve operands in various general-purpose registers - GPR (load-store architecture) or even the memory (register-memory architecture). Here, only explicit operands are allowed. Some Code Examples Let’s consider the expression C = A + B and examine the code sequence for that in the four major ISA classes. The code for stack (Listing 2.1). First, it is needed to push the operands from the memory to the stack. Then, the ALU operands are implicitly on the top of stack. Listing 2.1: Stack. 1 Push A 2 Push B 3 Add 4 Pop C The code for accumulator (Listing 2.2). It needs to fetch just one operand from memory since ALU operates considering one implicit operand as the ACC register, and the other as a memory position. Listing 2.2: Accumulator. 1 Load A 2 Add B 3 Store C 30 Processor Operation The code for register, namely the register-memory class (Listing 2.3). Operands are explicitly registers and memory positions. Listing 2.3: Register-memory. 1 Load R1 , A 2 Add R3 , R1 , B 3 Store R3 , C The code for register, i.e., the load-store class (Listing 2.4). Operands are only explicit registers. First, it is needed to load the operands from the memory to a register to execute the instruction Add. Listing 2.4: Load-store. 1 Load R1 , A 2 Load R2 , B 3 Add R3 , R1 , R2 4 Store R3 , C Notice that both stack and load-store classes have separate instructions to access the memory. The stack has push and pop for memory access, and the load-store has load and store for memory access. Processor with Accumulator and Microprogram Fig. 2.2 illustrates the architecture of processor with accumulator. The accumulator (ACC) register is connected to the ALU as its implicit operand. There is also the data register - DR; address register - AR; program counter - PC; and instruction register - IR. ALU ACC DR Memory PC IR Control AR Unit Figure 2.2: Processor with accumulator architecture. Fig. 2.3 shows the representation of the microprogram regarding the architecture of processor with accumulator. 31 2 Processor Operation, RISC & CISC BEGIN #1 Load FALSE processor END AR ← DR[addr] DR ← Mem[AR] ACC ← DR active? TRUE #2 Store AR ← PC AR ← DR[addr] DR ← ACC Mem[AR] ← DR DR ← Mem[AR] #5 Addition IR ← DR[opcode] AR ← DR[addr] DR ← Mem[AR] ACC ← ACC + DR #6 Subtraction PC ← PC + 1 AR ← DR[addr] DR ← Mem[AR] ACC ← ACC - DR #7 Multiplication IR AR ← DR[addr] DR ← Mem[AR] ACC ← ACC × DR #15 Jump PC ← DR[addr] #16 Jump if positive TRUE ACC > 0 ? PC ← DR[addr] FALSE #21 Stop halt processor Figure 2.3: This represents the microprogram related to the processor with accumulator architecture. In the figure’s left-hand side (Fig. 2.3), the microprogram behaves as follows (Listing 2.5). Listing 2.5: Microprogram pseudo-code first half. 1 BEGIN // processor starts 2 if ( processor is active ) // check if the processor is ACTIVE 3 AR = PC // address register gets the program counter 4 // register value 5 DR = Mem [ AR ] // data register gets the memory position value 6 // indexed by the address register value 7 IR = DR [ opcode ] // instruction register gets the operation 8 // code field from the data register contents 9 PC = PC + 1 // program counter is incremented to the 10 // next instruction 11 IR // verify which is the operation code 12 // to execute it 13 else 14 // processor is NOT active 15 END // execution ends 32 Processor Operation In the right-hand side of Fig. 2.3, the microprogram behaves as follows (Listing 2.6). Listing 2.6: Microprogram pseudo-code second half. 1 switch ( IR ) // verify which is the operation code to execute 2 case 1: // operation " LOAD " 3 AR = DR [ addr ] // address register gets the address field 4 // contents from data register 5 DR = Mem [ AR ] // data register gets the contents of memory 6 // position indexed by the address register 7 ACC = DR // accumulator register gets the data 8 // register value 9 case 2: // operation " STORE " 10 AR = DR [ addr ] 11 DR = ACC // data register gets the accumulator value 12 Mem [ AR ] = DR // memory position indexed by the address 13 // register gets the data register value 14. 15. 16. 17 case 5: // operation " ADDITION " 18 AR = DR [ addr ] 19 DR = Mem [ AR ] 20 ACC = ACC + DR // accumulator gets the accumulator plus 21 // the data register value 22. 23. 24. 25 case 15: // operation " JUMP " 26 PC = DR [ addr ] // program counter gets the address field 27 // value of the data register 28 29 case 16: // operation " JUMP IF POSITIVE " 30 if ( ACC >0) // if the accumulator is greater than zero 31 PC = DR [ addr ] 32. 33. 34. 35 case 21: // operation " STOP " 36 halt the processor Now, take the following program pseudo-code example, as shown in Listing 2.7. Listing 2.7: The result of X. 1 T1 = F + G 2 T1 = ( H - I ) * T1 3 T2 = E * F 4 X = A + B 5 X = ( ( C - D ) * X - T2 ) / T1 33 2 Processor Operation, RISC & CISC The previous listing pseudo-code is equivalent to the following expression: (C − D) × (A + B) − (E × F ) X= (2.1) (H − I) × (F + G) How to perform the correspondent assembly code in a processor with accumulator? That is possible to implement with a 4-bit adder circuit (Fig. 2.4), 1-bit multiplexer (Fig. 2.5), and even with binary multiplication, with “shift left” and additions. Figure 2.4: A 4-bit adder circuit. Figure 2.5: A 1-bit mux circuit. Just to remember some basic logic gates used to implement circuits along with its truth tables, Figs. 2.6 to 2.8 give their schemes. Figure 2.6: AND logic gate. Figure 2.7: NOT logic gate. Figure 2.8: NAND logic gate. Consider again the following phases: (i) operation code fetch, (ii) operation code decode, (iii) operands fetch, (iv) effective instruction execution, and (v) results store. There is an intrinsic performance problem with respect to the accumulator architecture. Listing 2.8 is the accumulator code version implementing the expression “X”, as in Eq. (2.1), example from Listing 2.7. Listing 2.8: Code for the accumulator architecture. 1 Ld F 2 Add G 3 Sto T1 4 Ld H 5 Sub I 34 Instructions Set Architecture Approaches 6 Mult T1 7 Sto T1 8 Ld E 9 Mult F 10 Sto T2 11 Ld A 12 Add B 13 Sto X 14 Ld C 15 Sub D 16 Mult X 17 Sub T2 18 Div T1 19 Sto X The code in Listing 2.8 has 19 instructions. It performs 19 fetches, 19 operands fetches, and then 38 memory accesses. On the other hand, the GPR (register-memory) code version is given in Listing 2.9. Listing 2.9: Code for GPR (register-memory) architecture. 1 Ld R1 , F 2 Add R1 , G 3 Ld R2 , H 4 Sub R2 , I 5 Mult R1 , R2 6 Ld R2 , E 7 Mult R2 , F 8 Ld R3 , A 9 Add R3 , B 10 Ld R4 , C 11 Sub R4 , D 12 Mult R3 , R4 13 Sub R3 , R2 14 Div R3 , R1 15 Sto X , R3 The code in Listing 2.9 has 15 instructions instead of 19 from the accumulator version. It does 15 fetches, 11 operands fetches, and then 26 memory accesses. In this case, the final number is a 31.5% (12/38) less memory access compared to the accumulator version. Instructions Set Architecture Approaches Instructions Design - RISC vs. CISC Instructions with very different execution times or very different number of phases are not suitable for a production line, i.e., a pipeline. Given this, why not create simple instructions with small differences in phases execution time, and same number of phases? What about create powerful instructions that solve common problems rather than simple instructions that solve almost nothing? 35 2 Processor Operation, RISC & CISC Here comes some background. In the late ’70s, the idea of high-level language computer architecture - HLLCA came up. In the early ’80s, Ditzel and Patterson argued that simpler architectures would be the best approach, and presented the idea of the reduced instruction set computer - RISC. At the same time, some designers related to VAX refuted that idea and continued to build computers based on the complex instruction set computer - CISC. RISC and CISC development followed in parallel competing to the market. To help to fix the RISC vs. CISC debate, VAX designers compared the VAX 8700 and MIPS M2000 in the early ’90s. VAX had powerful addressing modes, powerful instructions, efficient instruction coding, and few registers. MIPS had simple instructions, simple addressing modes, fixed-length instruction format, a large number of registers, and pipelining. The comparison considered that computers had similar organizations and equal clock time. On average, MIPS executes about two times as many instructions as the VAX. The CPI for the VAX is almost six times the MIPS CPI, yielding almost a three times performance advantage, as Fig. 2.9 shows. VAX was then discontinued and replaced by the Alpha, a 64-bit MIPS-like architecture. Figure 2.9: Ratio of MIPS to VAX in instructions executed and performance in clock cycles, based on SPEC89 programs. Overview Analysis The following analyses show the number of addresses used when performing an operation in the four major ISA classes. For the stack (this uses just the top of stack): 1 0 address | add tos