Building a Datapath (RISC-V) PDF

Summary

This document discusses the design of a datapath for a RISC-V processor, focusing on the components and control signals required for various instruction types. It explains how arithmetic-logical and memory instructions can be executed.

Full Transcript

4.3 Building a Datapath 261 will make it clear that we are concatenating buses to form a wider bus. Arrows are also added to help clarify the direction of the flow of data between elements. Finally, color indicates a control signal contrary to a signal that carries data; t...

4.3 Building a Datapath 261 will make it clear that we are concatenating buses to form a wider bus. Arrows are also added to help clarify the direction of the flow of data between elements. Finally, color indicates a control signal contrary to a signal that carries data; this distinction will become clearer as we proceed through this chapter. True or false: Because the register file is both read and written on the same clock Check cycle, any RISC-V datapath using edge-triggered writes must have more than one Yourself copy of the register file. Elaboration: There is also a 64-bit version of the RISC-V architecture, and, naturally enough, most paths in its implementation would be 64 bits wide. 4.3 Building a Datapath A reasonable way to start a datapath design is to examine the major components required to execute each class of RISC-V instructions. Let’s start at the top by looking at which datapath elements each instruction needs, and then work our way down through the levels of abstraction. When we show the datapath elements, we will also show their control signals. We use abstraction in this explanation, starting from the bottom up. Figure 4.5a shows the first element we need: a memory unit to store the instructions of a program and supply instructions given an address. Figure 4.5b also shows the program counter (PC), which as we saw in Chapter 2 is a register that holds the address of the current instruction. Lastly, we will need an adder datapath element A to increment the PC to the address of the next instruction. This adder, which is unit used to operate on combinational, can be built from the ALU described in detail in Appendix A simply or hold data within a by wiring the control lines so that the control always specifies an add operation. We processor. In the RISC-V implementation, the will draw such an ALU with the label Add, as in Figure 4.5c, to indicate that it has datapath elements include been permanently made an adder and cannot perform the other ALU functions. the instruction and data To execute any instruction, we must start by fetching the instruction from memories, the register memory. To prepare for executing the next instruction, we must also increment the file, the ALU, and adders. program counter so that it points at the next instruction, 4 bytes later. Figure 4.6 program counter shows how to combine the three elements from Figure 4.5 to form the portion of a (PC) The register datapath that fetches instructions and increments the PC to obtain the address of containing the address the next sequential instruction. of the instruction in the Now let’s consider the R-format instructions (see Figure 2.19 on page 127). program being executed. They all read two registers, perform an ALU operation on the contents of the registers, and write the result to a register. We call these instructions either R-type instructions or arithmetic-logical instructions (since they perform arithmetic or logical operations). This instruction class includes add, sub, and, and or, which 262 Chapter 4 The Processor Instruction address Instruction PC Add Sum Instruction memory a. Instruction memory b. Program counter c. Adder FIGURE 4.5 Two state elements are needed to store and access instructions, and an adder is needed to compute the next instruction address. The state elements are the instruction memory and the program counter. The instruction memory need only provide read access because the datapath does not write instructions. Since the instruction memory only reads, we treat it as combinational logic: the output at any time reflects the contents of the location specified by the address input, and no read control signal is needed. (We will need to write the instruction memory when we load the program; this is not hard to add, and we ignore it for simplicity.) The program counter is a 32-bit register that is written at the end of every clock cycle and thus does not need a write control signal. The adder is an ALU wired to always add its two 32-bit inputs and place the sum on its output. were introduced in Chapter 2. Recall that a typical instance of such an instruction is add x1, x2, x3, which reads x2 and x3 and writes the sum into x1. The processor’s 32 general-purpose registers are stored in a structure called a register file A state register file. A register file is a collection of registers in which any register can be element that consists read or written by specifying the number of the register in the file. The register file of a set of registers that contains the register state of the computer. In addition, we will need an ALU to can be read and written operate on the values read from the registers. by supplying a register number to be accessed. R-format instructions have three register operands, so we will need to read two data words from the register file and write one data word into the register file for each instruction. For each data word to be read from the registers, we need an input to the register file that specifies the register number to be read and an output from the register file that will carry the value that has been read from the registers. To write a data word, we will need two inputs: one to specify the register number to be written and one to supply the data to be written into the register. The register file always outputs the contents of whatever register numbers are on the Read register inputs. Writes, however, are controlled by the write control signal, which must be asserted for a write to occur at the clock edge. Figure 4.7a shows the result; we need a total of three inputs (two for register numbers and one for data) and two outputs (both for data). The register number inputs are 5 bits wide to specify one of 32 registers (32 = 25), whereas the data input and two data output buses are each 32 bits wide. Figure 4.7b shows the ALU, which takes two 32-bit inputs and produces a 32-bit result, as well as a 1-bit signal if the result is 0. The 4-bit control signal of the ALU is described in detail in Appendix A; we will review the ALU control shortly when we need to know how to set it. 4.3 Building a Datapath 263 Add 4 Read PC address Instruction Instruction memory FIGURE 4.6 A portion of the datapath used for fetching instructions and incrementing the program counter. The fetched instruction is used by other parts of the datapath. 5 Read ALU operation 4 register 1 Read Register 5 Read data 1 numbers register 2 Zero Data ALU ALU 5 Registers Write result register Read Write data 2 Data Data RegWrite a. Registers b. ALU FIGURE 4.7 The two elements needed to implement R-format ALU operations are the register file and the ALU. The register file contains all the registers and has two read ports and one write port. The design of multiported register files is discussed in Section A.8 of Appendix A. The register file always outputs the contents of the registers corresponding to the Read register inputs on the outputs; no other control inputs are needed. In contrast, a register write must be explicitly indicated by asserting the write control signal. Remember that writes are edge-triggered, so that all the write inputs (i.e., the value to be written, the register number, and the write control signal) must be valid at the clock edge. Since writes to the register file are edge-triggered, our design can legally read and write the same register within a clock cycle: the read will get the value written in an earlier clock cycle, while the value written will be available to a read in a subsequent clock cycle. The inputs carrying the register number to the register file are all 5 bits wide, whereas the lines carrying data values are 32 bits wide. The operation to be performed by the ALU is controlled with the ALU operation signal, which will be 4 bits wide, using the ALU designed in Appendix A. We will use the Zero detection output of the ALU shortly to implement conditional branches. 264 Chapter 4 The Processor Next, consider the RISC-V load register and store register instructions, which have the general form lw x1, offset(x2) or sw x1, offset(x2). These instructions compute a memory address by adding the base register, which is x2, to the 12-bit signed offset field contained in the instruction. If the instruction is a store, the value to be stored must also be read from the register file where it resides in x1. If the instruction is a load, the value read from memory must be written into the register file in the specified register, which is x1. Thus, we will need both the register file and the ALU from Figure 4.7. sign-extend To increase Furthermore, we will need a unit to sign-extend the 12-bit offset field in the the size of a data item by instruction to a 32-bit signed value, and a data memory unit to read from or write replicating the high-order to. The data memory must be written on store instructions; hence, data memory sign bit of the original has read and write control signals, an address input, and an input for the data to be data item in the high- order bits of the larger, written into memory. Figure 4.8 shows these two elements. destination data item. The beq instruction has three operands, two registers that are compared for equality, and a 12-bit offset used to compute the branch target address relative to branch target the branch instruction address. Its form is beq x1, x2, offset. To implement address The address this instruction, we must compute the branch target address by adding the sign- specified in a branch, which becomes the new extended offset field of the instruction to the PC. There are two details in the program counter (PC) if definition of branch instructions (see Chapter 2) to which we must pay attention: the branch is taken. In the The instruction set architecture specifies that the base for the branch address RISC-V architecture, the branch target is given by calculation is the address of the branch instruction. the sum of the offset field The architecture also states that the offset field is shifted left 1 bit so that it is of the instruction and the a half word offset; this shift increases the effective range of the offset field by address of the branch. a factor of 2. branch taken To deal with the latter complication, we will need to shift the offset field by 1. A branch where the As well as computing the branch target address, we must also determine whether branch condition is the next instruction is the instruction that follows sequentially or the instruction at the satisfied and the program branch target address. When the condition is true (i.e., two operands are equal), the counter (PC) becomes the branch target. All branch target address becomes the new PC, and we say that the branch is taken. If unconditional branches the operand is not zero, the incremented PC should replace the current PC (just as for are taken branches. any other normal instruction); in this case, we say that the branch is not taken. Thus, the branch datapath must do two operations: compute the branch target branch not taken or address and test the register contents. (Branches also affect the instruction fetch (untaken branch) portion of the datapath, as we will deal with shortly.) Figure 4.9 shows the structure A branch where the of the datapath segment that handles branches. To compute the branch target branch condition is false address, the branch datapath includes an immediate generation unit, from Figure and the program counter (PC) becomes the address 4.8 and an adder. To perform the compare, we need to use the register file shown of the instruction that in Figure 4.7a to supply two register operands (although we will not need to write sequentially follows the into the register file). In addition, the equality comparison can be done using the branch. ALU we designed in Appendix A. Since that ALU provides an output signal that indicates whether the result was 0, we can send both register operands to the ALU 4.3 Building a Datapath 265 MemWrite Read Address data Imm Data Gen Write memory data MemRead a. Data memory unit b. Immediate generation unit FIGURE 4.8 The two units needed to implement loads and stores, in addition to the register file and ALU of Figure 4.7, are the data memory unit and the immediate generation unit. The memory unit is a state element with inputs for the address and the write data, and a single output for the read result. There are separate read and write controls, although only one of these may be asserted on any given clock. The memory unit needs a read signal, since, unlike the register file, reading the value of an invalid address can cause problems, as we will see in Chapter 5. The immediate generation unit (ImmGen) has a 32-bit instruction as input that selects a 12-bit field for load, store, and branch if equal that is sign-extended into a 32-bit result appearing on the output (see Chapter 2). We assume the data memory is edge-triggered for writes. Standard memory chips actually have a write enable signal that is used for writes. Although the write enable is not edge-triggered, our edge-triggered design could easily be adapted to work with real memory chips. See Section A.8 of Appendix A for further discussion of how real memory chips work. with the control set to subtract two values. If the Zero signal out of the ALU unit is asserted, we know that the register values are equal. Although the Zero output always signals if the result is 0, we will be using it only to implement the equality test of conditional branches. Later, we will show exactly how to connect the control signals of the ALU for use in the datapath. The branch instruction operates by adding the PC with the 12 bits of the instruction shifted left by 1 bit. Simply concatenating 0 to the branch offset accomplishes this shift, as described in Chapter 2. Creating a Single Datapath Now that we have examined the datapath components needed for the individual instruction classes, we can combine them into a single datapath and add the control to complete the implementation. This simplest datapath will attempt to execute all instructions in one clock cycle. Thus, that no datapath resource can be used more than once per instruction, so any element needed more than once must be duplicated. We therefore need a memory for instructions separate from one for data. Although some of the functional units will need to be duplicated, many of the elements can be shared by different instruction flows. 266 Chapter 4 The Processor PCfrom instruction datapath Branch Add Sum target Read ALU operation register 1 4 Instruction Read Read data 1 register 2 To branch Registers ALU Zero control logic Write register Read data 2 Write data RegWrite Imm Gen FIGURE 4.9 The portion of a datapath for a branch uses the ALU to evaluate the branch condition and a separate adder to compute the branch target as the sum of the PC and immediate (the branch displacement). Control logic is used to decide whether the incremented PC or branch target should replace the PC, based on the Zero output of the ALU. To share a datapath element between two different instruction classes, we may need to allow multiple connections to the input of an element, using a multiplexor and control signal to select among the multiple inputs. Building a Datapath EXAMPLE The operations of arithmetic-logical (or R-type) instructions and the memory instructions datapath are quite similar. The key differences are the following: The arithmetic-logical instructions use the ALU, with the inputs coming from the two registers. The memory instructions can also use the ALU to do the address calculation, although the second input is the sign- extended 12-bit offset field from the instruction. The value stored into a destination register comes from the ALU (for an R-type instruction) or the memory (for a load). 4.3 Building a Datapath 267 Show how to build a datapath for the operational portion of the memory- reference and arithmetic-logical instructions that uses a single register file and a single ALU to handle both types of instructions, adding any necessary multiplexors. To create a datapath with only a single register file and a single ALU, we must support two different sources for the second ALU input, as well as two different ANSWER sources for the data stored into the register file. Thus, one multiplexor is placed at the ALU input and another at the data input to the register file. Figure 4.10 shows the operational portion of the combined datapath. Now we can combine all the pieces to make a simple datapath for the core RISC-V architecture by adding the datapath for instruction fetch (Figure 4.6), the datapath from R-type and memory instructions (Figure 4.10), and the datapath for branches (Figure 4.9). Figure 4.11 shows the datapath we obtain by composing the separate pieces. The branch instruction uses the main ALU to compare two register operands for equality, so we must keep the adder from Figure 4.9 for computing the branch target address. An additional multiplexor is required to select either the sequentially following instruction address (PC + 4) or the branch target address to be written into the PC. Read ALU operation register 1 4 Read MemWrite data 1 Read MemtoReg register 2 Zero Instruction ALUSrc Registers Read ALU ALU Read Write 0 Address 1 data 2 result data M register M u u x x Write 1 0 data Data Write memory RegWrite data Imm MemRead Gen FIGURE 4.10 The datapath for the memory instructions and the R-type instructions. This example shows how a single datapath can be assembled from the pieces in Figures 4.7 and 4.8 by adding multiplexors. Two multiplexors are needed, as described in the example. 268 Chapter 4 The Processor PCSrc M Add u x 4 Add Sum Read ALUSrc ALU operation Read register 1 4 PC Read address MemWrite Read data 1 Zero MemtoReg register 2 Instruction Registers Read ALU ALU Read Write Address Instruction data 2 M result data M register u memory u x x Write data Write Data RegWrite data memory MemRead Imm Gen FIGURE 4.11 The simple datapath for the core RISC-V architecture combines the elements required by different instruction classes. The components come from Figures 4.6, 4.9, and 4.10. This datapath can execute the basic instructions (load-store register, ALU operations, and branches) in a single clock cycle. Just one additional multiplexor is needed to integrate branches. Check I. Which of the following is correct for a load instruction? Refer to Figure 4.10. Yourself a. MemtoReg should be set to cause the data from memory to be sent to the register file. b. MemtoReg should be set to cause the correct register destination to be sent to the register file. c. We do not care about the setting of MemtoReg for loads. II. The single-cycle datapath conceptually described in this section must have separate instruction and data memories, because a. the formats of data and instructions are different in RISC-V, and hence different memories are needed; b. having separate memories is less expensive; c. the processor operates in one clock cycle and cannot use a (single- ported) memory for two different accesses within that clock cycle. 4.4 A Simple Implementation Scheme 269 Now that we have completed this simple datapath, we can add the control unit. The control unit must be able to take inputs and generate a write signal for each state element, the selector control for each multiplexor, and the ALU control. The ALU control is different in a number of ways, and it will be useful to design it first before we design the rest of the control unit. Elaboration: The immediate generation logic must choose between sign-extending a 12-bit field in instruction bits 31:20 for load instructions, bits 31:25 and 11:7 for store instructions, or bits 31, 7, 30:25, and 11:8 for the conditional branch. Since the input is all 32 bits of the instruction, it can use the opcode bits of the instruction to select the proper field. RISC-V opcode bit 6 happens to be 0 for data transfer instructions and 1 for conditional branches, and RISC-V opcode bit 5 happens to be 0 for load instructions and 1 for store instructions. Thus, bits 5 and 6 can control a 3:1 multiplexor inside the immediate generation logic that selects the appropriate 12-bit field for load, store, and conditional branch instructions. 4.4 A Simple Implementation Scheme In this section, we look at what might be thought of as a simple implementation of our RISC-V subset. We build this simple implementation using the datapath of the last section and adding a simple control function. This simple implementation covers load word (lw), store word (sw), branch if equal (beq), and the arithmetic- logical instructions add, sub, and, and or. The ALU Control The RISC-V ALU in Appendix A defines the four following combinations of four control inputs: ALU control lines Function 0000 AND 0001 OR 0010 add 0110 subtract Depending on the instruction class, the ALU will need to perform one of these four functions. For load and store instructions, we use the ALU to compute the memory address by addition. For the R-type instructions, the ALU needs to perform one of the four actions (AND, OR, add, or subtract), depending on the value of the 7-bit funct7 field (bits 31:25) and 3-bit funct3 field (bits 14:12) in the instruction (see Chapter 2). For the conditional branch if equal instruction, the ALU subtracts two operands and tests to see if the result is 0.

Use Quizgecko on...
Browser
Browser