(The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-102-258-pages-5.pdf

2.5 Representing Instructions in the Computer 87 Elaboration: Two’s complement gets its name from the rule that the unsigned sum of an n-bit number and its n-bit negative is 2n; hence, the negation or complement of a number x is 2n − x, or its “two’s complement.” one’s complement A A third alternative representation to two’s complement and sign and magnitude is called notation that represents one’s complement. The negative of a one’s complement is found by inverting each bit, from the most negative value by 10 … 000two and the 0 to 1 and from 1 to 0, or x. This relation helps explain its name since the complement of most positive value by x is 2n − x − 1. It was also an attempt to be a better solution than sign and magnitude, and 01 … 11two, leaving an several early scientific computers did use the notation. This representation is similar to equal number of negatives two’s complement except that it also has two 0s: 00 … 00two is positive 0 and 11 … 11two and positives but ending is negative 0. The most negative number, 10 … 000two, represents −2,147,483,647ten, up with two zeros, one and so the positives and negatives are balanced. One’s complement adders did positive (00 … 00two) and need an extra step to subtract a number, and hence two’s complement dominates today. one negative (11 … 11two). A final notation, which we will look at when we discuss floating point in Chapter 3, The term is also used to is to represent the most negative value by 00 … 000two and the most positive value by mean the inversion of 11 … 11two, with 0 typically having the value 10 … 00two. This representation is called a every bit in a pattern: 0 to biased notation, since it biases the number such that the number plus the bias has a 1 and 1 to 0. non-negative representation. biased notation A notation that represents the most negative value by 00 … 000two and the 2.5 Representing Instructions in the Computer most positive value by 11 … 11two, with 0 typically having the We are now ready to explain the difference between the way humans instruct value 10 … 00two, thereby computers and the way computers see instructions. biasing the number such Instructions are kept in the computer as a series of high and low electronic signals and that the number plus the bias has a non-negative may be represented as numbers. In fact, each piece of an instruction can be considered representation. as an individual number, and placing these numbers side by side forms the instruction. The 32 registers of RISC-V are just referred to by their number, from 0 to 31. EXAMPLE Translating a RISC-V Assembly Instruction into a Machine Instruction Let’s do the next step in the refinement of the RISC-V language as an example. We’ll show the real RISC-V language version of the instruction represented symbolically as add x9, x20, x21 first as a combination of decimal numbers and then of binary numbers. ANSWER The decimal representation is 0 21 20 0 9 51 88 Chapter 2 Instructions: Language of the Computer Each of these segments of an instruction is called a field. The first, fourth, and sixth fields (containing 0, 0, and 51 in this case) collectively tell the RISC-V computer that this instruction performs addition. The second field gives the number of the register that is the second source operand of the addition operation (21 for x21), and the third field gives the other source operand for the addition (20 for x20). The fifth field contains the number of the register that is to receive the sum (9 for x9). Thus, this instruction adds register x20 to register x21 and places the sum in register x9. This instruction can also be represented as fields of binary numbers instead of decimal: 0000000 10101 10100 000 01001 0110011 7 bits 5 bits 5 bits 3 bits 5 bits 7 bits instruction format A This layout of the instruction is called the instruction format. As you can see form of representation of from counting the number of bits, this RISC-V instruction takes exactly 32 bits—a an instruction composed word. In keeping with our design principle that simplicity favors regularity, RISC-V of fields of binary numbers. instructions are all 32 bits long. To distinguish it from assembly language, we call the numeric version of machine instructions machine language and a sequence of such instructions machine code. language Binary It would appear that you would now be reading and writing long, tiresome representation used for strings of binary numbers. We avoid that tedium by using a higher base than communication within a binary that converts easily into binary. Since almost all computer data sizes are computer system. multiples of 4, hexadecimal (base 16) numbers are popular. As base 16 is a power hexadecimal Numbers of 2, we can trivially convert by replacing each group of four binary digits by a in base 16. single hexadecimal digit, and vice versa. Figure 2.4 converts between hexadecimal and binary. FIGURE 2.4 The hexadecimal–binary conversion table. Just replace one hexadecimal digit by the corresponding four binary digits, and vice versa. If the length of the binary number is not a multiple of 4, go from right to left. Because we frequently deal with different number bases, to avoid confusion, we will subscript decimal numbers with ten, binary numbers with two, and hexadecimal numbers with hex. (If there is no subscript, the default is base 10.) By the way, C and Java use the notation 0xnnnn for hexadecimal numbers. 2.5 Representing Instructions in the Computer 89 Binary to Hexadecimal and Back EXAMPLE Convert the following 8-digit hexadecimal and 32-bit binary numbers into the other base: eca8 6420hex 0001 0011 0101 0111 1001 1011 1101 1111two Using Figure 2.4, the answer is just a table lookup one way: ANSWER eca8 6420hex 1110 1100 1010 1000 0110 0100 0010 0000two And then the other direction: 0001 0011 0101 0111 1001 1011 1101 1111two 1357 9bdfhex RISC-V Fields RISC-V fields are given names to make them easier to discuss: funct7 rs2 rs1 funct3 rd opcode 7 bits 5 bits 5 bits 3 bits 5 bits 7 bits Here is the meaning of each name of the fields in RISC-V instructions: n opcode: Basic operation of the instruction, and this abbreviation is its opcode The field that traditional name. denotes the operation and n rd: The register destination operand. It gets the result of the operation. format of an instruction. n funct3: An additional opcode field. n rs1: The first register source operand. n rs2: The second register source operand. n funct7: An additional opcode field. 90 Chapter 2 Instructions: Language of the Computer A problem occurs when an instruction needs longer fields than those shown above. For example, the load register instruction must specify two registers and a constant. If the address were to use one of the 5-bit fields in the format above, the largest constant within the load register instruction would be limited to only 25−1 or 31. This constant is used to select elements from arrays or data structures, and it often needs to be much larger than 31. This 5-bit field is too small to be useful. Hence, we have a conflict between the desire to keep all instructions the same length and the desire to have a single instruction format. This conflict leads us to the final hardware design principle: Design Principle 3: Good design demands good compromises. The compromise chosen by the RISC-V designers is to keep all instructions the same length, thereby requiring distinct instruction formats for different kinds of instructions. For example, the format above is called R-type (for register). A second type of instruction format is I-type and is used by arithmetic operands with one constant operand, including addi, and by load instructions. The fields of the I-type format are immediate rs1 funct3 rd opcode 12 bits 5 bits 3 bits 5 bits 7 bits The 12-bit immediate is interpreted as a two’s complement value, so it can represent integers from −211 to 211−1. When the I-type format is used for load instructions, the immediate represents a byte offset, so the load word instruction can refer to any word within a region of ±211 or 2048 bytes (±28 or 512 words) of the base address in the base register rd. We see that more than 32 registers would be difficult in this format, as the rd and rs1 fields would each need another bit, making it harder to fit everything in one word. Let’s look at the load register instruction from page 77: lw x9, 32(x22) // Temporary reg x9 gets A Here, 22 (for x22) is placed in the rs1 field, 32 is placed in the immediate field, and 9 (for x9) is placed in the rd field. We also need a format for the store word instruction, sw, which needs two source registers (for the base address and the store data) and an immediate for the address offset. The fields of the S-type format are immediate[11:5] rs2 rs1 funct3 immediate[4:0] opcode 7 bits 5 bits 5 bits 3 bits 5 bits 7 bits The 12-bit immediate in the S-type format is split into two fields, which supply the lower 5 bits and upper 7 bits. The RISC-V architects chose this design because it keeps the rs1 and rs2 fields in the same place in all instruction formats. (Figure 4.14.5 shows how this split simplifies the hardware.) Keeping the instruction formats as 2.5 Representing Instructions in the Computer 91 similar as possible reduces hardware complexity. Similarly, the opcode and funct3 fields are the same size in all locations, and they are always in the same place. In case you were wondering, the formats are distinguished by the values in the opcode field: each format is assigned a distinct set of opcode values in the first field (opcode) so that the hardware knows how to treat the rest of the instruction. Figure 2.5 shows the numbers used in each field for the RISC-V instructions covered so far. FIGURE 2.5 RISC-V instruction encoding. In the table above, “reg” means a register number between 0 and 31 and “address” means a 12-bit address or constant. The funct3 and funct7 fields act as additional opcode fields. Translating RISC-V Assembly Language into Machine Language EXAMPLE We can now take an example all the way from what the programmer writes to what the computer executes. If x10 has the base of the array A and x21 corresponds to h, the assignment statement A = h + A + 1; is compiled into lw x9, 120(x10) // Temporary reg x9 gets A add x9, x21, x9 // Temporary reg x9 gets h+A addi x9, x9, 1 // Temporary reg x9 gets h+A+1 sw x9, 120(x10) // Stores h+A+1 back into A ANSWER What is the RISC-V machine language code for these three instructions? 92 Chapter 2 Instructions: Language of the Computer For convenience, let’s first represent the machine language instructions using decimal numbers. From Figure 2.5, we can determine the three machine language instructions: immediate rs1 funct3 rd opcode 120 10 2 9 3 funct7 rs2 rs1 funct3 rd opcode 0 9 21 0 9 51 immediate rs1 funct3 rd opcode 1 9 0 9 19 immediate[11:5] rs2 rs1 funct3 immediate[4:0] opcode 3 9 10 2 24 35 The lw instruction is identified by 3 (see Figure 2.5) in the opcode field and 2 in the funct3 field. The base register 10 is specified in the rs1 field, and the destination register 9 is specified in the rd field. The offset to select A (120 = 30 × 4) is found in the immediate field. The add instruction that follows is specified with 51 in the opcode field, 0 in the funct3 field, and 0 in the funct7 field. The three register operands (9, 21, and 9) are found in the rd, rs1, and rs2 fields. The subsequent addi instruction is specified with 19 in the opcode field and 0 in the funct3 field. The register operands (9 and 9) are found in the rd and rs1 fields, and the constant addend 1 is found in the immediate field. The sw instruction is identified with 35 in the opcode field and 2 in the funct3 field. The register operands (9 and 10) are found in the rs2 and rs1 fields, respectively. The address offset 120 is split across the two immediate fields. Since the upper part of the immediate holds bits 5 and above, we can decompose the offset 120 by dividing by 25. The upper part of the immediate holds the quotient, 3, and the lower part holds the remainder, 24. Since 120ten = 0000011 11000two, the binary equivalent to the decimal form is: immediate rs1 funct3 rd opcode 000011110000 01010 010 01001 0000011 funct7 rs2 rs1 funct3 rd opcode 0000000 01001 10101 000 01001 0110011 immediate rs1 funct3 rd opcode 000000000001 01001 000 01001 0010011 immediate[11:5] rs2 rs1 funct3 immediate[4:0] opcode 0000011 01001 01010 010 11000 0100011 2.5 Representing Instructions in the Computer 93 Elaboration: RISC-V assembly language programmers aren’t forced to use addi when working with constants. The programmer simply writes add, and the assembler generates the proper opcode and the proper instruction format depending on whether the operands are all registers (R-type) or if one is a constant (I-type); see Section 2.12. We use the explicit names in RISC-V for the different opcodes and formats as we think it is less confusing when introducing assembly language versus machine language. Elaboration: Although RISC-V has both add and sub instructions, it does not have a subi counterpart to addi. This is because the immediate field represents a two’s complement integer, so addi can be used to subtract constants. The desire to keep all instructions the same size conflicts with the desire to have Hardware/ as many registers as possible. Any increase in the number of registers uses up at Software least one more bit in every register field of the instruction format. Given these constraints and the design principle that smaller is faster, most instruction sets Interface today have 16 or 32 general-purpose registers. Figure 2.6 summarizes the portions of RISC-V machine language described in this section. As we shall see in Chapter 4, the similarity of the binary representations of related instructions simplifies hardware design. These similarities are another example of regularity in the RISC-V architecture. FIGURE 2.6 RISC-V architecture revealed through Section 2.5. The three RISC-V instruction formats so far are R, I, and S. The R-type format has two source register operand and one destination register operand. The I-type format replaces one source register operand with a 12-bit immediate field. The S-type format has two source operands and a 12-bit immediate field, but no destination register operand. The S-type immediate field is split into two parts, with bits 11–5 in the leftmost field and bits 4–0 in the second-rightmost field. 94 Chapter 2 Instructions: Language of the Computer The BIG Today’s computers are built on two key principles: 1. Instructions are represented as numbers. Picture 2. Programs are stored in memory to be read or written, just like data. These principles lead to the stored-program concept; its invention let the computing genie out of its bottle. Figure 2.7 shows the power of the concept; specifically, memory can contain the source code for an editor program, the corresponding compiled machine code, the text that the compiled program is using, and even the compiler that generated the machine code. One consequence of instructions as numbers is that programs are often shipped as files of binary numbers. The commercial implication is that computers can inherit ready-made software provided they are compatible with an existing instruction set. Such “binary compatibility” often leads industry to align around a small number of instruction set architectures. Memory Accounting program (machine code) Editor program (machine code) C compiler Processor (machine code) Payroll data Book text Source code in C for editor program FIGURE 2.7 The stored-program concept. Stored programs allow a computer that performs accounting to become, in the blink of an eye, a computer that helps an author write a book. The switch happens simply by loading memory with programs and data and then telling the computer to begin executing at a given location in memory. Treating instructions in the same way as data greatly simplifies both the memory hardware and the software of computer systems. Specifically, the memory technology needed for data can also be used for programs, and programs like compilers, for instance, can translate code written in a notation far more convenient for humans into code that the computer can understand. 2.6 Logical Operations 95 What RISC-V instruction does this represent? Choose from one of the four options below. Check Yourself funct7 rs2 rs1 funct3 rd opcode 32 9 10 000 11 51 1. sub x9, x10, x11 2. add x11, x9, x10 3. sub x11, x10, x9 4. sub x11, x9, x10 If a person is age 40ten, what is their age in hexidecimal? 2.6 Logical Operations “Contrariwise,” Although the first computers operated on full words, it soon became clear that continued Tweedledee, it was useful to operate on fields of bits within a word or even on individual bits. “if it was so, it might Examining characters within a word, each of which is stored as 8 bits, is one example be; and if it were so, it of such an operation (see Section 2.9). It follows that operations were added to would be; but as programming languages and instruction set architectures to simplify, among other it isn’t, it ain’t. things, the packing and unpacking of bits into words. These instructions are called That’s logic.” logical operations. Figure 2.8 shows logical operations in C, Java, and RISC-V. Lewis Carroll, Alice’s Adventures in Wonderland, 1865 FIGURE 2.8 C and Java logical operators and their corresponding RISC-V instructions. One way to implement NOT is to use XOR with one operand being all ones (FFFF FFFF FFFF FFFFhex).

(The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-102-258-pages-5.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue