(The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-102-258.pdf

2 Instructions: Language of the Computer 2.1 Introduction 68 I speak Spanish 2.2 Operations of the Computer Hardware 69 to God, Italian to 2.3 Operands of the Computer Hardware 73 women, French to 2.4 Signed and Unsigned Numbers 80 men, and German 2.5 Representing Instructions in the Computer 87 2.6 Logical Operations 95 to my horse. 2.7 Instructions for Making Decisions 98 Charles V, Holy Roman Emperor 2.8 Supporting Procedures in Computer (1500–1558) Hardware 104 2.9 Communicating with People 114 Computer Organization and Design RISC-V Edition. DOI: http://dx.doi.org/10.1016/B978-0-12-820331-6.00002-8 © 2016 2021 Elsevier Inc. All rights reserved. 2.10 RISC-V Addressing for Wide Immediates and Addresses 120 2.11 Parallelism and Instructions: Synchronization 128 2.12 Translating and Starting a Program 131 2.13 A C Sort Example to Put it All Together 140 2.14 Arrays versus Pointers 148 2.15 Advanced Material: Compiling C and Interpreting Java 151 2.16 Real Stuff: MIPS Instructions 152 2.17 Real Stuff: ARMv7 (32-bit) Instructions 153 2.18 Real Stuff: ARMv8 (64-bit) Instructions 157 2.19 Real Stuff: x86 Instructions 158 2.20 Real Stuff: The Rest of the RISC-V Instruction Set 167 2.21 Going Faster: Matrix Multiply in C 168 2.22 Fallacies and Pitfalls 170 2.23 Concluding Remarks 172 2.24 Historical Perspective and Further Reading 174 2.25 Self-Study 175 2.26 Exercises 178 The Five Classic Components of a Computer 68 Chapter 2 Instructions: Language of the Computer 2.1 Introduction To command a computer’s hardware, you must speak its language. The words of a computer’s language are called instructions, and its vocabulary is called an instruction set The instruction set. In this chapter, you will see the instruction set of a real computer, vocabulary of commands both in the form written by people and in the form read by the computer. We understood by a given introduce instructions in a top-down fashion. Starting from a notation that looks like architecture. a restricted programming language, we refine it step-by-step until you see the actual language of a real computer. Chapter 3 continues our downward descent, unveiling the hardware for arithmetic and the representation of floating-point numbers. You might think that the languages of computers would be as diverse as those of people, but in reality, computer languages are quite similar, more like regional dialects than independent languages. Hence, once you learn one, it is easy to pick up others. The chosen instruction set is RISC-V, which was originally developed at UC Berkeley starting in 2010. To demonstrate how easy it is to pick up other instruction sets, we will also take a quick look at two other popular instruction sets. 1. MIPS is an elegant example of the instruction sets designed since the 1980s. In several respects, RISC-V follows a similar design. 2. The Intel x86 originated in the 1970s, but still today powers both the PC and the Cloud of the post-PC era. This similarity of instruction sets occurs because all computers are constructed from hardware technologies based on similar underlying principles and because there are a few basic operations that all computers must provide. Moreover, computer designers have a common goal: to find a language that makes it easy to build the hardware and the compiler while maximizing performance and minimizing cost and energy. This goal is time-honored; the following quote was written before you could buy a computer, and it is as true today as it was in 1946: It is easy to see by formal-logical methods that there exist certain [instruction sets] that are in abstract adequate to control and cause the execution of any sequence of operations.… The really decisive considerations from the present point of view, in selecting an [instruction set], are more of a practical nature: simplicity of the equipment demanded by the [instruction set], and the clarity of its application to the actually important problems together with the speed of its handling of those problems. Burks, Goldstine, and von Neumann, 1946 The “simplicity of the equipment” is as valuable a consideration for today’s computers as it was for those of the 1940s. The goal of this chapter is to teach an instruction set that follows this advice, showing both how it is represented in hardware and the relationship between high-level programming languages and this 2.2 Operations of the Computer Hardware 69 more primitive one. Our examples are in the C programming language; Section 2.15 shows how these would change for an object-oriented language like Java. By learning how to represent instructions, you will also discover the secret of computing: the stored-program concept. Moreover, you will exercise your “foreign stored-program language” skills by writing programs in the language of the computer and running them concept The idea that on the simulator that comes with this book. You will also see the impact of programming instructions and data of many types can be stored languages and compiler optimization on performance. We conclude with a look at the in memory as numbers historical evolution of instruction sets and an overview of other computer dialects. and thus be easy to We reveal our first instruction set a piece at a time, giving the rationale along with change, leading to the the computer structures. This top-down, step-by-step tutorial weaves the components stored-program computer. with their explanations, making the computer’s language more palatable. Figure 2.1 gives a sneak preview of the instruction set covered in this chapter. Elaboration: RISC-V is an open architecture that is controlled by RISC-V International, not a proprietary architecture that is owned by a company like ARM, MIPS, or x86. In 2020, more than 200 companies are members of RISC-V International, and its popularity is growing rapidly. There must certainly be instructions 2.2 Operations of the Computer Hardware for performing the fundamental Every computer must be able to perform arithmetic. The RISC-V assembly arithmetic operations. language notation Burks, Goldstine, and von Neumann, 1946 add a, b, c instructs a computer to add the two variables b and c and to put their sum in a. This notation is rigid in that each RISC-V arithmetic instruction performs only one operation and must always have exactly three variables. For example, suppose we want to place the sum of four variables b, c, d, and e into variable a. (In this section, we are being deliberately vague about what a “variable” is; in the next section, we’ll explain in detail.) The following sequence of instructions adds the four variables: add a, b, c // The sum of b and c is placed in a add a, a, d // The sum of b, c, and d is now in a add a, a, e // The sum of b, c, d, and e is now in a Thus, it takes three instructions to sum the four variables. The words to the right of the double slashes (//) on each line above are comments for the human reader, so the computer ignores them. Note that unlike other programming languages, each line of this language can contain at most one instruction. Another difference from C is that comments always terminate at the end of a line. 70 Chapter 2 Instructions: Language of the Computer FIGURE 2.1 RISC-V assembly language revealed in this chapter. This information is also found in Column 1 of the RISC-V Reference Data Card at the front of this book. 2.2 Operations of the Computer Hardware 71 FIGURE 2.1 (Continued). The natural number of operands for an operation like addition is three: the two numbers being added together and a place to put the sum. Requiring every instruction to have exactly three operands, no more and no less, conforms to the philosophy of keeping the hardware simple: hardware for a variable number of operands is more complicated than hardware for a fixed number. This situation illustrates the first of three underlying principles of hardware design: Design Principle 1: Simplicity favors regularity. We can now show, in the two examples that follow, the relationship of programs written in higher-level programming languages to programs in this more primitive notation. Compiling Two C Assignment Statements into RISC-V EXAMPLE This segment of a C program contains the five variables a, b, c, d, and e. Since Java evolved from C, this example and the next few work for either high-level programming language: a = b + c; d = a − e; The compiler translates from C to RISC-V assembly language instructions. Show the RISC-V code produced by a compiler. ANSWER A RISC-V instruction operates on two source operands and places the result in one destination operand. Hence, the two simple statements above compile directly into these two RISC-V assembly language instructions: add a, b, c sub d, a, e 72 Chapter 2 Instructions: Language of the Computer Compiling a Complex C Assignment into RISC-V EXAMPLE A somewhat complicated statement contains the five variables f, g, h, i, and j: f = (g + h) − (i + j); What might a C compiler produce? The compiler must break this statement into several assembly instructions, ANSWER since only one operation is performed per RISC-V instruction. The first RISC-V instruction calculates the sum of g and h. We must place the result somewhere, so the compiler creates a temporary variable, called t0: add t0, g, h // temporary variable t0 contains g + h Although the next operation is subtract, we need to calculate the sum of i and j before we can subtract. Thus, the second instruction places the sum of i and j in another temporary variable created by the compiler, called t1: add t1, i, j // temporary variable t1 contains i + j Finally, the subtract instruction subtracts the second sum from the first and places the difference in the variable f, completing the compiled code: sub f, t0, t1 // f gets t0 − t1, which is (g + h) − (i + j) Check For a given function, which programming language likely takes the most lines of Yourself code? Put the three representations below in order. 1. Java 2. C 3. RISC-V assembly language Elaboration: To increase portability, Java was originally envisioned as relying on a software interpreter. The instruction set of this interpreter is called Java bytecodes (see Section 2.15), which is quite different from the RISC-V instruction set. To get performance close to the equivalent C program, Java systems today typically compile Java bytecodes into the native instruction sets like RISC-V. Because this compilation is normally done much later than for C programs, such Java compilers are often called Just In Time (JIT) compilers. Section 2.12 shows how JITs are used later than C compilers in the start-up process, and Section 2.13 shows the performance consequences of compiling versus interpreting Java programs. 2.3 Operands of the Computer Hardware 73 2.3 Operands of the Computer Hardware Unlike programs in high-level languages, the operands of arithmetic instructions are restricted; they must be from a limited number of special locations built directly in hardware called registers. Registers are primitives used in hardware design that are also visible to the programmer when the computer is completed, so you can think of registers as the bricks of computer construction. The size of a register in the RISC-V architecture is 32 bits; groups of 32 bits occur so frequently that they are given the name word in the RISC-V architecture. (Another popular size is a word A natural unit group of 64 bits, called a doubleword in the RISC-V architecture.) of access in a computer, One major difference between the variables of a programming language and registers usually a group of 32 bits; corresponds to the size of is the limited number of registers, typically 32 on current computers, like RISC-V. (See a register in the RISC-V Section 2.25 for the history of the number of registers.) Thus, continuing in our architecture. top-down, stepwise evolution of the symbolic representation of the RISC-V language, in this section we have added the restriction that the three operands of RISC-V doubleword Another arithmetic instructions must each be chosen from one of the 32-bit registers. natural unit of access in a The reason for the limit of 32 registers may be found in the second of our three computer, usually a group of 64 bits. underlying design principles of hardware technology: Design Principle 2: Smaller is faster. A very large number of registers may increase the clock cycle time simply because it takes electronic signals longer when they must travel farther. Guidelines such as “smaller is faster” are not absolutes; 31 registers may not be faster than 32. Even so, the truth behind such observations causes computer designers to take them seriously. In this case, the designer must balance the craving of programs for more registers with the designer’s desire to keep the clock cycle fast. Another reason for not using more than 32 is the number of bits it would take in the instruction format, as Section 2.5 demonstrates. Chapter 4 shows the central role that registers play in hardware construction; as we shall see in that chapter, effective use of registers is critical to program performance. Although we could simply write instructions using numbers for registers, from 0 to 31, the RISC-V convention is x followed by the number of the register, except for a few register names that we will cover later. Compiling a C Assignment Using Registers EXAMPLE It is the compiler’s job to associate program variables with registers. Take, for instance, the assignment statement from our earlier example: f = (g + h) − (i + j); 74 Chapter 2 Instructions: Language of the Computer The variables f, g, h, i, and j are assigned to the registers x19, x20, x21, x22, and x23, respectively. What is the compiled RISC-V code? The compiled program is very similar to the prior example, except we replace ANSWER the variables with the register names mentioned above plus two temporary registers, x5 and x6, which correspond to the temporary variables above: add x5, x20, x21 // register x5 contains g + h add x6, x22, x23 // register x6 contains i + j sub x19, x5, x6 f gets x5 – x6, which is (g + h) − (i + j) // Memory Operands Programming languages have simple variables that contain single data elements, as in these examples, but they also have more complex data structures—arrays and structures. These composite data structures can contain many more data elements than there are registers in a computer. How can a computer represent and access such large structures? Recall the five components of a computer introduced in Chapter 1 and repeated on page 67. The processor can keep only a small amount of data in registers, but computer memory contains billions of data elements. Hence, data structures data transfer instruction A command (arrays and structures) are kept in memory. that moves data between As explained above, arithmetic operations occur only on registers in RISC-V memory and registers. instructions; thus, RISC-V must include instructions that transfer data between memory and registers. Such instructions are called data transfer instructions. address A value used to To access a word in memory, the instruction must supply the memory address. delineate the location of Memory is just a large, single-dimensional array, with the address acting as the a specific data element index to that array, starting at 0. For example, in Figure 2.2, the address of the third within a memory array. data element is 2, and the value of memory is 10. 3 100 2 10 1 101 0 1 Address Data Processor Memory FIGURE 2.2 Memory addresses and contents of memory at those locations. If these elements were words, these addresses would be incorrect, since RISC-V actually uses byte addressing, with each word representing 4 bytes. Figure 2.3 shows the correct memory addressing for sequential word addresses. 2.3 Operands of the Computer Hardware 75 The data transfer instruction that copies data from memory to a register is traditionally called load. The format of the load instruction is the name of the operation followed by the register to be loaded, then register and a constant used to access memory. The sum of the constant portion of the instruction and the contents of the second register forms the memory address. The real RISC-V name for this instruction is lw, standing for load word. Compiling an Assignment When an Operand Is in Memory Let’s assume that A is an array of 100 words and that the compiler has associated the variables g and h with the registers x20 and x21 as before. Let’s also assume EXAMPLE that the starting address, or base address, of the array is in x22. Compile this C assignment statement: g = h + A; Although there is a single operation in this assignment statement, one of the ANSWER operands is in memory, so we must first transfer A to a register. The address of this array element is the sum of the base of the array A, found in register x22, plus the number to select element 8. The data should be placed in a temporary register for use in the next instruction. Based on Figure 2.2, the first compiled instruction is lw x9, 8(x22) // Temporary reg x9 gets A (We’ll be making a slight adjustment to this instruction, but we’ll use this simplified version for now.) The following instruction can operate on the value in x9 (which equals A) since it is in a register. The instruction must add h (contained in x21) to A (contained in x9) and put the sum in the register corresponding to g (associated with x20): add x20, x21, x9 // g = h + A The register added to form the address (x22) is called the base register, and the constant in a data transfer instruction (8) is called the offset. In addition to associating variables with registers, the compiler allocates data Hardware/ structures like arrays and structures to locations in memory. The compiler can then Software place the proper starting address into the data transfer instructions. Since 8-bit bytes are useful in many programs, virtually all architectures today Interface address individual bytes. Therefore, the address of a word matches the address of one of the 4 bytes within the word, and addresses of sequential words differ by 76 Chapter 2 Instructions: Language of the Computer 4. For example, Figure 2.3 shows the actual RISC-V addresses for the words in Figure 2.2; the byte address of the third word is 8. Computers divide into those that use the address of the leftmost or “big end” byte as the word address versus those that use the rightmost or “little end” byte. RISC-V belongs to the latter camp, referred to as little-endian. Since the order matters only if you access the identical data both as a word and as four individual bytes, few need to be aware of the “endianness.” Byte addressing also affects the array index. To get the proper byte address in the code above, the offset to be added to the base register x22 must be 8 × 4, or 32, so that the load address will select A and not A[8/4]. (See the related Pitfall on page 172 of Section 2.22.) The instruction complementary to load is traditionally called store; it copies data from a register to memory. The format of a store is similar to that of a load: the name of the operation, followed by the register to be stored, then the base register, and finally the offset to select the array element. Once again, the RISC-V address is specified in part by a constant and in part by the contents of a register. The actual RISC-V name is sw, standing for store word. 12 100 8 10 4 101 0 1 Byte Address Data Processor Memory FIGURE 2.3 Actual RISC-V memory addresses and contents of memory for those words. The changed addresses are highlighted to contrast with Figure 2.2. Since RISC-V addresses each byte, word addresses are multiples of 4: there are 4 bytes in a doubleword. alignment restriction Elaboration: In many architectures, words must start at addresses that are multiples of A requirement that data 4. This requirement is called an alignment restriction. (Chapter 4 suggests why alignment be aligned in memory on leads to faster data transfers.) RISC-V and Intel x86 do not have alignment restrictions, but natural boundaries. MIPS does. Hardware/ As the addresses in loads and stores are binary numbers, we can see why the DRAM for main memory comes in binary sizes rather than in decimal sizes. That is, in gibibytes Software (230) or tebibytes (240), not in gigabytes (109) or terabytes (1012); see Figure 1.1. Interface 2.3 Operands of the Computer Hardware 77 Compiling Using Load and Store Assume variable h is associated with register x21 and the base address of the array A is in x22. What is the RISC-V assembly code for the C assignment EXAMPLE statement below? A = h + A; Although there is a single operation in the C statement, now two of the operands are in memory, so we need even more RISC-V instructions. The first ANSWER two instructions are the same as in the prior example, except this time we use the proper offset of 32 for byte addressing in the load register instruction to select A, and the add instruction places the sum in x9: lw x9, 32(x22) // Temporary reg x9 gets A add x9, x21, x9 // Temporary reg x9 gets h + A The final instruction stores the sum into A, using 48 (4 × 12) as the offset and register x22 as the base register. sw x9, 48(x22) // Stores h + A back into A Load word and store word are the instructions that copy words between memory and registers in the RISC-V architecture. Some brands of computers use other instructions along with load and store to transfer data. An architecture with such alternatives is the Intel x86, described in Section 2.17. Many programs have more variables than computers have registers. Consequently, Hardware/ the compiler tries to keep the most frequently used variables in registers and places Software the rest in memory, using loads and stores to move variables between registers and memory. The process of putting less frequently used variables (or those needed Interface later) into memory is called spilling registers. The hardware principle relating size and speed suggests that memory must be slower than registers, since there are fewer registers. This suggestion is indeed the case; data accesses are faster if data are in registers instead of memory. Moreover, data are more useful when in a register. A RISC-V arithmetic instruction can read two registers, operate on them, and write the result. A RISC-V data transfer instruction only reads one operand or writes one operand, without operating on it. 78 Chapter 2 Instructions: Language of the Computer Thus, registers take less time to access and have higher throughput than memory, making data in registers both considerably faster to access and simpler to use. Accessing registers also uses much less energy than accessing memory. To achieve the highest performance and conserve energy, an instruction set architecture must have enough registers, and compilers must use registers efficiently. Elaboration: Let’s put the energy and performance of registers versus memory into perspective. Assuming 32-bit data, registers are roughly 200 times faster (0.25 vs. 50 nanoseconds) and are 10,000 times more energy efficient (0.1 vs. 1000 picoJoules) than DRAM in 2020. These large differences led to caches, which reduce the performance and energy penalties of going to memory (see Chapter 5). Constant or Immediate Operands Many times a program will use a constant in an operation—for example, incrementing an index to point to the next element of an array. In fact, more than half of the RISC-V arithmetic instructions have a constant as an operand when running the SPEC CPU2006 benchmarks. Using only the instructions we have seen so far, we would have to load a constant from memory to use one. (The constants would have been placed in memory when the program was loaded.) For example, to add the constant 4 to register x22, we could use the code lw x9, AddrConstant4(x3) // x9 = constant 4 add x22, x22, x9 // x22 = x22 + x9 (where x9 == 4) assuming that x3 + AddrConstant4 is the memory address of the constant 4. An alternative that avoids the load instruction is to offer versions of the arithmetic instructions in which one operand is a constant. This quick add instruction with one constant operand is called add immediate or addi. To add 4 to register x22, we just write addi x22, x22, 4 // x22 = x22 + 4 Constant operands occur frequently; indeed, addi is the most popular instruction in most RISC-V programs. By including constants inside arithmetic instructions, operations are much faster and use less energy than if constants were loaded from memory. The constant zero has another role, which is to simplify the instruction set by offering useful variations. For example, you can negate the value in a register by using the sub instruction with zero for the first operand. Hence, RISC-V dedicates register x0 to be hard-wired to the value zero. Using frequency to justify the inclusions of constants is another example of the great idea from Chapter 1 of making the common case fast. 2.3 Operands of the Computer Hardware 79 Given the importance of registers, what is the rate of increase in the number of Check registers in a chip over time? Yourself 1. Very fast: They increased as fast as Moore’s Law, which predicted doubling the number of transistors on a chip every 24 months. 2. Very slow: Since programs are usually distributed in the language of the computer, there is inertia in instruction set architecture, and so the number of registers increases only as fast as new instruction sets become viable. Elaboration: Although the RISC-V registers in this book are 32 bits wide, the RISC-V architects conceived multiple variants of the ISA. In addition to this variant, known as RV32, a variant named RV64 has 64-bit registers, whose larger addresses make RV64 better suited to processors for servers and smart phones. Elaboration: The RISC-V offset plus base register addressing is an excellent match to structures as well as arrays, since the register can point to the beginning of the structure and the offset can select the desired element. We’ll see such an example in Section 2.13. Elaboration: The register in the data transfer instructions was originally invented to hold an index of an array with the offset used for the starting address of an array. Thus, the base register is also called the index register. Today’s memories are much larger, and the software model of data allocation is more sophisticated, so the base address of the array is normally passed in a register since it won’t fit in the offset, as we shall see. Elaboration: The migration from 32-bit address computers to 64-bit address computers left compiler writers a choice of the size of data types in C. Clearly, pointers should be 64 bits, but what about integers? Moreover, C has the data types int, long int, and long long int. The problems come from converting from one data type to another and having an unexpected overflow in C code that is not fully standard compliant, which unfortunately is not rare code. The table below shows the two popular options: Operating System pointers int long int long long int Microsoft Windows 64 bits 32 bits 32 bits 64 bits Linux, Most Unix 64 bits 32 bits 64 bits 64 bits 80 Chapter 2 Instructions: Language of the Computer 2.4 Signed and Unsigned Numbers First, let’s quickly review how a computer represents numbers. Because people have 10 fingers, we are taught to think in base 10, but numbers may be represented in any base. For example, 123 base 10 = 1111011 base 2. Numbers are kept in computer hardware as a series of high and low electronic signals, and so they are considered base 2 numbers. (Just as base 10 numbers are called decimal numbers, base 2 numbers are called binary numbers.) A single digit of a binary number is thus the “atom” of computing, since all binary digit Also information is composed of binary digits or bits. This fundamental building block called bit. One of can be one of two values, which can be thought of as several alternatives: high or the two numbers in low, on or off, true or false, or 1 or 0. base 2, 0 or 1, that are Generalizing the point, in any number base, the value of ith digit d is the components of information. d Basei where i starts at 0 and increases from right to left. This representation leads to an obvious way to number the bits in the doubleword: simply use the power of the base for that bit. We subscript decimal numbers with ten and binary numbers with two. For example, 1011two represents (1 × 23) + (0 × 22) + (1 × 21) + (1 × 20)ten = (1 × 8) + (0 × 4) + (1 × 2) + (1 × 1)ten = 8 + 0 + 2 + 1ten = 11ten We number the bits 0, 1, 2, 3, … from right to left in a doubleword. The drawing below shows the numbering of bits within a RISC-V word and the placement of the number 1011two: least significant bit The Since words are drawn vertically as well as horizontally, leftmost and rightmost rightmost bit in an may be unclear. Hence, the phrase least significant bit is used to refer to the RISC-V word. rightmost bit (bit 0 above) and most significant bit to the leftmost bit (bit 31). The RISC-V word is 32 bits long, so it can represent 232 different 32-bit patterns. most significant bit The leftmost bit in an RISC-V It is natural to let these combinations represent the numbers from 0 to 232 −1 word. (4,294,967,295ten): 2.4 Signed and Unsigned Numbers 81 00000000 00000000 00000000 00000000two = 0ten 00000000 00000000 00000000 00000001two = 1ten 00000000 00000000 00000000 00000010two = 2ten... ... 11111111 11111111 11111111 11111101two = 4,294,967,293ten 11111111 11111111 11111111 11111110two = 4,294,967,294ten 11111111 11111111 11111111 11111111two = 4,294,967,295ten That is, 32-bit binary numbers can be represented in terms of the bit value times a power of 2 (here xi means the ith bit of x): For reasons we will shortly see, these positive numbers are called unsigned numbers. Hardware/ Base 2 is not natural to human beings; we have 10 fingers and so find base 10 natural. Why didn’t computers use decimal? In fact, the first commercial Software computer did offer decimal arithmetic. The problem was that the computer still Interface used on and off signals, so a decimal digit was simply represented by several binary digits. Decimal proved so inefficient that subsequent computers reverted to all binary, converting to base 10 only for the relatively infrequent input/output events. Keep in mind that the binary bit patterns above are simply representatives of numbers. Numbers really have an infinite number of digits, with almost all being 0 except for a few of the rightmost digits. We just don’t normally show leading 0s. Hardware can be designed to add, subtract, multiply, and divide these binary bit patterns, as we’ll see in Chapter 3. If the number that is the proper result of such operations cannot be represented by these rightmost hardware bits, overflow is overflow when the said to have occurred. It’s up to the programming language, the operating system, results of an operation are and the program to determine what to do if overflow occurs. larger than be represented Computer programs calculate both positive and negative numbers, so we need a in a register representation that distinguishes the positive from the negative. The most obvious solution is to add a separate sign, which conveniently can be represented in a single bit; the name for this representation is sign and magnitude. Alas, sign and magnitude representation has several shortcomings. First, it’s not obvious where to put the sign bit. To the right? To the left? Early computers tried both. Second, adders for sign and magnitude may need an extra step to set the sign 82 Chapter 2 Instructions: Language of the Computer because we can’t know in advance what the proper sign of the sum will be. Finally, a separate sign bit means that sign and magnitude has both a positive and a negative zero, which can lead to problems for inattentive programmers. Because of these shortcomings, sign and magnitude representation was soon abandoned. In the search for a more attractive alternative, the question arose as to what would be the result for unsigned numbers if we tried to subtract a large number from a small one. The answer is that it would try to borrow from a string of leading 0s, so the result would have a string of leading 1s. Given that there was no obvious better alternative, the final solution was to pick the representation that made the hardware simple: leading 0s mean positive, and leading 1s mean negative. This convention for representing signed binary numbers is called two’s complement representation (an Elaboration on page 81 explains this unusual name): 00000000 00000000 00000000 00000000two = 0ten 00000000 00000000 00000000 00000001two = 1ten 00000000 00000000 00000000 00000010two = 2ten... ... 01111111 11111111 11111111 11111101two = 2,147,483,645ten 01111111 11111111 11111111 11111110two = 2,147,483,646ten 01111111 11111111 11111111 11111111two = 2,147,483,647ten 10000000 00000000 00000000 00000000two = − 2,147,483,648ten 10000000 00000000 00000000 00000001two = − 2,147,483,647ten 10000000 00000000 00000000 00000010two = − 2,147,483,646ten … ... 11111111 11111111 11111111 11111101two = − 3ten 11111111 11111111 11111111 11111110two = − 2ten 11111111 11111111 11111111 11111111two = − 1ten The positive half of the numbers, from 0 to 2,147,483,647ten (231−1), use the same representation as before. The following bit pattern (1000 … 0000two) represents the most negative number -2,147,483,648ten (−231). It is followed by a declining set of negative numbers: -2,147,483,647ten (1000 … 0001two) down to −1ten (1111 … 1111two). Two’s complement does have one negative number that has no corresponding positive number: -2,147,483,648ten. Such imbalance was also a worry to the inattentive programmer, but sign and magnitude had problems for both the programmer and the hardware designer. Consequently, every computer today uses two’s complement binary representations for signed numbers. 2.4 Signed and Unsigned Numbers 83 Two’s complement representation has the advantage that all negative numbers have a 1 in the most significant bit. Thus, hardware needs to test only this bit to see if a number is positive or negative (with the number 0 is considered positive). This bit is often called the sign bit. By recognizing the role of the sign bit, we can represent positive and negative 32-bit numbers in terms of the bit value times a power of 2: The sign bit is multiplied by −231, and the rest of the bits are then multiplied by positive versions of their respective base values. Binary to Decimal Conversion EXAMPLE What is the decimal value of this 32-bit two’s complement number? 11111111 11111111 11111111 11111100two Substituting the number’s bit values into the formula above: ANSWER We’ll see a shortcut to simplify conversion from negative to positive soon. Just as an operation on unsigned numbers can overflow the capacity of hardware to represent the result, so can an operation on two’s complement numbers. Overflow occurs when the leftmost retained bit of the binary bit pattern is not the same as the infinite number of digits to the left (the sign bit is incorrect): a 0 on the left of the bit pattern when the number is negative or a 1 when the number is positive. 84 Chapter 2 Instructions: Language of the Computer Hardware/ Signed versus unsigned applies to loads as well as to arithmetic. The function of a Software signed load is to copy the sign repeatedly to fill the rest of the register—called sign Interface extension—but its purpose is to place a correct representation of the number within that register. Unsigned loads simply fill with 0s to the left of the data, since the number represented by the bit pattern is unsigned. When loading a 32-bit word into a 32-bit register, the point is moot; signed and unsigned loads are identical. RISC-V does offer two flavors of byte loads: load byte unsigned (lbu) treats the byte as an unsigned number and thus zero-extends to fill the leftmost bits of the register, while load byte (lb) works with signed integers. Since C programs almost always use bytes to represent characters rather than consider bytes as very short signed integers, lbu is used practically exclusively for byte loads. Hardware/ Unlike the signed numbers discussed above, memory addresses naturally start Software at 0 and continue to the largest address. Put another way, negative addresses make no sense. Thus, programs want to deal sometimes with numbers that can Interface be positive or negative and sometimes with numbers that can be only positive. Some programming languages reflect this distinction. C, for example, names the former integers (declared as int in the program) and the latter unsigned integers (unsigned int). Some C style guides even recommend declaring the former as signed int to keep the distinction clear. Let’s examine two useful shortcuts when working with two’s complement numbers. The first shortcut is a quick way to negate a two’s complement binary number. Simply invert every 0 to 1 and every 1 to 0, then add one to the result. This shortcut is based on the observation that the sum of a number and its inverted representation must be 111 … 111two, which represents −1. Since x x 1 , therefore x x 1 0 or x 1 x. (We use the notation x to mean invert every bit in x from 0 to 1 and vice versa.) Negation Shortcut EXAMPLE Negate 2ten, and then check the result by negating −2ten. 2ten = 00000000 00000000 00000000 00000010two ANSWER 2.4 Signed and Unsigned Numbers 85 Negating this number by inverting the bits and adding one, Going the other direction, 11111111 11111111 11111111 11111110two is first inverted and then incremented: Our next shortcut tells us how to convert a binary number represented in n bits to a number represented with more than n bits. The shortcut is to take the most significant bit from the smaller quantity—the sign bit—and replicate it to fill the new bits of the larger quantity. The old nonsign bits are simply copied into the right portion of the new doubleword. This shortcut is called sign extension. Sign Extension Shortcut EXAMPLE Convert 16-bit binary versions of 2ten and −2ten to 32-bit binary numbers. The 16-bit binary version of the number 2 is ANSWER 00000000 00000010two = 2ten It is converted to a 32-bit number by making 16 copies of the value in the most significant bit (0) and placing that in the left of the word. The right part gets the old value: 00000000 00000000 00000000 00000010two = 2tenM 86 Chapter 2 Instructions: Language of the Computer Let’s negate the 16-bit version of 2 using the earlier shortcut. Thus, 0000 0000 0000 0010two becomes 1111 1111 1111 1101two + 1two = 1111 1111 1111 1110two Creating a 32-bit version of the negative number means copying the sign bit 16 times and placing it on the left: 11111111 11111111 11111111 11111110two = −2ten This trick works because positive two’s complement numbers really have an infinite number of 0s on the left and negative two’s complement numbers have an infinite number of 1s. The binary bit pattern representing a number hides leading bits to fit the width of the hardware; sign extension simply restores some of them. Summary The main point of this section is that we need to represent both positive and negative integers within a computer, and although there are pros and cons to any option, the unanimous choice since 1965 has been two’s complement. Elaboration: For signed decimal numbers, we used “−” to represent negative because there are no limits to the size of a decimal number. Given a fixed data size, binary and hexadecimal (see Figure 2.4) bit strings can encode the sign; therefore, we do not normally use “+” or “−” with binary or hexadecimal notation. Check What is the decimal value of this 64-bit two’s complement number? Yourself 11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111000two 1) −4ten 2) −8ten 3) −16ten 4) 18,446,744,073,709,551,608ten What is the decimal value if it is instead a 64-bit unsigned number? 2.5 Representing Instructions in the Computer 87 Elaboration: Two’s complement gets its name from the rule that the unsigned sum of an n-bit number and its n-bit negative is 2n; hence, the negation or complement of a number x is 2n − x, or its “two’s complement.” one’s complement A A third alternative representation to two’s complement and sign and magnitude is called notation that represents one’s complement. The negative of a one’s complement is found by inverting each bit, from the most negative value by 10 … 000two and the 0 to 1 and from 1 to 0, or x. This relation helps explain its name since the complement of most positive value by x is 2n − x − 1. It was also an attempt to be a better solution than sign and magnitude, and 01 … 11two, leaving an several early scientific computers did use the notation. This representation is similar to equal number of negatives two’s complement except that it also has two 0s: 00 … 00two is positive 0 and 11 … 11two and positives but ending is negative 0. The most negative number, 10 … 000two, represents −2,147,483,647ten, up with two zeros, one and so the positives and negatives are balanced. One’s complement adders did positive (00 … 00two) and need an extra step to subtract a number, and hence two’s complement dominates today. one negative (11 … 11two). A final notation, which we will look at when we discuss floating point in Chapter 3, The term is also used to is to represent the most negative value by 00 … 000two and the most positive value by mean the inversion of 11 … 11two, with 0 typically having the value 10 … 00two. This representation is called a every bit in a pattern: 0 to biased notation, since it biases the number such that the number plus the bias has a 1 and 1 to 0. non-negative representation. biased notation A notation that represents the most negative value by 00 … 000two and the 2.5 Representing Instructions in the Computer most positive value by 11 … 11two, with 0 typically having the We are now ready to explain the difference between the way humans instruct value 10 … 00two, thereby computers and the way computers see instructions. biasing the number such Instructions are kept in the computer as a series of high and low electronic signals and that the number plus the bias has a non-negative may be represented as numbers. In fact, each piece of an instruction can be considered representation. as an individual number, and placing these numbers side by side forms the instruction. The 32 registers of RISC-V are just referred to by their number, from 0 to 31. EXAMPLE Translating a RISC-V Assembly Instruction into a Machine Instruction Let’s do the next step in the refinement of the RISC-V language as an example. We’ll show the real RISC-V language version of the instruction represented symbolically as add x9, x20, x21 first as a combination of decimal numbers and then of binary numbers. ANSWER The decimal representation is 0 21 20 0 9 51 88 Chapter 2 Instructions: Language of the Computer Each of these segments of an instruction is called a field. The first, fourth, and sixth fields (containing 0, 0, and 51 in this case) collectively tell the RISC-V computer that this instruction performs addition. The second field gives the number of the register that is the second source operand of the addition operation (21 for x21), and the third field gives the other source operand for the addition (20 for x20). The fifth field contains the number of the register that is to receive the sum (9 for x9). Thus, this instruction adds register x20 to register x21 and places the sum in register x9. This instruction can also be represented as fields of binary numbers instead of decimal: 0000000 10101 10100 000 01001 0110011 7 bits 5 bits 5 bits 3 bits 5 bits 7 bits instruction format A This layout of the instruction is called the instruction format. As you can see form of representation of from counting the number of bits, this RISC-V instruction takes exactly 32 bits—a an instruction composed word. In keeping with our design principle that simplicity favors regularity, RISC-V of fields of binary numbers. instructions are all 32 bits long. To distinguish it from assembly language, we call the numeric version of machine instructions machine language and a sequence of such instructions machine code. language Binary It would appear that you would now be reading and writing long, tiresome representation used for strings of binary numbers. We avoid that tedium by using a higher base than communication within a binary that converts easily into binary. Since almost all computer data sizes are computer system. multiples of 4, hexadecimal (base 16) numbers are popular. As base 16 is a power hexadecimal Numbers of 2, we can trivially convert by replacing each group of four binary digits by a in base 16. single hexadecimal digit, and vice versa. Figure 2.4 converts between hexadecimal and binary. FIGURE 2.4 The hexadecimal–binary conversion table. Just replace one hexadecimal digit by the corresponding four binary digits, and vice versa. If the length of the binary number is not a multiple of 4, go from right to left. Because we frequently deal with different number bases, to avoid confusion, we will subscript decimal numbers with ten, binary numbers with two, and hexadecimal numbers with hex. (If there is no subscript, the default is base 10.) By the way, C and Java use the notation 0xnnnn for hexadecimal numbers. 2.5 Representing Instructions in the Computer 89 Binary to Hexadecimal and Back EXAMPLE Convert the following 8-digit hexadecimal and 32-bit binary numbers into the other base: eca8 6420hex 0001 0011 0101 0111 1001 1011 1101 1111two Using Figure 2.4, the answer is just a table lookup one way: ANSWER eca8 6420hex 1110 1100 1010 1000 0110 0100 0010 0000two And then the other direction: 0001 0011 0101 0111 1001 1011 1101 1111two 1357 9bdfhex RISC-V Fields RISC-V fields are given names to make them easier to discuss: funct7 rs2 rs1 funct3 rd opcode 7 bits 5 bits 5 bits 3 bits 5 bits 7 bits Here is the meaning of each name of the fields in RISC-V instructions: n opcode: Basic operation of the instruction, and this abbreviation is its opcode The field that traditional name. denotes the operation and n rd: The register destination operand. It gets the result of the operation. format of an instruction. n funct3: An additional opcode field. n rs1: The first register source operand. n rs2: The second register source operand. n funct7: An additional opcode field. 90 Chapter 2 Instructions: Language of the Computer A problem occurs when an instruction needs longer fields than those shown above. For example, the load register instruction must specify two registers and a constant. If the address were to use one of the 5-bit fields in the format above, the largest constant within the load register instruction would be limited to only 25−1 or 31. This constant is used to select elements from arrays or data structures, and it often needs to be much larger than 31. This 5-bit field is too small to be useful. Hence, we have a conflict between the desire to keep all instructions the same length and the desire to have a single instruction format. This conflict leads us to the final hardware design principle: Design Principle 3: Good design demands good compromises. The compromise chosen by the RISC-V designers is to keep all instructions the same length, thereby requiring distinct instruction formats for different kinds of instructions. For example, the format above is called R-type (for register). A second type of instruction format is I-type and is used by arithmetic operands with one constant operand, including addi, and by load instructions. The fields of the I-type format are immediate rs1 funct3 rd opcode 12 bits 5 bits 3 bits 5 bits 7 bits The 12-bit immediate is interpreted as a two’s complement value, so it can represent integers from −211 to 211−1. When the I-type format is used for load instructions, the immediate represents a byte offset, so the load word instruction can refer to any word within a region of ±211 or 2048 bytes (±28 or 512 words) of the base address in the base register rd. We see that more than 32 registers would be difficult in this format, as the rd and rs1 fields would each need another bit, making it harder to fit everything in one word. Let’s look at the load register instruction from page 77: lw x9, 32(x22) // Temporary reg x9 gets A Here, 22 (for x22) is placed in the rs1 field, 32 is placed in the immediate field, and 9 (for x9) is placed in the rd field. We also need a format for the store word instruction, sw, which needs two source registers (for the base address and the store data) and an immediate for the address offset. The fields of the S-type format are immediate[11:5] rs2 rs1 funct3 immediate[4:0] opcode 7 bits 5 bits 5 bits 3 bits 5 bits 7 bits The 12-bit immediate in the S-type format is split into two fields, which supply the lower 5 bits and upper 7 bits. The RISC-V architects chose this design because it keeps the rs1 and rs2 fields in the same place in all instruction formats. (Figure 4.14.5 shows how this split simplifies the hardware.) Keeping the instruction formats as 2.5 Representing Instructions in the Computer 91 similar as possible reduces hardware complexity. Similarly, the opcode and funct3 fields are the same size in all locations, and they are always in the same place. In case you were wondering, the formats are distinguished by the values in the opcode field: each format is assigned a distinct set of opcode values in the first field (opcode) so that the hardware knows how to treat the rest of the instruction. Figure 2.5 shows the numbers used in each field for the RISC-V instructions covered so far. FIGURE 2.5 RISC-V instruction encoding. In the table above, “reg” means a register number between 0 and 31 and “address” means a 12-bit address or constant. The funct3 and funct7 fields act as additional opcode fields. Translating RISC-V Assembly Language into Machine Language EXAMPLE We can now take an example all the way from what the programmer writes to what the computer executes. If x10 has the base of the array A and x21 corresponds to h, the assignment statement A = h + A + 1; is compiled into lw x9, 120(x10) // Temporary reg x9 gets A add x9, x21, x9 // Temporary reg x9 gets h+A addi x9, x9, 1 // Temporary reg x9 gets h+A+1 sw x9, 120(x10) // Stores h+A+1 back into A ANSWER What is the RISC-V machine language code for these three instructions? 92 Chapter 2 Instructions: Language of the Computer For convenience, let’s first represent the machine language instructions using decimal numbers. From Figure 2.5, we can determine the three machine language instructions: immediate rs1 funct3 rd opcode 120 10 2 9 3 funct7 rs2 rs1 funct3 rd opcode 0 9 21 0 9 51 immediate rs1 funct3 rd opcode 1 9 0 9 19 immediate[11:5] rs2 rs1 funct3 immediate[4:0] opcode 3 9 10 2 24 35 The lw instruction is identified by 3 (see Figure 2.5) in the opcode field and 2 in the funct3 field. The base register 10 is specified in the rs1 field, and the destination register 9 is specified in the rd field. The offset to select A (120 = 30 × 4) is found in the immediate field. The add instruction that follows is specified with 51 in the opcode field, 0 in the funct3 field, and 0 in the funct7 field. The three register operands (9, 21, and 9) are found in the rd, rs1, and rs2 fields. The subsequent addi instruction is specified with 19 in the opcode field and 0 in the funct3 field. The register operands (9 and 9) are found in the rd and rs1 fields, and the constant addend 1 is found in the immediate field. The sw instruction is identified with 35 in the opcode field and 2 in the funct3 field. The register operands (9 and 10) are found in the rs2 and rs1 fields, respectively. The address offset 120 is split across the two immediate fields. Since the upper part of the immediate holds bits 5 and above, we can decompose the offset 120 by dividing by 25. The upper part of the immediate holds the quotient, 3, and the lower part holds the remainder, 24. Since 120ten = 0000011 11000two, the binary equivalent to the decimal form is: immediate rs1 funct3 rd opcode 000011110000 01010 010 01001 0000011 funct7 rs2 rs1 funct3 rd opcode 0000000 01001 10101 000 01001 0110011 immediate rs1 funct3 rd opcode 000000000001 01001 000 01001 0010011 immediate[11:5] rs2 rs1 funct3 immediate[4:0] opcode 0000011 01001 01010 010 11000 0100011 2.5 Representing Instructions in the Computer 93 Elaboration: RISC-V assembly language programmers aren’t forced to use addi when working with constants. The programmer simply writes add, and the assembler generates the proper opcode and the proper instruction format depending on whether the operands are all registers (R-type) or if one is a constant (I-type); see Section 2.12. We use the explicit names in RISC-V for the different opcodes and formats as we think it is less confusing when introducing assembly language versus machine language. Elaboration: Although RISC-V has both add and sub instructions, it does not have a subi counterpart to addi. This is because the immediate field represents a two’s complement integer, so addi can be used to subtract constants. The desire to keep all instructions the same size conflicts with the desire to have Hardware/ as many registers as possible. Any increase in the number of registers uses up at Software least one more bit in every register field of the instruction format. Given these constraints and the design principle that smaller is faster, most instruction sets Interface today have 16 or 32 general-purpose registers. Figure 2.6 summarizes the portions of RISC-V machine language described in this section. As we shall see in Chapter 4, the similarity of the binary representations of related instructions simplifies hardware design. These similarities are another example of regularity in the RISC-V architecture. FIGURE 2.6 RISC-V architecture revealed through Section 2.5. The three RISC-V instruction formats so far are R, I, and S. The R-type format has two source register operand and one destination register operand. The I-type format replaces one source register operand with a 12-bit immediate field. The S-type format has two source operands and a 12-bit immediate field, but no destination register operand. The S-type immediate field is split into two parts, with bits 11–5 in the leftmost field and bits 4–0 in the second-rightmost field. 94 Chapter 2 Instructions: Language of the Computer The BIG Today’s computers are built on two key principles: 1. Instructions are represented as numbers. Picture 2. Programs are stored in memory to be read or written, just like data. These principles lead to the stored-program concept; its invention let the computing genie out of its bottle. Figure 2.7 shows the power of the concept; specifically, memory can contain the source code for an editor program, the corresponding compiled machine code, the text that the compiled program is using, and even the compiler that generated the machine code. One consequence of instructions as numbers is that programs are often shipped as files of binary numbers. The commercial implication is that computers can inherit ready-made software provided they are compatible with an existing instruction set. Such “binary compatibility” often leads industry to align around a small number of instruction set architectures. Memory Accounting program (machine code) Editor program (machine code) C compiler Processor (machine code) Payroll data Book text Source code in C for editor program FIGURE 2.7 The stored-program concept. Stored programs allow a computer that performs accounting to become, in the blink of an eye, a computer that helps an author write a book. The switch happens simply by loading memory with programs and data and then telling the computer to begin executing at a given location in memory. Treating instructions in the same way as data greatly simplifies both the memory hardware and the software of computer systems. Specifically, the memory technology needed for data can also be used for programs, and programs like compilers, for instance, can translate code written in a notation far more convenient for humans into code that the computer can understand. 2.6 Logical Operations 95 What RISC-V instruction does this represent? Choose from one of the four options below. Check Yourself funct7 rs2 rs1 funct3 rd opcode 32 9 10 000 11 51 1. sub x9, x10, x11 2. add x11, x9, x10 3. sub x11, x10, x9 4. sub x11, x9, x10 If a person is age 40ten, what is their age in hexidecimal? 2.6 Logical Operations “Contrariwise,” Although the first computers operated on full words, it soon became clear that continued Tweedledee, it was useful to operate on fields of bits within a word or even on individual bits. “if it was so, it might Examining characters within a word, each of which is stored as 8 bits, is one example be; and if it were so, it of such an operation (see Section 2.9). It follows that operations were added to would be; but as programming languages and instruction set architectures to simplify, among other it isn’t, it ain’t. things, the packing and unpacking of bits into words. These instructions are called That’s logic.” logical operations. Figure 2.8 shows logical operations in C, Java, and RISC-V. Lewis Carroll, Alice’s Adventures in Wonderland, 1865 FIGURE 2.8 C and Java logical operators and their corresponding RISC-V instructions. One way to implement NOT is to use XOR with one operand being all ones (FFFF FFFF FFFF FFFFhex). 96 Chapter 2 Instructions: Language of the Computer The first class of such operations is called shifts. They move all the bits in a word to the left or right, filling the emptied bits with 0s. For example, if register x19 contained 00000000 00000000 00000000 00001001two = 9ten and the instruction to shift left by 4 was executed, the new value would be: 00000000 00000000 00000000 10010000two = 144ten The dual of a shift left is a shift right. The actual names of the two RISC-V shift instructions are shift left logical immediate (slli) and shift right logical immediate (srli). The following instruction performs the operation above, if the original value was in register x19 and the result should go in register x11: slli x11, x19, 4 // reg x11 = reg x19 = x11 or x20 < 0, goto IndexOutOfBounds 2.7 Instructions for Making Decisions 103 Case/Switch Statement Most programming languages have a case or switch statement that allows the programmer to select one of many alternatives depending on a single value. The simplest way to implement switch is via a sequence of conditional tests, turning the switch statement into a chain of if-then-else statements. Sometimes the alternatives may be more efficiently encoded as a table of addresses of alternative instruction sequences, called a branch address table or branch address branch table, and the program needs only to index into the table and then branch table Also called to the appropriate sequence. The branch table is therefore just an array of double- branch table. A table of words containing addresses that correspond to labels in the code. The program addresses of alternative instruction sequences. loads the appropriate entry from the branch table into a register. It then needs to branch using the address in the register. To support such situations, computers like RISC-V include an indirect jump instruction, which performs an unconditional branch to the address specified in a register. In RISC-V, the jump-and-link register instruction (jalr) serves this purpose. We’ll see an even more popular use of this versatile instruction in the next section. Although there are many statements for decisions and loops in programming Hardware/ languages like C and Java, the bedrock statement that implements them at the instruction set level is the conditional branch. Software Interface I. C has many statements for decisions and loops, while RISC-V has few. Check Which of the following does or does not explain this imbalance? Why? Yourself 1. More decision statements make code easier to read and understand. 2. Fewer decision statements simplify the task of the underlying layer that is responsible for execution. 3. More decision statements mean fewer lines of code, which generally reduces coding time. 4. More decision statements mean fewer lines of code, which generally results in the execution of fewer operations. II. Why does C provide two sets of operators for AND (& and &&) and two sets of operators for OR (| and ||), while RISC-V doesn’t? 1. Logical operations AND and OR implement & and |, while conditional branches implement && and ||. 2. The previous statement has it backwards: && and || correspond to logical operations, while & and | map to conditional branches. 3. They are redundant and mean the same thing: && and || are simply inherited from the programming language B, the predecessor of C. 104 Chapter 2 Instructions: Language of the Computer Supporting Procedures in Computer 2.8 Hardware procedure A stored A procedure or function is one tool programmers use to structure programs, both subroutine that performs to make them easier to understand and to allow code to be reused. Procedures a specific task based allow the programmer to concentrate on just one portion of the task at a time; on the parameters with parameters act as an interface between the procedure and the rest of the program which it is provided. and data, since they can pass values and return results. We describe the equivalent to procedures in Java in Section 2.15, but Java needs everything from a computer that C needs. Procedures are one way to implement abstraction in software. You can think of a procedure like a spy who leaves with a secret plan, acquires resources, performs the task, covers his or her tracks, and then returns to the point of origin with the desired result. Nothing else should be perturbed once the mission is complete. Moreover, a spy operates on only a “need to know” basis, so the spy can’t make assumptions about the spymaster. Similarly, in the execution of a procedure, the program must follow these six steps: 1. Put parameters in a place where the procedure can access them. 2. Transfer control to the procedure. 3. Acquire the storage resources needed for the procedure. 4. Perform the desired task. 5. Put the result value in a place where the calling program can access it. 6. Return control to the point of origin, since a procedure can be called from several points in a program. As mentioned above, registers are the fastest place to hold data in a computer, so we want to use them as much as possible. RISC-V software follows the following convention for procedure calling in allocating its 32 registers: n x10–x17: eight parameter registers in which to pass parameters or return values. n x1: one return address register to return to the point of origin. jump-and-link instruction An In addition to allocating these registers, RISC-V assembly language includes an instruction that branches instruction just for the procedures: it branches to an address and simultaneously to an address and saves the address of the following instruction to the destination register rd. The simultaneously saves the jump-and-link instruction (jal) is written address of the following instruction in a register jal x1, ProcedureAddress // jump to (usually x1 in RISC-V). ProcedureAddress and write return address to x1 2.8 Supporting Procedures in Computer Hardware 105 The link portion of the name means that an address or link is formed that points to the calling site to allow the procedure to return to the proper address. This “link,” stored in register x1, is called the return address. The return address return address A link to is needed because the same procedure could be called from several parts of the the calling site that allows program. a procedure to return to the proper address; in To support the return from a procedure, computers like RISC-V use an indirect RISC-V it is stored in jump, like the jump-and-link instruction (jalr) introduced above to help with register x1. case statements: jalr x0, 0(x1) caller The program that instigates a procedure and The jump-and-link register instruction branches to the address stored in provides the necessary register x1—which is just what we want. Thus, the calling program, or caller, puts parameter values. the parameter values in x10–x17 and uses jal x1, X to branch to procedure X (sometimes named the callee). The callee then performs the calculations, places callee A procedure that executes a series of stored the results in the same parameter registers, and returns control to the caller using instructions based on jalr x0, 0(x1). parameters provided by Implicit in the stored-program idea is the need to have a register to hold the address the caller and then returns of the current instruction being executed. For historical reasons, this register is almost control to the caller. always called the program counter, abbreviated PC in the RISC-V architecture, program counter although a more sensible name would have been instruction address register. The jal (PC) The register instruction actually saves PC + 4 in its designation register (usually x1) to link to the containing the address byte address of the following instruction to set up the procedure return. of the instruction in the program being executed. Elaboration: The jump-and-link instruction can also be used to perform an unconditional branch within a procedure by using x0 as the destination register. Since stack A data structure for spilling registers x0 is hard-wired to zero, the effect is to discard the return address: organized as a last-in- first-out queue. jal x0, Label // unconditionally branch to Label stack pointer A value denoting the most Using More Registers recently allocated address in a stack that shows Suppose a compiler needs more registers for a procedure than the eight argument where registers should registers. Since we must cover our tracks after our mission is complete, any registers be spilled or where old needed by the caller must be restored to the values that they contained before the register values can be procedure was invoked. This situation is an example in which we need to spill registers found. In RISC-V, it is to memory, as mentioned in the Hardware/Software Interface section on page 75. register sp, or x2. The ideal data structure for spilling registers is a stack—a last-in-first-out queue. A stack needs a pointer to the most recently allocated address in the stack to show where the next procedure should place the registers to be spilled or where old push Add element to register values are found. In RISC-V, the stack pointer is register x2, also known stack. by the name sp. The stack pointer is adjusted by one word for each register that is saved or restored. Stacks are so popular that they have their own buzzwords for pop Remove element transferring data to and from the stack: placing data onto the stack is called a push, from stack. and removing data from the stack is called a pop. 106 Chapter 2 Instructions: Language of the Computer By historical precedent, stacks “grow” from higher addresses to lower addresses. This convention means that you push values onto the stack by subtracting from the stack pointer. Adding to the stack pointer shrinks the stack, thereby popping values off the stack. Compiling a C Procedure That Doesn’t Call Another Procedure EXAMPLE Let’s turn the example on page 72 from Section 2.2 into a C procedure: int leaf_example (int g, int h, int i, int j) { int f; f = (g + h) − (i + j); return f; } What is the compiled RISC-V assembly code? The parameter variables g, h, i, and j correspond to the argument registers ANSWER x10, x11, x12, and x13, respectively, and f corresponds to x20. The compiled program starts with the label of the procedure: leaf_example: The next step is to save the registers used by the procedure. The C assignment statement in the procedure body is identical to the example on page 73, which uses two temporary registers (x5 and x6). Thus, we need to save three registers: x5, x6, and x20. We “push” the old values onto the stack by creating space for three words (12 bytes) on the stack and then store them: addi sp, sp, -12 // adjust stack to make room for 3 items sw x5, 8(sp) // save register x5 for use afterwards sw x6, 4(sp) // save register x6 for use afterwards sw x20, 0(sp) // save register x20 for use afterwards Figure 2.10 shows the stack before, during, and after the procedure call. The next three statements correspond to the body of the procedure, which follows the example on page 73: add x5, x10, x11 // register x5 contains g + h add x6, x12, x13 // register x6 contains i + j sub x20, x5, x6 // f = x5 − x6, which is (g + h) − (i + j) 2.8 Supporting Procedures in Computer Hardware 107 To return the value of f, we copy it into a parameter register: addi x10, x20, 0 // returns f (x10 = x20 + 0) Before returning, we restore the three old values of the registers we saved by “popping” them from the stack: lw x20, 0(sp) // restore register x20 for caller lw x6, 4(sp) // restore register x6 for caller lw x5, 8(sp) // restore register x5 for caller addi sp, sp, 12 // adjust stack to delete 3 items The procedure ends with a branch register using the return address: jalr x0, 0(x1) // branch back to calling routine In the previous example, we used temporary registers and assumed their old values must be saved and restored. To avoid saving and restoring a register whose value is never used, which might happen with a temporary register, RISC-V software separates 19 of the registers into two groups: n x5−x7 and x28−x31: temporary registers that are not preserved by the callee (called procedure) on a procedure call n x8−x9 and x18−x27: saved registers that must be preserved on a procedure call (if used, the callee saves and restores them) This simple convention reduces register spilling. In the example above, since the caller does not expect registers x5 and x6 to be preserved across a procedure call, High address SP SP Contents of register x5 Contents of register x6 SP Contents of register x20 Low address (a) (b) (c) FIGURE 2.10 The values of the stack pointer and the stack (a) before, (b) during, and (c) after the procedure call. The stack pointer always points to the “top” of the stack, or the last word in the stack in this drawing. 108 Chapter 2 Instructions: Language of the Computer we can drop two stores and two loads from the code. We still must save and restore x20, since the callee must assume that the caller needs its value. Nested Procedures Procedures that do not call others are called leaf procedures. Life would be simple if all procedures were leaf procedures, but they aren’t. Just as a spy might employ other spies as part of a mission, who in turn might use even more spies, so do procedures invoke other procedures. Moreover, recursive procedures even invoke “clones” of themselves. Just as we need to be careful when using registers in procedures, attention must be paid when invoking nonleaf procedures. For example, suppose that the main program calls procedure A with an argument of 3, by placing the value 3 into register x10 and then using jal x1, A. Then suppose that procedure A calls procedure B via jal x1, B with an argument of 7, also placed in x10. Since A hasn’t finished its task yet, there is a conflict over the use of register x10. Similarly, there is a conflict over the return address in register x1, since it now has the return address for B. Unless we take steps to prevent the problem, this conflict will eliminate procedure A’s ability to return to its caller. One solution is to push all the other registers that must be preserved on the stack, just as we did with the saved registers. The caller pushes any argument registers (x10–x17) or temporary registers (x5-x7 and x28-x31) that are needed after the call. The callee pushes the return address register x1 and any saved registers (x8- x9 and x18-x27) used by the callee. The stack pointer sp is adjusted to account for the number of registers placed on the stack. Upon the return, the registers are restored from memory, and the stack pointer is readjusted. Compiling a Recursive C Procedure, Showing Nested Procedure Linking EXAMPLE Let’s tackle a recursive procedure that calculates factorial: int fact (int n) { if (n < 1) return (1); else return (n * fact(n − 1)); } What is the RISC-V assembly code? The parameter variable n corresponds to the argument register x10. The compiled program starts with the label of the procedure and then saves two registers on the stack, the return address and x10: ANSWER fact: 2.8 Supporting Procedures in Computer Hardware 109 addi sp, sp, -8 // adjust stack for 2 items sw x1, 4(sp) // save the return address sw x10, 0(sp) // save the argument n The first time fact is called, sw saves an address in the program that called fact. The next two instructions test whether n is less than 1, going to L1 if n ≥ 1. addi x5, x10, -1 // x5 = n - 1 bge x5, x0, L1 // if (n - 1) >= 0, go to L1 If n is less than 1, fact returns 1 by putting 1 into a value register: it adds 1 to 0 and places that sum in x10. It then pops the two saved values off the stack and branches to the return address: addi x10, x0, 1 // return 1 addi sp, sp, 8 // pop 2 items off stack jalr x0, 0(x1) // return to caller Before popping two items off the stack, we could have loaded x1 and x10. Since x1 and x10 don’t change when n is less than 1, we skip those instructions. If n is not less than 1, the argument n is decremented and then fact is called again with the decremented value: L1: addi x10, x10, -1 // n >= 1: argument gets (n − 1) jal x1, fact // call fact with (n − 1) The next instruction is where fact returns; its result is in x10. Now the old return address and old argument are restored, along with the stack pointer: addi x6, x10, 0 // return from jal: move result of fact (n - 1) to x6: lw x10, 0(sp) // restore argument n lw x1, 4(sp) // restore the return address addi sp, sp, 8 // adjust stack pointer to pop 2 items Next, argument register x10 gets the product of the old argument and the result of fact(n - 1), now in x6. We assume a multiply instruction is available, even though it is not covered until Chapter 3: mul x10, x10, x6 // return n * fact (n − 1) Finally, fact branches again to the return address: jalr x0, 0(x1) // return to the caller 110 Chapter 2 Instructions: Language of t

(The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-102-258.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue