(The Morgan Kaufmann Series in Computer Architecture and Design) David A. Patterson, John L. Hennessy - Computer Organization and Design RISC-V Edition_ The Hardware Software Interface-Morgan Kaufmann-102-258-pages-8.pdf
Document Details
Uploaded by MagnanimousCloisonnism
Vilnius University
Tags
Related
Full Transcript
104 Chapter 2 Instructions: Language of the Computer Supporting Procedures in Computer 2.8 Hardware procedure A stored A procedure or function is one tool programmers use to structu...
104 Chapter 2 Instructions: Language of the Computer Supporting Procedures in Computer 2.8 Hardware procedure A stored A procedure or function is one tool programmers use to structure programs, both subroutine that performs to make them easier to understand and to allow code to be reused. Procedures a specific task based allow the programmer to concentrate on just one portion of the task at a time; on the parameters with parameters act as an interface between the procedure and the rest of the program which it is provided. and data, since they can pass values and return results. We describe the equivalent to procedures in Java in Section 2.15, but Java needs everything from a computer that C needs. Procedures are one way to implement abstraction in software. You can think of a procedure like a spy who leaves with a secret plan, acquires resources, performs the task, covers his or her tracks, and then returns to the point of origin with the desired result. Nothing else should be perturbed once the mission is complete. Moreover, a spy operates on only a “need to know” basis, so the spy can’t make assumptions about the spymaster. Similarly, in the execution of a procedure, the program must follow these six steps: 1. Put parameters in a place where the procedure can access them. 2. Transfer control to the procedure. 3. Acquire the storage resources needed for the procedure. 4. Perform the desired task. 5. Put the result value in a place where the calling program can access it. 6. Return control to the point of origin, since a procedure can be called from several points in a program. As mentioned above, registers are the fastest place to hold data in a computer, so we want to use them as much as possible. RISC-V software follows the following convention for procedure calling in allocating its 32 registers: n x10–x17: eight parameter registers in which to pass parameters or return values. n x1: one return address register to return to the point of origin. jump-and-link instruction An In addition to allocating these registers, RISC-V assembly language includes an instruction that branches instruction just for the procedures: it branches to an address and simultaneously to an address and saves the address of the following instruction to the destination register rd. The simultaneously saves the jump-and-link instruction (jal) is written address of the following instruction in a register jal x1, ProcedureAddress // jump to (usually x1 in RISC-V). ProcedureAddress and write return address to x1 2.8 Supporting Procedures in Computer Hardware 105 The link portion of the name means that an address or link is formed that points to the calling site to allow the procedure to return to the proper address. This “link,” stored in register x1, is called the return address. The return address return address A link to is needed because the same procedure could be called from several parts of the the calling site that allows program. a procedure to return to the proper address; in To support the return from a procedure, computers like RISC-V use an indirect RISC-V it is stored in jump, like the jump-and-link instruction (jalr) introduced above to help with register x1. case statements: jalr x0, 0(x1) caller The program that instigates a procedure and The jump-and-link register instruction branches to the address stored in provides the necessary register x1—which is just what we want. Thus, the calling program, or caller, puts parameter values. the parameter values in x10–x17 and uses jal x1, X to branch to procedure X (sometimes named the callee). The callee then performs the calculations, places callee A procedure that executes a series of stored the results in the same parameter registers, and returns control to the caller using instructions based on jalr x0, 0(x1). parameters provided by Implicit in the stored-program idea is the need to have a register to hold the address the caller and then returns of the current instruction being executed. For historical reasons, this register is almost control to the caller. always called the program counter, abbreviated PC in the RISC-V architecture, program counter although a more sensible name would have been instruction address register. The jal (PC) The register instruction actually saves PC + 4 in its designation register (usually x1) to link to the containing the address byte address of the following instruction to set up the procedure return. of the instruction in the program being executed. Elaboration: The jump-and-link instruction can also be used to perform an unconditional branch within a procedure by using x0 as the destination register. Since stack A data structure for spilling registers x0 is hard-wired to zero, the effect is to discard the return address: organized as a last-in- first-out queue. jal x0, Label // unconditionally branch to Label stack pointer A value denoting the most Using More Registers recently allocated address in a stack that shows Suppose a compiler needs more registers for a procedure than the eight argument where registers should registers. Since we must cover our tracks after our mission is complete, any registers be spilled or where old needed by the caller must be restored to the values that they contained before the register values can be procedure was invoked. This situation is an example in which we need to spill registers found. In RISC-V, it is to memory, as mentioned in the Hardware/Software Interface section on page 75. register sp, or x2. The ideal data structure for spilling registers is a stack—a last-in-first-out queue. A stack needs a pointer to the most recently allocated address in the stack to show where the next procedure should place the registers to be spilled or where old push Add element to register values are found. In RISC-V, the stack pointer is register x2, also known stack. by the name sp. The stack pointer is adjusted by one word for each register that is saved or restored. Stacks are so popular that they have their own buzzwords for pop Remove element transferring data to and from the stack: placing data onto the stack is called a push, from stack. and removing data from the stack is called a pop. 106 Chapter 2 Instructions: Language of the Computer By historical precedent, stacks “grow” from higher addresses to lower addresses. This convention means that you push values onto the stack by subtracting from the stack pointer. Adding to the stack pointer shrinks the stack, thereby popping values off the stack. Compiling a C Procedure That Doesn’t Call Another Procedure EXAMPLE Let’s turn the example on page 72 from Section 2.2 into a C procedure: int leaf_example (int g, int h, int i, int j) { int f; f = (g + h) − (i + j); return f; } What is the compiled RISC-V assembly code? The parameter variables g, h, i, and j correspond to the argument registers ANSWER x10, x11, x12, and x13, respectively, and f corresponds to x20. The compiled program starts with the label of the procedure: leaf_example: The next step is to save the registers used by the procedure. The C assignment statement in the procedure body is identical to the example on page 73, which uses two temporary registers (x5 and x6). Thus, we need to save three registers: x5, x6, and x20. We “push” the old values onto the stack by creating space for three words (12 bytes) on the stack and then store them: addi sp, sp, -12 // adjust stack to make room for 3 items sw x5, 8(sp) // save register x5 for use afterwards sw x6, 4(sp) // save register x6 for use afterwards sw x20, 0(sp) // save register x20 for use afterwards Figure 2.10 shows the stack before, during, and after the procedure call. The next three statements correspond to the body of the procedure, which follows the example on page 73: add x5, x10, x11 // register x5 contains g + h add x6, x12, x13 // register x6 contains i + j sub x20, x5, x6 // f = x5 − x6, which is (g + h) − (i + j) 2.8 Supporting Procedures in Computer Hardware 107 To return the value of f, we copy it into a parameter register: addi x10, x20, 0 // returns f (x10 = x20 + 0) Before returning, we restore the three old values of the registers we saved by “popping” them from the stack: lw x20, 0(sp) // restore register x20 for caller lw x6, 4(sp) // restore register x6 for caller lw x5, 8(sp) // restore register x5 for caller addi sp, sp, 12 // adjust stack to delete 3 items The procedure ends with a branch register using the return address: jalr x0, 0(x1) // branch back to calling routine In the previous example, we used temporary registers and assumed their old values must be saved and restored. To avoid saving and restoring a register whose value is never used, which might happen with a temporary register, RISC-V software separates 19 of the registers into two groups: n x5−x7 and x28−x31: temporary registers that are not preserved by the callee (called procedure) on a procedure call n x8−x9 and x18−x27: saved registers that must be preserved on a procedure call (if used, the callee saves and restores them) This simple convention reduces register spilling. In the example above, since the caller does not expect registers x5 and x6 to be preserved across a procedure call, High address SP SP Contents of register x5 Contents of register x6 SP Contents of register x20 Low address (a) (b) (c) FIGURE 2.10 The values of the stack pointer and the stack (a) before, (b) during, and (c) after the procedure call. The stack pointer always points to the “top” of the stack, or the last word in the stack in this drawing. 108 Chapter 2 Instructions: Language of the Computer we can drop two stores and two loads from the code. We still must save and restore x20, since the callee must assume that the caller needs its value. Nested Procedures Procedures that do not call others are called leaf procedures. Life would be simple if all procedures were leaf procedures, but they aren’t. Just as a spy might employ other spies as part of a mission, who in turn might use even more spies, so do procedures invoke other procedures. Moreover, recursive procedures even invoke “clones” of themselves. Just as we need to be careful when using registers in procedures, attention must be paid when invoking nonleaf procedures. For example, suppose that the main program calls procedure A with an argument of 3, by placing the value 3 into register x10 and then using jal x1, A. Then suppose that procedure A calls procedure B via jal x1, B with an argument of 7, also placed in x10. Since A hasn’t finished its task yet, there is a conflict over the use of register x10. Similarly, there is a conflict over the return address in register x1, since it now has the return address for B. Unless we take steps to prevent the problem, this conflict will eliminate procedure A’s ability to return to its caller. One solution is to push all the other registers that must be preserved on the stack, just as we did with the saved registers. The caller pushes any argument registers (x10–x17) or temporary registers (x5-x7 and x28-x31) that are needed after the call. The callee pushes the return address register x1 and any saved registers (x8- x9 and x18-x27) used by the callee. The stack pointer sp is adjusted to account for the number of registers placed on the stack. Upon the return, the registers are restored from memory, and the stack pointer is readjusted. Compiling a Recursive C Procedure, Showing Nested Procedure Linking EXAMPLE Let’s tackle a recursive procedure that calculates factorial: int fact (int n) { if (n < 1) return (1); else return (n * fact(n − 1)); } What is the RISC-V assembly code? The parameter variable n corresponds to the argument register x10. The compiled program starts with the label of the procedure and then saves two registers on the stack, the return address and x10: ANSWER fact: 2.8 Supporting Procedures in Computer Hardware 109 addi sp, sp, -8 // adjust stack for 2 items sw x1, 4(sp) // save the return address sw x10, 0(sp) // save the argument n The first time fact is called, sw saves an address in the program that called fact. The next two instructions test whether n is less than 1, going to L1 if n ≥ 1. addi x5, x10, -1 // x5 = n - 1 bge x5, x0, L1 // if (n - 1) >= 0, go to L1 If n is less than 1, fact returns 1 by putting 1 into a value register: it adds 1 to 0 and places that sum in x10. It then pops the two saved values off the stack and branches to the return address: addi x10, x0, 1 // return 1 addi sp, sp, 8 // pop 2 items off stack jalr x0, 0(x1) // return to caller Before popping two items off the stack, we could have loaded x1 and x10. Since x1 and x10 don’t change when n is less than 1, we skip those instructions. If n is not less than 1, the argument n is decremented and then fact is called again with the decremented value: L1: addi x10, x10, -1 // n >= 1: argument gets (n − 1) jal x1, fact // call fact with (n − 1) The next instruction is where fact returns; its result is in x10. Now the old return address and old argument are restored, along with the stack pointer: addi x6, x10, 0 // return from jal: move result of fact (n - 1) to x6: lw x10, 0(sp) // restore argument n lw x1, 4(sp) // restore the return address addi sp, sp, 8 // adjust stack pointer to pop 2 items Next, argument register x10 gets the product of the old argument and the result of fact(n - 1), now in x6. We assume a multiply instruction is available, even though it is not covered until Chapter 3: mul x10, x10, x6 // return n * fact (n − 1) Finally, fact branches again to the return address: jalr x0, 0(x1) // return to the caller 110 Chapter 2 Instructions: Language of the Computer Hardware/ A C variable is generally a location in storage, and its interpretation depends both on its type and storage class. Example types include integers and characters (see Software Section 2.9). C has two storage classes: automatic and static. Automatic variables Interface are local to a procedure and are discarded when the procedure exits. Static variables exist across exits from and entries to procedures. C variables declared outside all procedures are considered static, as are any variables declared using the keyword global pointer The static. The rest are automatic. To simplify access to static data, some RISC-V register that is reserved to compilers reserve a register x3 for use as the global pointer, or gp. point to the static area. Figure 2.11 summarizes what is preserved across a procedure call. Note that several schemes preserve the stack, guaranteeing that the caller will get the same data back on a load from the stack as it stored onto the stack. The stack above sp is preserved simply by making sure the callee does not write above sp; sp is itself preserved by the callee adding exactly the same amount that was subtracted from it; and the other registers are preserved by saving them on the stack (if they are used) and restoring them from there. FIGURE 2.11 What is and what is not preserved across a procedure call. If the software relies on the global pointer register, discussed in the following subsections, it is also preserved. procedure frame Also Allocating Space for New Data on the Stack called activation record. The segment of the stack The final complexity is that the stack is also used to store variables that are local containing a procedure’s to the procedure but do not fit in registers, such as local arrays or structures. The saved registers and local segment of the stack containing a procedure’s saved registers and local variables is variables. called a procedure frame or activation record. Figure 2.12 shows the state of the stack before, during, and after the procedure call. Some RISC-V compilers use a frame pointer fp, or register x8 to point to the first word of the frame of a procedure. A stack pointer might change during the frame pointer A value procedure, and so references to a local variable in memory might have different denoting the location of offsets depending on where they are in the procedure, making the procedure the saved registers and local variables for a given harder to understand. Alternatively, a frame pointer offers a stable base register procedure. within a procedure for local memory-references. Note that an activation record 2.8 Supporting Procedures in Computer Hardware 111 High address FP FP SP SP FP Saved argument registers (if any) Saved return address Saved saved registers (if any) Local arrays and SP structures (if any) Low address (a) (b) (c) FIGURE 2.12 Illustration of the stack allocation (a) before, (b) during, and (c) after the procedure call. The frame pointer (fp or x8) points to the first word of the frame, often a saved argument register, and the stack pointer (sp) points to the top of the stack. The stack is adjusted to make room for all the saved registers and any memory-resident local variables. Since the stack pointer may change during program execution, it’s easier for programmers to reference variables via the stable frame pointer, although it could be done just with the stack pointer and a little address arithmetic. If there are no local variables on the stack within a procedure, the compiler will save time by not setting and restoring the frame pointer. When a frame pointer is used, it is initialized using the address in sp on a call, and sp is restored using fp. This information is also found in Column 4 of the RISC-V Reference Data Card at the front of this book. appears on the stack whether or not an explicit frame pointer is used. We’ve been avoiding using fp by avoiding changes to sp within a procedure: in our examples, the stack is adjusted only on entry to and exit from the procedure. Allocating Space for New Data on the Heap In addition to automatic variables that are local to procedures, C programmers need space in memory for static variables and for dynamic data structures. Figure 2.13 shows the RISC-V convention for allocation of memory when running the Linux operating system. The stack starts in the high end of the user addresses space (see Chapter 5) and grows down. The first part of the low end of memory is reserved, followed by the home of the RISC-V machine code, traditionally called the text text segment The segment. Above the code is the static data segment, which is the place for constants segment of a UNIX object and other static variables. Although arrays tend to be a fixed length and thus are a file that contains the good match to the static data segment, data structures like linked lists tend to grow machine language code for routines in the source and shrink during their lifetimes. The segment for such data structures is traditionally file. called the heap, and it is placed next in memory. Note that this allocation allows the stack and heap to grow toward each other, thereby allowing the efficient use of memory as the two segments wax and wane. 112 Chapter 2 Instructions: Language of the Computer SP 0000 003f ffff fff0hex Stack Dynamic data Static data 0000 0000 1000 0000hex Text PC 0000 0000 0040 0000hex Reserved 0 FIGURE 2.13 The RISC-V memory allocation for program and data. These addresses are only a software convention, and not part of the RISC-V architecture. The stack pointer is initialized to 0000 003f ffff fff0hex and grows down toward the data segment. At the other end, the program code (“text”) starts at 0000 0000 0040 0000hex. The static data starts immediately after the end of the text segment; in this example, we assume that address is 0000 0000 1000 0000hex. Dynamic data, allocated by malloc in C and by new in Java, is next. It grows up toward the stack in an area called the heap. This information is also found in Column 4 of the RISC-V Reference Data Card at the front of this book. C allocates and frees space on the heap with explicit functions. malloc() allocates space on the heap and returns a pointer to it, and free() releases space on the heap to which the pointer points. C programs control memory allocation, which is the source of many common and difficult bugs. Forgetting to free space leads to a “memory leak,” which ultimately uses up so much memory that the operating system may crash. Freeing space too early leads to “dangling pointers,” which can cause pointers to point to things that the program never intended. Java uses automatic memory allocation and garbage collection primarily to avoid such bugs. Figure 2.14 summarizes the register conventions for the RISC-V assembly language. This convention is another example of making the common case fast: most procedures can be satisfied with up to eight argument registers, twelve saved registers, and seven temporary registers without ever going to memory. Elaboration: What if there are more than eight parameters? The RISC-V convention is to place the extra parameters on the stack just above the frame pointer. The procedure then expects the first eight parameters to be in registers x10 through x17 and the rest in memory, addressable via the frame pointer. As mentioned in the caption of Figure 2.12, the frame pointer is convenient because all references to variables in the stack within a procedure will have the same offset. The frame pointer is not necessary, however. The RISC-V C compiler only uses a frame pointer in procedures that change the stack pointer in the body of the procedure. 2.8 Supporting Procedures in Computer Hardware 113 FIGURE 2.14 RISC-V register conventions. This information is also found in Column 2 of the RISC-V Reference Data Card at the front of this book. Elaboration: Some recursive procedures can be implemented iteratively without using recursion. Iteration can significantly improve performance by removing the overhead associated with recursive procedure calls. For example, consider a procedure used to accumulate a sum: int sum (int n, int acc) { if (n > 0) return sum(n − 1, acc + n); else return acc; } Consider the procedure call sum(3,0). This will result in recursive calls to sum(2,3), sum(1,5), and sum(0,6), and then the result 6 will be returned four times. This recursive call of sum is referred to as a tail call, and this example use of tail recursion can be implemented very efficiently (assume x10 = n, x11 = acc, and the result goes into x12): sum: ble x10, x0, sum_exit // go to sum_exit if n (wow open tab at bar is great) Fourth line of the 2.9 Communicating with People keyboard poem “Hatless Atlas,” 1991 (some Computers were invented to crunch numbers, but as soon as they became give names to ASCII characters: “!” is “wow,” commercially viable they were used to process text. Most computers today offer “(” is open, “|” is bar, 8-bit bytes to represent characters, with the American Standard Code for Information and so on). Interchange (ASCII) being the representation that nearly everyone follows. Figure 2.15 summarizes ASCII. FIGURE 2.15 ASCII representation of characters. Note that upper- and lowercase letters differ by exactly 32; this observation can lead to shortcuts in checking or changing upper- and lowercase. Values not shown include formatting characters. For example, 8 represents a backspace, 9 represents a tab character, and 13 represents a carriage return. Another useful value is 0 for null, the value the programming language C uses to mark the end of a string.