(THEMO~1.PDF

120 Chapter 2 Instructions: Language of the Computer The Big Picture on page 94 reminds us that instructions are also represented as numbers, so the bit pattern could represent the MIPS machine language instruction 011000 10011 00001 01010 00000 000000 that corresponds to the assembly language instruction multiply (see Chapter 3): mult $t2, $s3, $at If you accidentally give a word-processing program an image, it will try to interpret it as text and you will see bizarre images on the screen. You will get into similar problems if you give text data to a graphics display program. This unrestricted behavior is why file systems have a naming convention of the suffix giving the type of file (e.g.,.jpg,.pdf, or.txt) to enable the program to check for mismatches by file name to reduce the occurrence of such embarrassing scenarios. RISC-V Addressing for Wide Immediates 2.10 and Addresses Although keeping all RISC-V instructions 32 bits long simplifies the hardware, there are times where it would be convenient to have 32-bit or larger constants or addresses. This section starts with the general solution for large constants, and then shows the optimizations for instruction addresses used in branches. Wide Immediate Operands Although constants are frequently short and fit into the 12-bit fields, sometimes they are bigger. The RISC-V instruction set includes the instruction Load upper immediate (lui) to load a 20-bit constant into bits 12 through 31 of a register. The rightmost 12 bits are filled with zeros. This instruction allows, for example, a 32-bit constant to be created with two instructions. lui uses a new instruction format, U-type, as the other formats cannot accommodate such a large constant. EXAMPLE Loading a 32-Bit Constant What is the RISC-V assembly code to load this 32-bit constant into register x19? 00000000 00111101 00000101 00000000 2.10 RISC-V Addressing for Wide Immediates and Addresses 121 ANSWER First, we would load bits 12 through 31 with that bit pattern, which is 976 in decimal, using lui: lui x19, 976 // 976decimal = 0000 0000 0011 1101 0000 The value of register x19 afterward is: 00000000 00111101 00000000 00000000 The next step is to add in the lowest 12 bits, whose decimal value is 1280: addi x19, x19, 1280 // 1280decimal = 00000101 00000000 The final value in register x19 is the desired value: 00000000 00111101 00000101 00000000 Elaboration: In the previous example, bit 11 of the constant was 0. If bit 11 had been set, there would have been an additional complication: the 12-bit immediate is sign-extended, so the addend would have been negative. This means that in addition to adding in the rightmost 11 bits of the constant, we would have also subtracted 212. To compensate for this error, it suffices to add 1 to the constant loaded with lui, since the lui constant is scaled by 212. Either the compiler or the assembler must break large constants into pieces and Hardware/ then reassemble them into a register. As you might expect, the immediate field’s size restriction may be a problem for memory addresses in loads and stores as well Software as for constants in immediate instructions. Interface Hence, the symbolic representation of the RISC-V machine language is no longer limited by the hardware, but by whatever the creator of an assembler chooses to include (see Section 2.12). We stick close to the hardware to explain the architecture of the computer, noting when we use the enhanced language of the assembler that is not found in the processor. Addressing in Branches The RISC-V branch instructions use an RISC-V instruction format with a 12-bit immediate. This format can represent branch addresses from −4096 to 4094, in multiples of 2. For reasons revealed shortly, it is only possible to branch to even addresses. The SB-type format consists of a 7-bit opcode, a 3-bit function code, two 5-bit register operands (rs1 and rs2), and a 12-bit address immediate. The address uses an unusual encoding, which simplifies datapath design but complicates assembly. The instruction bne x10, x11, 2000 // if x10 != x11, go to location 2000ten = 0111 1101 0000 122 Chapter 2 Instructions: Language of the Computer could be assembled into the S format (it’s actually a bit more complicated, as we will see in Section 4.4): 0011111 01011 01010 001 01000 1100111 imm[12:6] rs2 rs1 funct3 imm[5:1] opcode where the opcode for conditional branches is 1100111two and bne’s funct3 code is 001two. The unconditional jump-and-link instruction (jal) uses an instruction format with a 12-bit immediate. This instruction consists of a 7-bit opcode, a 5-bit destination register operand (rd), and a 20-bit address immediate. The link address, which is the address of the instruction following the jal, is written to rd. Like the SB-type format, the UJ-type format’s address operand uses an unusual immediate encoding, and it cannot encode odd addresses. So, jal x0, 2000 // go to location 2000ten = 0111 1101 0000 could be assembled into the U format (Section 4.4 will show the actual format for jal): 00000000001111101000 00000 1101111 imm[20:1] rd opcode If addresses of the program had to fit in this 20-bit field, it would mean that no program could be bigger than 220, which is far too small to be a realistic option today. An alternative would be to specify a register that would always be added to the branch offset, so that a branch instruction would calculate the following: Program counter Register Branch offset This sum allows the program to be as large as 232 and still be able to use conditional branches, solving the branch address size problem. Then the question is, which register? The answer comes from seeing how conditional branches are used. Conditional branches are found in loops and in if statements, so they tend to branch to a nearby instruction. For example, about half of all conditional branches in SPEC PC-relative benchmarks go to locations less than 16 instructions away. Since the program addressing An counter (PC) contains the address of the current instruction, we can branch within addressing regime ±210 words of the current instruction, or jump within ±218 words of the current in which the address instruction, if we use the PC as the register to be added to the address. Almost all is the sum of the program counter (PC) loops and if statements are smaller than 210 words, so the PC is the ideal choice. and a constant in the This form of branch addressing is called PC-relative addressing. instruction. 2.10 RISC-V Addressing for Wide Immediates and Addresses 123 Like most recent computers, RISC-V uses PC-relative addressing for both conditional branches and unconditional jumps, because the destination of these instructions is likely to be close to the branch. On the other hand, procedure calls may require jumping more than 218 words away, since there is no guarantee that the callee is close to the caller. Hence, RISC-V allows very long jumps to any 32- bit address with a two-instruction sequence: lui writes bits 12 through 31 of the address to a temporary register, and jalr adds the lower 12 bits of the address to the temporary register and jumps to the sum. Since RISC-V instructions are 4 bytes long, the RISC-V branch instructions could have been designed to stretch their reach by having the PC-relative address refer to the number of words between the branch and the target instruction, rather than the number of bytes. However, the RISC-V architects wanted to support the possibility of instructions that are only 2 bytes long, so the branch instructions represent the number of halfwords between the branch and the branch target. Thus, the 20-bit address field in the jal instruction can encode a distance of ±219 halfwords, or ±1 MiB from the current PC. Similarly, the 12-bit field in the conditional branch instructions is also a halfword address, meaning that it represents a 13-bit byte address. EXAMPLE Showing Branch Offset in Machine Language The while loop on page 100 was compiled into this RISC-V assembler code: Loop:slli x10, x22, 2 // Temp reg x10 = i * 4 add x10, x10, x25 // x10 = address of save[i] lw x9, 0(x10) // Temp reg x9 = save[i] bne x9, x24, Exit // go to Exit if save[i] != k addi x22, x22, 1 // i = i + 1 beq x0, x0, Loop // go to Loop Exit: If we assume we place the loop starting at location 80000 in memory, what is the RISC-V machine code for this loop? ANSWER The assembled instructions and their addresses are: Address Instruction 80000 0000000 00010 10110 001 01010 0010011 80004 0000000 11001 01010 000 01010 0110011 80008 0000000 00000 01010 011 01001 0000011 80012 0000000 11000 01001 001 01100 1100011 80016 0000000 00001 10110 000 10110 0010011 80020 1111111 00000 00000 000 01101 1100011 124 Chapter 2 Instructions: Language of the Computer Remember that RISC-V instructions have byte addresses, so addresses of sequential words differ by 4. The bne instruction on the fourth line adds 3 words or 12 bytes to the address of the instruction, specifying the branch destination relative to the branch instruction (12 + 80012) and not using the full destination address (80024). The branch instruction on the last line does a similar calculation for a backwards branch (−20 + 80020), corresponding to the label Loop. Elaboration: In Chapters 2 and 3, we pretend that branches and jumps use S and U formats for pedagogical reasons. Conditional branch and unconditional jumps use formats match the lengths and functions of the fields in the S and U types—called SB and UJ—but the bits are swirled around. The rationale for SB and UJ makes more sense once you understand hardware, as Chapter 4 explains. SB and UJ simplify the hardware but give the assembler (and your author) a little more to do. Figures 4.17 and 4.18 show the hardware savings. Hardware/ Most conditional branches are to a nearby location, but occasionally they branch far away, farther than can be represented in the 12-bit address in the conditional branch instruction. Software The assembler comes to the rescue just as it did with large addresses or constants: it Interface inserts an unconditional branch to the branch target, and inverts the condition so that the conditional branch decides whether to skip the unconditional branch. EXAMPLE Branching Far Away Given a branch on register x10 being equal to zero, beq x10, x0, L1 replace it by a pair of instructions that offers a much greater branching distance. ANSWER These instructions replace the short-address conditional branch: bne x10, x0, L2 jal x0, L1 L2: 2.10 RISC-V Addressing for Wide Immediates and Addresses 125 1. Immediate addressing immediate rs1 funct3 rd op 2. Register addressing funct7 rs2 rs1 funct3 rd op Registers Register 3. Base addressing immediate rs1 funct3 rd op Memory Register + Byte Halfword word 4. PC-relative addressing imm rs2 rs1 funct3 imm op Memory PC + Word FIGURE 2.17 Illustration of four RISC-V addressing modes. The operands are shaded in color. The operand of mode 3 is in memory, whereas the operand for mode 2 is a register. Note that versions of load and store access bytes, halfwords, words. For mode 1, the operand is part of the instruction itself. Mode 4 addresses instructions in memory, with mode 4 adding a long address to the PC. Note that a single operation can use more than one addressing mode. Add, for example, uses both immediate (addi) and register (add) addressing. RISC-V Addressing Mode Summary addressing mode One of several addressing Multiple forms of addressing are generically called addressing modes. Figure regimes delimited by their 2.17 shows how operands are identified for each addressing mode. The addressing varied use of operands modes of the RISC-V instructions are the following: and/or addresses. 1. Immediate addressing, where the operand is a constant within the instruction itself. 2. Register addressing, where the operand is a register. 3. Base or displacement addressing, where the operand is at the memory location whose address is the sum of a register and a constant in the instruction. 4. PC-relative addressing, where the branch address is the sum of the PC and a constant in the instruction. 126 Chapter 2 Instructions: Language of the Computer FIGURE 2.18 RISC-V instruction encoding. All instructions have an opcode field, and all formats except U-type use the funct3 field. R-type instructions use the funct7 field, and immediate shifts (slli, srli, srai) use the funct6 field. Decoding Machine Language Sometimes you are forced to reverse-engineer machine language to create the original assembly language. One example is when looking at “core dump.” Figure 2.18 shows the RISC-V encoding of the opcodes for the RISC-V machine language. This figure helps when translating by hand between assembly language and machine language. 2.10 RISC-V Addressing for Wide Immediates and Addresses 127 EXAMPLE Decoding Machine Code What is the assembly language statement corresponding to this machine instruction? 00578833hex The first step is converting hexadecimal to binary: ANSWER 0000 0000 0101 0111 1000 1000 0011 0011 To know how to interpret the bits, we need to determine the instruction format, and to do that we first need to determine the opcode. The opcode is the rightmost 7 bits, or 0110011. Searching Figure 2.20 for this value, we see that the opcode corresponds to the R-type arithmetic instructions. Thus, we can parse the binary format into fields listed in Figure 2.21: funct7 rs2 rs1 funct3 rd opcode 0000000 00101 01111 000 10000 0110011 We decode the rest of the instruction by looking at the field values. The funct7 and funct3 fields are both zero, indicating the instruction is add. The decimal values for the register operands are 5 for the rs2 field, 15 for rs1, and 16 for rd. These numbers represent registers x5, x15, and x16. Now we can reveal the assembly instruction: add x16, x15, x5 Figure 2.19 shows all the RISC-V instruction formats. Figure 2.1 on pages 70–71 shows the RISC-V assembly language revealed in this chapter. The next chapter covers RISC-V instructions for multiply, divide, and arithmetic for real numbers. FIGURE 2.19 Four RISC-V instruction formats. Figure 4.14.6 reveals the missing RISC-V formats for conditional branch (SB) and unconditional jumps (UJ), whose formats match the lengths of the fields in the S and U types, but the bits are swirled around. The rationale for SB and UJ makes more sense once you have an understanding of hardware given in Chapter 4, as SB and UJ simplify the hardware but give the assembler a little more to do. 128 Chapter 2 Instructions: Language of the Computer Check I. What is the range of byte addresses for conditional branches in RISC-V Yourself (K = 1024)? 1. Addresses between 0 and 4K − 1 2. Addresses between 0 and 8K − 1 3. Addresses up to about 2K before the branch to about 2K after 4. Addresses up to about 4K before the branch to about 4K after II. What is the range of byte addresses for the jump-and-link instruction in RISC-V (M = 1024K)? 1. Addresses between 0 and 512K − 1 2. Addresses between 0 and 1M − 1 3. Addresses up to about 512K before the branch to about 512K after 4. Addresses up to about 1M before the branch to about 1M after Parallelism and Instructions: 2.11 Synchronization Parallel execution is easier when tasks are independent, but often they need to cooperate. Cooperation usually means some tasks are writing new values that others must read. To know when a task is finished writing so that it is safe for another to read, the tasks need to synchronize. If they don’t synchronize, there is a data race Two memory danger of a data race, where the results of the program can change depending on accesses form a data race how events happen to occur. if they are from different For example, recall the analogy of the eight reporters writing a story on pages 44–45 threads to the same of Chapter 1. Suppose one reporter needs to read all the prior sections before writing location, at least one is a write, and they occur one a conclusion. Hence, he or she must know when the other reporters have finished after another. their sections, so that there is no danger of sections being changed afterwards. That is, they had better synchronize the writing and reading of each section so that the conclusion will be consistent with what is printed in the prior sections. In computing, synchronization mechanisms are typically built with user-level software routines that rely on hardware-supplied synchronization instructions. In this section, we focus on the implementation of lock and unlock synchronization operations. Lock and unlock can be used straightforwardly to create regions where only a single processor can operate, called a mutual exclusion, as well as to implement more complex synchronization mechanisms. The critical ability we require to implement synchronization in a multiprocessor is a set of hardware primitives with the ability to atomically read and modify a memory location. That is, nothing else can interpose itself between the read and the write of the memory location. Without such a capability, the cost of building basic synchronization primitives will be high and will increase unreasonably as the processor count increases.

Document Details

Related

Full Transcript

Upgrade to continue