3.3 Multiplication PDF

3.3 Multiplication 193 Some programming languages allow two’s complement integer arithmetic on Check variables declared byte and half, whereas RISC-V only has integer arithmetic Yourself operations on full words. As we recall from Chapter 2, RISC-V does have data transfer operations for bytes and halfwords. What RISC-V instructions should be generated for byte and halfword arithmetic operations? 1. Load with lb, lh; arithmetic with add, sub, mul, div, using and to mask result to 8 or 16 bits after each operation; then store using sb, sh. 2. Load with lb, lh; arithmetic with add, sub, mul, div; then store using sb, sh. Elaboration: One feature not generally found in general-purpose microprocessors is saturating operations. Saturation means that when a calculation overflows, the result is set to the largest positive number or the most negative number, rather than a modulo calculation as in two’s complement arithmetic. Saturation is likely what you want for media operations. For example, the volume knob on a radio set would be frustrating if, as you turned it, the volume would get continuously louder for a while and then immediately very soft. A knob with saturation would stop at the highest volume no matter how far you turned it. Multimedia extensions to standard instruction sets often offer saturating arithmetic. Elaboration: The speed of addition depends on how quickly the carry into the high- order bits is computed. There are a variety of schemes to anticipate the carry so that the worst-case scenario is a function of the log2 of the number of bits in the adder. These anticipatory signals are faster because they go through fewer gates in sequence, but it takes many more gates to anticipate the proper carry. The most popular is carry lookahead, which Section A.6 in Appendix A describes. 3.3 Multiplication Now that we have completed the explanation of addition and subtraction, we are ready to build the more vexing operation of multiplication. Multiplication is First, let’s review the multiplication of decimal numbers in longhand to remind vexation, Division is ourselves of the steps of multiplication and the names of the operands. For reasons as bad; The rule of that will become clear shortly, we limit this decimal example to using only the three doth puzzle me, digits 0 and 1. Multiplying 1000ten by 1001ten: And practice drives me mad. Anonymous, Elizabethan manuscript, 1570 194 Chapter 3 Arithmetic for Computers The first operand is called the multiplicand and the second the multiplier. The final result is called the product. As you may recall, the algorithm learned in grammar school is to take the digits of the multiplier one at a time from right to left, multiplying the multiplicand by the single digit of the multiplier, and shifting the intermediate product one digit to the left of the earlier intermediate products. The first observation is that the number of digits in the product is considerably larger than the number in either the multiplicand or the multiplier. In fact, if we ignore the sign bits, the length of the multiplication of an n-bit multiplicand and an m-bit multiplier is a product that is n + m bits long. That is, n + m bits are required to represent all possible products. Hence, like add, multiply must cope with overflow because we frequently want a 32-bit product as the result of multiplying two 32-bit numbers. In this example, we restricted the decimal digits to 0 and 1. With only two choices, each step of the multiplication is simple: 1. Just place a copy of the multiplicand (1 × multiplicand) in the proper place if the multiplier digit is a 1, or 2. Place 0 (0 × multiplicand) in the proper place if the digit is 0. Although the decimal example above happens to use only 0 and 1, multiplication of binary numbers must always use 0 and 1, and thus always offers only these two choices. Now that we have reviewed the basics of multiplication, the traditional next step is to provide the highly optimized multiply hardware. We break with tradition in the belief that you will gain a better understanding by seeing the evolution of the multiply hardware and algorithm through multiple generations. For now, let’s assume that we are multiplying only positive numbers. Sequential Version of the Multiplication Algorithm and Hardware This design mimics the algorithm we learned in grammar school; Figure 3.3 shows the hardware. We have drawn the hardware so that data flow from top to bottom to resemble more closely the paper-and-pencil method. Let’s assume that the multiplier is in the 32-bit multiplier register and that the 64-bit product register is initialized to 0. From the paper-and-pencil example above, it’s clear that we will need to move the multiplicand left one digit each step, as it may be added to the intermediate products. Over 32 steps, a 32-bit multiplicand would move 32 bits to the left. Hence, we need a 64-bit multiplicand register, initialized with the 32-bit multiplicand in the right half and zero in the left half. This register is then shifted left 1 bit each step to align the multiplicand with the sum being accumulated in the 64-bit product register. 3.3 Multiplication 195 Multiplicand Shift left 64 bits Multiplier 64-bit ALU Shift right 32 bits Product Control test Write 64 bits FIGURE 3.3 First version of the multiplication hardware. The Multiplicand register, ALU, and Product register are all 64 bits wide, with only the Multiplier register containing 32 bits. (Appendix A describes ALUs.) The 32-bit multiplicand starts in the right half of the Multiplicand register and is shifted left 1 bit on each step. The multiplier is shifted in the opposite direction at each step. The algorithm starts with the product initialized to 0. Control decides when to shift the Multiplicand and Multiplier registers and when to write new values into the Product register. Figure 3.4 shows the three basic steps needed for each bit. The least significant bit of the multiplier (Multiplier0) determines whether the multiplicand is added to the Product register. The left shift in step 2 has the effect of moving the intermediate operands to the left, just as when multiplying with paper and pencil. The shift right in step 3 gives us the next bit of the multiplier to examine in the following iteration. These three steps are repeated 32 times to obtain the product. If each step took one clock cycle, this algorithm would require almost 200 clock cycles to multiply two 32-bit numbers. The relative importance of arithmetic operations like multiply varies with the program, but addition and subtraction may be anywhere from 5 to 100 times more popular than multiply. Accordingly, in many applications, multiply can take several clock cycles without significantly affecting performance. However, Amdahl’s Law (see Section 1.10) reminds us that even a moderate frequency for a slow operation can limit performance. This algorithm and hardware are easily refined to take one clock cycle per step. The speed up comes from performing the operations in parallel: the multiplier and multiplicand are shifted while the multiplicand is added to the product if the multiplier bit is a 1. The hardware only has to ensure that it tests the right bit of the multiplier and gets the preshifted version of the multiplicand. The hardware is usually further optimized to halve the width of the adder and registers by noticing where there are unused portions of registers and adders. Figure 3.5 shows the revised hardware. 196 Chapter 3 Arithmetic for Computers Start Multiplier0 = 1 1. Test Multiplier0 = 0 Multiplier0 1a. Add multiplicand to product and place the result in Product register 2. Shift the Multiplicand register left 1 bit 3. Shift the Multiplier register right 1 bit No:

3.3 Multiplication PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue