Lecture 18 Fixed-Point Arithmetic PDF

Document Details

CleanestAntigorite3616

Uploaded by CleanestAntigorite3616

University of Tripoli

David Money Harris and Sarah L. Harris

Tags

floating point arithmetic digital design computer architecture number systems

Summary

This lecture document discusses floating-point arithmetic, including fixed-point numbers, the IEEE 754 standard, and floating-point operations. It provides examples to illustrate the concepts of normalization and various steps in floating-point addition and multiplication. The examples use binary representations and scientific notation.

Full Transcript

Floating Point Arithmetic Chapter 5: Digital Design and Computer Architecture, 2nd Edition David Money Harris and Sarah L. Harris Chapter 5 Outline: 1- Floating Point Numbers 2- IEEE Standard for Floating-Point Arithmetic (IEEE 754) 3- Floating Point Addi...

Floating Point Arithmetic Chapter 5: Digital Design and Computer Architecture, 2nd Edition David Money Harris and Sarah L. Harris Chapter 5 Outline: 1- Floating Point Numbers 2- IEEE Standard for Floating-Point Arithmetic (IEEE 754) 3- Floating Point Addition 4- Floating Point Multiplication Chapter 5 Number Systems Binary representations : – Positive numbers Unsigned binary – Negative numbers Sign magnitude Two’s complement What about fractions? How Computer calculate fractions? Chapter 5 Numbers with Fractions Two common notations: – Fixed-point: binary point fixed some of the bits represent the integer part, and the rest represent the fraction. – Floating-point: binary point floats to the right of the most significant 1 Scientific notation, with a mantissa and an exponent. The advantage of floating-point representation over fixed-point and integer representation is that it supports a much wider range of values. Chapter 5 Fixed-Point Numbers 6.75 using 4 integer bits and 4 fraction bits: Binary point is implied The number of integer and fraction bits must be agreed upon beforehand Chapter 5 Floating-Point Numbers Floating-point numbers allowing the representation of very large and very small numbers. In general, a number is written in scientific notation as: a sign, mantissa (M), base (B), and exponent (E). Chapter 5 Floating-Point Numbers For example, write 27310 in scientific notation: Chapter 5 Floating-Point Numbers For example, write 27310 in scientific notation: 273 = 2.73 × 102 – In the example, M = 2.73, B = 10, and E = 2 Chapter 5 IEEE Standard 754 floating point The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point computation which was established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found in the diverse floating point implementations that made them difficult to use reliably and reduced their portability. IEEE Standard 754 floating point is the most common representation today for real numbers on computers, including Intel-based PC’s, Macs, and most Unix platforms. Chapter 5 IEEE Standard 754 floating point 1. The Sign of Mantissa S– 0 represents a positive number while 1 represents a negative number. 2. The Biased exponent The exponent field needs to represent both positive and negative exponents. A bias is added to the actual exponent in order to get the stored exponent. 3. The Normalised Mantissa M– The mantissa is part of a number in scientific notation or a floating-point number, consisting of its significant digits. binary digit can be either 0 or 1. So a normalised mantissa is one with only one 1 to the left of the decimal. Chapter 5 Floating-Point Precision Single-Precision: – 32-bit – 1 sign bit, 8 exponent bits, 23 fraction bits – bias = 127 Double-Precision: – 64-bit – 1 sign bit, 11 exponent bits, 52 fraction bits – bias = 1023 Chapter 5 IEEE 754 Floating-Point Number Chapter 5 ٍSingle Precision Example Example: represent the value 22810 using a 32-bit floating point representation Chapter 5 Floating-Point Representation 1 Example: represent the value 22810 using a 32-bit floating point representation 1. Convert decimal to binary : 22810 = 111001002 Chapter 5 Floating-Point Representation 1 1. Convert decimal to binary : 22810 = 111001002 2. Write the number in “binary scientific notation”: Binary point floats to the right of the most significant 1 111001002 = 1.110012 × 27 3. Fill in each field of the 32-bit floating point number: – The sign bit is positive (0) – The 8 exponent bits represent the value 7 – The remaining 23 bits are the mantissa Chapter 5 Floating-Point Representation 2 First bit of the mantissa is always 1: – 22810 = 111001002 = 1.11001 × 27 So, no need to store it: implicit leading 1 Store just fraction bits in 23-bit field Chapter 5 Floating-Point Representation 3 Biased exponent: bias = 127 (011111112) – Biased exponent = bias + exponent – Exponent of 7 is stored as: 127 + 7 = 134 = 0x100001102 The IEEE 754 32-bit floating-point representation of 22810 in hexadecimal: 0x43640000 Chapter 5 Floating-Point Example Write -58.2510 in floating point (IEEE 754) Chapter 5 Floating-Point Example Write -58.2510 in floating point (IEEE 754) 1. Convert decimal to binary: 58.2510 = 111010.012 Chapter 5 Floating-Point Example Write -58.2510 in floating point (IEEE 754) 1. Convert decimal to binary: 58.2510 = 111010.012 2. Write in binary scientific notation: 1.1101001 × 25 3. Fill in fields: Sign bit: 1 (negative) 8 exponent bits: (127 + 5) = 132 = 100001002 23 fraction bits: 110 1001 0000 0000 0000 0000 in hexadecimal: 0xC2690000 Chapter 5 Floating-Point: Special Cases Number Sign Exponent Fraction 0 X 00000000 00000000000000000000000 ∞ 0 11111111 00000000000000000000000 -∞ 1 11111111 00000000000000000000000 NaN X 11111111 non-zero Chapter 5 Floating-Point: Rounding Arithmetic results that fall outside of the available precision must round to a neighboring number. The rounding modes are: round down, round up, round toward zero, round to nearest. The default rounding mode is round to nearest. Recall that a number overflows when its magnitude is too large to be represented. Likewise, a number underflows when it is too tiny to be represented. In round to nearest mode, overflows are rounded up to ± ∞, and underflows are rounded down to 0. Chapter 5 Floating-Point: Rounding Chapter 5 Floating-Point Addition Chapter 5 Floating-Point Addition Chapter 5 Floating-Point Addition Chapter 5 Floating-Point Addition Chapter 5 Floating-Point Addition Chapter 5 Floating-Point Addition Chapter 5 Floating-Point Addition Chapter 5 Floating-Point Addition addition with two’s complement numbers. The steps for adding floating-point numbers with the same sign are as follows: 1) Extract exponent and fraction bits 2) Prepend leading 1 to form mantissa 3) Compare exponents 4) Shift smaller mantissa if necessary 5) Add mantissas 6) Normalize mantissa and adjust exponent if necessary 7) Round result 8) Assemble exponent and fraction back into floating- point format Chapter 5 Floating-Point Addition/Subtraction Chapter 5 Floating-Point Addition Example Add the following floating point numbers: 7.875 + 0.1875 Chapter 5 Floating-Point Addition Example Add the following floating point numbers: 7.875 + 0.1875 = 8.0625 0.1875= 0.0011 7.875= 111.111 → Chapter 5 Floating-Point Addition Example Add the following floating point numbers: 7.875 + 0.1875 = 8.0625 0.1875= 0.0011 → 1.1 E-3 7.875= 111.111 → 1.11111 E+2 Chapter 5 Chapter 5 Floating-Point Addition Example Chapter 5 Floating-Point Addition Example Chapter 5 Floating-Point Addition Example Chapter 5 Floating-Point Addition Example Chapter 5 Floating-Point Multiplication Chapter 5 Floating-Point Multiplication Chapter 5 Floating-Point Multiplication Chapter 5 Floating-Point Multiplication Multiply A=2.5A = 2.5A=2.5 and B=1.5B = 1.5B=1.5 using single-precision IEEE 754 representation. Chapter 5 Floating-Point Multiplication Multiply A=2.5 and B=1.5 using single-precision IEEE 754 representation. Chapter 5 Floating-Point Multiplication Chapter 5 Floating-Point Multiplication Chapter 5

Use Quizgecko on...
Browser
Browser