Floating Point Arithmetic - Unit 1 PDF

Document Details

PropitiousVariable7089

Uploaded by PropitiousVariable7089

Marwadi University

Tags

floating point arithmetic numerical analysis computer science mathematics

Summary

This document covers the topic of floating point arithmetic, including errors, truncation, and normalized forms. It features solved examples of addition, subtraction, multiplication, and division and multiple choice questions. This resource appears to be from Marwadi University.

Full Transcript

Faculty of Computer Applications Bachelors of Computer Applications BCA - Sem 2 Mathematics - 2 05BC3201 Unit 1 Floating Point Arithmetic NAAC A+ Accredited University UNIT 1: FLOATING POINT ARITHM...

Faculty of Computer Applications Bachelors of Computer Applications BCA - Sem 2 Mathematics - 2 05BC3201 Unit 1 Floating Point Arithmetic NAAC A+ Accredited University UNIT 1: FLOATING POINT ARITHMETIC Course Content: 1) Errors: Addition Operation, Subtraction Operation, Division Operation, Multiplication Operation 2) Types of Errors: Data Errors, Truncation Errors, Round off Errors, Computational Errors 3) Measure of Accuracy: Absolute Error, Relative Error ❖ Error: An error is defined as the difference between the actual value and the approximate value obtained from the experimental observation or from numerical computation. Let x denote the actual value of a quantity and xa denote its approximated value. Then the Error is defined as, Error = Actual Value – Approximated Value ∴ E = x – xa Example: Find the Error when actual value is 0.987642 and approximated value is 0.987630. Here x = 0.987642 and xa = 0.987630 ∴ E = x – xa = 0.987642 – 0.987630 = 0.000012 Truncation Error Truncation Errors occur when some digits from the number are discarded. There are mainly two situations when truncation error occurs. (i) During the representation of numbers in Normalized floating-point form. Because in this only few digits in the mantissa can be accommodated, for example only four digits in our hypothetical computer. (ii) During the conversion of a number from one system to another. NAAC A+ Accredited University Example: Truncate last 4 digits from the following numbers. (i) 0.90012346 Truncated value: 0.9001 (ii) 0.8912390821 Truncated value: 0.891239 (iii) 0.98567342 Truncated value: 0.9856 Round off Error The Errors occurring due to rounding off the digits is called Round off Errors. - Rounding off the number is similar to the truncation but with some adjustment to the last digit of the remaining digits depending upon the first digit of truncation. - Suppose a number is required to be rounded off to the nth decimal place. Then 1 is added to the nth decimal digit if the (n + 1)th digit is from 5 to 9, and the nth digit is kept unchanged if the (n + 1)th digit is from 0 to 4. Example: Round off last 4 digits from the following numbers. (i) 0.90012346 Rounded off value: 0.9001 (ii) 0.8912390821 Rounded off value: 0.891239 (iii) 0.98567342 Rounded off value: 0.9857 Example: Find round off error and truncation error of the following. (i) 0.789021567 (last 3 digits) Truncated value = 0.789021 Rounded off value = 0.789022 NAAC A+ Accredited University (ii) 0.0098790123 (last 4 digits) x = 0.98790123 × 10-2 Truncated value = 0.9879 × 10-2 Rounded off value = 0.9879 × 10-2 (iii) 0.0034563216789 (last 4 digits) x = 0.34563216789 × 10-2 Truncated value = 0.3456321 × 10-2 Rounded off value = 0.3456322 × 10-2 (iv) 1.5678912 (last 3 digits) x = 0.15678912 × 101 Truncated value = 0.15678 × 101 Rounded off value = 0.15679 × 101 ❖ Normalized Floating Point Form: While noting a decimal number in the normalised or the standard ‘floating point form’, the digit on the left of the decimal point should be zero, and the digit to the right of the decimal point should be non-zero. Also, note that the notation En denotes multiplication with 10 to the power n. Example: abc E5 = abc × 105. For the number abc E5, the part ‘abc’ is called the mantissa and ‘5’ is called the exponent Examples: Convert the following values in the Normalized Floating Point Form. (1) 0.0312 E3 Answer: 0.312 E2 (2) 0.009723 E9 Answer: 0.9723 E7 (3) 1.2375 E5 Answer: 0.12375 E6 NAAC A+ Accredited University (4) 0.145744 E18 Answer: 0.145744 E18 (5) 10.371 E7 Answer: 0.10371 E9 (6) 4.4440 E2 Answer: 0.4444 E3 (7) 9.9008 E104 Answer: 0.99008 E105 (8) 0.00294 E(-17) Answer: 0.294 E(-19) (9) 0.45703 E(-21) Answer: 0.4570 E(-21) (10) 12.314 E(-24) Answer: 0.1231 E(-22) Addition of Floating Point numbers: Condition: The exponents must be equal (select the larger of the two). Note: Express the addition in the standard floating point form. Examples: Add the following floating point numbers. (1) 0.3254 E5 and 0.5464 E5 0.3254 E5 + 0.5464 E5 = 0.8718 E5 (2) 0.3254 E2 and 0.5462 E5 0.3254 E2 = 0.0003 E5 (E2 to E5) 0.3254 E2 + 0.5462 E5 = 0.0003 E5 + 0.5462 E5 NAAC A+ Accredited University = 0.5465 E5 (3) 0.5467 E5 and 0.7253 E3 0.7253 E3 = 0.0072 E5 (E3 to E5) 0.5467 E5 + 0.7253 E3 = 0.5467 E5 + 0.0072 E5 = 0.5539 E5 (4) 0.7254 E2 and 0.5467 E2 0.7254 E2 + 0.5467 E2 = 1.272 E2 = 0.1272 E3 (5) 0.07 E(-1) and 0.66 E(-3) 0.66 E(-3) = 0.0066 E(-1) 0.0700 E(-1) + 0.66 E(-3) 0.0700 E(-1) + 0.0066 E(-1) = 0.0766 E(-1) = 0.7660 E(-2) Subtraction of Floating Point numbers: Condition: The exponents must be equal (select the larger of the two). Note: Express the subtraction in the standard floating point form. Examples: Subtract the following floating point numbers. (1) Subtract 0.7254 E5 from 0.7288 E5 0.7288 E5 – 0.7254 E5 = 0.0034 E5 = 0.3400 E3 NAAC A+ Accredited University (2) Subtract 0.7253 E2 from 0.5467 E5 Now 0.7253 E2 = 0.0007 E5  0.5467 E5 – 0.7253 E2 = 0.5467 E5 – 0.0007 E5 = 0.5460 E5 (3) Subtract 0.7254 E(-99) from 0.7288 E(-99) 0.7288 E(-99) – 0.7254 E(-99) = 0.0034 E(-99) = 0.3400 E(-101) (4) Subtract 0.5423 E(-1) from 0.6298 E2 Now 0.5423 E(-1) = 0.0005 E2  0.6298 E2 – 0.5423 E(-1) = 0.6298 E2 – 0.0005 E2 = 0.6293 E2 (5) Subtract 0.2834 E(-99) from 0.5492 E(-97) Now 0.2834 E(-99) = 0.0028 E(-97)  0.5492 E(-97) – 0.2834 E(-99) = 0.5492 E(-97) – 0.0028 E(-97) = 0.5464 E(-97) Multiplication of Floating Point numbers: Note: Mantissa parts are to be multiplied and the exponents are to be added. The product is to be expressed in standard floating point format. NAAC A+ Accredited University Examples: Multiply the following floating point numbers. (1) 0.6543 E5 and 0.2253 E3 0.6543 E5 × 0.2253 E3 = (0.6543 × 0.2253) E(5 + 3) = 0.1474 E8 (2) 0.1234 E5 by 0.1111 E13 0.1234 E5 × 0.1111 E13 = (0.1234 × 0.1111) E(5 + 13) = 0.0137 E18 = 0.1370 E17 (3) 0.1234 E(-75) by 0.1111 E37 0.1234 E(-75) × 0.1111 E37 = (0.1234 × 0.1111) E(-75 + 37) = 0.0137 E(-38) = 0.1370 E(-39) (4) 0.1235 E20 by 0.1298 E(-11) 0.1235 E20 × 0.1298 E(-11) = (0.1235 × 0.1298) E(20 – 11) = 0.0160 E9 = 0.1600 E8 Division of Floating Point Numbers: Note: The Mantissa part is to be divided and the exponent of second number is to be subtracted from the exponent of the first number. The quotient is written in standard form. NAAC A+ Accredited University Examples: Divide the following floating point numbers. (1) Divide 0.8888 E5 by 0.2000 E3 0.8888 E5 ÷ 0.2000 E3 = (0.8888 ÷ 0.2000) E(5 – 3) = 4.4444 E2 = 0.4444 E3 (2) Divide 0.9998 E5 by 0.1000 E(-99) 0.9998 E ÷ 0.1000 E(-99) = (0.9998 ÷ 0.1000) E(5 – (-99)) = 0.9998 E 105 Absolute Error: (EA) Absolute Error is defined as the positive difference between the actual value and the approximated value of the observation. - If x is actual value (true value) and xa is approximated value then Absolute Error is calculated by, Absolute Error = | Actual Value – Approximated Value | ∴ EA = | x – xa | Example: Let x = 0.00458529. Find the absolute error if x is truncated to 3 decimal digits. Solution: Given x = 0.00458529 ∴ x = 0.458529 × 10-2 ∴ xa = 0.458 × 10-2 (after truncating to the 3 decimal places) ∴ Absolute Error EA = | x – xa | = | 0.458529 × 10-2 – 0.458× 10-2 | = | (0.458529 – 0.458) ×10-2 | NAAC A+ Accredited University = 0.000529 × 10-2 = 0.529 ×10 -2 + (-3) = 0.529 × 10-5 Example: Let x = 0.00458529. Find the value of absolute error if x is rounded off to three decimal digits. Solution: Given x = 0.00458529 ∴ x = 0.458529 × 10-2 ∴ xa = 0.459 × 10-2 (after rounding off to the three decimal places) ∴ Absolute Error EA = | x – xa | = | 0.458529 × 10-2 – 0.459× 10-2 | = | -0.000471 × 10-2 | = 0.000471 × 10-2 = 0.471 × 10-5 Relative Error: (ER) Relative Error is the ratio of the error to the actual value of a variable. If x is the actual value and xa is the approximated value then relative error is given by, 𝑥−𝑥𝑎 ER = 𝑥 Example: Let x = 0.00599821. Find the relative error if x is truncated to 3 decimal digits. Solution: Here x = 0.00599821 ∴ x = 0.599821 × 10-2 ∴ xa = 0.599 × 10-2 𝑥−𝑥𝑎 Now, Relative Error = 𝑥 0.599821×10−2 − 0.599×10−2 = 0.599821×10−2 NAAC A+ Accredited University 0.000821×10−2 = 0.599821×10−2 0.821×10−5 = 0.599821×10−2 = 1.36874 × 10-5-(-2) = 1.36874 × 10-3 = 0.136874 ×10-3+1 = 0.136874×10-2 Example: Let x = 0.00597621. Find the relative error if x is round off to three decimal digits. x = 0.597621 x 10-2 xa = 0.598 x 10-2 𝑥−𝑥𝑎 ER = = -0.63418119 x 10-3 𝑥 M.C.Q.’s: 1)Which of the following is correct normalized floating point form of 0.009723 E9? a) 0.9723 E7 b) 0.9723 E8 c) 0.9723 E11 d) None of these 2) Which of the following is correct normalized floating point form of 1.2375 E5? a) 0.1238 E6 b) 0.1237 E6 c) 0.12375 E4 d) None of these 3) Which of the following is correct form of 0.12347891 after truncating last 4 digits? a) 0.1235 NAAC A+ Accredited University b) 0.7891 c) 0.1291 d) 0.1234 4) 0.1234 E75 × 0.1111 E37 = ______ a) 0.1370 E75 b) 0.1370 E111 c) 0.1370 E34 d) None of these 5) Which of the following is correct formula for relative error? a) Er = (x – xa) / x b) Er = | x – xa | c) Er = (x + xa) / x d) Er = (x. xa) / x 6) Which of the following is correct formula for Absolute Error? a) Ea = x – xa b) Ea = | x + xa | c) Ea = | x – xa | d) Ea = x / xa 7) 0.5467 E2 + 0.7254 E2 = _____ a) 0.1272 E3 b) 0.1272 E2 c) 0.1272 E4 d) 0.1272 E1 NAAC A+ Accredited University NAAC A+ Accredited University