Introduction to Numerical Analysis PDF

Document Details

InvaluableSimile6630

Uploaded by InvaluableSimile6630

CVM-USM

drlmpaleta

Tags

numerical analysis algorithms computer arithmetic mathematics

Summary

This document provides an introduction to numerical analysis, emphasizing the theory, applications, and implementation of numerical methods. It covers mathematical software libraries and scientific computing environments, and touches on important topics like algorithms and floating-point arithmetic.

Full Transcript

Introduction to Numerical Analysis Prepared by drlmpaleta Contents 1 Introduction 2 1.1 Learning objectives...............................

Introduction to Numerical Analysis Prepared by drlmpaleta Contents 1 Introduction 2 1.1 Learning objectives............................ 2 1.2 Algorithms................................ 3 1.3 Good algorithms............................. 5 2 Mathematical Software 5 2.1 Mathematical Software Libraries.................... 6 2.2 Scientific Computing Environments................... 6 2.3 Computer Arithmetic.......................... 8 2.4 Floating-point arithmetic and round-off error............. 8 2.5 Truncation and rounding........................ 9 2.6 Simplified model for computer arithmetic............... 11 2.7 An algorithm for floating-point addition................ 11 2.8 Exercises................................. 13 3 Measuring the error 14 3.1 Absolute error.............................. 14 3.2 Relative error............................... 14 3.3 Course themes.............................. 16 3.4 Desirable properties of numerical methods............... 16 3.5 Algorithms and pseudocode....................... 16 3.6 Absolute/relative error.......................... 16 4 Floating point arithmetic 16 4.1 Floating point number systems..................... 16 4.2 Implementation on a computer..................... 16 4.3 Rounding errors.............................. 16 4.4 Cancellation................................ 16 1 1 Introduction 1.1 Learning objectives At the end of this unit, students will be able to: 1. define numerical analysis 2. give some examples and applications of numerical analysis The field of numerical analysis, broadly speaking, is concerned with using algorithms to solve problems of continuous mathematics. Because obtaining exact solutions is infeasible or impossible, we must obtain approximations. The primary goal is to derive methods for ‘numerically’ solving problems - using a computer to do the calculations - and to understand the relevant properties. The field is situated somewhere between pure analysis and computer science, and draws from both. Some examples in the umbrella of numerical analysis: 1. Theory (mostly math) Convergence (limits of sequences that approach the true solution) Finite-dimensional spaces for approximation Discrete analogues of continuous processes 2. Applied (somewhere in between) Derivation of (practical) numerical methods Intuition for interpreting results, measuring error Adapting/generalizing methods to get desired properties 3. Implementation (mostly computer science) Translating methods to actual code Efficient implementation Packaging algorithms for general use The applied side of numerical methods is often called scientific computing to distinguish it from the theory, but really all aspects are intertwined. In this course, we focus more on the first two aspects and address the last one in less depth. Hopefully, you will be convinced by the end that an understanding of the underlying mathematics is valuable, even when one is concerned with practical results. 2 1.2 Algorithms A numerical algorithm is a sequence of steps that takes an input and returns some output that (approximately) solves some mathematical problem. Such algorithms can be considered at three levels: M athematical description =⇒ Algorithm =⇒ Implementation(code) On one end, the code has complete computational detail (variables, memory, control structure etc.), and on the other end, the method is described in abstract terms - as a mathematical process. We will often work with this‘high-level’ description of the algorithm and leave out computational details. In between the purely mathematical and the code is an algorithm in pseudocode. ( At this level, the computational aspects are determined and laid out, so that the pseudocode could be more or less directly translated to code. At its most precise, pseudocode is specific enough that any two users who implement it should get code that does the same thing. Example 1.1 (Algorithms and Pseudocode): The Fibonacci numbers are defined by F0 = F1 = 1, FJ = Fj−1 + Fj−2 , j ≥ 2. This is the ’mathematical description’ - it tells us exactly how to generate the numbers. But it does not specify how it should be done, and there are decisions to be made! Algorithm 1 takes an integer N > 0 and generates all the Fibonacci numbers up to FN (typeset using the algorithmcx package). As long as it is readable and precise, the notation in pseudocode is up to you (there are a variety of styles). Because the problem is trivial, the algorithm is just (1) written as a for loop. Figure 1: Fibonacci numbers Algorithm 1 3 Now suppose we only need to generate the nth number, and not the whole sequence. An efficient algorithm to do so should minimize waste (unneeded storage or computations). Algorithm 2uses only 3 variables instead of a whole array by re-using them. For the code, there are only a few things to worry about - for instance, in Algorithm 1, the length N + 1 array should be initialized at the start so it does not have to grow in the for loop. In both, we may wish to throw an error if N is so large that the result would overflow, a practical concern that does not have much to do with the algorithm. Figure 2: Fibonacci numbers Algorithm 2 4 1.3 Good algorithms Some desirable properties for numerical methods are... 1. Accuracy (Is the output close to the exact value?) How should we measure ‘error’ and ‘accuracy’ for a given problem? Given a tolerance ϵ, can the algorithm find a solution to within ϵ? Can the algorithm tell us the error (how do we know the solution is close)? 2. Efficiency Time efficiency: Is the algorithm fast? Space efficiency: how much memory is needed? How do the above scale with the size of the problem? 3. Stability/Reliability Do small changes in inputs lead to small changes in the solution? How c an we control errors that propagate through the algorithm? How much human attention is required? (Ideally, one should be able to feed it inputs, get an output and not have to worry about what happens inside.) 4. Other concerns Can the method handle a wide range of inputs? (How general is it?) Is it simple or convenient to implement? An algorithm that drastically fails at any of the three is probably useless. An ideal algorithm is one that is accurate, efficient, and reliable. Such algorithms are rare - most of the time, there are trade-offs involved. For most problems, there are a variety of methods with different properties - and numerical analysis provides us with the intuition to choose which is the right one to use. 2 Mathematical Software To be able to solve interesting computational problems, we will often rely on mathematical softwares written by professionals. Our primary goal is to become intelligent users,, rather than creators, of mathematical software. Befoe citing some specific sources of good mathematical software, let us summarize the desirable characteristics that such software should posses, in no particular order of importance: 5 Reliability: always works correctly for easy problems. Robustness: usually works for hard problems, but fails gracefully and informatively when does fail. Software robustness refers to the ability of a software system to continue functioning correctly and reliably even in the face of unexpected or abnormal inputs or situations. These might include things like incorrect user input, network failures, hardware malfunctions, or even deliberate attempts to disrupt the system Accuracy: produces results as accurate as warranted by the problem and input data, preferably with an estimate of the accuracy achieved Efficiency: requires execution time and storage that are close to the minimum possible for problem being solved Portability: adapts with little or no change to new computing environments. Software portability is the possibility to use the same software in different environments. It applies to the software that is available for two or more different platforms or can be recompiled for them Maintainability: is easy to understand and modify Usability: has a convenient and well-document user interface Applicability: solves a broad range of problems These properties often conflict, and it is rare indeed that satisfies all of them. Nevertheless, this list gives mathematical software users some idea what qualifies to look for and developers some worthy goals to strive for. 2.1 Mathematical Software Libraries Several widely available sources of general-purpose mathematical software are listed here. The software listed is written in Fortran unless otherwise noted. 2.2 Scientific Computing Environments The software libraries just listed contain subroutines that are meant to be called by user-written programs, usually in a conventional programming language such as Fortranor C. An increasingly popular alternative for scientific computing is interactive environments that provide powerful, conveniently accessible, built-in mathematical capabilities, often combined with sophisticated graphics and a very high-level programming language designed for rapid prototyping of new algorithms. One of the most widely used such computing environments is MATLAB, which is a propriety commercial product of The MathWorks, Inc. (www.mathworks.com) MATLAB, which stands for MATrix LABoratory, is an interactive system that integrates extensive mathematical capabilities, with powerful scientific visualization. 6 Similar pacakages are freely available via the world-wide web, include octave (www.che.wisc.edu/octave), RLAB (rlab.sourceforge.net), and Scilab (www-rocq.inria.fr/scilab). Other similar commercial products include GAUSS, HiQ, IDL, Mathcad, and PV-WAVE. During the SEAMS School on Computing, we use FreeFEM++. Another family of interactive computing environments is based primarily on symbolic rather than numeric computation, often called computer algebra. These packages, which include Axiom, Derive, Macsyma, Maple, Mathematica, MuPAD and Reduce, provide many of the same mathematical and graphical capabilities, and in addition provide symbolic differentiation, integration, equation solving, polynomial manipulation as well as arbitrary precision arithmetic. MATLAB also has symbolic computation capabilities via a ”Symbolic Math” toolbox based on Maple. 7 2.3 Computer Arithmetic Computers allocate a fixed amount of storage to every number they use. Each number is stores as a fixed string of digits. In practice, so-called floating point numbers are used. A computer using four digit decimal arithmetic with floating point numbers would store The number (p, q) is called a floating point number p is called the MANTISSA (or REAL PART) and q is the CHARACTERISTIC or INDEX or EXPONENT. The mantissa is always a fixed number of digits and the index must lie in some range. Typically, −256 < IN DEX < 256. If the INDEX goes outside that range then we get underflow (less than -256) or overflow (greater than 256). Some computers/systems automatically replace underflow by the special number 0 (zero). Overflow always gives some sort of error. We note that the mantissa is always of the form 0... and the digit after the decimal point is always non-zero. This, the third example above 0.03 is stored as (0.3000, −1) and not as (0.0300, 0). We also note that there is no representation of zero. A computer normally has some special representation for this number. We further note, as in the fourth example, that the representation may not be exact. Finally, it should be remembered that in practice computers do not use decimal numbers. They actually use binary numbers. There is often some error in converting decimal numbers to or from binary numbers. 2.4 Floating-point arithmetic and round-off error Errors in floating-point arithmetic are more subtle than errors in integer arithmetic since, in contrast to integers, floating-point numbers can be just a little bit wrong. A result that appears to be reasonable may therefore contain errors, and it may be difficult to judge how large the error is. A simple example will illustrate. √ Example 2.1 On a typical calculator we compute x = 2, then y = x2 , and √ finally z = y − 2, i.e., the result should be z = ( 2)2 − 2 = 0, which of course is 0. The result reported by the calculator is 8 z = −1.38032020120975 × 10−16. This is a simple example of round-off error. The aim of this section is to explain why computations like the one in example 2.1 give this obviously wrong result. To do this we will give a very highlevel introduction to computer arithmetic and discuss the potential for errors in the four elementary arithmetic operations addition, subtraction, multiplication, and division with floating-point numbers. Perhaps a bit surprisingly, the conclusion will be that the most critical operation is addition/subtraction, which in certain situations may lead to dramatic errors. A word of warning is necessary before we continue. In this chapter, and in particular in this section, where we focus on the shortcomings of computer arithmetic, some may conclude that round-off errors are so dramatic that we had better not use a computer for serious calculations. This is a misunderstanding. The computer is an extremely powerful tool for numerical calculations that you should use whenever you think it may help you, and most of the time it will give you answers that are much more accurate than you need. However, you should be alert and aware of the fact that in certain cases errors may cause considerable problems. 2.5 Truncation and rounding Floating point numbers on most computers use binary representation, and we know that in the binary numeral system all real numbers that are fractions on the form ab , with a an integer and b = 2k for some positive integer k can be represented exactly 1 1 (provided a and b are not too large). This means that numbers like 2, 4 and 5 16 are represented exactly. On the other hand, it is easy to forget that numbers like 0.1 and 3.43 are not represented exactly. And of course all numbers that cannot be represented exactly in the decimal system cannot be represented exactly 1 5 in the binary system either. These numbers include fractions like 3 and 12 as well as all irrational numbers. Even before we start doing arithmetic we therefore have the challenge of finding good approximations to these numbers that cannot be represented exactly within the floating-point model being used. We will distinguish between two different ways to do this, truncation and rounding. Definition 2.2 (Truncation). A number is said to be truncated to m digits when each digit except the m leftmost ones is replaced by 0. Example 2.3 (Examples of truncation). The number 0.33333333 truncated to 4 digits is 0.3333, while 128.4 truncated to 2 digits is 120, and 0.67899 truncated to 4 digits is 0.6789. Note that truncating a positive number a to an integer is equivalent to applying the floor function to a, i.e., if the result is b then 9 b = ⌊a⌋. Truncation is a very simple way to convert any real number to a floating point number : We just start computing the digits of the number, and stop as soon as we have all the required digits. However, the error will often be much larger than necessary, as in the last example above. Rounding is an alternative to truncation that in general will make the error smaller, but is more complicated. Definition 2.4 (Rounding). A number is said to be rounded to m digits when it is replaced by the nearest number with the property that all digits beyond position m is 0. Example 2.5 (Examples of rounding). The number 0.33333333 rounded to 4 digits is 0.3333. The result of rounding 128.4 to 2 digits is 130, while 0.67899 rounded to 4 digits is 0.6790. Rounding is something most of us do regularly when we go shopping, as well as in many other contexts. However, there is one slight ambiguity in the definition of rounding: What to do when the given number is halfway between two m digit numbers. The standard rule taught in school is to round up in such situations. For example, we would usually round 1.15 to 2 digits as 1.2, but 1.1 would give the same error. For our purposes this is ok, but from a statistical point of view it is biased since we always round up when in doubt. Example 2.6 Using the 3 digit floating point arithmetic find the answer of the calculation a+b∗c M= b+c where a = 11.13, b = 1.247and c = −0.145. Identify the rounding error at each stage of the calculation and the total effect of rounding error. Solution: a := (+0.111, +2) rounding error = −0.0003 b := (+0.125, +1) rounding error = +0.003 c =: (−0.145, +0) rounding error = 0 Each calculation is performed as a series of single operations each with their own rounding error. Thus, we compute: 10 X := b ∗ c Y := a + X Z := b + c M := Y /Z and we obtain X := (−0.181, 0) rounding error=+0.00025 Y := a + X rounding error=-0.019 Z := b + c rounding error=+0.005 M := Y /Z rounding error=+0.00018 Thus the computed answer is 9.82. The exact answer is 9.8812. Hence, the total effect of rounding error, i.e. the computed value minus the exact value is -0.06. 2.6 Simplified model for computer arithmetic The details of computer arithmetic are technical, but luckily the essential features of both the arithmetic and the round-off errors are present if we use the same model of floating-point numbers. Recall that any real number a may be written as a normalised decimal number a × 10n , where the number a is in the range 0.1 ≤ a < 1 and is called the significand, while the integer n is called the exponent. Remark 2.7 (Simplified model of floating-point numbers). Most of the shortcomings of floating-point arithmetic become visible in a computer model that uses 4 digits for the significand, 1 digit for the exponent plus an optional sign for both the significand and the exponent. Two typical normalised (decimal) numbers in this model are 0.4521 × 101 and −0.910−5. The examples in this section will use this model, but the results can be generalised to a floating-point model in base β, 2.7 An algorithm for floating-point addition In order to understand how round-off errors occur in addition and subtraction, we need to understand the basic algorithm that is used. Since a − b = a + (−b), it is sufficient to consider addition of numbers. The basic procedure for adding floating-point numbers is simple (in reality it is more involved than what is stated here). 11 Algorithm: To add two floating-point numbers a and b on a computer, the following steps are performed: 1. The number with largest absolute value, say a, is written in normalised form a = α × 10n , and the other number b is written as b = β × 10n with the sam exponent as a and the same number of digits for the significand β. 2. The significands α and β are added, γ =α+β 3. The results c = γ × 10n is converted to normalized form. This apparently simple algorithm contains a serious pitfall which is most easily seen from some examples. Let us first consider a situation where everything works out nicely Example 2.8 (Standard case). Suppose that a = 5.645 and b = 7.821. We convert the numbers to normal form and obtain a = 0.5645 × 101 , b = 0.7821 × 101. We add the two significands 0.5645 + 0.7821 = 1.3466 so the correct answer is 1.3466 × 101. The last step is to convert this to normal form. In exact arithmetic this would yield the result 0.13466 × 102. However, this is not in normal form since the significand has five digits. We therefore perform rounding, 0.13466 ≈ 0.1347, and get the final result 0.1347 × 102. Previous example shows that we easily get an error when normalised numbers are added and the result converted to normalised form with a fixed number of digits for the significand. In this first example all the digits of the result are correct, so the error is far from serious. Example 2.9 (One large and one small number). If a = 42.34 and b = 0.0033 we convert the largest number to normal form 42.34 = 0.4234 × 102. 12 The smaller number b is then written in the same form (same exponent) 0.0033 = 0.000033 × 102. The significand in this second number must be rounded to four digits, and the result of this is 0.0000. The addition therefore becomes 0.4234 × 102 + 0.0000 × 102 = 0.4234 × 102. The error in the previous example may seem serious, but once again the result is correct to four decimal digits, which is the best we can hope for when we only have this number of digits available. Example 2.10 (Subtraction of two similar numbers II). Suppose that a = 10/7 and b = −1.42. Conversion to normal form yields 10 ≈ a = 0.1429 × 101 , b = −0.142 × 101. 7 Adding the significands yield 0.1429 − 0.142 = 0.0009. When this is converted to normal form, the result is 0.9000 × 10−3 while the true result rounded to four correct digits is 0.8571 × 10−3. 2.8 Exercises 1. a. Round 1.2386 to 1 digit b. Round 85.001 to 1 digit. c. Round 100 to 1 digit. d. Round 10.473 to 3 digits. e. Truncate 10.473 to 3 digits. f. Round 4525 to 3 digits. 2. The earliest programming languages would provide only the method of truncation for rounding non-integer numbers. This can lead sometimes lead to large errors as 2.000 − 0.001 = 1.999 would be rounded of to 1 if truncated to an integer. Express rounding a number to the nearest integer in terms of truncation. 3. Use the floating-point model defined in this chapter with 4 digits for the significand and 1 digit for the exponent, and use algorithm XXX to do the calculations below in this model. 13 a. 12.24 + 4.23. b. 9.834 + 2.45. c. 2.314 − 2.273. d. 23.45 − 23.39. e. 1 + x − ex for x = 0.1. 3 Measuring the error In the previous section we saw that errors usually occur during floating point computations, and we therefore need a way to measure this error. Suppose we have a number a and an approximation a. We are going to consider two ways to measure the error in such an approximation, the absolute error and the relative error. 3.1 Absolute error The first error measure is the most obvious, it is essentially the difference between a and a. Definition 3.1 (Absolute error). Suppose a is an approximation to the number a. The absolute error in this approximation is given by |a − a|. If a = 100 and a = 100.1 the absolute error is 0.1, whereas if a = 1 and a = 1.1 the absolute error is still 0.1. Note that if a is an approximation to a, then a is an equally good approximation to a with respect to the absolute error. 3.2 Relative error For some purposes the absolute error may be what we want, but in many cases it is reasonable to say that the error is smaller in the first example above than in the second, since the numbers involved are bigger. The relative error takes this into account. Definition 3.2 (Relative error). Suppose that a is an approximation to the nonzero number a. The relative error in this approximation is given by |a − a|. |a| We note that the relative error is obtained by scaling the absolute error with the size of the number that is approximated. If we compute the relative errors in 14 the approximations above we find that it is 0.001 when a = 100 and a = 100.1, while when a = 1 and a = 1.1 the relative error is 0.1. In contrast to the absolute error, we see that the relative error tells us that the approximation in the first case is much better than in the latter case. In fact we will see in a moment that the relative error gives a good measure of the number of digits that a and ˜a have in common. We use concepts which are closely related to absolute and relative errors in many everyday situations. One example is profit from investments, like bank accounts. Suppose you have invested some money and after one year, your profit is 100 (in your local currency). If your initial investment was 100, this is a very good profit in one year, but if your initial investment was 10 000 it is not so impressive. If we let a denote the initial investment and ˜a the investment after one year, we see that, apart from the sign, the profit of 100 corresponds to the ’absolute error’ in the two cases. On the other hand, the relative error corresponds to profit measured in %, again apart from the sign. This says much more about how good the investment is since the profit is compared to the initial investment. In the two cases here, we find that the profit in % is 1 = 100% in the first case and 0.01 = 1% in the second. Another situation where we use relative measures is when we give the concentration of an ingredient within a substance. If for example the fat content in milk is 3.5% we know that a grams of milk will contain 0.035a grams of fat. In this case a denotes how many grams that are not fat. Then the difference a − a is the amount of fat and the equation a − a = 0.035a can be written a − −a = 0.035 a which shows the similarity with relative error. 15 3.3 Course themes 3.4 Desirable properties of numerical methods 3.5 Algorithms and pseudocode 3.6 Absolute/relative error 4 Floating point arithmetic 4.1 Floating point number systems 4.2 Implementation on a computer 4.3 Rounding errors 4.4 Cancellation 16

Use Quizgecko on...
Browser
Browser