Numerical Analysis Guide PDF
Document Details
Tags
Summary
This guide provides an introduction to numerical analysis, covering topics such as algebraic computation, computer programming errors, non-linear functions and their approximation, finite difference methods, numerical systems to linear systems, and mathematical modeling of first-order differential equations and difference equations. The material is geared towards students in engineering, science, mathematics, and computer science.
Full Transcript
TABLE OF CONTENT Page no CHAPTER 1 Algebraic Computation. 1-9 Computer programming and errors. 10-16 CHAPTER 2...
TABLE OF CONTENT Page no CHAPTER 1 Algebraic Computation. 1-9 Computer programming and errors. 10-16 CHAPTER 2 NON-LINEAR FUNCTIONS APPROXIMATION TO FUNCTIONS Non-linear algebraic and transcendental equations 17-20 ROOTS OF NON-LINEAR FUNCTIONS Bisection method. 21-23 Newton Raphson method and Secant method. 24-28 False position method. 29-32 Simple iteration method. 33-34 CHAPTER 3 FINITE DIFFERENCE METHODS FINITE DIFFERENCE # 1 35-40 FINITE DIFFERENCE # 2 FORWARD,BACKWARD , CENTRAL DIFFERENCE NOTATIONS The Shift operator 𝑬𝑬. 40-40 The Forward difference operator ∆. 41-41 The Backward difference operator 𝛁𝛁. 41-42 The Central difference operator 𝜹𝜹. 42-43 Differences displays. 43-46 Lagrange Interpolation formula. 46-48 CHAPTER 4 1|Page NUMERICAL SYSTEM TO LINEAR SYSTEM LU decomposition method. 49-54 Gauss-Seidel method. 55-57 Curve fitting. 58-66 CHAPTER 5 MATHEMATICAL MODELLING First –Order Differential Equations and Applications. 67-76 Difference Equations and Applications. 76-87 REFERENCES 88 2|Page SECTION A CHAPTER 1 INTRODUCTION Numerical Analysis is an introductory text for students of engineering, science, mathematics, and computer science. Its goals are straightforward: to describe algorithms for solving science and engineering problems, and to discuss the mathematical underpinnings of the algorithms. Numerical Analysis is so rich with ideas that there is a clear danger of presenting it as a bag of neat, but unrelated, tricks. For a deep understanding, it is essential for readers to learn more than how to code Newton’s method, bisection method and Finite difference method. They must absorb the big ideas, the ones that permeate numerical analysis and unify its competing concerns. 1.1 ALGEBRAIC COMPUTATION 1.1.1 COMPUTER ARITHMETIC This is a scientific area that refers to the study and development of algorithms and software for manipulating mathematical expressions and other mathematical objects. The Fundamentals of Algorithms The study of algorithms has an ancient pedigree. The study of computers as we know them may date back a hundred years or so. The study of algorithms dates back a millennia or more. In fact the word algorithm is derived from the name of the great Islamic Mathematician, Astronomer, Geographer and all-round polymath, Muhammad ibn Musa al-Khwarizmi, who was a member of Dar Al-Hikmah (the House of Knowledge) in Baghdad in the 800s. Algorithms are just precise ways of achieving some task. Follow the algorithm and job is done. It may be a computer following the algorithm but it doesn't have to be. For most of history it has been people doing so. Al-Khwarizmi was interested in algorithms for solving algebraic equations and on calculation using our "modern" Hindu-Arabic positional number system which he introduced to the western world. Back in the 9th century Islam, having ways to do things such as calculating shares in inheritance was an important requirement of the Qur'an. It was vital to be sure you had a way of calculating such things that was guaranteed to get the right answer. That is what the study of algorithms is about, though algorithms can be devised to do much more than simple algebra. Every computer gadget you ever used is following algorithms to do whatever it does Computer Scientists both invent algorithms and study their properties. Algorithms have been devised to beat humans at games, fly planes, recognize faces, process DNA, send money around the world, crack codes, navigate you home, control your washing machine, detect your movements, write down the words you speak, paint works of art, write jokes, control nuclear power plants... You name it. Any individual program, in fact, will involve a whole range of algorithms some simple, some complex. Software is a program and other operating information used by a computer. This relates to the use of machines, such as computers to manipulate mathematical equations. 3|Page How computers communicates Most of computers use ASCII code to represent text {words,sentences and paragraphs} which makes it possible for the transfer of data from one computer to the other. ASCII is an acronym for American Standard Code for Information Interchange {This is a character set not a language}. This represents English characters as numbers, with each letter being assigned a number from 0 to 127. For example ASCII code for upper case M is 77. Computers can only understand numbers, so an ASCII code is the numerical representation of a character such as 'a' or '@' or an action of some sort. ASCII was developed a long time ago and now the non-printing characters are rarely used for their original purpose. Below is the ASCII character table and this includes descriptions of the first 32 non-printing characters. ASCII was actually designed for use with teletypes and so the descriptions are somewhat obscure. If someone says they want your CV however in ASCII format, all this means is they want 'plain' text with no formatting such as tabs, bold or underscoring - the raw format that any computer can understand. This is usually so they can easily import the file into their own applications without issues. Notepad.exe creates ASCII text, or in MS Word you can save a file as 'text only' Importance of ASCII Code This helps in the process of allowing symbols represented on the keyboard to be printed on the screen. All letters, digits, punctuation symbols and many more other things are given codes. As an example ASCII code for “a” is 97 and for “A” is 65. If you open a notepad you can demonstrate this , first make sure that Num-Lock is on. Hold down the alt key and keeping it held down type 65 on the numeric keyboard { the one on the right of your keyboard}. Let go of the alt key. An “A” should appear on the notepad. Try this for another numbers between 0 and 255 you will be able to get characters and symbols that are not on your keyboard. The software in your computer is able to recognise what each key is defined to do. If you change your software in your computer you can have your key board giving different characters than the ones on your keyboard. ASCII has limited set of characters and cannot support the Chinese and Japanese languages as they have thousands of different characters. To use Chinese and Japanese characters you need character set called Unicode which is on modern computers. Other code is EBCDIC which is an acronym for Extended Binary Coded Decimal Interchange Code. Which does not store characters in time and this can create problems alphabetising “words”. This EBCDIC code is an IBM designed code for representing characters as numbers. IBM is an acronym for International Business Machines, which is a leading U.S computer manufacturer. The disadvantage of ASCII is that it is biased to English language, so other countries cannot write programs in ASCII. 4|Page Storage of characters When you write 123 using text editor, the file does not store 123, instead it stores the ASCII code for the character “1”; ”2”; ”3”which is 31, 32, 33 in hexadecimal or 0011 0001, 0010 0010, 0011 0011 in binary. ASCII TABLE Decimal # Binary # Octal # Hexadecimal # Character Base 10 Base 2 Base 8 Base 16 0 0000 0 0 NUL 1 0001 1 1 SOH 2 0010 2 2 STX 3 0011 3 3 ETX 4 0100 4 4 EOT 5 0101 5 5 ENQ 6 0110 6 6 ACK 7 0111 7 7 BEL 8 1000 10 8 BS 9 1001 11 9 TAB 10 1010 12 A LF 11 1011 13 B VT 12 1100 14 C FF 13 1101 15 D CR 14 1110 16 E SO 15 1111 17 F SI The above table is a block of 4 bits called nibble {this is half a byte} and can hold a maximum number of 1111=15 in decimal. The following table is a block of 8 bits called a byte and can hold a maximum number of 11111111=255 in decimal. 5|Page Extended ASCII Codes Most computers manipulate data in 8 bit-bytes for each character. From 0 to 127, characters used to be stored as 7 bits. Bit is a single numeric value either “1” or “0” and a byte is sequence of bits, usually 8bits=1 byte {e.g. 11000000 = 1byte}. Ever since Extended ASCII is introduced which is 128 extra characters from 128 to 255 characters each character is stored as 8 bits. The first 32 characters are control codes which Microsoft word does not display on screen and are non-printable. 6|Page An acronym for BITS is Binary Intelligent Transfer Service. The only way to understand BITS is to compare them to something you know, which is digit: A digit is a single place that can hold numerical values between 0 and 9. Computers happen to use base 2 number system also known as binary system, reason being that it is easy to implement them with the current electronic technology and are relatively cheap as compared to base 10. Digits are normally combined to together to create large numbers. For example 5,357 has four digits. It is understood that 7 is filling the 1st (first) place or unit place, 5 the 10th (tenths) place, 3 the 100s (hundreds) place and 6 the 1000s (thousands) place. If you want to be explicit you could express this number as: (5×1000) +(3×100) + (5×10) +(7×1) =5,357 Another way to express this would be to use powers of 10 as: (5× 103 ) +(3× 102 ) + (5× 101 ) +(7× 100 ) =5,357 The above equation is in polynomial form of degree 3 {raised by base 10}, where “5”, “3”, “5”, and “7” are the leading coefficient and “10” the base of a polynomial. BITS have only two possible values: 0 and 1, therefore a binary number is composed of only 0s and 1s like 1011. How do you figure out what the value of the binary number is? We do it similar way as we did for 5,357 but this time we use base 2 instead of base 10. Now we have: (1011)2 = 1× 23 + 0× 22 + 1× 21 + 1× 20 = 11 Why Computers Use Binary Binary numbers – seen as strings of 0's and 1's – are often associated with computers. But why is this? Why can't computers just use base 10 instead of converting to and from binary? Isn't it more efficient to use a higher base, since binary (base 2) representation uses up more "spaces"? I was recently asked this question by someone who knows a good deal about computers. But this question is also often asked by people who aren't so tech-savvy. Either way, the answer is quite simple. WHAT IS "DIGITAL"? A modern-day "digital" computer, as opposed to an older "analog" computer, operates on the principle of two possible states of something – "on" and "off". This directly corresponds to there either being an electrical current present, or said electrical current being absent. The "on" state is assigned the value "1", while the "off" state is assigned the value "0". The term "binary" implies "two". Thus, the binary number system is a system of numbers based on two possible digits – 0 and 1. This is where the strings of binary digits come in. 7|Page Each binary digit, or "bit", is a single 0 or 1, which directly corresponds to a single "switch" in a circuit. Add enough of these "switches" together, and you can represent more numbers. So instead of 1 digit, you end up with 8 to make a byte. (A byte, the basic unit of storage, is simply defined as 8 bits; the well-known kilobytes, megabytes, and gigabytes are derived from the byte, and each is 1,024 times as big as the other. There is a 1024-fold difference as opposed to a 1000-fold difference because 1024 is a power of 2 but 1000 is not.) DOES BINARY USE MORE STORAGE THAN DECIMAL? On first glance, it seems like the binary representation of a number 10010110uses up more space than its decimal (base 10) representation 150. After all, the first is 8 digits long and the second is 3 digits long. However, this is an invalid argument in the context of displaying numbers on screen, since they're all stored in binary regardless! The only reason that 150 is "smaller" than 10010110 is because of the way we write it on the screen (or on paper). Increasing the base will decrease the number of digits required to represent any given number, but taking directly from the previous point, it is impossible to create a digital circuit that operates in any base other than 2, since there is no state between "on" and "off" (unless you get into quantum computers... more on this later). WHAT ABOUT OCTAL AND HEX? Octal (base 8) and hexadecimal (base 16) are simply a "shortcut" for representing binary numbers, as both of these bases are powers of 2. 3 octal digits = 2 hex digits = 8 binary digits = 1 byte. It's easier for the human programmer to represent a 32-bit integer, often used for 32- bit colour values, as FF00EE99 instead of11111111000000001110111010011001. Read the Bitwise Operators article for a more in-depth discussion of this. NON-BINARY COMPUTERS Imagine a computer based on base-10 numbers. Then, each "switch" would have 10 possible states. These can be represented by the digits (known as "bans" or "dits", meaning "decimal digits") 0 through 9. In this system, numbers would be represented in base 10. This is not possible with regular electronic components of today, but it is theoretically possible on a quantum level. Is this system more efficient? Assuming the "switches" of a standard binary computer take up the same amount of physical space (nanometers) as these base-10 switches, the base-10 computer would be able to fit considerably more processing power into the same physical space. So although the question of binary being "inefficient" does have some validity in theory, but not in practical use today. WHY DO ALL MODERN-DAY COMPUTERS USE BINARY THEN? Simple answer: Computers weren't initially designed to use binary... rather, binary was determined to be the most practical system to use with the computers we did design. Full answer: We only use binary because we currently do not have the technology to create "switches" that can reliably hold more than two possible states. (Quantum computers aren't exactly on sale at the moment.) The binary system was chosen only because it is quite easy to distinguish the presence of an electric current from an absence of electric current, especially when working with trillions of such connections. And using any other number base in this 8|Page system ridiculous, because the system would need to constantly convert between them. That's all there is to it. DECIMAL NUMBERS The commonly used scientific digits are the so called real numbers. The basis arithmetic operations performed by a computer are addition, subtraction, division and multiplication. This real numbers are first converted into machine language which consists of 0 and 1(binary system). Representing decimal numbers in a polynomial form (9998)10 can be represented as a polynomial in, the form (9998)10 = 9 × 103 + 9 × 102 + 9 × 101 + 8 × 100 = 𝑎𝑎3 𝛽𝛽3 + 𝑎𝑎2 𝛽𝛽2 + 𝑎𝑎1 𝛽𝛽1 + 𝑎𝑎0 𝛽𝛽0 where 𝑎𝑎3 = 9 is called the leading coefficient and 𝛽𝛽 is called the base of the polynomial. Definition: Let 𝑎𝑎0 , 𝑎𝑎1 , 𝑎𝑎2 , 𝑎𝑎3 , … , 𝑎𝑎𝑛𝑛 be n+1 numbers with 𝑎𝑎𝑛𝑛 = 0 then the function 𝑃𝑃(𝑧𝑧) = 𝑎𝑎𝑛𝑛 𝑧𝑧 𝑛𝑛 + 𝑎𝑎𝑛𝑛−1 𝑧𝑧 𝑛𝑛−1 + 𝑎𝑎𝑛𝑛−2 𝑧𝑧 𝑛𝑛−2 + ⋯ + 𝑎𝑎1 𝑧𝑧 + 𝑎𝑎0. is called a polynomial of degree n where also 𝑎𝑎𝑛𝑛 is called the leading coefficient. BINARY SYSTEM A non-negative integer N will be represented in the binary system as 𝑁𝑁 = (𝑎𝑎𝑛𝑛 𝑎𝑎𝑛𝑛−1 𝑎𝑎𝑛𝑛−2 … 𝑎𝑎1 𝑎𝑎0 )2 = 𝑎𝑎𝑛𝑛 2𝑛𝑛 + 𝑎𝑎𝑛𝑛−1 2𝑛𝑛−1 + 𝑎𝑎𝑛𝑛−2 2𝑛𝑛−2 + ⋯ + 𝑎𝑎1 21 + 𝑎𝑎0 20 Where the coefficients 𝑎𝑎𝑛𝑛 are either 0 or 1. Note that N is again represented as a polynomial, but now in the base 2. Many computer systems used in scientific work operate internally in the binary system. We the users of computers, however, prefer to work in the more familiar decimal system. It is therefore necessary to have some means of converting from decimal to binary when the information is submitted to the computer, and from binary for output purposes. Conversion of a binary to decimal number may be accomplished directly 9|Page (11100)2 = 1 × 24 + 1 × 23 + 1 × 22 + 0 × 21 + 0 × 20 = 28 (1101)2 = 1 × 23 + 1 × 22 + 0 × 21 + 0 × 20 = 13 HORNER’S ALGORITHM Given the coefficients 𝑎𝑎0 , 𝑎𝑎1, 𝑎𝑎2 , 𝑎𝑎3 , … , 𝑎𝑎𝑛𝑛 of the polynomial 𝑃𝑃(𝑧𝑧) = 𝑎𝑎𝑛𝑛 𝑧𝑧 𝑛𝑛 + 𝑎𝑎𝑛𝑛−1 𝑧𝑧 𝑛𝑛−1 + 𝑎𝑎𝑛𝑛−2 𝑧𝑧 𝑛𝑛−2 + ⋯ + 𝑎𝑎1 𝑧𝑧 + 𝑎𝑎0. and 𝑧𝑧 = 𝛽𝛽 (base of the polynomial). Compute recursively the numbers 𝑏𝑏𝑛𝑛 , 𝑏𝑏𝑛𝑛−1 , 𝑏𝑏𝑛𝑛−2 , … , 𝑏𝑏0 𝑏𝑏𝑛𝑛 = 𝑎𝑎𝑛𝑛 𝑏𝑏𝑛𝑛−1 = 𝑎𝑎𝑛𝑛−1 + 𝑏𝑏𝑛𝑛 𝑧𝑧 𝑏𝑏𝑛𝑛−2 = 𝑎𝑎𝑛𝑛−2 + 𝑏𝑏𝑛𝑛−1 𝑧𝑧 𝑏𝑏𝑛𝑛−3 = 𝑎𝑎𝑛𝑛−3 + 𝑏𝑏𝑛𝑛−2 𝑧𝑧 ⋮ ⋮ ⋮ 𝑏𝑏0 = 𝑎𝑎0 + 𝑏𝑏1 𝑧𝑧 The decimal equivalent of (1101)2 computed using the Horner’s algorithm 𝑏𝑏3 = 1 𝑏𝑏2 = 1 + 2 × 1 = 3 𝑏𝑏1 = 0 + 2 × 3 = 6 𝑏𝑏0 = 1 + 2 × 6 = 13 This implies that (1101)2 = (13)10 And the decimal equivalent of (10000)2 is 𝑏𝑏4 = 1 𝑏𝑏3 = 0 + 2 × 1 = 2 𝑏𝑏2 = 0 + 2 × 2 = 4 10 | P a g e 𝑏𝑏1 = 0 + 2 × 4 = 8 𝑏𝑏0 = 0 + 2 × 8 = 16 This implies that (10000)2 = (16)10 Converting a decimal integer N into its binary equivalent can also be accomplished by Horner’s algorithm. Conversion of a decimal numbers to binary numbers Example : Convert (156)10 into binary system using the Horner’s algorithm. (156)10 = 1 × 102 + 5 × 101 + 6 × 100 First convert the following decimal digits into binary form (1)10 = (1)2 (5)10 = (101)2 (6)10 = (110)2 (10)10 = (1010)2 (156)10 = 1 × 102 + 5 × 101 + 6 × 100 = (1)2 × (1010)22 + (101)2 × (1010)12 + (110)2 × (1010)02 Using the algorithm (𝛽𝛽 = (1010)2) 𝑏𝑏2 = 𝑎𝑎2 = (1)2 𝑏𝑏1 = (1111)2 𝑏𝑏0 = (10011100)2 Verify the answer (10011100)2 = (156)10 OCTAL SYSTEM The octal system uses base (𝛽𝛽 = 8) 𝑁𝑁 = (𝑎𝑎𝑛𝑛 𝑎𝑎𝑛𝑛−1 𝑎𝑎𝑛𝑛−2 … 𝑎𝑎1 𝑎𝑎0 )8 = 𝑎𝑎𝑛𝑛 8𝑛𝑛 + 𝑎𝑎𝑛𝑛−1 8𝑛𝑛−1 + 𝑎𝑎𝑛𝑛−2 8𝑛𝑛−2 + ⋯ + 𝑎𝑎1 81 + 𝑎𝑎0 80 11 | P a g e Example : Convert the following decimal numbers into octal numbers. a) (9978)10 b) (998)10 Solution: (9978)10 = 9 × 103 + 9 × 102 + 7 × 101 + 8 × 100 We first convert the following decimal digits into octal digits (9)10 = (11)8 (8)10 = (10)8 (7)10 = (7)8 (10)10 = (12)8 (9978)10 = 9 × 103 + 9 × 102 + 7 × 101 + 8 × 100 = (11)8 × (12)38 + (11)8 × (12)28 + (7)8 × (12)18 + (10)8 × (12)08 Where (𝛽𝛽 = 12) 𝑏𝑏3 = 𝑎𝑎3 = (11)8 𝑏𝑏2 = 𝑎𝑎2 + 𝑏𝑏3 (𝛽𝛽) = (11)8 + (11)8 × (12)8 = (143)8 𝑏𝑏1 = 𝑎𝑎1 + 𝑏𝑏2 (𝛽𝛽) = (7)8 + (143)8 × (12)8 = (1745)8 𝑏𝑏0 = 𝑎𝑎0 + 𝑏𝑏1 (𝛽𝛽) = (10)8 + (1745)8 × (12)8 = (23372)8 Verify the answer 12 | P a g e (23372)8 = 2 × 84 + 3 × 83 + 3 × 82 + 7 × 81 + 2 × 80 = 8192 + 1536 + 192 + 56 + 2 = (9978)10 EXERCISES: 1. Convert the following binary numbers to decimal number: a) (1010)2 b) (100110)2 c) (100110101)2 d) (11101111010)2 e) (11101110)2 f) (10101111010)2 g) (111011110100011)2 2. Convert the following decimal numbers to binary form using Horner’s algorithm: a) (4832)10 b) (4921)10 c) (428)10 d) (998)10 e) (189)10 f) (46832)10 g) (96832)10 h) (668328)10 3. Convert the following decimal numbers to octal form using Horner’s algorithm: a) (4832)10 b) (4921)10 c) (428)10 d) (998)10 e) (189)10 f) (46832)10 g) (96832)10 h) (668328)10 13 | P a g e 1.COMPUTER ARITHMETIC AND ERRORS How can we define ‘error’ in a computation? In its simplest form, it is the difference between the exact answer 𝐴𝐴, say, and the computed answer, 𝐴𝐴̃. Hence, we can write, 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 = 𝐴𝐴̃ − 𝐴𝐴. Since we are usually interested in the magnitude or absolute value of the error we can also define 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 = 𝐴𝐴̃ − 𝐴𝐴 In practical calculations, it is important to obtain an upper bound on the error i.e. a number, 𝐸𝐸, such that, 𝐴𝐴̃ − 𝐴𝐴 < 𝐸𝐸 Clearly, we would like 𝐸𝐸 to be small! In practice we are often more interested in so-called ‘relative error’ than absolute error and we define, 𝐴𝐴̃ − 𝐴𝐴 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 = |𝐴𝐴| This is often expressed as a percentage. Hence, an ‘error’ of 10−5 may be a good or bad ‘relative error’ depending on the answer. For example , 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 1000 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 10−5 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 10−5 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 10−5 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 10−5 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑏𝑏𝑏𝑏𝑏𝑏 What are the possible sources of error in a computation? 1. Human error 2. Truncation error 14 | P a g e 3. Rounding error A typical ‘human error’ is Arithmetic error Programming error These errors can be very hard to detect unless they give obviously incorrect solutions. In discussing errors, we shall assume that human errors are not present. 1.1 Truncation error A truncation error is present when some infinite process is approximated by a finite process. For example, consider the Taylor series expansion 𝑥𝑥 𝑥𝑥 2 𝑥𝑥 𝑛𝑛 𝑒𝑒 = 1 + 𝑥𝑥 + + ⋯ + +⋯ 2! 𝑛𝑛! If this formula is used to calculate 𝑓𝑓 = 𝑒𝑒 0.1 we get: (0.1)2 (0.1)3 𝑓𝑓 = 1 + 0.1 + + +⋯ 2! 3! Where do we stop calculation? How many terms do we include? Theoretically the calculation will never stop. There are always more terms to add on. If we do stop after a finite number of terms, we will not get the exact answer. For example, if we take the first five terms as the approximation we get, (0.1)2 (0.1)3 (0.1)4 𝑓𝑓 = 1 + 0.1 + + + = 𝑓𝑓̃ ≈ 1.105 2! 3! 4! For this calculation, the truncation error TE (i.e. the sum of the terms that have been chopped off) is, (0.1)5 (0.1)6 𝑇𝑇𝑇𝑇 = 𝑓𝑓̃ − 𝑓𝑓 = − − −⋯ 5! 6! The numerical analyst might try and estimate the size of the truncation error, i.e. |𝑇𝑇𝑇𝑇|. In this example, we can easily get a rough estimate. (0.1)5 0.1 (0.1)2 (0.1)3 |𝑇𝑇𝑇𝑇| = 1 + + + +⋯ 5! 6! 6×7 6×7×8 15 | P a g e (0.1)5 (0.1)2 (0.1)3 ≤ 1 + 0.1 + + +⋯ 5! 1×2 1×2×3 (0.1)5 0.1 0.00001 ≤ 𝑒𝑒 ≅ × 1.105 ≈ 10−7 5! 120 ∴ the error in truncating to five terms is approximately 10−7 𝑎𝑎𝑎𝑎 𝑥𝑥 = 0.1. In general it is much harder to estimate the truncation error! 1.2 Rounding error In order to introduce the idea of a rounding error, consider the calculation of 𝑓𝑓̃ above. 1=1 (0.1) = 0.1 1! (0.1)2 = 0.005 2! (0.1)3 = 0.0001666̇ 3! (0.1)4 = 0.000004166̇ 4! 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 1.105170833̇ = 𝑓𝑓̃ The exact answer to the truncated problem, 𝑓𝑓̃ is an infinite string of digits and , as such, is not very useful. Since we know that it is in error in the seventh decimal place we could round it to six or seven decimal places. For example, rounding to six decimal places gives, 𝑓𝑓̃ ≅ 1.105171 = 𝑓𝑓 where the usual rounding process has been adopted; namely, if the next figure is 0,1,2,3 or 4 round down; 5,6,7,8 or 9 round up. The difference between 𝑓𝑓̃ 𝑎𝑎𝑎𝑎𝑎𝑎 𝑓𝑓 𝑓𝑓̃ − 𝑓𝑓 = 0.000000166̇ = 𝑅𝑅𝑅𝑅 16 | P a g e is the rounding error RE. Using the usual rounding process (and rounding to six 1 decimal places) the rounding error is always bounded by 10−6. Thus , in 2 computing the answer, 𝑒𝑒 0.1 ≈ 𝑓𝑓̃ = 1.105171 two errors are present and we have, 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 = 𝑓𝑓̃ − 𝑓𝑓 = 𝑓𝑓̃ − 𝑓𝑓 ̅ + 𝑓𝑓 ̅ − 𝑓𝑓 = 𝑅𝑅𝑅𝑅 + 𝑇𝑇𝑇𝑇 |𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸| ≤ |𝑅𝑅𝑅𝑅| + |𝑇𝑇𝑇𝑇| 1 1 ≈ 10−6 + 10−7 ≈ 10−6 2 2 Note that in this case the actual error is dominated by ROUNDING. 1.3 COMPUTER ARITHMETIC Computers allocate a fixed amount of storage to every number they use. Each number is stored as a string of digits. In practice, so-called floating point numbers are used. A computer using four digit decimal arithmetic with floating point numbers would store 37.31 𝑎𝑎𝑎𝑎 (0.3731,2) = 0.3731 × 102 0.00004717 𝑎𝑎𝑎𝑎 (0.4717, −4) = 0.4717 × 10−4 0.03 𝑎𝑎𝑎𝑎 (0.3000, −1) = 0.3 × 10−1 14.211 𝑎𝑎𝑎𝑎 (0.1421,2) = 0.1421 × 102 The number pair (𝑝𝑝, 𝑞𝑞) is called a floating point number. 𝑝𝑝 is called the MANTISSA (or REAL PART) and 𝑞𝑞 is the CHARACTERISTIC(or INDEX or EXPONENT). The mantissa is always a fixed number of digits and the index must lie some range. Typically −256 < 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 < 256. If the INDEX goes outside that range then we get underflow(less than -256) or overflow (greater than 256). Some computers/systems automatically replace 17 | P a g e underflow by the special number 0 (zero). Overflow always gives some sort of error. We note that the mantissa is always of the form 0. … and the digit after the decimal point is always no-zero. Thus, in the third example above 0.03 is stored as (0.3000,-1) and not as (0.0300,0). We also note that there is no representation of zero. A computer normally has some special representation for this number. We further note, as in the fourth example, that the representation may not be exact. Finally it should be remembered that in practice computers do not use decimal numbers. They actually use binary numbers. There is often some error in converting decimal numbers to or from binary numbers. Rounding errors are therefore always present since we can never be certain that a computation has been done exactly. For example, a computer working with four digit, decimal, floating point arithmetic with, 𝐴𝐴 = (0.3333,2), 𝐵𝐵 = (0.4625,3) would compute 𝐴𝐴 + 𝐵𝐵 → (+0.4958,3) ≠ 𝐴𝐴 + 𝐵𝐵 𝐴𝐴 − 𝐵𝐵 → (−0.4292,3) ≠ 𝐴𝐴 − 𝐵𝐵 𝐴𝐴 × 𝐵𝐵 → (+0.1542,5) ≠ 𝐴𝐴 × 𝐵𝐵 and none is exact. In a more challenging computation like solving 𝐴𝐴𝑥𝑥̅ = 𝑏𝑏 , where A is a 1000×1000 matrix , there are millions of floating point calculations in all of which small errors are present! Can we be certain of the result? How accurate is it? Can we find algorithms that overcome the problem? All these questions are considered by Numerical Analysts. 1.4 CANCELLATION ERROR In many books ‘cancellation’ or more precisely ‘cancellation error’ is listed as a fourth source of error. This is not strictly a new source of error but rather a consequence of rounding and truncation errors leading to severe loss of accuracy in certain circumstances. Suppose, for example, that we have two numbers that we know really accurately, 18 | P a g e 1 −6 𝑎𝑎 = 0.642136 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑡𝑡𝑡𝑡 10 2 1 −6 𝑏𝑏 = 0.642125 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑡𝑡𝑡𝑡 10 2 Then, 𝑎𝑎 − 𝑏𝑏 = 0.000011 and this quantity will contain an error bounded by |𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑖𝑖𝑖𝑖 𝑎𝑎| + |𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑖𝑖𝑖𝑖 𝑏𝑏| ≤ 1 × 10−6 The relative error in 𝑎𝑎 − 𝑏𝑏 is about 10% and is therefore unacceptably large when the relative errors in 𝑎𝑎 and 𝑏𝑏 are only 0.0005%. Moreover, if the errors in the data, 1 𝑎𝑎 and , were 10−4 then the answer would be meaningless! 2 1.5 Examples Example 1 Using 3 digit floating point arithmetic find the answer of the calculation, 𝑎𝑎 + 𝑏𝑏 ∗ 𝑐𝑐 𝑀𝑀 = 𝑏𝑏 + 𝑐𝑐 when 𝑎𝑎 = 11.13, 𝑏𝑏 = 1.247 𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐 = −0.145. Identify the rounding error at each stage of the calculation and the total effect of rounding error. Solution The representation of 𝑎𝑎, 𝑏𝑏, 𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐 as three digit floating point numbers are: 𝑎𝑎 ≔ (+0.111, +2) 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = −0.0003 𝑏𝑏 ≔ (+0.125, +1) 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = +0.003 𝑐𝑐 ≔ (−0.145,0) 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 0 Each calculation is performed as a series of single operations each with their own rounding error. Thus we compute: 𝑋𝑋 ≔ 𝑏𝑏 ∗ 𝑐𝑐 19 | P a g e 𝑌𝑌 ≔ 𝑎𝑎 + 𝑋𝑋 𝑍𝑍 ≔ 𝑏𝑏 + 𝑐𝑐 𝑀𝑀 ≔ 𝑌𝑌 𝑍𝑍 and we obtain, 𝑋𝑋 ≔ (−0.181,0) 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = +0.00025 𝑌𝑌 ≔ (+0.109, +2) 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = −0.019 𝑍𝑍 ≔ (0.111, +1) 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = +0.005 𝑀𝑀 ≔ (+0.982, +1) 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = +0.00018 Thus the computed answer is 9.82. The exact answer is 9.8812. Hence, the total effect of rounding error, i.e. the computed value minus the exact value, is -0.06. EXERCISES 1. If the exact answer is 𝐴𝐴 and the computed answer is 𝐴𝐴̃, find the absolute and relative error when a) 𝐴𝐴 = 10.147, 𝐴𝐴̃ = 10.159 b) 𝐴𝐴 = 0.0047, 𝐴𝐴̃ = 0.0045 c) 𝐴𝐴 = 0.671 × 1012 , 𝐴𝐴̃ = 0.669 × 1012 2. Let 𝑎𝑎 = 0.471 × 10−2 𝑎𝑎𝑎𝑎𝑎𝑎 𝑏𝑏 = −0.185 × 10−4. Use 3 digit floating point arithmetic to compute 𝑎𝑎 + 𝑏𝑏, 𝑎𝑎 − 𝑏𝑏, 𝑎𝑎 ∗ 𝑏𝑏 𝑎𝑎𝑎𝑎𝑎𝑎 𝑎𝑎 𝑏𝑏. Find the rounding error in each case. 20 | P a g e CHAPTER 2 Non-linear algebraic and transcendental equations The first non-linear equation encountered in algebra courses is usually the quadratic equation 𝑎𝑎𝑥𝑥 2 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐 = 0 and all students will be familiar with the formula for its roots: −𝑏𝑏 ± √𝑏𝑏 2 − 4𝑎𝑎𝑎𝑎 𝑥𝑥 = 2𝑎𝑎 The formula for the roots of a general cubic is somewhat more complicated and that for a general quartic usually takes several pages to describe! We are spared further effort by a theorem which states that there is no such formula for general polynomials of degree higher than four. Accordingly, except in special cases (for example, when factorization is easy), we prefer in practice to use a numerical method to solve polynomial equations of degree higher than two. Another class of nonlinear equations consists of those which involve transcendental functions such as 𝑒𝑒 𝑥𝑥 , 𝑙𝑙𝑙𝑙𝑙𝑙, 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎𝑎𝑎 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡. Useful analytic solutions of such equations are rare so that we are usually forced to use numerical methods. 1. A transcendental equation We shall use a simple mathematical problem to show that transcendental equations do arise quite naturally. Consider the height of a liquid in a cylindrical tank of radius r and horizontal axis, when the tank is a quarter full (see Figure 2). Denote the height of the liquid by h (DB in the diagram). The condition to be satisfied is that the area of the segment ABC should be ¼ of the area of the circle. This task reduces to 𝑟𝑟 2 𝜃𝜃 where is the area of the sector OAB, the triangle OAD. Hence 2 𝜋𝜋 𝜋𝜋 2𝜃𝜃 − 2𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 𝑜𝑜𝑜𝑜 𝑥𝑥 + 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 0 , 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑥𝑥 = − 2𝜃𝜃 2 2 21 | P a g e 𝜋𝜋 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 2𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 𝑠𝑠𝑠𝑠𝑠𝑠2𝜃𝜃 = sin − 𝑥𝑥 = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 2 FIGURE 2. Cylindrical tank (cross-section).. When we have solved the transcendental equation 𝑓𝑓(𝑥𝑥) = 𝑥𝑥 + 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 0 𝜋𝜋 𝑥𝑥 we obtain h from ℎ = 𝑂𝑂𝑂𝑂 − 𝑂𝑂𝑂𝑂 = 𝑟𝑟 − 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = 𝑟𝑟 1 − 𝑐𝑐𝑐𝑐𝑐𝑐 − 4 2 2. Locating roots Let it be required to find some or all of the roots of the nonlinear f(x) = 0. Before we use a numerical method (Bisection method, False position method ,Newton Raphson method and Simple iteration method), we should have some idea about the number, nature and approximate location of the roots. The usual approach involves the construction of graphs and perhaps a table of values of the function f, in order to confirm the information obtained from the graph. We will now illustrate this approach by a few examples. a) 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 − 𝑥𝑥 + 0.5 = 0 If we do not have a calculator or computer available to immediately plot the graph of f(x )= sin x - x + 0.5, 22 | P a g e we can separate f into two parts, sketch two curves on a single set of axes, and find out whether they intersect. Thus we sketch. 𝑦𝑦 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎𝑎𝑎 𝑦𝑦 = 𝑥𝑥 − 0.5.Since |𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠| ≤ 1, we are only interested in the interval -0.5 ≤ x ≤ 1.5 (outside which |x - 0.5| > 1). Thus we deduce from Fig. 3 that the equation has only one real root, near x =1.5 as follows: 𝑥𝑥 1.5 1.45 1.49 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 0.9975 0.9927 0.9967 𝑓𝑓(𝑥𝑥) -0.0025 0.0427 0.0067 We now know that the root lies between 1.49 and 1.50, and we can use a numerical method to obtain a more accurate answer as is discussed in later Steps. b) 𝑒𝑒 −0.2𝑥𝑥 = 𝑥𝑥(𝑥𝑥 − 2)(𝑥𝑥 − 3) Again, we sketch two curves: 𝑦𝑦 = 𝑒𝑒 −0.2𝑥𝑥 𝑎𝑎𝑎𝑎𝑎𝑎 𝑦𝑦 = 𝑥𝑥(𝑥𝑥 − 2)(𝑥𝑥 − 3) In order to sketch the second curve, we use the three obvious zeros at x = 0, 2, and 3, as well as the knowledge that x(x - 2) (x - 3) is negative for x < 0 and 2 < x < 3, but positive and increasing steadily for x > 3. We deduce from the graph (Fig. 4) that there are three real roots, near x = 0.2, 1.8, and 3. 1, and tabulate as follows (with 𝑓𝑓(𝑥𝑥) = 𝑒𝑒 −0.2𝑥𝑥 − 𝑥𝑥(𝑥𝑥 − 2)(𝑥𝑥 − 3)) : 𝑥𝑥 0.2 0.15 1.8 1.6 3.1 3.2 𝑒𝑒 −0.2𝑥𝑥 0.9608 0.9704 0.6977 0.7261 0.5379 0.5273 23 | P a g e 𝑥𝑥(𝑥𝑥 − 2)(𝑥𝑥 1.0080 0.7909 0.4320 0.8960 0.3410 0.7680 − 3) 𝑓𝑓(𝑥𝑥) -0.0472 0.1796 0.2657 -0.1699 0.1969 -0.2407 We conclude that the roots lie between 0.15 and 0.2, 1.6 and 1.8, and 3.1 and 3.2, respectively. Note that the values in the table were calculated to an accuracy of at least 5SD. For example, working to 5S accuracy, we have f (0.15) = 0.97045- 0.79088= 0.17957, which is then rounded to 0.1796. Thus the entry in the table for f(0.15) is 0.1796 and not 0.1795 as one might expect from calculating 0.9704 - 0.7909. EXERCISES Locate the roots of the equation x+cos x=0. Use curve sketching to roughly locate all the roots of the equations: a) x + 2 cos x = 0. b) x + ex= 0. c) x(x - 1) - ex= 0. d) x(x - 1 - sin x = 0. 24 | P a g e The bisection method The bisection method, suitable for implementation on a computer allows to find the roots of the equation f (x) = 0, based on the following theorem: Theorem: If f is continuous for x between a and b and if f (a) and f(b) have opposite signs, then there exists at least one real root of f (x) = 0 between a and b. 1. Procedure: Suppose that a continuous function f is negative at x = a and positive at x = b, so that there is at least one real root between a and b. (As 𝑎𝑎+𝑏𝑏 a rule, a and b may be found from a graph of f.) If we calculate𝑓𝑓 , 2 which is the function value at the point of bisection of the interval 𝑎𝑎 < 𝑥𝑥 < 𝑏𝑏, there are three possibilities: 𝑎𝑎+𝑏𝑏 (𝑎𝑎+𝑏𝑏) 1. 𝑓𝑓 2 = 0, in which case 2 is the root; 𝑎𝑎+𝑏𝑏 (𝑎𝑎+𝑏𝑏) 2. 𝑓𝑓 2 < 0, in which case the root lies between 2 and 𝑏𝑏 ; 𝑎𝑎+𝑏𝑏 (𝑎𝑎+𝑏𝑏) 3. 𝑓𝑓 2 > 0, in which case the root lies between 𝑎𝑎 and 2. Presuming there is just one root, in Case 1 the process is terminated. In either Case 2 or Case 3, the process of bisection of the interval containing the root can be repeated until the root is obtained to the desired accuracy. In Figure 5, the successive points of bisection are denoted by x1 , x2, and x3. 2. Effectiveness: The bisection method is almost certain to give a root. Provided the conditions of the above theorem hold; it can only fail if the accumulated error in the calculation of f at a bisection point gives it a small 25 | P a g e negative value when actually it should have a small positive value (or vice versa); the interval subsequently chosen would then be wrong. This can be overcome by working to sufficient accuracy, and this almost- assured convergence is not true of many other methods of finding roots. One drawback of the bisection method is that it applies only to roots of f about which f (x) changes sign. In particular, double roots can be overlooked; one should be careful to examine f(x) in any range where it is small, so that repeated roots about which f (x) does not change sign are otherwise evaluated (for example, see Steps 9 and 10). Of course, such a close examination also avoids another nearby root being overlooked. Finally, note that bisection is rather slow; after n iterations the interval (𝑏𝑏−𝑎𝑎) containing the root is of length 𝑛𝑛. However, provided values of f can be 2 generated readily, as when a computer is used, the rather large number of iterations which can be involved in the application of bisection, is of relatively little consequence. 3. Example a) Solve 3xex = 1 to three decimal places by the bisection method. Consider f(x) = 3x - ex, which changes sign in the interval 0.25 < x < 0.27: one tabulates (working to 4D ) as follows: 𝑥𝑥 3x ex f(x) 0.25 0.75 0.7788 -0.0288 0.27 0.81 0.7634 0.0466 (Ascertain graphically that there is just one root!) Denote the lower and upper endpoints of the interval bracketing the root at the 𝑛𝑛-th iteration by 𝑎𝑎𝑛𝑛 and 𝑏𝑏𝑛𝑛 , respectively (with 𝑎𝑎1 = 0.25 and 𝑏𝑏1 = 0.27). Then the approximation to the root at the n-th iteration is given 𝑎𝑎 +𝑏𝑏 by 𝑥𝑥𝑛𝑛 = 𝑛𝑛 𝑛𝑛. Since the root is either in[𝑎𝑎𝑛𝑛 , 𝑏𝑏𝑛𝑛 ] or [𝑥𝑥𝑛𝑛 , 𝑏𝑏𝑛𝑛 ] and both 2 (𝑏𝑏𝑛𝑛 −𝑎𝑎𝑛𝑛 ) intervals are of length , we see that 𝑥𝑥𝑛𝑛 will be accurate to three 2 (𝑏𝑏𝑛𝑛 −𝑎𝑎𝑛𝑛 ) decimal places when < 5× 10-4. Proceeding to bisection: 2 n 𝑎𝑎𝑛𝑛 𝑏𝑏𝑛𝑛 (𝑎𝑎𝑛𝑛 + 𝑏𝑏𝑛𝑛 ) 3𝑥𝑥𝑛𝑛 𝑒𝑒 −𝑥𝑥𝑛𝑛 𝑓𝑓(𝑥𝑥𝑛𝑛 ) 𝑥𝑥𝑛𝑛 = 2 1 0.25 0.27 0.26 0.78 0.7711 0.0089 2 0.25 0.26 0.255 0.765 0.7749 -0.0099 3 0.255 0.26 0.2575 0.7725 0.7730 -0.0005 4 0.2575 0.26 0.2588 0.7763 0.7720 0.0042 26 | P a g e 5 0.2575 0.2588 0.2581 0.7744 0.7725 0.0019 6 0.2575 0.2581 0.2578 (Note that the values in the table are displayed to only 4D.) Hence the root accurate to three decimal places is 0.258. b) Use the bisection method to solve 𝑓𝑓(𝑥𝑥) = 𝑥𝑥 2 − 3. Let 𝜀𝜀𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 0.01 and 𝜀𝜀𝑎𝑎𝑎𝑎𝑎𝑎 = 0.01 and start with the interval [1,2]. n 𝒂𝒂𝒏𝒏 𝒃𝒃𝒏𝒏 f(𝒂𝒂𝒏𝒏 ) f(𝒃𝒃𝒏𝒏 ) 𝒂𝒂𝒏𝒏 + 𝒃𝒃𝒏𝒏 f(𝒄𝒄𝒏𝒏 ) update Width 𝒄𝒄𝒏𝒏 = 𝒃𝒃𝒏𝒏 − 𝒂𝒂𝒏𝒏 𝟐𝟐 1 1.0 2.0 -2.0 1.0 1.5 -0.75 𝑎𝑎𝑛𝑛 = 𝑐𝑐𝑛𝑛 0.5 2 1.5 2.0 -0.75 1.0 1.75 0.062 𝑏𝑏𝑛𝑛 = 𝑐𝑐𝑛𝑛 0.25 3 1.5 1.75 -0.75 0.0625 1.625 -0.359 𝑎𝑎𝑛𝑛 = 𝑐𝑐𝑛𝑛 0.125 4 1.625 1.75 -03594 0.0625 1.6875 -0.1523 𝑎𝑎𝑛𝑛 = 𝑐𝑐𝑛𝑛 0.0625 5 1.6875 1.75 -0.1523 0.0625 1.7188 -0.0457 𝑎𝑎𝑛𝑛 = 𝑐𝑐𝑛𝑛 0.0313 6 1.7188 1.75 -0.0457 0.0625 1.7344 0.0081 𝑏𝑏𝑛𝑛 = 𝑐𝑐𝑛𝑛 0.0156 7 1.7188 1.7344 -0.0457 0.0081 1.7266 -0.0189 𝑎𝑎𝑛𝑛 = 𝑐𝑐𝑛𝑛 0.0078 Thus, with the seventh iteration , we note that the final interval [1.7266,17344] has a width less than 0.01 and |𝑓𝑓(1.7344)| < 0.01 and therefore we choose 𝑏𝑏 = 1.7344 to be our approximation of the root. EXERCISES a. Use the bisection method to find the root of the equation x+cosx = 0. correct to two decimal places (2D ). b. Use the bisection method to find to 3D the positive root of the equation x - 0.2sinx - 0.5=0. c. Each equation in Exercises 2(a)-2(c) above has only one root. For each equation use the bisection method to find the root correct to 2 D. 1 d. Use the bisection method to solve 𝑓𝑓(𝑥𝑥) = 𝑥𝑥 + − 3𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 with the 𝑥𝑥 interval [ 0.7 , 0.9] , work to 4-decimal. 27 | P a g e The Newton-Raphson iterative method The Newton-Raphson method is suitable for implementation on a computer. It is a process for the determination of a real root of an equation f (x) = 0, given just one point close to the desired root. It can be viewed as a limiting case of the secant method or as a special case of simple iteration. 1. Procedure Let x0 denote the known approximate value of the root of f(x) = 0 and h the difference between the true value 𝛼𝛼 and the approximate value, i.e., 𝛼𝛼 = 𝑥𝑥0 + ℎ The second degree, terminated Taylor expansion about x0 is ℎ2 ′′ 𝑓𝑓(𝛼𝛼) = 𝑓𝑓(𝑥𝑥0 + ℎ) = 𝑓𝑓(𝑥𝑥0 ) + ℎ𝑓𝑓 ′ (𝑥𝑥0 ) + 𝑓𝑓 (𝜉𝜉) 2! where 𝜉𝜉 = 𝑥𝑥0 + 𝜃𝜃ℎ , 0 < 𝜃𝜃 < 1, lies between 𝛼𝛼 𝑎𝑎𝑎𝑎𝑎𝑎 𝑥𝑥0. Ignoring the remainder term and writing 𝑓𝑓(𝛼𝛼) = 0. 𝑓𝑓(𝑥𝑥0 ) 𝑓𝑓(𝑥𝑥0 ) + ℎ𝑓𝑓 ′ (𝑥𝑥0 ) ≈ 0 ,whence ℎ ≈ − and consequently, 𝑓𝑓′ (𝑥𝑥0 ) 𝑓𝑓(𝑥𝑥0 ) 𝑥𝑥1 = 𝑥𝑥0 − 𝑓𝑓′ (𝑥𝑥0 ) should be a better estimate of the root than x0. Even better approximations may be obtained by repetition (iteration) of the process, which then becomes 𝑓𝑓(𝑥𝑥𝑛𝑛 ) 𝑥𝑥𝑛𝑛+1 = 𝑥𝑥𝑛𝑛 − 𝑓𝑓′ (𝑥𝑥𝑛𝑛 ) 28 | P a g e The geometrical interpretation is that each iteration provides the point at which the tangent at the original point cuts the x-axis (Figure 9). Thus the equation of the tangent at (xn, f (xn)) is y - f(x0) = f '(x0)(x - x0) so that (x1, 0) corresponds to -f(x0) = f '(x0)(x1 - x0), 𝑓𝑓(𝑥𝑥0 ) whence x1 = x0 -. 𝑓𝑓′ (𝑥𝑥0 ) 2. Example We will use the Newton-Raphson method to find the positive root of the equation sin x = x2, correct to 3D. It will be convenient to use the method of false position to obtain an initial approximation. Tabulation yields 𝑥𝑥 𝑓𝑓(𝑥𝑥) = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 − 𝑥𝑥 2 0 0 0.25 0.1849 0.5 0.2294 0.75 0.1191 29 | P a g e 1 −0.1585 With numbers displayed to 4D, we see that there is a root in the interval 0.75 < x < 1 at approximately 1 0.75 0.1191 𝑥𝑥0 = −0.1585 − 0.1191 1 −0.1585 1 0.2380 = − 0.2777 (−0.1189 − 0.1191) = 0.2777 = 0.8573 Next, we will use the Newton-Raphson method: 𝑓𝑓(0.8573) = sin(0.8573) − (0.8573)2 = 0.7561 − 0.7349 = 0.0211 and 𝑓𝑓 ′ (𝑥𝑥) = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 − 2𝑥𝑥 yielding 𝑓𝑓 ′ (0.8573) = 0.6545 − 1.7145 = −1.0600 Consequently, a better approximation is 0.0211 𝑥𝑥1 = 0.8573 + = 0.8573 + 0.0200 = 0.8772 1.0600 Repeating this step, we obtain 𝑓𝑓(𝑥𝑥1 ) = 𝑓𝑓(0.8772) = −0.0005 and 𝑓𝑓 ′ (𝑥𝑥1 ) = 𝑓𝑓 ′ (0.8772) = −1.1151 0.0005 so that 𝑥𝑥2 = 0.8772 − = 0.8772 − 0.0005 = 0.8767 1.1151 Since f(x2) = 0.0000, we conclude that the root is 0.877 to 3D. 3. Convergence 𝑓𝑓(𝑥𝑥) If we write ∅(𝑥𝑥) = 𝑥𝑥 − 𝑓𝑓′(𝑥𝑥), the Newton-Raphson iteration expression 30 | P a g e 𝑓𝑓(𝑥𝑥𝑛𝑛 ) 𝑥𝑥𝑛𝑛+1 = 𝑥𝑥𝑛𝑛 − 𝑓𝑓 ′ (𝑥𝑥𝑛𝑛 ) may be rewritten 𝑥𝑥𝑛𝑛+1 = ∅(𝑥𝑥𝑛𝑛 ) We have observed that, in general, the iteration method converges when |∅′ (𝑥𝑥)| < 1 near the root. In the case of Newton-Raphson, we have ′ [𝑓𝑓 ′ (𝑥𝑥)]2 − 𝑓𝑓(𝑥𝑥)𝑓𝑓 ′′ (𝑥𝑥) 𝑓𝑓(𝑥𝑥)𝑓𝑓 ′′ (𝑥𝑥) ∅ (𝑥𝑥) = 1 − = [𝑓𝑓 ′ (𝑥𝑥)]2 [𝑓𝑓 ′ (𝑥𝑥)]2 so that the criterion for convergence is ′′ ′ 2 𝑓𝑓(𝑥𝑥)𝑓𝑓 (𝑥𝑥) < 𝑓𝑓 (𝑥𝑥) i.e., convergence is not as assured as, say, for the bisection method. 4. Rate of convergence The second degree terminated Taylor expansion about xn is 𝑒𝑒𝑛𝑛2 ′′ 𝑓𝑓(𝛼𝛼) = 𝑓𝑓(𝑥𝑥𝑛𝑛 + 𝑒𝑒𝑛𝑛 ) = 𝑓𝑓(𝑥𝑥𝑛𝑛 ) + 𝑒𝑒𝑛𝑛 𝑓𝑓 ′ (𝑥𝑥𝑛𝑛 ) + 𝑓𝑓 (𝜉𝜉𝑛𝑛 ) 2! where 𝑒𝑒𝑛𝑛 = 𝛼𝛼 − 𝑥𝑥𝑛𝑛 is the error at the n-th iteration and 𝜉𝜉𝑛𝑛 = 𝑥𝑥𝑛𝑛 + 𝜃𝜃𝑒𝑒𝑛𝑛 ,0 < 𝜃𝜃 < 1. Since 𝑓𝑓(𝛼𝛼) = 0 , we find 𝑓𝑓(𝑥𝑥𝑛𝑛 ) 𝑒𝑒𝑛𝑛2 𝑓𝑓 ′′ (𝜉𝜉𝑛𝑛 ) 0= + (𝛼𝛼 − 𝑥𝑥𝑛𝑛 ) + ′ 𝑓𝑓 (𝑥𝑥𝑛𝑛 ) 2𝑓𝑓 ′ (𝑥𝑥𝑛𝑛 ) But, by the Newton-Raphson formula, 𝑓𝑓(𝑥𝑥𝑛𝑛 ) 𝑒𝑒𝑛𝑛2 𝑓𝑓 ′′ (𝜉𝜉𝑛𝑛 ) 0= + (𝛼𝛼 − 𝑥𝑥𝑛𝑛 ) + ′ 𝑓𝑓 (𝑥𝑥𝑛𝑛 ) 2𝑓𝑓 ′ (𝑥𝑥𝑛𝑛 ) whence the error at the (n + 1)-th iteration is 𝑒𝑒𝑛𝑛+1 = 𝛼𝛼 − 𝑥𝑥𝑛𝑛+1 31 | P a g e 𝑒𝑒𝑛𝑛2 𝑓𝑓 ′′ (𝜉𝜉𝑛𝑛 ) =− 2𝑓𝑓 ′ (𝑥𝑥𝑛𝑛 ) 𝑒𝑒𝑛𝑛2 𝑓𝑓 ′′ (𝛼𝛼) ≈− 2𝑓𝑓 ′ (𝛼𝛼) 𝑓𝑓 ′′ (𝛼𝛼) ≈ 4𝑓𝑓 ′ (𝛼𝛼) provided en is sufficiently small. This result states that the error at the (n + 1)-th iteration is proportional to the square of the error at the nth iteration; hence, if 𝑓𝑓 ′′ (𝛼𝛼) ≈ 4𝑓𝑓 ′ (𝛼𝛼), an answer correct to one decimal place at one iteration should be accurate to two places at the next iteration, four at the next, eight at the next, etc. This quadratic - second-order convergence - outstrips the rate of convergence of the methods of bisection and false position! In relatively little used computer programs, it may be wise to prefer the methods of bisection or false position, since convergence is virtually assured. However, for hand calculations or for computer routines in constant use, the Newton-Raphson method is usually preferred. 5. The square root One application of the Newton-Raphson method is in the computation of square roots. Since a½ is equivalent to finding the positive root of x2 = a. i.e., f(x) = x2 - a = 0. Since f '(x) = 2x, we have the Newton-Raphson iteration formula: 𝑎𝑎 2 − 𝑎𝑎) (𝑥𝑥𝑛𝑛 𝑥𝑥𝑛𝑛 + 𝑥𝑥𝑛𝑛 xn+1 = xn − = , 2𝑥𝑥𝑛𝑛 2 a formula known to the ancient Greeks. Thus, if a = 16 and x0 = 5, we find to 3D x1 = (5 + 3.2)/2 = 4.1, x2 = (4.1 + 3.9022)/2 = 4.0012, and x3 = (4.0012 + 3.9988)/2 = 4.0000. EXERCISES 1. Use the Newton-Raphson method to find to 4S the (positive) root of 3xex=1? 2.Derive the Newton-Raphson iteration formula 32 | P a g e (𝑥𝑥𝑛𝑛𝑘𝑘 − 𝑎𝑎) 𝑥𝑥𝑛𝑛+1 = 𝑥𝑥𝑛𝑛 − 𝑘𝑘𝑥𝑥𝑛𝑛𝑘𝑘−1 for finding the k-th root of a. 3. Compute the square root of 10 to 5 significant digits from an initial guess. 4. Use the Newton-Raphson method to find to 4D the root of the equation x cos x = 0. Method of false position As mentioned in the Prologue, the method of false position dates back to the ancient Egyptians. It remains an effective alternative to the bisection method for solving the equation f(x) = 0 for a real root between a and b, given that f (x) is continuous and f (a) and f(b) have opposite signs. The algorithm is suitable for automatic computation. 1. PROCEDURE The curve y = f(x) is not generally a straight line. However, one may join the points (a,f(a)) and (b,f(b)) by the straight line 𝑦𝑦 − 𝑓𝑓(𝑎𝑎) 𝑥𝑥 − 𝑎𝑎 = 𝑓𝑓(𝑏𝑏) − 𝑓𝑓(𝑎𝑎) 𝑏𝑏 − 𝑎𝑎 Thus straight line cuts the x-axis at (X, 0) where 𝑦𝑦 − 𝑓𝑓(𝑎𝑎) 𝑥𝑥 − 𝑎𝑎 = 𝑓𝑓(𝑏𝑏) − 𝑓𝑓(𝑎𝑎) 𝑏𝑏 − 𝑎𝑎 so that 𝑦𝑦 − 𝑓𝑓(𝑎𝑎) 𝑥𝑥 − 𝑎𝑎 = 𝑓𝑓(𝑏𝑏) − 𝑓𝑓(𝑎𝑎) 𝑏𝑏 − 𝑎𝑎 33 | P a g e Suppose that f(a) is negative and f(b) is positive. As in the bisection method, there are the three possibilities : 1. f(x) = 0, when case x is the root ; 2. f(x) < 0, when the root lies between x and b ; 3. f(x)>0, when the root lies between x and a. Again, in Case 1, the process is terminated, in either Case 2 or Case 3, the process can be repeated until the root is obtained to the desired accuracy. In Fig. 6, the successive points where the straight lines cut the axis are denoted by x1, x2, x3. 2. EFFECTIVENESS AND THE SECANT METHOD Like the bisection method, the method of false position has almost assured convergence, and it may converge to a root faster. However, it may happen that most or all the calculated values of X are on the same side of the root, in which case convergence may be slow (Fig. 7). This is avoided in the secant method, which resembles the method of false position except that no attempt is made to ensure that the root is enclosed; starting with two approximations to the root (x0, x1), further approximations x2, x3,… are computed from 𝑥𝑥𝑛𝑛 − 𝑥𝑥𝑛𝑛−1 𝑥𝑥𝑛𝑛+1 = 𝑥𝑥𝑛𝑛 − 𝑓𝑓(𝑥𝑥𝑛𝑛 ) 𝑓𝑓(𝑥𝑥𝑛𝑛 ) − 𝑓𝑓(𝑥𝑥𝑛𝑛−1 ) 34 | P a g e There is no longer convergence, but the process is simpler (the sign of f(xn+1) is not tested) and often converges faster. With respect to speed of convergence of the secant method, one has at the (n+1)th step: Hence, expanding in terms of the Taylor series, 𝑒𝑒 2 𝑒𝑒𝑛𝑛−1 𝑓𝑓(𝛼𝛼) − 𝑒𝑒𝑛𝑛 𝑓𝑓 ′ (𝛼𝛼) + 2!𝑛𝑛 𝑓𝑓 ′′ (𝛼𝛼) − ⋯ 𝑒𝑒𝑛𝑛+1 = [𝑓𝑓(𝛼𝛼) − 𝑒𝑒𝑛𝑛 𝑓𝑓 ′ (𝛼𝛼) + ⋯ ] − [𝑓𝑓(𝛼𝛼) − 𝑒𝑒𝑛𝑛−1 𝑓𝑓 ′ (𝛼𝛼) + ⋯ ] 𝑒𝑒 2 𝑒𝑒𝑛𝑛 𝑓𝑓(𝛼𝛼) − 𝑒𝑒𝑛𝑛−1 𝑓𝑓 ′ (𝛼𝛼) + 𝑛𝑛−1 ′′ (𝛼𝛼) − ⋯ 2! 𝑓𝑓 − [𝑓𝑓(𝛼𝛼) − 𝑒𝑒𝑛𝑛 𝑓𝑓 ′ (𝛼𝛼) + ⋯ ] − [𝑓𝑓(𝛼𝛼) − 𝑒𝑒𝑛𝑛−1 𝑓𝑓 ′ (𝛼𝛼) + ⋯ ] 𝑓𝑓 ′′ (𝛼𝛼) ≈ − 2𝑓𝑓′(𝛼𝛼) 𝑒𝑒𝑛𝑛−1 𝑒𝑒𝑛𝑛 where we have used the fact that f(α)=0. Thus we see that en+1 is proportional to enen-1, which may be expressed in mathematical notation as 𝑒𝑒𝑛𝑛+1 ≈ 𝑒𝑒𝑛𝑛−1 𝑒𝑒𝑛𝑛 35 | P a g e We seek k such that 2 𝑘𝑘+1 1+√5 𝑒𝑒𝑛𝑛+1 ≈ 𝑒𝑒𝑛𝑛𝑘𝑘 ≈ 𝑒𝑒𝑛𝑛𝑘𝑘 , 𝑒𝑒𝑛𝑛−1 𝑒𝑒𝑛𝑛 ≈ 𝑒𝑒𝑛𝑛−1 , ⟹ 𝑘𝑘 2 ≈ 𝑘𝑘 + 1, ⟹ 𝑘𝑘 ≈ 2 ≈ 1.618. Hence the speed of convergence is faster than linear (k =1 ), but slower than quadratic (k=2). This rate of convergence is sometimes referred to as superlinear convergence. 3. EXAMPLE Use the method of false position to solve 3𝑥𝑥𝑒𝑒 𝑥𝑥 = 1, stopping when |𝑓𝑓(𝑥𝑥𝑛𝑛 )| < 5 ∗ 10−6 with 𝑓𝑓(𝑥𝑥) = 3𝑥𝑥 − 𝑒𝑒 −𝑥𝑥. Then f (x1) =f (0.257637) = 3(0.257637) − 0.772875 = 0.772912 - 0.772875 = 0.000036. The student may verify that doing one more iteration of the method of false position yields an estimate x2 = 0.257628 for which the function value is less than 5*10-6. Since x1 and x2 agree to 4D, we conclude that the root is 0.2576, correct to 4D. EXERCISES a. Use the method of false position to find the smallest root of the equation f (x) = 2 sin x + x - 2 = 0, stopping when |𝑓𝑓(𝑥𝑥𝑛𝑛 )| < 5 ∗ 10−5. b. Compare the results obtained when you use i. the bisection method, ii. the method of false position, and iii. the secant method with starting values 0.7 and 0.9 to solve the equation 3sin x = x + 1/x. iv. Use the method of false position to find the root of the equation 𝑓𝑓(𝑥𝑥) ≡ 𝑥𝑥 + 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 0, stopping when |𝑓𝑓(𝑥𝑥𝑛𝑛 )| < 5 ∗ 10−6. 36 | P a g e The method of simple iteration The method of simple iteration involves writing the equation f(x) = 0 in a form x = f(x), suitable for the construction of a sequence of approximations to some root in a repetitive fashion. 1. Procedure The iteration procedure follows: In some way, we obtain a rough approximation x0 of the desired root, which may then be substituted into the right-hand side to give a new approximation, x1=φ(x0). The new approximation is again substituted into the right-hand side to give a further approximation x2=φ(x1), etc., until (hopefully) a sufficiently accurate approximation to the root is obtained. This repetitive process, based on xn+1 = φ(xn) is called simple iteration; provided that |xn+1 - xn| decreases as n increases, the process tends to α = φ(α), where α denotes the root. 2. Example Use simple iteration to find the root of the equation 3xex = 1 to an accuracy of 4D. One first writes x = e-x/3 = φ(x). 37 | P a g e Assuming x0 = 1, successive iterations yield x1 = 0.12263, x2 = 0.29486, x3 = 0.24821, x4 = 0.26007, x5 = 0.25700, x6 = 0.25779, x7 = 0.25759, x8 = 0.25764. Thus, we see that after eight iterations the root is 0.2576 to 4D. A graphical interpretation of the first three iterations is shown in Fig. 8. 3. Convergence Whether or not an iteration procedure converges or indeed at all, depends on the choice of the function φ(x) as well as the starting value x0. For example, the equation x² = 3 has two real roots ±√3(= ±1.732).It can be given the form x = 3/x = φ(x) which suggests the iteration xn+1 = 3/xn. However, if the starting value x0 = 1 is used, the successive iterations yield 3 3 3 𝑥𝑥1 = = 3 , 𝑥𝑥2 = = 1 , 𝑥𝑥3 = = 3 , 𝑒𝑒𝑒𝑒𝑒𝑒! 𝑥𝑥0 𝑥𝑥1 𝑥𝑥2 We can examine the convergence of the iteration process x = φ (xn) to α = φ (α) with the aid of the Taylor series ∅(𝛼𝛼) = ∅(𝑥𝑥𝑘𝑘 ) + (𝛼𝛼 − 𝑥𝑥𝑘𝑘 )∅′ (𝜁𝜁𝑘𝑘 ), 𝑘𝑘 = 0,1, … , 𝑛𝑛, where ζk is a point between the root and the approximation xk. We have 38 | P a g e 𝛼𝛼 − 𝑥𝑥1 = ∅(𝛼𝛼) − ∅(𝑥𝑥0 ) = (𝛼𝛼 − 𝑥𝑥0 )∅′ (𝜁𝜁0 ) 𝛼𝛼 − 𝑥𝑥2 = ∅(𝛼𝛼) − ∅(𝑥𝑥1 ) = (𝛼𝛼 − 𝑥𝑥1 )∅′ (𝜁𝜁1 )...... 𝛼𝛼 − 𝑥𝑥𝑛𝑛+1 = ∅(𝛼𝛼) − ∅(𝑥𝑥𝑛𝑛 ) = (𝛼𝛼 − 𝑥𝑥 𝑛𝑛 )∅′ (𝜁𝜁𝑛𝑛 ) Multiplying the n + 1 rows together and cancelling the common factors α − x1, α − x2, ··· , α − xn leaves |𝛼𝛼 − 𝑥𝑥𝑛𝑛+1 | = |𝛼𝛼 − 𝑥𝑥0 ||∅′ (𝜁𝜁0 )||∅′ (𝜁𝜁1 )| … |∅′ (𝜁𝜁𝑛𝑛 )| , whence |𝛼𝛼 − 𝑥𝑥𝑛𝑛+1 | = |𝛼𝛼 − 𝑥𝑥0 ||∅′ (𝜁𝜁0 )||∅′ (𝜁𝜁1 )| … |∅′ (𝜁𝜁𝑛𝑛 )| , so that the absolute error |α − xn+1| can be made as small as we please by sufficient iteration if |φ '|< 1 in the neighbourhood of the root. Note that φ(x) = 3/x has derivative |φ '(x)| = |-3/x²| > 1 for |x| < 3½. 1. Assuming x0 = 1, show by simple iteration that one root of the equation 2x - 1 -2sinx = 0 is 1.4973. 2. Use simple iteration to find (to 4D) the root of the equation x + cos x = 0. CHAPTER 3 FINITE DIFFERENCES 1 Tables Historically speaking, numerical analysts have always been concerned with tables of numbers, and many techniques have been developed for dealing with mathematical functions, represented in this way. For example, the value of the function at an untabulated point may be required, so that a interpolation is necessary. It is also possible to estimate the derivative or the definite integral of a tabulated function, using some finite processes to 39 | P a g e approximate the corresponding (infinitesimal) limiting procedures of calculus. In each case, it has been traditional to use finite differences. Another application of finite differences, which is outside the scope of this book, is the numerical solution of partial differential equations. 1. Tables of values Many books contain tables of mathematical functions. One of the most comprehensive is the Handbook of Mathematical Functions, edited by Abramowitz and Stegun (see the Bibliography for publication details), which also contains useful information about numerical methods. Although most tables use constant argument intervals, some functions do change rapidly in value in particular regions of their argument, and hence may best be tabulated using intervals varying according to the local behaviour of the function. Tables with varying argument intervals are more difficult to work with, however, and it is common to adopt uniform argument intervals wherever possible. As a simple example, consider the 6S table of the exponential function over 0.10 (0.01 ) 0.14 (a notation which specifies the domain 0.10 𝑥𝑥 𝑓𝑓(𝑥𝑥) = 𝑒𝑒 𝑥𝑥 0.10 1.10517 0.11 1.11628 0.12 1.12750 0.13 1.13883 0.14 1.15027 It is extremely important that the interval between successive values is small enough to display the variation of the tabulated function, because usually the value of the function will be needed at some argument value between values specified (for example, 𝒆𝒆𝒙𝒙 𝒂𝒂𝒂𝒂 𝒙𝒙 = 𝟎𝟎. 𝟏𝟏𝟏𝟏𝟏𝟏 from the above table). If the table is constructed in this manner, we can obtain such intermediate values to a reasonable accuracy by using a polynomial representation (hopefully, of low degree) of the function f. 2. Finite differences 40 | P a g e Since Newton, finite differences have been used extensively. The construction of a table of finite differences for a tabulated function is simple: One obtains first differences by subtracting each value from the succeeding value in the table, second differences by repeating this operation on the first differences, and so on for higher order differences. From the above table of 𝑒𝑒 𝑥𝑥 𝑓𝑓𝑓𝑓𝑓𝑓 𝑥𝑥 = 0.10(0.01)0.14 one has the (note the standard layout, with decimal points and leading zeros omitted from the differences): Differences 𝑥𝑥 𝑓𝑓(𝑥𝑥) = 𝑒𝑒 𝑥𝑥 1st 2nd 3rd 0.10 1.10517 1111 0.11 1.11628 11 1122 0 0.12 1.12750 11 1133 0 0.13 1.13883 11 1144 1 0.14 1.15027 12 1156 1 0.15 1.16183 12 1168 −1 0.16 1.17351 11 1179 2 0.17 1.18530 13 1192 0.18 1.19722 41 | P a g e (In this case, the differences must be multiplied by 10-5 for comparison with the function values.) 3. Influence of round-off errors Consider the difference table given below for 𝑓𝑓(𝑥𝑥) = 𝑒𝑒 𝑥𝑥 : 0.1(0.05)0.5 to 6S, constructed as in Section 2. As before, differences of increasing order decrease rapidly in magnitude, but the third differences are irregular. This is largely a consequence of round-off errors, as tabulation of the function to 7S and differencing to fourth order illustrates (compare Exercise 3 ). Differences 𝑥𝑥 𝑓𝑓(𝑥𝑥) = 𝑒𝑒 𝑥𝑥 1st 2nd 3rd 0.10 1.10517 5666 0.15 1.16183 291 5957 15 0.20 1.22140 306 6263 14 0.25 1.28403 320 6583 18 0.30 1.34986 338 6921 16 0.35 1.41907 354 7275 20 0.40 1.49182 374 7649 18 0.45 1.56831 392 8041 42 | P a g e 0.50 1.64872 Although the round-off errors in f should be less than 1/2 in the last significant place, they may accumulate; the greatest error that can be obtained corresponds to: Differences Tabular error 1st 2nd 3rd 4th 5th 6th 1 + 2 −1 +2 1 − 2 -4 +1 +8 1 -2 + 2 −1 +4 -16 +2 -8 +32 1 − 2 +1 -4 +16 1 -2 +8 + 2 −1 +4 1 − 2 +2 +1 1 + 2 A rough working criterion for the expected fluctuations (noise level) due to round-off error is shown in the table: 43 | P a g e Order of differences 1 2 3 4 5 6 Expected error limits ±1 ±2 ±3 ±6 ±12 ±22 EXERCISES 1. Construct the difference table for the function f (x) = x3 for x = 0(1) 6. 2. Construct difference tables for each of the polynomials: a) 2𝑥𝑥 − 1 for 𝑥𝑥 = 0(1)3. b) 3𝑥𝑥 2 + 2𝑥𝑥 − 4 for 𝑥𝑥 = 0(1)4. c) 2𝑥𝑥 3 + 3𝑥𝑥 − 3 for 𝑥𝑥 = 0(1)5. 3.Construct a difference table for the function 𝑓𝑓(𝑥𝑥) = 𝑒𝑒 𝑥𝑥 , given to 7D for x = 0.1(0.05) 0.5 𝑥𝑥 𝑓𝑓(𝑥𝑥) 𝑥𝑥 𝑓𝑓(𝑥𝑥) 𝑥𝑥 𝑓𝑓(𝑥𝑥) 0.10 1.105171 0.25 1.284025 0.40 1.491825 0.15 1.161834 0.30 1.349859 0.45 1.568312 0.20 1.221403 0.35 1.419068 0.50 1.648721 FINITE DIFFERENCES 2 Forward, backward, central difference notations There are several different notations for the single set of finite differences, described in the preceding Step. We introduce each of these three notations in terms of the so-called shift operator, which we will define first. 1. The shift operator E Let 𝑓𝑓𝑗𝑗 ≡ 𝑓𝑓 𝑥𝑥𝑗𝑗 , 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑥𝑥𝑗𝑗 = 𝑥𝑥0 + 𝑗𝑗ℎ , 𝑗𝑗 = 0,1,2, … , 𝑛𝑛. be a set of values of the function f(x) The shift operator E is defined by: 44 | P a g e 𝐸𝐸𝑓𝑓𝑗𝑗 ≡ 𝑓𝑓𝑗𝑗+1. Consequently, 𝐸𝐸 2 𝑓𝑓𝑗𝑗 = 𝐸𝐸 𝐸𝐸𝑓𝑓𝑗𝑗 = 𝐸𝐸𝑓𝑓𝑗𝑗 +1 = 𝑓𝑓𝑗𝑗 +2. and so on, i.e., 𝐸𝐸 𝑘𝑘 𝑓𝑓𝑗𝑗 = 𝑓𝑓𝑗𝑗+𝑘𝑘 , where k is any positive integer. Moreover, the last formula can be extended to negative integers, and indeed to all real values of j and k, so that, for example, 𝐸𝐸 −1 𝑓𝑓𝑗𝑗 = 𝑓𝑓𝑗𝑗−1, And 1 1 1 𝐸𝐸 2 𝑓𝑓𝑗𝑗 = 𝑓𝑓𝑗𝑗+1 = 𝑓𝑓 𝑥𝑥𝑗𝑗 + 2 ℎ = 𝑓𝑓 𝑥𝑥0 + 𝑗𝑗 + 2 ℎ. 2 2. The forward difference operator ∆ If we define the forward difference operator ∆ by ∆≡ 𝐸𝐸 − 1 then ∆𝑓𝑓𝑗𝑗 = (𝐸𝐸 − 1)𝑓𝑓𝑗𝑗 = 𝐸𝐸𝑓𝑓𝑗𝑗 − 𝑓𝑓𝑗𝑗 = 𝑓𝑓𝑗𝑗+1 − 𝑓𝑓𝑗𝑗 , which is the first-order forward difference at xj. Similarly, we find that ∆2 𝑓𝑓𝑗𝑗 = ∆ ∆𝑓𝑓𝑗𝑗 = ∆𝑓𝑓𝑗𝑗+1 − ∆𝑓𝑓𝑗𝑗 = 𝑓𝑓𝑗𝑗+2 − 2𝑓𝑓𝑗𝑗+1 + 𝑓𝑓𝑗𝑗 is the second-order forward difference at xj, and so on. The forward difference of order k is 45 | P a g e ∆𝑘𝑘 𝑓𝑓𝑗𝑗 = ∆𝑘𝑘−1 ∆𝑓𝑓𝑗𝑗 = ∆𝑘𝑘−1 𝑓𝑓𝑗𝑗+1 − 𝑓𝑓𝑗𝑗 = ∆𝑘𝑘−1 𝑓𝑓𝑗𝑗+1 − ∆𝑘𝑘−1 𝑓𝑓𝑗𝑗 where k is any integer. 3. The backward difference operator 𝛁𝛁 If we define the backward difference operator ∇ by ∇≡ 1 − 𝐸𝐸 −1 , then ∇𝑓𝑓𝑗𝑗 = (1 − 𝐸𝐸 −1 )𝑓𝑓𝑗𝑗 = 𝑓𝑓𝑗𝑗 − 𝐸𝐸 −1 𝑓𝑓𝑗𝑗 = 𝑓𝑓𝑗𝑗 − 𝑓𝑓𝑗𝑗−1 which is the first-order backward difference at xj. Similarly, ∇2 𝑓𝑓𝑗𝑗 = ∇ ∇𝑓𝑓𝑗𝑗 = ∇𝑓𝑓𝑗𝑗 − ∇𝑓𝑓𝑗𝑗−1 = 𝑓𝑓𝑗𝑗 − 2𝑓𝑓𝑗𝑗−1 + 𝑓𝑓𝑗𝑗−2 is the second-order backward difference at xj, etc. The backward difference of order k is ∇𝑘𝑘 𝑓𝑓𝑗𝑗 = ∇𝑘𝑘−1 ∇𝑓𝑓𝑗𝑗 = ∇𝑘𝑘−1 𝑓𝑓𝑗𝑗 − 𝑓𝑓𝑗𝑗−1 = ∇𝑘𝑘−1 𝑓𝑓𝑗𝑗 − ∇𝑘𝑘−1 𝑓𝑓𝑗𝑗−1 where k is any integer. Note that ∇𝑓𝑓𝑗𝑗 = ∆𝑓𝑓𝑗𝑗−1 and ∇𝑘𝑘 𝑓𝑓𝑗𝑗 = ∆𝑘𝑘 𝑓𝑓𝑗𝑗−𝑘𝑘. 4. The central difference operator 𝜹𝜹 If we define the central difference operator 𝜹𝜹 by 1 1 𝛿𝛿 ≡ 𝐸𝐸 2 − 𝐸𝐸 −2 then 1 1 1 1 𝛿𝛿𝑓𝑓𝑗𝑗 = 𝐸𝐸2 − 𝐸𝐸−2 𝑓𝑓𝑗𝑗 = 𝐸𝐸 2 𝑓𝑓𝑗𝑗 − 𝐸𝐸 −2 𝑓𝑓𝑗𝑗 = 𝑓𝑓𝑗𝑗+1 − 𝑓𝑓𝑗𝑗−1 2 2 46 | P a g e which is the first-order central difference at xj. Similarly, 𝛿𝛿 2 𝑓𝑓𝑗𝑗 = 𝛿𝛿 𝛿𝛿𝑓𝑓𝑗𝑗 = 𝛿𝛿 𝑓𝑓 1 − 𝑓𝑓 1 = 𝑓𝑓𝑗𝑗+1 − 2𝑓𝑓𝑗𝑗 + 𝑓𝑓𝑗𝑗−1 𝑗𝑗+ 𝑗𝑗− 2 2 is the second-order central difference at xj, etc. The central difference of order k is 𝑘𝑘−1 𝛿𝛿 𝑘𝑘 𝑓𝑓𝑗𝑗 = 𝛿𝛿 𝑘𝑘−1 𝛿𝛿𝑓𝑓𝑗𝑗 = 𝛿𝛿 𝑘𝑘−1 𝑓𝑓 1 − 𝑓𝑓 1 = 𝛿𝛿 𝑘𝑘−1 𝑓𝑓 1 − 𝛿𝛿 𝑓𝑓 1 𝑗𝑗+ 𝑗𝑗− 𝑗𝑗+ 𝑗𝑗− 2 2 2 2 where k is any integer. Note that 𝛿𝛿 𝑘𝑘 𝑓𝑓𝑗𝑗+1 = ∆𝑓𝑓𝑗𝑗 = ∇𝑓𝑓𝑗𝑗+1. 2 5. Differences display The role of the forward, central, and backward differences is displayed by the difference table: Differences 𝑥𝑥 𝑓𝑓(𝑥𝑥) 1st 2nd 3rd 4th 𝑥𝑥0 𝑓𝑓0 ∆𝑓𝑓0 𝑥𝑥1 𝑓𝑓1 ∆2 𝑓𝑓0 ∆𝑓𝑓1 ∆3 𝑓𝑓0 𝑥𝑥2 𝑓𝑓2 ∆2 𝑓𝑓1 ∆4 𝑓𝑓0 ∆𝑓𝑓2 𝑥𝑥3 𝑓𝑓3 ∆3 𝑓𝑓1 ∆2 𝑓𝑓2 𝑥𝑥4 𝑓𝑓4 ∆𝑓𝑓3 47 | P a g e ⋮ ⋮ 𝑥𝑥𝑗𝑗−2 𝑓𝑓𝑗𝑗−2 𝛿𝛿𝑓𝑓𝑗𝑗−3 2 𝑥𝑥𝑗𝑗−1 𝑓𝑓𝑗𝑗−1 𝛿𝛿 2 𝑓𝑓𝑗𝑗−1 1 𝛿𝛿𝑓𝑓𝑗𝑗−1 𝛿𝛿 3 𝑓𝑓𝑗𝑗− 2 2 𝑥𝑥𝑗𝑗 𝑓𝑓𝑗𝑗 𝛿𝛿 2 𝑓𝑓𝑗𝑗 𝛿𝛿 4 𝑓𝑓𝑗𝑗 𝛿𝛿𝑓𝑓𝑗𝑗+1 𝛿𝛿 3 𝑓𝑓𝑗𝑗+1 𝑥𝑥𝑗𝑗+1 𝑓𝑓𝑗𝑗+1 2 2 𝛿𝛿 2 𝑓𝑓𝑗𝑗+1 𝑥𝑥𝑗𝑗+2 𝑓𝑓𝑗𝑗+2 𝛿𝛿𝑓𝑓𝑗𝑗+3 2 ⋮ ⋮ 𝑥𝑥𝑛𝑛−4 𝑓𝑓𝑛𝑛−4 𝑥𝑥𝑛𝑛−3 𝑓𝑓𝑛𝑛−3 𝑥𝑥𝑛𝑛−2 𝑓𝑓𝑛𝑛−2 ∇𝑓𝑓𝑛𝑛−3 ∇2 𝑓𝑓𝑛𝑛−2 𝑥𝑥𝑛𝑛−1 𝑓𝑓𝑛𝑛−1 ∇𝑓𝑓𝑛𝑛−2 ∇3 𝑓𝑓𝑛𝑛−1 ∇2 𝑓𝑓𝑛𝑛−1 ∇4 𝑓𝑓𝑛𝑛 𝑥𝑥𝑛𝑛 𝑓𝑓𝑛𝑛 ∇𝑓𝑓𝑛𝑛−1 ∇3 𝑓𝑓𝑛𝑛 2 ∇ 𝑓𝑓𝑛𝑛 48 | P a g e ∇𝑓𝑓𝑛𝑛 Although forward, central, and backward differences represent precisely the same data: 1. Forward differences are useful near the start of a table, since they only involve tabulated function values below xj ; 2. Central differences are useful away from the ends of a table, where there are available tabulated function values above and below xj; 3. Backward differences are useful near the end of a table, since they only involve tabulated function values above xj. EXERCISES 1. Construct a table of differences for the polynomial 𝑓𝑓(𝑥𝑥) = 3𝑥𝑥 3 − 2𝑥𝑥 2 + 𝑥𝑥 + 5; for x = 0(1)4. Use the table to obtain the values of : 𝑎𝑎) ∆𝑓𝑓1, ∆2 𝑓𝑓1 , ∆3 𝑓𝑓1 , ∆3 𝑓𝑓0 , ∆2 𝑓𝑓2.; b) ∇𝑓𝑓1 , ∇𝑓𝑓2 , ∇2 𝑓𝑓2 , ∇2 𝑓𝑓3 , ∇3 𝑓𝑓4.; 𝑐𝑐) 𝛿𝛿𝑓𝑓1 , 𝛿𝛿 2 𝑓𝑓1 , 𝛿𝛿 3 𝑓𝑓3 , 𝛿𝛿 3 𝑓𝑓5 , 𝛿𝛿 2 𝑓𝑓2. 2 2 2 2.For the difference table of f (x) = ex for x = 0.1(0.05)0.5 determine to six significant digits the quantities (taking x0 = 0.1 ): 𝑎𝑎) ∆𝑓𝑓2, ∆2 𝑓𝑓2 , ∆3 𝑓𝑓2 , ∆4 𝑓𝑓2.; b) ∇𝑓𝑓6 , ∇2 𝑓𝑓6 , ∇3 𝑓𝑓6 , ∇4 𝑓𝑓6.; 49 | P a g e 𝑐𝑐) 𝛿𝛿 2 𝑓𝑓4 , 𝛿𝛿 4 𝑓𝑓4.; 𝑑𝑑) ∆2 𝑓𝑓1 , 𝛿𝛿 2 𝑓𝑓2 , ∇2 𝑓𝑓3.; 𝑒𝑒) ∆3 𝑓𝑓3 , ∇3 𝑓𝑓6 , 𝛿𝛿 3 𝑓𝑓9. 2 3.Prove the statements: 𝑎𝑎) 𝐸𝐸𝑥𝑥𝑗𝑗 = 𝑥𝑥𝑗𝑗+1.; 𝑏𝑏) ∆3 𝑓𝑓𝑗𝑗 = 𝑓𝑓𝑗𝑗+3 − 3𝑓𝑓𝑗𝑗+2 + 3𝑓𝑓𝑗?