Calculus Concepts and Methods PDF
Document Details
Uploaded by Deleted User
2012
Ken Binmore and Joan Davies
Tags
Summary
This textbook, Calculus Concepts and Methods, provides a comprehensive exploration of calculus concepts, extending beyond introductory material to cover functions of multiple variables, optimization, and more complex mathematical topics. Written for university-level students familiar with fundamental calculus principles.
Full Transcript
Calculus KEN BINMORE held the Chair of Mathematics for many years at the London School of Economics, where this book was born. In recent years, he has used his mathematical skills in developing the theory of games in a number of posts as a professor of...
Calculus KEN BINMORE held the Chair of Mathematics for many years at the London School of Economics, where this book was born. In recent years, he has used his mathematical skills in developing the theory of games in a number of posts as a professor of economics, both in Britain and the USA. His most recent exploit was to head the team that used mathematical ideas to design the UK auction of telecom frequencies that made 35 billion dollars for the British taxpayer. JOAN DAVIES has a doctorate in mathematics from the University of Oxford. She is committed to bringing an understanding of mathematics to a wider audience. She currently lectures at the London School of Economics. Dedications To Jim Clunie Ken Binmore To my family Roy, Elizabeth, Sarah and Marion Joan Davies Calculus Ken Binmore University College, London Joan Davies London School of Economics CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Delhi, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521775410 © Cambridge University Press 2001 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2001 7th printing 2012 Printed and Bound in the United Kingdom by the MPG Books Group A catalogue record of this publication is available from the British Library ISBN 978-0-521-77541-0 Paperback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. Contents Preface Acknowledgements 1 Matrices and vectors 1.1 Matrices 1.2 Exercises 1.3 Vectors in 2 1.4 Exercises 1.5 Vectors in 3 1.6 Lines 1.7 Planes 1.8 Exercises 1.9 Vectors in n 1.10 Flats 1.11 Exercises 1.12 Applications (optional) 1.12.1 Commodity bundles 1.12.2 Linear production models 1.12.3 Price vectors 1.12.4 Linear programming 1.12.5 Dual problem 1.12.6 Game theory 2 Functions of one variable 2.1 Intervals 2.2 Real valued functions of one real variable 2.3 Some elementary functions 2.3.1 Power functions 2.3.2 Exponential functions 2.3.3 Trigonometric functions 2.4 Combinations of functions 2.5 Inverse functions 2.6 Inverses of the elementary functions 2.6.1 Root functions 2.6.2 Exponential and logarithmic functions 2.7 Derivatives 2.8 Existence of derivatives 2.9 Derivatives of inverse functions 2.10 Calculation of derivatives 2.10.1 Derivatives of elementary functions and their inverses 2.10.2 Derivatives of combinations of functions 2.11 Exercises 2.12 Higher order derivatives 2.13 Taylor series for functions of one variable 2.14 Conic sections 2.15 Exercises 3 Functions of several variables 3.1 Real valued functions of two variables 3.1.1 Linear and affine functions 3.1.2 Quadric surfaces 3.2 Partial derivatives 3.3 Tangent plane 3.4 Gradient 3.5 Derivative 3.6 Directional derivatives 3.7 Exercises 3.8 Functions of more than two variables 3.8.1 Tangent hyperplanes 3.8.2 Directional derivatives 3.9 Exercises 3.10 Applications (optional) 3.10.1 Indifference curves 3.10.2 Profit maximisation 3.10.3 Contract curve 4 Stationary points 4.1 Stationary points for functions of one variable 4.2 Optimisation 4.3 Constrained optimisation 4.4 The use of computer systems 4.5 Exercises 4.6 Stationary points for functions of two variables 4.7 Gradient and stationary points 4.8 Stationary points for functions of more than two variables 4.9 Exercises 5 Vector functions 5.1 Vector valued functions 5.2 Affine functions and flats 5.3 Derivatives of vector functions 5.4 Manipulation of vector derivatives 5.5 Chain rule 5.6 Second derivatives 5.7 Taylor series for scalar valued functions of n variables 5.8 Exercises 6 Optimisation of scalar valued functions 6.1 Change of basis in quadratic forms 6.2 Positive and negative definite 6.3 Maxima and minima 6.4 Convex and concave functions 6.5 Exercises 6.6 Constrained optimisation 6.7 Constraints and gradients 6.8 Lagrange’s method – optimisation with one constraint 6.9 Lagrange’s method – general case 6.10 Constrained optimisation – analytic criteria 6.11 Exercises 6.12 Applications (optional) 6.12.1 The Nash bargaining problem 6.12.2 Inventory control 6.12.3 Least squares analysis 6.12.4 Kuhn–Tucker conditions 6.12.5 Linear programming 6.12.6 Saddle points 7 Inverse functions 7.1 Local inverses of scalar valued functions 7.1.1 Differentiability of local inverse functions 7.1.2 Inverse trigonometric functions 7.2 Local inverses of vector valued functions 7.3 Coordinate systems 7.4 Polar coordinates 7.5 Differential operators 7.6 Exercises 7.7 Application (optional): contract curve 8 Implicit functions 8.1 Implicit differentiation 8.2 Implicit functions 8.3 Implicit function theorem 8.4 Exercises 8.5 Application (optional): shadow prices 9 Differentials 9.1 Matrix algebra and linear systems 9.2 Differentials 9.3 Stationary points 9.4 Small changes 9.5 Exercises 9.6 Application (optional): Slutsky equations 10 Sums and integrals 10.1 Sums 10.2 Integrals 10.3 Fundamental theorem of calculus 10.4 Notation 10.5 Standard integrals 10.6 Partial fractions 10.7 Completing the square 10.8 Change of variable 10.9 Integration by parts 10.10 Exercises 10.11 Infinite sums and integrals 10.12 Dominated convergence 10.13 Differentiating integrals 10.14 Power series 10.15 Exercises 10.16 Applications (optional) 10.16.1 Probability 10.16.2 Probability density functions 10.16.3 Binomial distribution 10.16.4 Poisson distribution 10.16.5 Mean 10.16.6 Variance 10.16.7 Standardised random variables 10.16.8 Normal distribution 10.16.9 Sums of random variables 10.16.10 Cauchy distribution 10.16.11 Auctions 11 Multiple integrals 11.1 Introduction 11.2 Repeated integrals 11.3 Change of variable in multiple integrals 11.4 Unbounded regions of integration 11.5 Multiple sums and series 11.6 Exercises 11.7 Applications (optional) 11.7.1 Joint probability distributions 11.7.2 Marginal probability distributions 11.7.3 Expectation, variance and covariance 11.7.4 Independent random variables 11.7.5 Generating functions 11.7.6 Multivariate normal distributions 12 Differential equations of order one 12.1 Differential equations 12.2 General solutions of ordinary equations 12.3 Boundary conditions 12.4 Separable equations 12.5 Exact equations 12.6 Linear equations of order one 12.7 Homogeneous equations 12.8 Change of variable 12.9 Identifying the type of first order equation 12.10 Partial differential equations 12.11 Exact equations and partial differential equations 12.12 Change of variable in partial differential equations 12.13 Exercises 13 Complex numbers 13.1 Quadratic equations 13.2 Complex numbers 13.3 Modulus and argument 13.4 Exercises 13.5 Complex roots 13.6 Polynomials 13.7 Elementary functions 13.8 Exercises 13.9 Applications (optional) 13.9.1 Characteristic functions 13.9.2 Central limit theorem 14 Linear differential and difference equations 14.1 The operator P(D) 14.2 Difference equations and the shift operator E 14.3 Linear operators 14.4 Homogeneous, linear, differential equations 14.5 Complex roots of the auxiliary equation 14.6 Homogeneous, linear, difference equations 14.7 Nonhomogeneous equations 14.7.1 Nonhomogeneous differential equations 14.7.2 Nonhomogeneous difference equations 14.8 Convergence and divergence 14.9 Systems of linear equations 14.10 Change of variable 14.11 Exercises 14.12 The difference operator (optional) 14.13 Exercises 14.14 Applications (optional) 14.14.1 Cobweb models 14.14.2 Gambler’s ruin Answers to starred exercises with some hints and solutions Appendix Index Revision material More challenging material Preface Ancient accountants laid pebbles in columns on a sand tray to help them do their sums. It is thought that the impression left in the sand when a pebble is moved to another location is the origin of our symbol for zero. The word calculus has the same source, since it means a pebble in Latin. Nowadays it means any systematic way of working out something mathematical. We still speak of a calculator when referring to the modern electronic equivalent of an ancient sandtray and pebbles. However, since Isaac Newton invented the differential and integral calculus, the word is seldom applied to anything else. Although there are pebbles on its cover, this book is therefore about differentiating and integrating. Students who don’t already know what derivatives and integrals are would be wise to start with another book. Our aim is to go beyond the first steps to discuss how calculus works when it is necessary to cope with several variables all at once. We appreciate that some readers will be rusty on the basics, and others will be doubtful that they ever really understood what they can remember. We therefore go over the material on the calculus of one variable in a manner that we hope will offer some new insights even to those rare souls who feel confident of their mathematical prowess. However, we strongly recommend against using this material as a substitute for a first course in calculus. It goes too fast and offers too much detail to be useful for this purpose. It should be emphasised that this is not a cook book containing a menu of formulas that students are expected to learn by rote in order to establish their erudition at examination time. We see no point in turning out students who can write down the formal derivation of the Slutsky equations, but have no idea what the mathematical manipulations they have learned to reproduce actually mean. When one teaches how things are done without explaining why, one does worse than fill the heads of the weaker students with mumbo-jumbo, one teaches the stronger students something very wrong – that mathematics is a list of theorems and proofs that have no practical relevance to anything real. The attitude that mathematics is a menu of formulas that ordinary mortals can only admire from afar is very common among those who know no mathematics at all. Research mathematicians are often greeted with incredulity when they say what they do for a living. People think that inventing a new piece of mathematics would be like inventing a new commandment to be added to the ten that Moses brought down from the mountain. Such awe of mathematics creates a form of hysterical paralysis that must be overcome before a student can join the community of those of us who see mathematics as an ever changing box of tools that educated people can use to make sense of the world around them. Within this community, a model is not expressed in mathematical form to invite the applause of those who are easily impressed, or to obfuscate the issues in order to immunise the model from criticisms by uninitiated outsiders. Instead the community we represent is always anxious to find the simplest possible model that captures how a particular aspect of some physical or social process works. For us, mathematical sophistication is pointless unless it serves to demystify things that we would not otherwise be able to understand. We do not see mathematical modelling as some grandiose activity that can only be carried out by professors at the blackboard. Mathematical modelling is what everybody should do when seeking to make sense of a problem. Of course, beginners will only be able to construct very simple models – but a good teacher will only ask them to solve very simple problems. Such an attitude to solving problems is not possible with students whose intellectual processes freeze over at the mention of an equation. The remedy for this species of mathematical paralysis lies in teaching that mathematics is something one does – not something that one just appreciates. Rather than offering them a cook book, one needs to teach students to put together simple recipes of their own. We need to build their confidence in their own ability to think coherent mathematical thoughts all by themselves. Such confidence comes from involving students in the mathematics as it is developed, using the traditional method of demanding weekly answers to carefully chosen sets of problems. The problems must not be too hard – but nor must they be too easy. Nobody gets their confidence boosted by being asked to jump through hoops held too low. On the contrary, if we only ask students to solve problems that they can see are trivial, we merely confirm to them that their own low opinion of their mathematical ability is shared by their teacher. Some hoops have to be held high enough that students get to feel they have achieved something by jumping through them. We feel particularly strongly on this latter point, having watched the confidence of our student intake gradually diminish over the years as cook book teaching has taken over our schools under the pretence that old fashioned rote learning is being replaced by progressive methods that emphasise the underlying concepts. As this successor to the first version of Calculus shows, mathematicians don’t mind adjusting the content of their courses as school syllabuses develop over time. We are even willing to welcome less mathematics being taught in school if this means that more children become numerate. But the kind of cook book teaching that leaves students helpless if a problem does not fit one of a small number of narrow categories seems to us inexcusable. Our hostility to cook book teaching should not be taken to imply that this book is a rigorous work of mathematical analysis. It is a how to do it book, which contains no formal theorems at all. But we always explain why the methods work, because there is no way anybody can know how to use a method to tackle a new kind of problem unless they know why the method works. Our approach to explaining why a method works is largely geometrical. To this end, the book contains an unusually large number of diagrams – even more than in the first version. The availability of colour and computer programs like Mathematica means that the diagrams are also better. In addition, the already large number of examples and exercises has been augmented, with a view to increasing understanding by illustrating some of the things that can go wrong in cases that are usually passed over without comment. Finally, we include examples of how the mathematics we are teaching gets used in practice. The mathematics is the same wherever it is applied, but the applications to economics on which we concentrate are often particularly instructive because of the need to be especially careful about what is being kept constant during a differentiation. Students whose prime interest is in the hard sciences may find that these applications are a lot more fun than examples from physics that they will have seen in some form before. With our attitudes to the teaching of mathematics, it will come as no surprise that we view the exercises as an integral part of the text. There is no point in trying to read this or any other mathematics book without making a serious commitment to tackle as many of the exercises as time allows. Indeed, being able to solve a fair number of the exercises without assistance is the basic test of whether you understand the material. You may think you understand the concepts, but if you can’t do any of the exercises, then you don’t. You may think you don’t understand the concepts, but if you can do most of the exercises, then you do. Either way, you need to attempt the exercises to find out where you stand. A lot of work went into tailoring this successor to Calculus to the needs of today’s students. Our readers will have to work equally hard to enjoy its benefits. We hope that they will also share our feeling of having done something genuinely worthwhile. Ken Binmore Joan Davies Acknowledgements Joan Davies is grateful to the following: firstly to Michele Harvey for her support and formative suggestions during the early stages of the project, to Gillian Colkin and especially to David Cartwright for their invaluable comments on drafts of the manuscript and for their help in testing exercises, to Mark Baltovic for a student’s view of the book and to Dean Ives and James Ward for useful comments, to Roy Davies for help in coping with the idiosyncrasies of various software packages. Ken Binmore gratefully acknowledges the support of the Leverhulme Foundation and the Economic and Social Research Council. One Matrices and vectors This book takes for granted that readers have some previous knowledge of the calculus of real functions of one real variable. It would be helpful to also have some knowledge of linear algebra. However, for those whose knowledge may be rusty from long disuse or raw with recent acquisition, sections on the necessary material from these subjects have been included where appropriate. Although these revision sections (marked with the symbol ) are as self-contained as possible, they are not suitable for those who have no acquaintance with the topics covered. The material in the revision sections is surveyed rather than explained. It is suggested that readers who feel fairly confident of their mastery of this surveyed material scan through the revision sections quickly to check that the notation and techniques are all familiar before going on. Probably, however, there will be few readers who do not find something here and there in the revision sections which merits their close attention. The current chapter is concerned with the fundamental techniques from linear algebra which we shall be using. This will be particularly useful for those who may be studying linear algebra concurrently with the present text. Algebraists are sometimes neglectful of the geometric implications of their results. Since we shall be making much use of geometrical arguments, particular attention should therefore be paid to §1.3 onwards, in which the geometric relevance of various vector notions is explained. This material will be required in Chapter 3. Those who are not very confident of their linear algebra may prefer leaving §1.10 until Chapter 5. 1.1 Matrices A matrix is a rectangular array of numbers – a notation which enables calculations to be carried out in a systematic manner. We enclose the array in brackets as in the examples below: A matrix with m rows and n columns is called an m × n matrix. Thus A is a 3 × 2 matrix and B is a 2 × 3 matrix. A general m × n matrix may be expressed as where the first number in each subscript is the row and the second number in the subscript is the column. For example, c21 is the entry in the second row and the first column of the matrix C. Similarly, the entry in the third row and the first column of the preceding matrix A can be denoted by a31: We call the entries of a matrix scalars. Sometimes it is useful to allow the scalars to be complex numbers† but our scalars will always be real numbers. We denote the set of real numbers by. Scalar multiplication One can do a certain amount of algebra with matrices and under this and the next few headings we shall describe the mechanics of some of the operations which are possible. The first operation we shall consider is called scalar multiplication. If A is an m × n matrix and c is a scalar, then c A is the m × n matrix obtained by multiplying each entry of A by c. For example, Similarly, Matrix addition and subtraction If C and D are two m × n matrices, then C + D is the m × n matrix obtained by adding corresponding entries of C and D. Similarly, C − D is the m × n matrix obtained by subtracting corresponding entries. For example, if then and Note that and that The final matrix is called the 3 × 3 zero matrix. We usually denote any zero matrix by 0. This is a little naughty because of the possibility of confusion with other zero matrices or with the scalar 0. However, it has the advantage that we can then write for any matrix C. Note that it makes no sense to try to add or subtract two matrices which are not of the same shape. Thus, for example, is an entirely meaningless expression. Matrix multiplication If A is an m × n matrix and B is an n × p matrix, then A and B can be multiplied to give an m × p matrix AB: To work out the entry c jk of AB which appears in its jth row and kth column, we require the jth row of A and the kth column of B as illustrated below. The entry c jk is then given by Example 1 We calculate the product AB of the matrices Since A is a 2 × 3 matrix and B is a 3 × 2 matrix, their product AB is a 2 × 2 matrix: To calculate c, we require the second row of A and the first column of B. These are indicated in the matrices below: We obtain Similarly, Thus Note again that it makes no sense to try to calculate AB unless the number of columns in A is the same as the number of rows in B. Thus, for example, it makes no sense to write Identity matrices An n × n matrix is called a square matrix for obvious reasons. Thus, for example, is a square matrix. The main diagonal of a square matrix is indicated by the shaded entries in the matrix below: The n × n identity matrix† is the n × n matrix whose main diagonal entries are all 1 and whose other entries are all 0. For convenience, we usually denote an identity matrix of any order by I. The 3 × 3 identity matrix is Generally, if I is the n × n identity matrix and A is an m × n matrix, B is an n × p matrix, then Example 2 Determinants With each square matrix there is associated a scalar called the determinant of the matrix. We shall denote the determinant of the square matrix A by det(A) or by |A|.‡ A 1 × 1 matrix A = (a) is just a scalar, and det(A) = a. The determinant of the 2 × 2 matrix is given by To calculate the determinant for larger matrices we need the concepts of a minor and a cofactor. The minor M corresponding to an entry a in a square matrix A is the determinant of the matrix obtained from A by deleting the row and the column containing a. In the case of a 3 × 3 matrix we calculate the minor M23 corresponding to the entry a23 by deleting the second row and the third column of A as below: The minor M23 is then the determinant of what remains – i.e. If we alter the sign of the minor M of a general n × n matrix according to its associated position in the checkerboard pattern illustrated below, the result is called the cofactor corresponding to the entry a. In the case of a 3 × 3 matrix A, the cofactor corresponding to the entry a23 is equal to −M23, since there is a minus sign, ‘−’, in the second row and the third column of the checkerboard pattern: The determinant of an n × n matrix A is calculated by multiplying each entry of one row (or one column) by its corresponding cofactor and adding the results. The value of det(A) is the same whichever row or column is used. Example 3 The determinant of the 1 × 1 matrix A = (3) is simply det(A) = 3. The determinant of the 2 × 2 matrix is We find the determinant of the 3 × 3 matrix in two ways: first using row 1 and then using column 2. The entries of row 1 are We first find their corresponding minors, and then alter the signs according to the checkerboard pattern to obtain the corresponding cofactors, Then We calculate the determinant again, this time using column 2: The corresponding cofactors are Thus Inverse matrices We have dealt with matrix addition, subtraction and multiplication and found that these operations only make sense in certain restricted circumstances. The circumstances under which it is possible to divide by a matrix are even more restricted. We say that a square matrix A is invertible † if there is another matrix B such that In fact, if A is invertible there is precisely one such matrix B which we call the inverse matrix to A and write B = A−1. Thus an invertible matrix A has an inverse matrix A−1 which satisfies If A is an n × n matrix, then A−1 is an n × n matrix as well (otherwise the equation would make no sense). It can be shown that If A is not square or if A is square but its determinant is zero (i.e. A is not invertible), then A does not have an inverse in the above sense. Transpose matrices In describing how to calculate the inverse of an invertible matrix, we shall need the idea of a transpose matrix. This is also useful in other connections. If A is an m × n matrix, then its transpose AT is the n × m matrix whose first row is the first column of A, whose second row is the second column of A, whose third row is the third column of A and so on.‡ An important special case occurs when A is a square matrix for which A = AT. Such a matrix is called symmetric (about the main diagonal). Example 4 (i) Note that (ii) Thus A = AT and so A is symmetric. The cofactor method for A−1 The inverse of an invertible matrix A can be calculated from its determinant and its cofactors. In the case of 1×1 and 2×2 matrices, one might as well learn the resulting inverse matrix by heart. A 1 × 1 invertible matrix A = (a) is just a nonzero scalar and A 2 × 2 invertible matrix is one for which det(A) = ad − bc ≠ 0. Its inverse is given by The formulas can easily be confirmed by checking that AA−1 and A−1A actually are equal to I. For example in the 2 × 2 case The cofactor method gives an expression for the inverse of an invertible matrix A in terms of its cofactors. In the case of an invertible 3 × 3 matrix for which det(A) ≠ 0 the inverse is given by For a general invertible n × n matrix we have Thus, each entry of A is replaced by the corresponding minor and the sign is then altered according to the checkerboard pattern to obtain the corresponding cofactor. The inverse matrix A−1 is then obtained by multiplying the transpose of the result by the scalar (det(A))−1. Example 5 We calculate the inverses of the invertible matrices of Example 3. We begin by replacing each entry of A by the corresponding minor and obtain Next, the signs are altered according to the checkerboard pattern to yield the result We now take the transpose of this matrix – i.e. Finally, the inverse is obtained by multiplying by the scalar (det(A))−1 = Thus 1.2 Exercises 1. The matrices A, B, C and D are given by Decide which of the following expressions make sense, giving reasons. Evaluate those expressions which make sense. 2. The matrices A, B, C and D are given by Which of these square matrices are invertible? Find the inverse matrices of those which are invertible and check your answers by multiplying the inverses and the matrices from which they were obtained. 3. Evaluate the following determinants using the cofactor expansion along an appropriate row or column. * For what value(s) of t is the determinant (iii) equal to zero? 4 For what value(s) of t is the matrix not invertible? 5. Even if the expressions AB and BA both make sense, it is not necessarily true that AB = BA. Check this fact using the matrices A and B of Exercise 2. 6. It is always true that A(BC) = (AB)C, provided that each side makes sense. Check this result in the case when 7. It is always true that provided that A and B are n × n invertible matrices. Provided that AB makes sense, it is also always true that Check all these results in the case when The matrices need not be square for the result (AB)T = BT AT to hold. Verify this result for the matrices 8 If A and B are both n ×1 column matrices, verify that the matrix products AT B and BT A are both defined and find their orders. What is the relationship between AT B and BT A? Verify that the matrix products ABT and B AT are both defined and find their orders. What is the relationship between ABT and B AT? 9 If A is a 3 × 3 matrix with det(A) = 7, evaluate 10. Let A, B and C be n × n matrices with the property that AB = I and C A = I. Prove that B = C. 11 Show that the matrix has the property that A2 = 0. * Find the form of all 2 × 2 matrices with this property. 12. Let Show that there is no nonzero matrix C such that AC = BC. 13. Let Verify that AB = AC although B ≠ C. Verify also that the matrices B − C and A are both not invertible. 14 If A, B, C are nonzero square matrices with prove that the matrices B − C and A are both not invertible. 15 If A is an n × n matrix and X is an n × 1 nonzero column matrix with show, by assuming the contrary, that det(A) = 0. This is an important result that is used extensively. 16 Let A be an n × n matrix, X an n × 1 nonzero column matrix and λ a scalar. Show that the scalar equation which gives the values of λ that satisfy the matrix equation AX = λX is Write λX = λI X and use Exercise 15. 1.3 Vectors in 2 The set of real numbers can be represented as the points along a horizontal line. Alternatively, we can think of the real numbers as representing displacements along such a line. For example, the point 2 can be displaced to the point 5 by the operation of adding 3. Thus, 3 can represent a displacement of three units to the right. Similarly, −2 can represent a displacement of two units to the left. The set 2 consists of all 2 × 1 column matrices or vectors. If a vector v 2, it is therefore an object of the form where the two components 1 and 2 are real numbers, which we call scalars when vectors are also being discussed. To save space it is often convenient to use the transpose notation of §1.1, and write v = ( 1, 2)T. Just as a real number can be used to represent either the position of a point or a displacement along a line, so a two dimensional vector can be used to represent either the position of a point,† or a displacement in a plane. For this purpose, as well as a horizontal x axis pointing to the right, we need a vertical y axis pointing upwards. The cartesian plane is simply an ordinary plane equipped with such axes. For example, (3, 2)T can represent either the point obtained by displacing the origin three units horizontally to the right and two units vertically upwards, or it can represent the displacement itself. When thinking of a vector as a displacement, the arrow we use to represent the vector need not have its initial point at the origin. All arrows with the same length and direction, wherever they are located, represent the same displacement vector. To make sure that we do not confuse a vector with a scalar, it is common to print vectors in boldface type, and to underline them when writing by hand. The definitions of matrix addition and scalar multiplication given in §1.1 apply to vectors. Thus, as illustrated in the diagram, we define the vector addition of Observe that w is the displacement required to shift the point v to v + w. For obvious reasons, the rule for adding two vectors is called the parallelogram law, but it is often more useful to draw triangles than parallelograms, as when several vectors are added together. Two n × 1 column vectors cannot be multiplied as matrices (unless n = 1). But we can always multiply a matrix by a scalar. In particular, scalar multiplication of a vector by a scalar α is defined by By Pythagoras’ theorem, the length (or norm) of v is the scalar quantity Vectors u with u = 1 are called unit vectors. With the exception of the zero vector 0 = (0, 0)T, all vectors determine a direction. When using a nonzero vector v to specify only a direction, it is convenient to replace v by the unit vector which points in the direction of v. For example, the vector v = (3, 4)T is not a unit vector because Its direction can be given by the vector a unit vector which points in the same direction as v. Vectors w and v are parallel if w = αv for some scalar α with α ≠ 0. This implies that they have the same or opposite directions, but not necessarily the same length. The vector w − v is the vector from the point v to the point w. The distance between two points v and w is defined by E.g. if v = (3, 4)T and w = (4, 2)T then the distance between v and w is Although the matrix product of two vectors v and w makes no sense, we have This scalar quantity is called the scalar product† of v and w and written as e.g. It is easy to check that scalar products have the following properties:‡ (i) (ii) (iii) The angle between vectors The geometric interpretation of the scalar product gives rise to a formula for the angle between vectors. The cosine rule from trigonometry which states that is a generalisation of Pythagoras’ theorem, to which it reduces when C = π/2. Rewriting the cosine rule in terms of vectors, we obtain (1) where θ is the angle between v and w, chosen so that 0 ≤ θ ≤ π to avoid ambiguity. Now Therefore (2) Comparing (1) and (2) So θ can be calculated from the equation E.g. if θ is the angle between and Therefore θ = Nonzero vectors v and w are defined to be orthogonal (perpendicular or normal) if the angle between them is a right angle. Recall that cos θ = 0 for 0 ≤ θ ≤ π if and only if θ = π/2, so that the angle between v and w is a right angle. It follows that v and w are orthogonal if and only if v, w = 0. E.g. the vectors (1, 1)T and (−1, 1)T are orthogonal. To prove this, calculate the scalar product of the two vectors: It is often useful to split a vector v into two vectors, one of which is in the direction of a vector w and the other in a direction orthogonal to w. The vector v is then the sum of a ‘vector component along w’ and a ‘vector component orthogonal to w’. If θ is the angle between v and w, the (scalar) component of v in the direction of w is Observe that the component is negative when π/2 < θ ≤ π, because cos θ is negative in this range. Hence the vector component of v along w is Then the vector component of v orthogonal to w is To check that this vector is orthogonal to w, calculate their scalar product: E.g. the component of (1, 4)T along (1, 1)T is The vector component of (1, 4)T along (1, 1)T is The vector component of (1, 4)T orthogonal to (1, 1)T is 1.4 Exercises 1. Express the vectors in the diagram in the margin in component form. (i) Find their sum and verify your answer by drawing them on graph paper along the sides of a polygon. * Is the vector you obtain related to any of the given vectors? (ii) Find the end point of (7, 4)T if its initial point is (−2, 1)T. (iii)* Find the initial point of (7, 4)T if its end point is (3, −2)T. (iv) Find a vector of length one in the direction of the vector (−5, 12)T. (v) Find a vector of length 7 in the direction opposite to (−3, 4)T. 2. Let u = (2, 1)T, v = (8, −6)T and w = (−3, −4)T (i)* Draw the vectors u, v and w on graph paper. Find the angle between u and v and between v and w and use your drawing to verify your answers. (ii) Draw the vectors u + v, v − u, u + v + w and 2w − u − v. (iii)* Find the vectors 4(u + 5v) and w − 3(v − u) in component form. (iv)* Find u + v , u + v , 5 u + −2v and 5u − 2v. (v) Find the vector x that satisfies the equation u − 3x = v + 2w + x. (vi)* Find the scalar component of u along v. Express u as the sum of a vector along v and a vector perpendicular to v. (vii) Find the distance between the points v and w. Find also, the scalar product of v and w and the angle between them. 3 Show that the distance between two points v and w increases as the angle θ between the vectors v and w increases. Remember that 0 ≤ θ ≤ π. 4 Prove the three properties of the scalar product. 5 If a and b are given real numbers which satisfy the equation interpret the equation as the scalar product of two vectors, and write down at least two different pairs of vectors which are orthogonal. 6. Use vectors to prove the following elementary geometrical results: (i)* The line segment joining the midpoints of two sides of a triangle is parallel to the third side and equal to half of it. (ii) If one pair of opposite sides of a quadrilateral is parallel and equal, then so also is the other pair. (iii) The midpoints of the sides of any quadrilateral are the vertices of a parallelogram. 7. (i) If C is the midpoint of AB, find c in terms of a and b in the above figure. (ii)* If C divides AB in the ratio of m : n find c. 1.5 Vectors in 3 The set 3 consists of all 3 × 1 column matrices or vectors of the form where the components 1, 2 and 3 are scalars or real numbers. Three dimensional vectors can be used to represent either positions or displacements in space. For this purpose we need three mutually orthogonal axes. The way that these are usually drawn depends on the arms and body of a person when these point in orthogonal directions, as shown in the figure. The right arm then points in the direction of the first, or x axis. The left arm points in the direction of the second, or y axis. The head points in the direction of the third, or z axis. It is also conventional to use the right hand rule. If the index finger of the right hand points in the direction of the x axis and the middle finger in the direction of the y axis, the thumb then points in the direction of the z axis. As with vectors in 2, a three dimensional vector can be used to represent either a point or a displacement in 3. Vector addition and scalar multiplication are defined, just as in 2, by E.g. let v = (1, 2, 3)T and w = (2, 0, 5)T. Then Applying Pythagoras’ theorem twice, the length of v is and the distance between v and w is E.g. let v = (1, 2, 3)T and w = (2, 0, 5)T. Then the length of v is and the distance between v and w is The scalar product of v and w is defined by This has the same properties as the scalar product for two dimensional vectors. The angle between vectors is given by the same formula and the condition for vectors v and w to be orthogonal is similarly v, w = 0. E.g. find the cosine of the angle θ between the vectors v = (1, 2, 3)T and w = (2, 0, 5)T. We have Hence The vector v = (1, 1, 2)T is not a unit vector because But the vector is a unit vector which points in the same direction as v. The vectors v = (1, 2, 3)T and w = (−3, 3, −1)T are orthogonal because 1.6 Lines Let v 3, v ≠ 0. If t , tv is a vector in the direction of v if t > 0 and in the direction of −v if t < 0. As t varies over all real values, tv moves along the line drawn through the origin in the direction of v. So the equation represents all points on the line through the origin in the direction of v. If ξ and v 3 are constant, v ≠ 0, the equation Note that ξ and v are constant, while x and t are variable. represents all points on the line through ξ in the direction of v. This is called the parametric equation of the line. Each value of the parameter t yields a unique point on the line and vice versa. If v is a unit vector, then and so the distance between x and ξ is |t|. Writing the equation in terms of components, we obtain The corresponding scalar equations are If none of the i are zero, solving for t in each equation yields equations involving the vector components, known as cartesian equations: E.g. the equations of a line through (6, −2, 0)T in the direction (4, 3, −1)T are We can rewrite the equations as or in many other equivalent ways. A vector equation of the line is To find if the point (2, −5, 1)T lies on the line, see if there exists a t such that These three equations have a solution t = −1, so the point lies on the line. If the equations were inconsistent so that no solution could be found, we would conclude that (2, −5, 1)T does not lie on the line. Two lines that lie in the same plane are coplanar. Such lines either intersect or are parallel. Two lines that are not coplanar are skew. The angle between two intersecting lines x = ξ + tv and x = η + tu, is the angle between the vectors u and v. The lines intersect if and only if they have a point in common. This occurs when there exist scalars t1, t2 such that As t is a variable, the values it takes at the point of intersection will, in general, differ for the two lines. E.g. to find whether the lines x = (1, 0, 1)T+t (−1, −1, −1)T and x = t (2, 0, 1) T intersect, see if there exist scalars t1, t2 such that The equations are inconsistent, so there are no scalars t1 and t2 satisfying the above equations. Hence the lines do not intersect. A line through points ξ and η has direction η − ξ and so has equation E.g. the line passing through (1, 2, 3)T and (3, 2, 1)T has the form which can be expressed in the alternative form Do not try to write the equation y − 2 = t × 0 as (y − 2)/0 = t. It is not valid to divide by 0. It should be obvious that all of the above theory holds in 2, though of course, any two lines in 2 are always coplanar and there can be no skew lines in 2. 1.7 Planes Consider the equation where v is a fixed nonzero vector. In 2 the equation represents a line through the origin. In 3 it represents a plane through the origin. In both cases, the line or plane is orthogonal to v. We say that v is a normal to the line or plane. If ξ is a fixed vector, consider the equation In 2 it is the equation of a line through ξ orthogonal to v. In 3 it is the equation of a plane through ξ with normal v. The equation may be written as If θ is the angle between ξ and v, and v is a unit vector so that v = 1, we have Hence the equation becomes When v points towards the plane, the distance from the origin to the plane is c = ξ cos θ. When v points away from the plane, the distance from the origin to the plane is −c = − ξ cos θ. Hence the distance from the origin to the plane is |c|. The equation v, x = c written in terms of the components is known as the cartesian equation: Summarising, a single linear equation of the above form defines a plane in 3, with normal vector v = ( 1, 2, 3)T. If v is a unit vector, |c| is the distance of the plane from the origin. Example 6 In 2 the equation is the equation of a line in two dimensions. The vector (3, 4)T is normal to this line. A unit normal to the line is the vector. If the equation is rewritten in the form we therefore conclude that the distance of the line from (0, 0)T is 1. Example 7 The equation is the equation of a plane in 3. A normal to this plane is the vector (1, 2, 2) T. A unit normal is the vector If the equation is rewritten in the form we conclude that the distance from (0, 0, 0)T to the plane is. Example 8 A plane passes through the points (1, 2, 3)T, (3, 1, 2)T and (2, 3, 1)T. Suppose that we wish to find its distance from the origin. Let the plane have equation ux + y +wz = c. Then if (u, , w)T is a unit vector, the distance of the plane from the origin is |c|. Substituting the components of the points in the equation i.e. Hence We next impose the further condition that (u, , w)T be a unit vector – i.e. u2 + 2+ w2 = 1. Then Hence the required distance is The angle between two planes is the angle between their normals. So the angle between the planes is the angle θ between the vectors (2, −1, 1)T and (1, 5, 3)T. We can find θ by calculating the scalar product of these vectors: Hence θ = and the planes are orthogonal. If a vector is orthogonal to the normal to a plane, we say that it is parallel to the plane. If two planes have the same normal, they are parallel. Parametric equation of a plane This is a natural extension of the parametric equation of a line. Let u and v be nonzero, nonparallel vectors. As s and t vary over all real values, the figure on the left below indicates that the vector su + tv varies over all points of the plane containing the directions of both u and v. If ξ is a fixed point of the plane and if x represents a general point of the plane, then the figure on the right below shows that the vector x − ξ has the form su + tv. Hence the equation represents all points on the plane through the point ξ containing both of the vectors u and v. 3 The vector product in Moving all terms of the above equation to the left hand side and writing in terms of the components, we obtain This may be written as the matrix equation As the column matrix is nonzero, by Exercise 1.2.15, we have This gives the equation of the plane as It follows that the vector normal to the plane, which is orthogonal to both u and v is This vector, called u × v is defined to be the vector product of the vectors u and v 3. Equation (1) of the plane then takes the form The vector product is also known as the cross product. A simpler formula for the vector product can be given in terms of the vectors Then in the notation of determinants Since the entries of a determinant should be scalars, not vectors, the expression is merely a mnemonic to simplify the calculation of the vector product. It is easily checked that Both u×v and v×u are orthogonal to u and v, but it can be checked that the direction of u × v is in accordance with the right hand rule, while that of v × u is the opposite direction. We next prove that This establishes the result. The figure on the right shows that the area of the parallelogram for which the vectors u and v are adjacent sides is u v sin θ where θ is the angle between u and v. Hence The area of the parallelogram whose adjacent sides are the vectors u and v is u × v. If we write u = (u1, u2, 0)T and v = ( 1, 2, 0)T then Hence the length of u × v is the modulus of the determinant: This gives us a useful geometric interpretation of a determinant: The area of the parallelogram whose adjacent sides are the column vectors of a 2 × 2 matrix P is equal to | det P|. E.g. the area of the parallelogram on the right is As a consequence of the above results, we can observe the relation between the areas A and B in the diagram below: A = cd and B = cu × dv = u × v cd = | det P|cd. Hence The result also extends to the 3 × 3 case – the volume of a parallelepiped whose adjacent sides are the column vectors of a 3 × 3 matrix P is | det P|. Example 9 The vector product can be used to derive the cartesian or vector equations of a plane from the parametric form. In fact, when the equation of a plane is known in any form, the other forms of the equation can easily be derived from it, as illustrated by this example. Suppose we wish to find a cartesian equation of the plane with parametric equation The normal to the plane is (2, 1, −2)T × (1, −2, 2)T, which can be calculated to be (2, 6, 5)T. As the point (1, 1, 0)T lies on the plane, a vector equation for it is As (2, 6, 5)T, ( 1, 1, 0)T = 8, the cartesian equation is On the other hand, to find a parametric equation of the plane with cartesian equation let x = s, y = t so that z = 2s + 3t − 4. We can write the parametric form The vector (2, 3, −1)T is normal to the plane and (0, 0, −4)T lies on it and so a vector equation is 1.8 Exercises 1 Draw the line in 2 and write down any two other vector equations of the line. Note that there is an unlimited number of vector equations of the line. 2. Find a vector equation of the line through the points (2, 2)T and (−1, 5)T. Prove that the point (6, −2)T lies on the line by finding a value of t that gives the point. Draw the line and use your drawing to verify that your value of t gives the point (6, −2)T. 3 Find the unit vector u in the direction of the vector (−3, 4)T. Verify that the points (4, −2) T and lie on the line For each of these points, find its distance from the point (1, 2)T and verify that this is equal to the modulus of the value of the parameter t which gives this point in the above equation. 4 Find a vector in 2 which is orthogonal to the vector (1, 2)T. Hence write down the equation of a general line in 2 which is orthogonal to the vector (1, 2)T. 5. Find a vector in 2 in the direction of the line 2x + 7y + 3 = 0. (Consider its slope.) * Find two unit vectors parallel to the line and also, two unit vectors perpendicular to it. 6. (i) Find the equations of a line through the points (−2, 4)T and (0, 2)T, and the line through (0, 2)T and the point (5, −3)T. What can you conclude from these equations? (ii) Find the equation of the line through the point (3, 2)T in the direction of the vector (2, −2)T. How is it related to the line through (−2, 4)T and (0, 2)T? (iii) Find the equation of the line through (5, −3)T in the direction of the vector (1, 1)T. How is it related to the line through (−2, 4)T and (0, 2)T? 7. Which of the following vectors in 3 are unit vectors? Are any pair of these vectors orthogonal? (i) (ii) (iii) 8 Using the components of the vectors u = (1, 0, −1)T and v = (2, −1, 3) T write the scalar product u, v as the matrix product uTv. State the order of this matrix product and deduce the relationship between uTv and vTu. Find the matrix product uvT. 9. Find nonzero vectors u, v, w in 3 such that 10 Find nonzero vectors u, v, w in 3 such that 11. Write down equations for the line through the points (1, 2, 1)T and (2, 1, 2)T. 12. A line in three dimensional space is defined by the equations Find two unit vectors parallel to this line. 13. Which of the following sets of points in 3 are collinear, that is, lie on a line? (i)* (2, 1, 4)T, (4, 4, −1)T, (6, 7, −6)T (ii) (1, 2, 3)T, (−4, 2, 1)T, (1, 1, 2)T 14 Let l1 be the line with equation (x, y, z)T = (1, −1, 2)T +t (1, 2, 3)T, l2 the line through (5, 7, 4)T and (8, 13, 3)T and l3 the line through (1, 17, 6)T parallel to the vector (7, −4, −3)T. Show that (i) l1 and l3 are skew, (ii) l1 and l2 intersect, (iii) l2 and l3 intersect. Find the points of intersection. Determine whether each pair of intersecting lines is orthogonal. If not, find the angle between them. 15. For each of the following lines in 3 write down its direction and a point through which it passes: (i) (ii) 16. Write down a vector which is normal to the plane x + 2y + 3z = 6. Write down any point which lies on this plane. 17 Which of the planes x + 4y − 3z = 5 and 3x − y + 4z = 6 lies further from the origin? 18. Find the equation of the plane which (i) passes through (3, 1, 2)T and is parallel to the vectors (1, 1, 1)T and (1, −1, 1)T, (ii) passes through (3, 1, 2)T and (1, 2, 3)T and is parallel to the vector (1, 1, 1)T. 19 Find if the following planes are parallel, perpendicular or neither. If neither, find the angle between them. (i) 4x + 3y − z = 0 2z = 8x + 6y + 1 (ii) x+y=1x+z=1 (iii) 5x − y + 7z = 0 −3x + 13y + 4z = 0 20. Show that the line x = (0, 1, −1)T + t (1, −1, 6)T is (i) parallel to the plane −5x + 7y + 2z = 3 and (ii) perpendicular to the plane 3x + 18z = 3y − 2. Find its point of intersection with this second plane. 21. Find the point of intersection of the line and the plane 22 Find the cartesian equation of the plane through (1, 0, 6)T and (−2, 3, −1) T which is orthogonal to the plane 23. Write down the equation of a plane which passes through (1, 2, 1)T and which is normal to (2, 1, 2)T. What is the distance of this plane from (i) the origin, and (ii) the point (1, 2, 3)T? 24 Write down the cartesian equations for the three coordinate planes: the x y plane, the yz plane and the zx plane. Write down the general form for (i) a horizontal vector and (ii) a vertical vector in 3. Write down the general form for the cartesian equation of (iii) a horizontal plane and (iv) a vertical plane. 25. Find vector and parametric equations for the plane with cartesian equation 26. Find cartesian and parametric equations for the plane with vector equation 27. A plane passes through (1, 1, 2)T, (1, 2, 1)T, (2, 1, 1)T. What is its distance from (0, 0, 0)T? 28 (i) The scalar product of vectors u and v × w is known as the scalar triple product of the vectors u, v and w in 3. Show that it may be written in terms of their components as Hence show that the vectors u, v and w are coplanar, that is, they lie in the same plane, if and only if this determinant is zero. (ii) Find the constant if the vectors (3, −1, 2)T, (t, 5, 1)T and (−2, 3, 1)T are coplanar. 29 (i) Use the vector product to find the cartesian equation of the plane which contains the three points (0, 4, −1)T, (5, −1, 3)T and (2, −2, 1)T. You will first need to find two vectors which lie along the plane. (ii) Alternatively, let the equation of the plane be Substitute the values of x, y and z for each of the three points lying on the plane. Now write the system of equations that you have obtained in the matrix form Ab = 0 where A is a 4 × 4 matrix and b and 0 are four dimensional vectors. Use Exercise 1.2.15 to show that the equation of the plane can be given in determinant form by Do not evaluate the determinant. 1.9 Vectors in n It is not possible to visualise vectors of dimension n when n > 3, but most of the theory of vectors can be extended to n dimensions. An n dimensional vector v with components 1, 2,..., n n is an n × 1 column matrix As usual, the definitions of matrix addition and scalar multiplication given in §1.1 apply. Thus The length (or norm) of an n dimensional vector v is defined by and the distance between two points v and w is defined by As before, we define the scalar vTw to be the scalar product of v and w, written The usual properties of the scalar product hold. (i) v, v = v 2 (ii) v, w = w, v (iii) αu + βv, w = α u, w + β v, w. Two vectors v and w are orthogonal (or normal) if and only if Lines Similarly, the equation in which ξ and v are n dimensional vectors with v ≠ 0 and t is a scalar represents a line in n, with x and ξ determining points and v determining a direction. Writing the equation in full or which is an alternative form for the equations of a line through (ξ1, ξ2,..., ξn)T in the direction ( 1, T 2,..., n ) when none of the components of v are zero. A line passing through two distinct points ξ and η has the parametric form E.g. the line passing through points (7, 0, −2, 1)T and (3, 1, 0, −1)T 4 has the form or Hyperplanes If ξ and v are n dimensional vectors and v ≠ 0, the equation is the equation of a hyperplane through the point ξ with normal v. If we write c = v, ξ , the equation takes the form Writing the equation in terms of components, it becomes Thus a hyperplane is defined by one ‘linear’ equation, in which v = ( 1, T 2,..., n) is the normal to the hyperplane and, if v is a unit vector, |c| is the distance from 0 to the hyperplane. 1.10 Flats Let A denote an m × n matrix and let c be an m dimensional vector. Then the set of n dimensional vectors x which satisfy the equation is called a flat. If the equation Ax = c is written out in full, it becomes which is the same as the system of equations A flat is therefore the set of points satisfying m ‘linear’ equations. In vector notation, if ak = (ak1, ak2,... , akn)T ≠ 0, the kth linear equation above is then the equation of the hyperplane ak , x = ck. Our system of linear equations becomes This shows that a flat consists of the points x which are common to m hyperplanes. We illustrate some possibilities below: (i) m = 1, n = 3. In this case A is a 1 × 3 matrix and the flat reduces to the single equation The flat is therefore, in general, a plane. (ii) m = 2, n = 3. In this case A is a 2 × 3 matrix and the flat reduces to the pair of equations The flat is therefore, in general, the intersection of two planes and hence is a line. E.g. the flat which is pictured above, is a line. To find the equation of the line, let z = t. Solve the equations and obtain y = 1 − t, x = 2. Hence the equation of the line may be written as showing that it passes through the point (2, 1, 0)T in the direction (0, −1, 1)T. (iii) m = 3, n = 3. In this case, A is a 3 × 3 matrix and the flat Ax = c reduces to the system of equations The flat is therefore, in general, the intersection of three planes and hence is a point. E.g. the flat which is pictured above can be written in the matrix form As the matrix is invertible the system has the unique solution That is, the intersection of the three planes is the point. Observe that we have been careful to insert the words ‘in general’ in (i), (ii) and (iii) in order to signal that the result may fail in degenerate cases. In (ii), for example, things degenerate if the two planes are parallel and hence the flat will contain no points at all unless the two planes happen to be identical, in which case the flat will be this plane. It is worth noting that when m = n, the matrix A is invertible if and only if the flat Ax = c consists of the single point A−1c. 1.11 Exercises 1. Prove that the point (−1, −1, 5, 0)T in 4 lies on the line where u is a unit vector in the direction of the vector (2, −1, 0, −2)T. Find the distance of this point from the point (−3, 0, 5, 2)T and check that it is equal to the modulus of the parameter t which gives the point in the equation of the line. 2. Find the equation of the line in 5 through the points (3, 0, −1, 2, 1)T and (−1, 0, 0, 2, 0)T. 3 In 4 let l1 be the line through the points (0, 1, 1, 1)T and (2, 0, −1, 0)T, l2 the line through the point (3, 1, 0, 1)T in the direction of the vector (−4, −1, 0, −1)T and l3 the line through the origin parallel to the vector (0, 1, 0, −1)T. For each pair of lines, find if the following are true: The lines (i) intersect, (ii) are parallel and (iii) have orthogonal directions. 4. Show that the line in 4 with equation (i) lies in the hyperplane with equation (ii) is orthogonal to the hyperplane with equation Find its point of intersection with the second hyperplane. 5. Which of the hyperplanes in 4, 2x1 − x2 − 2x4 = 3 and 3x3 = 5, is further from the origin? 6. Find vectors u, v, w in 4 for which but v ≠ w. 7. Write the scalar product in n as the matrix product uTv. How are uTv and vTu related? Find the matrix product uvT. 8. The flat defined by is a line. Find the parametric equation of this line. 9. Show that the flat defined by is a point. Find the components of this point. 10 Show that the flats are, respectively, a plane and the empty set. 11. A flat is defined by Prove that the matrix is not invertible and show that the flat is a line. Find the equation of this line. 12. Write the following systems of equations in the matrix form Ax = c, where A is a 3 × 3 matrix and x and c are 3 × 1 column vectors. Find if A is invertible, in which case find the unique point which is the flat. If A is not invertible, solve the equations by eliminating one of the variables. If a consistent solution can be found, give a formula for the flat stating its form. If the equations are inconsistent, the flat is the empty set. 13. Find if each of the following flats consists of a single point, in which case, find the point. 1.12 Applications (optional) 1.12.1 Commodity bundles An important use for vectors is to represent commodity bundles. The owner of a small bakery who visits a supermarket and purchases the items on the shopping list can be described by saying that he has acquired the commodity bundle Of course, one needs to know in considering y what kind of good each component represents and in what units this good is measured. 1.12.2 Linear production models Suppose that x is an n × 1 column vector, y is an m × 1 column vector and A is an m × n matrix. We can then use the equation as a model for a simple production process. In this process, y represents the commodity bundle of raw materials (the input) required to produce the commodity bundle x of finished goods (the output). Such a production process is said to be linear. For example, suppose that the baker of §1.12.1 is interested in producing x 1 kg of bread and x2 kg of cake. His output vector is therefore to be x = (x1, x2)T. In deciding how much of each type of ingredient will be required, he will need to consider his production process. If this is linear, his input vector y = (y1, y2, y3, y4)T of required ingredients might, for example, be given by 1.12.3 Price vectors The components of a price vector list the prices at which the corresponding commodities can be bought or sold. Given the price vector p, the value of a commodity bundle x is This is the amount for which the commodity bundle may be bought or sold. If p is a fixed price vector and c is a given quantity of money, the commodity vectors x which lie on the hyperplane are those whose purchase costs precisely c. Alternatively, it may be that x is a fixed commodity bundle and that c is a given amount of money. In this case, the points p which lie on the hyperplane are the price vectors which ensure that the sale of x will realise an amount c. These two ways of looking at things are said to be dual. A buyer will be more interested in the former and a seller in the latter. 1.12.4 Linear programming Suppose that, having acquired the commodity bundle b of ingredients, the baker of §1.12.1 decides to sell the result x of his baking. If the relevant price vector is p, then the revenue he will acquire from the sale of x is His problem is to choose x so as to maximise this revenue. But he cannot choose x freely. In particular, he cannot bake more than his stock b of ingredients allows. To bake x, he requires y = Ax. His choice of x is therefore restricted by the constraint Note that c ≤ d means that c1 ≤ d1, c2 ≤ d2, c3 ≤ d3, etc. Also, he cannot choose to bake negative quantities of bread or cakes. We therefore have the additional constraint The baker’s problem is therefore to find where x is subject to the constraints Such a problem is called a linear programming problem. Observe how the use of matrix notation allows us to state the problem neatly and concisely. Written out in full, the problem takes the form: find where x1, x2, x3,... , xn are subject to the constraints and x1 ≥ 0, x2 ≥ 0,... , xn ≥ 0. 1.12.5 Dual problem A factory producing baked goods cannot compete with small bakers in respect of the quality of its products. Instead, it proposes to buy up our baker’s stock of ingredients. What price vector n should it offer him? If the baker bakes and sells x, he will receive pTx. If instead he sells the ingredients y = Ax at q he will receive qTy = qT Ax = (ATq)Tx (see Exercise 1.2.7). Selling the ingredients at q is therefore the same as selling the results of his baking at prices ATq. For these prices to be more attractive than the market prices, we require that We also, of course, require that Assuming that the factory wishes to acquire the baker’s stock b at minimum cost, it therefore has to find where q is subject to the constraints This linear programming problem is said to be dual to that of §1.12.4. It is more or less obvious that the minimum cost at which the factory can buy up the baker’s ingredients is equal to the maximum revenue he can acquire by baking the ingredients and selling the results. This fact is the important duality theorem of linear programming. The prices q which solve the dual problem are called the shadow prices of the ingredients. These are the prices at which it is sensible to value the stock b given that an amount x = Ay of finished goods can be obtained costlessly from an amount y of stock and sold at prices p. 1.12.6 Game theory In modern times, the economic theory of imperfect competition has become a branch of game theory. The competing firms are modelled as players who choose strategies in a game. The simplest kind of game is a two person, zero sum game. It is called zero sum because whatever one player wins, the other loses. Although zero sum games are seldom realistic models of real economic situations, it is instructive to look at how von Neumann proposed that they should be solved. An m × n matrix A can be used as the payoff matrix in a zero sum game. We interpret the m rows of A as possible strategies for the first player and the n columns of A as possible strategies for the second player. The choice of a row and column by the two players determines an entry of the matrix. The first player then wins a payoff equal to this entry and the second player loses an equal amount. The game described above is called zero sum because the sum of the payoffs to the two players is always zero. For such games we are interested in what happens when the players simultaneously act in such a way that each player’s action is a ‘best reply’ to the action of the other. Neither player will then have cause to regret his choice of action. It turns out that the players should consider using ‘mixed strategies’ – i.e. instead of simply choosing a strategy, they should assign probabilities to each of their strategies and leave chance to make the final decision. This has the advantage that the opponent cannot then possibly predict what the final choice will be. We use an m ×1 vector p = ( p1, p2,... , pm)T to represent a mixed strategy for the first player. The kth component pk represents the probability with which the kth row is to be chosen. Similarly, an n × 1 vector q = (q1, q2,... , qn)T represents a mixed strategy for the second player. If the two players independently choose the mixed strategies p and q, then the expected gain to the first player (and hence the expected loss to the second player) is We illustrate this with the 2 × 2 payoff matrix considered above. The expected gain to the first player is calculated by multiplying each payoff by the probability with which it occurs and then adding the results. The expected gain to the first player is therefore We assume that the first player chooses p in an attempt to maximise this quantity and the second player chooses q in an attempt to minimise it. If each of the mixed strategies and is a best reply to the other, then and A pair of mixed strategies and with these properties is called a Nash equilibrium of the game. Von Neumann proved that every matrix A has such a Nash equilibrium and that it is sensible to regard this as the solution of the game. In the case of the 2 × 2 matrix given above, the first player should choose the mixed strategy (0.3, 0.7)T and the second player should choose the mixed strategy (0.6, 0.4)T. It is instructive to check that each of these is a best reply to the other. The resulting expected gain to the first player is −0.2 – i.e. this is a good game for the second player! † See Chapter 13. † Note that an identity matrix must be square. Just as a zero matrix behaves like the number 0, so an identity matrix behaves like the number 1. ‡There is some risk of confusing this notation with the modulus or absolute value of a real number. In fact, the determinant of a square matrix may be positive or negative. † An invertible matrix is also called nonsingular. ‡ Alternative notations for the transpose are A′ and At. † Hence it makes sense to write the point as ( 1, 2)T rather than the more usual ( 1, 2). † The scalar product is also known as the inner product or the dot product, which is written as v · w. ‡ See Exercise 1.4.4. Two Functions of one variable This chapter introduces the notion of a function and the techniques of differentiation required for the analysis of functions of one variable. Standard functions are built up from elementary functions and their properties explored. This material is vital for nearly everything which follows in this book. Although some of it is revision, there is much that may be new, like the account of inverse functions, the use of Taylor’s theorem and conic sections. Throughout the chapter, foundations are laid in the one variable case for the investigation of functions of two or more variables that follows in subsequent chapters. 2.1 Intervals The set of real numbers is denoted by. A subset I ⊆ is called an interval if, whenever it contains two real numbers, it also contains all the real numbers between them. An interval can be represented geometrically by a line segment. The interval consisting of all nonnegative real numbers is denoted by +. The geometrical representation and the notation used to describe typical intervals, are illustrated below: In these examples, a and b are end points, whether or not they belong to the interval they help to determine. Points of an interval that are not end points are called interior points. Remember that ∞ and −∞ can be neither end points nor interior points, because they are not real numbers. They are simply a notational convenience. 2.2 Real valued functions of one real variable A function f : X → Y is a rule which assigns to each x in the set X a unique y in the set Y. We denote the element in Y assigned to the element x in X by For obvious reasons x is known as the independent variable and y as the dependent variable. The set X is called the domain of the function f. The set Y is called its codomain. When the domain and codomain of a function are both sets of real numbers, the function is said to be a real valued (or scalar valued) function of one (real) variable. It is vital to appreciate that the independent variable x in the formula f (x) is a ‘dummy variable’ in the sense that any variable symbol can be used in place of it. A function can be illustrated by a graph in 2 as shown in the margin. In the figures below, (a) shows the case when X = Y =. The graph in (b) shows a function f : (−∞, b] → , whose domain is the set of all real numbers x satisfying x ≤ b. Its domain is not because f (x) is undefined when x > b. The graph in (c) does not represent a function since it does not give a unique y for each x. 2.3 Some elementary functions We start by examining three elementary types of function which, between them, give rise to the functions commonly studied in calculus. 2.3.1 Power functions Let be the set of positive integers, namely, 1, 2, 3, 4,.... The function f : → defined by has a continuous graph that looks like the following figures for n odd and even: 2.3.2 Exponential functions Let be the set of rational numbers or fractions, e.g. 1/2, 3, −7/5. Given a real number a > 0, we can plot a graph of y = ar for each r. For irrational numbers (e.g. π, e), we assign values to ax in such a way as to ‘fill up the holes’ in the graph of y = ar. The function f : → (0, ∞) then defined by It is worth recognizing that ax has a constant base and a variable exponent whereas xn has a variable base and a constant exponent. has a graph which is continuous for all x and is always increasing. 2.3.3 Trigonometric functions In what follows, the angle x is always to be understood as measured in radians, where one radian is indicated by the figure on the right. Because the total circumference of a circle of radius 1 is 2π, The sine and cosine functions are shown for x satisfying 0 ≤ x ≤ 2π, in the following circles of radius one. The top right diagram shows cos x and sin x for As usual, The other diagrams show how one can always refer back to this diagram when faced with values of cos x and sin x with x outside the interval For example, Since there are 2π radians in a full circle, the sine and cosine functions are periodic with period 2π (i.e. sin(x +2π) = sin x and cos(x +2π) = cos x) with the following graphs: Observe from the graphs that sin x = cos(x − ), and from Pythagoras’ theorem that cos2 x + sin2 x = 1. 2.4 Combinations of functions Functions may be combined in a variety of ways to generate other functions. The sum of functions is a function. So are the product of a function and a scalar and the product of two functions. Again, the quotient of functions is a function which is defined where the denominator is nonzero. Functions can also be composed as ‘functions of functions’ to produce new functions – e.g. f (g(x)) = sin(x2). The simplest type of combined or nonelementary function is a polynomial which is a combination of power functions of the form where the ai are real constants. If an ≠ 0, the polynomial is said to be of degree n and has n roots ξ1, ξ2,..., ξn, which means that it may be written as Some of the roots may be repeated, that is, they may not all be different. Also, some of the roots may be complex numbers, in which case they occur in ‘conjugate’ pairs. † Alternatively, Pn(x) may be expressed as the product of real linear and quadratic factors in x. E.g. has degree 4 and roots 1 (repeated), i and −i. Linear and affine functions are polynomial functions of degree one, which are of special importance. If m , the function L: → defined by is said to be linear. Its graph y = mx is a nonvertical line passing through the origin. If m and c , the function A: → defined by is said to be affine. Its graph y = mx +c is a nonvertical line. A common mistake is to confuse an affine function with a linear function because both have straight line graphs. But the graph of a linear function must pass through the origin. A rational function R is the quotient of two polynomials – i.e. where P and Q are polynomials. If P and Q have no common factor, the function R is not defined at the roots of Q, which must be excluded from the domain of R. Note that in such a case R(x) tends to ∞ or −∞ as x tends to a root ξ of Q. We then say that the line x = ξ is a vertical asymptote of the graph of R(x). For instance, the function Note that the domains of the functions defined exclude points at which the denominators on the right are zero. has the domain , excluding the points 0 and 1. The lines x = 0 and 1 are vertical asymptotes of its graph. L e t denote the set of integers, namely, 0, ±1, ±2, ±3,.... The definitions for other trigonometric functions may be expressed in terms of the sine and cosine functions as below: The tangent function is the most important of these further functions. Its graph, which has vertical asymptotes is illustrated next: 2.5 Inverse functions Let X and Y be sets of real numbers, and suppose that f : X → Y is a function. This means that for each x X, there is a unique y Y such that y = f (x). If g: Y → X is another function and has the property that then we call g the inverse function to f. The notation used for g is f −1, but we shall see that there is not always a unique inverse function. Writing g = f −1 we have Observe that x = f −1(y) is what we obtain by solving the equation y = f (x) for x in terms of y. It must not be confused with the reciprocal function For example, if then Interchanging x and y, The reciprocal function In the general case, the equation y = f (x) may have no solutions at all, or else may have many solutions, as shown in the following diagram of a function. The equation y1 = f (x) has no solutions, while the equation y2 = f (x) has five solutions (namely x1, x2, x3, x4 and x5). But in order that there exists a function f −1: Y → X which is an inverse function of f : X → Y it is necessary that there is a unique x X for each y Y, such that x = f −1(y) or y = f (x). For such a function, we deduce that, for each x X and for each y Y , Let x1 and x2 be points of an interval I on which f is defined. (1) If f (x1) < f (x2) whenever x1 < x2, we say that f is increasing on I. (2) If f (x1) > f (x2) whenever x1 < x2, we say that f is decreasing on I. The following diagrams illustrate two functions, an increasing and a decreasing function f : I → J which admit inverse functions. Example 1 The function f : → defined by y = f (x) = x2 has no inverse function f −1: →. Observe that the equation has no (real) solution, while the equation has two solutions. In this example the equation y = f (x) fails to have a solution for x in terms of y or else has too many solutions. One can usually get round this problem by restricting the sets of values of x and y which one takes into account. This leads to the notion of a ‘local inverse’, which is discussed in detail in Chapter 7. 2.6 Inverses of the elementary functions 2.6.1 Root functions Consider the function f : [0, ∞) → [0, ∞) defined by Note that we are here deliberately excluding consideration of negative real numbers and restricting attention to nonnegative values of x and y. For each y ≥ 0, the equation y = xn has a unique solution x ≥ 0 and hence the function f : [0, ∞) → [0, ∞) has an inverse function f −1: [0, ∞) → [0, ∞) at any x > 0. We use the notation Thus, if x ≥ 0 and y ≥ 0, It is instructive to observe that the graph of x = y1/n is just the same as the graph of y = xn but looked at from a different viewpoint. The graph of x = y1/n is what you would see if the graph of y = xn were drawn on a piece of glass and you viewed it sideways from behind. Let us now make x the independent variable in the inverse function so that y = x1/n. Then the graphs of the function and its inverse can both be drawn on the same diagram. Now, an interchange of x and y is equivalent to a reflection in the line y = x. It is easy to see this from simple geometry, since (y, x)T is the reflection of (x, y)T in this line, as shown in the figure on the left below. Hence the curve y = x1/n can be obtained by reflecting the curve y = xn in the line y = x, as shown in the figure on the right below. The same principles, of course, apply in respect of the graph of any inverse function. If m and x ≥ 0, we define xm/n by In particular, we obtain 2.6.2 Exponential and logarithmic functions For the function f : → (0, ∞) defined by we may deduce the existence of an inverse function f −1: (0, ∞) →. This inverse function is used to define the logarithm to base a. We have Note that loga y is defined only for positive values of y because ax takes only positive values. Also loga ax = x and aloga x = x. Let y1 = ax1 and y2 = ax2. Then The number e 2.7183 has the special property that the slope of the tangent to y = ex when x = 0 is equal to one. The function f : → (0, ∞) defined by y = f (x) = ex is called the exponential function and often written as ‘exp(x)’. Its inverse function f −1: (0, ∞) → is the logarithm to base e. We call this function the natural logarithm and write loge y = ln y. Finally, since ex ranges over all positive values, all powers of positive numbers can be expressed in terms of the exponential and natural logarithm functions. We have 2.7 Derivatives The concept of the derivative is one of the most fundamental in calculus. The derivative is concerned with the responses of dependent variables to changes in independent variables. We begin by revising the idea of the derivative of a real valued function of one real variable. Denote the derivative of the function f evaluated at the point ξ by f′(ξ). We have when this limit exists as a finite real number. In this case we say that f has a derivative or is differentiable at ξ. We will also use the notation D f (x) for the derivative of f at a general point x. Note that D f is itself a function defined by Geometrically, f′(ξ) is the slope or gradient of the tangent line to the graph y = f (x) at the point (ξ, f (ξ))T. Since the tangent line has slope f′(ξ) and passes through the point (ξ, f (ξ))T, it has equation A function is differentiable at ξ if it has a nonvertical tangent line at ξ. It is differentiable in an interval if it is differentiable at every point of the interval. It is easily seen that (1) f is increasing on I if f′(x) > 0. (2) f is decreasing on I if f′(x) < 0. As x increases from ξ to ξ + h the expression is the average rate of increase of f (x) with respect to x. As h tends to 0, we obtain f′(ξ) as the instantaneous rate of increase of f (x) with respect to x at the point ξ. Economists call f′(ξ) the marginal rate of increase. 2.8 Existence of derivatives Commonly occurring instances when the derivative fails to exist are shown below. (1) Write h → 0+ for h tends to 0 from the right (through positive values) and h → 0− for h tends to 0 from the left (through negative values). The above limit exists only if the derivative from the right is equal to the derivative from the left: For example, this is not the case for the function f (x) = |x| when x = 0, since Hence f (x) = |x| is not differentiable when x = 0. (2) The limit does not exist for functions which are discontinuous at ξ, like the function In this case, the derivative from the right does not exist since as h → 0+ (3) Lastly, if the function is continuous and the limits from the left and the right do not exist at ξ, the function has a vertical tangent line and is not differentiable at ξ. For example, the function has a vertical tangent line and is not differentiable when x = 0. 2.9 Derivatives of inverse functions The diagrams below illustrate a differentiable function f : I → J which has a differentiable inverse function f −1: J → I. We know that the second diagram is really the same as the first but looked at from a different viewpoint. In particular, the lines y − η = l(x − ξ) and x − ξ = m(y − η) are really the same lines. Thus m = l−1. But l = f′(ξ) and m = f −1 ′(η). This gives the formula This clearly indicates that special thought has to be given to what happens when the derivative of f is zero. 2.10 Calculation of derivatives 2.10.1 Derivatives of elementary functions and their inverses Derivatives of power functions We start by calculating the derivative of the function f (x) = x2. By the definition of the derivative Hence, if y = x2, then = 2x. Using expansions by the binomial theorem for integral powers, the derivatives of xn where n may be calculated by the same method, yielding the result Derivatives of root functions If x = y1/n, then y = xn. Hence Derivatives of exponential and logarithmic functions Using the definition of the derivative But the second limit is the slope of the exponential function at x = 0 and e was chosen to make this equal to one. So ex is its own derivative, i.e. Recall that y = ex if and only if x = ln y. It follows that That is Derivatives of the sine and cosine functions A justification for these formulas may be based on the diagram below: From the larger right angled triangle, we have x = cos θ and y = sin θ. For small incremental angle δθ the arc of length δθ approximates to the hypotenuse of the smaller right angled ‘triangle’, which is tangential to the radius in the limit as δθ → 0. Then, the vertical angle is θ in the smaller ‘triangle’, and so we have Taking limits as δθ → 0 we obtain 2.10.2 Derivatives of combinations of functions There are rules for finding the derivatives of combinations of functions which are listed on the following page and are worth memorising. Let f and g be functions of x and α and β be constants. We can use the chain rule to obtain the derivative of any rational power of x: If z = ym/n and t = ym where m. Then z = t1/n and so This justifies the formula Using the formula for differentiating a quotient, we obtain Example 2 (i) To find let y = x2 + 2x + 1 and z = cos y. Then Hence by the chain rule (ii) Let y = f (x) and z = ln y. Then Hence by the chain rule Example 3 An economics application The rate of growth of a function y = f (t) in economics is defined by It tells us how fast a function increases in percentage terms. See Example 2(ii). For example, suppose that national investment I (t) increases by 0.5% per year and population P(t) by 1.25% per year. Then the investment per head of population (the per capita investment) is On taking logarithms, Hence, the rate of growth of per capita investment PCI(t) is We can conclude that per capita investment falls by 0.75% per year. Example 4 Another economics application A revenue function R(x) of output x = x(l), itself a function of labour input l, is indirectly a function of l. If an employer in a small cottage industry, producing a small number of goods, wishes to know if it is worth increasing labour from 10 to 11 (measured by the number of working hours per day), he can estimate the increase in revenue measured in dollars by calculating dR/dl. The chain rule states that This expresses, in mathematical terms, the statement in economics: If the production function has a Cobb–Douglas form and the revenue function R(x) = 22x − x2, then We evaluate the marginal revenue when l = 10. Hence, increasing the number of working hours from 10 to 11 increases revenue by only $6.† This is not worth it for the employer since wages would be more than $6 an hour. Often, as in the above instance, the derivative may be calculated directly from the composite function, R(x(l)), but this is by no means always the case. 2.11 Exercises 1. Calculate f′(1) using the definition of the derivative when 2. Differentiate the following expressions: 3. (i) Find in the following cases: Explain the relationship between the results for (c) and (d). (ii) Find the equation of the tangent line at the point (1, 0)T to the curves given by (a)*, (b) and (c). 4 Obtain the following results where a and b are constants: 5 Money obtained ($y) is related to amount sold (x kg) by What is the marginal price per kg when the amount sold is x = 2? 6 Differentiate the following expressions: 7 Differentiate the following expressions: 8 Differentiate the following expressions: 9. Find the equations of the tangent lines to the curve which are parallel to the line 4x − y = 7. 10 Show that, for each real number t the interval (0, 1], the curve given in Exercise 3(i)(a) by has a tangent line with slope t. Find the points on the curve at which the tangent line has slope 2/3. 11 Given that there are distinct points on the curve which have a common tangent, find two such tangents, their equations and points of contact. Are there any more tangents with this property? 12 Find the number of tangent lines to the curve which pass through the point (−1, 9)T. Find also the points of contact of these tangent lines with the curve. 13 Given a function f (x) in economics, its average function, A f (x), is defined Prove the following results relating the average and marginal functions, A f (x) and M f (x) respectively: 14 The price of a type of computer scanner is dropping by 15% each year. The quantity of scanners sold is increasing by 25% each year. Find the annual rate of growth of the revenue derived from this type of scanner.† 15. For each of the following cost functions, find the largest possible domain and the marginal cost function. Compare the marginal cost at the production level of 100 units with the extra cost of producing one more unit. 2.12 Higher order derivatives The second derivative of a function f : → at the point ξ is simply the derivative of the function f′ evaluated at the point ξ. It is denoted by f′′(ξ) or by D2 f (ξ). If y = f (x) we also use the notation The second derivative can be interpreted geometrically as the slope of the tangent to the curve y = f′(x) at the point x = ξ. For instance, the second derivative is the slope of a marginal function in economics. Higher order derivatives can be defined in a similar way. Like the first derivative, the derivative of any order is a function of x. We denote the nth order derivative of f : → at the point ξ by f (n)(ξ) or by Dn f (ξ). Alternatively, we may write indicating that is the derivative or rate of change of with respect to x. Example 5 Let y = ln(1 + x). Then and, in general, We will restrict our study to ‘well behaved’ functions, whose derivatives of all orders exist, and begin by approximating a general function of this type by one of the simplest kind – a polynomial function. 2.13 Taylor series for functions of one variable As polynomials are simple functions, which are easily known, in order to study the behaviour of a function near a point, it is useful to find a polynomial approximation of it near the point. If we are investigating the function at points x near a point x = ξ, we consider a polynomial in powers of x − ξ. A useful approximation is the Taylor polynomial for a function f : → near a point ξ given by The polynomial closely resembles f (x) near ξ, since not only does it have the same value as f (x) at ξ, but also its first derivative and all higher order derivatives up to order n match those of f (x) at ξ. To see this differentiate both sides of the equation: At x = ξ both sides of the equation are equal to f′(ξ). Differentiating a second time At x = ξ both sides of the equation are equal to f′′(ξ). Similarly When the function has derivatives of all orders at x = ξ we define the infinite Taylor series about the point x = ξ: This formula is valid for some set of values of x including the point ξ. When the series does not converge to f (x), this set contains only the point ξ (in which case the formula is pretty useless). In exceptional cases, it is even possible for the series to converge to a function other than f (x). On other occasions the formula is valid for all values of x. Usually, however, the formula is valid for values of x close to ξ and invalid for values of x not close to ξ. The question of convergence will be briefly discussed in §10.11. Example 6 We shall find the Taylor polynomials for the function sin x about the point x = 0. Let f (x) = sin x. Then As we have arrived back at the function sin x, the pattern of values for the derivatives will repeat itself. Since the derivatives of even order are zero at the point x = 0, only derivatives of odd order, and so only odd powers, will appear in the Taylor polynomials. Hence the polynomials of even power degenerate to the polynomial of the previous odd power. In general, the Taylor polynomial for sin x about x = 0 is The figure below shows the graph of sin x together with its approximations P1(x), P3(x) and P5(x) at x = 0. The polynomial P1(x) is the tangent at the point. The graphs show that the approximations deteriorate as we move away from the point. Also, the approximation improves as the degree of the polynomial increases. The Taylor series for sin x is Example 7 Suppose that f (x) = ln(1 + x). As we know from Example 5 The Taylor series for ln(1 + x) about the point 0 is therefore The range of validity of the formula is −1 < x ≤ 1. You may find it instructive to calculate the sum of the first n terms of the expansion, using a calculator or a computer, when x = −1, x = 1 and x = 1.01. The Taylor series about ξ = 0 is also known as the Maclaurin series and takes the simpler form The following series can be verified as demonstrated in Examples 6 and 7: Example 8 For many purposes one only needs terms up to the second order in the Taylor expansion of a function. When the terms of third order and higher have been discarded, the polynomial P2(x) which remains is known as the quadratic Taylor approximation to f about ξ. The next figure illustrates how a quadratic approximation about a point indicates the shape of the graph near the point. It shows the graph of the function y = sin x together with the graphs of quadratic Taylor approximations P2(x) to the function about the points x = and also, their common tangents at these points. If f (x) = sin x then f′(x) = cos x and f′′(x) = − sin x. Hence near π/2, Similarly, near 3π/2 The negative quadratic term of the Taylor polynomial at π/2 shows that the graph is below the tangent line there. The positive quadratic term at 3π/2 shows that it is above the tangent line there. Again, the graphs indicate that the approximations deteriorate as we move away from the points. The quadratic Taylor approximation degenerates at π to the tangent approximation. There is no quadratic term near π since the graph crosses the tangent line there, and so cannot be approximated by a quadratic. Example 6 shows that this is also the case at x = 0. 2.14 Conic sections We have defined functions which are expressed explicitly in terms of an independent variable. Functions may also be defined implicitly by a relation between x and y, which may be solved to give y as one or more explicit functions of x. This will be studied in Chapter 8. For the moment, we conclude this chapter with a study of the simplest type of implicit relation, which is a second degree equation in x and y This always represents a conic section or conic, which is the intersection of a doublenapped cone and a plane, as the following illustrations show. We shall see that conics play a central role in our study of the behaviour of functions of one and two variables. This is due to the crucial part played by the quadratic term of the Taylor approximation to a function at a point, in determining the shape of the function near the point. The figures below show that when the plane does not pass through the vertex of the cone there are three basic nondegenerate conics: the parabola, the ellipse and the hyperbola together with the circle, which can be regarded as a special type of ellipse. When the plane passes through the vertex of the cone, the conic assumes one of the degenerate forms below: a point, a line or a pair of intersecting lines. The equation can be transformed into one of three ‘standard forms’, considered below, in the following way. First, a rotation of the coordinate axes will eliminate the ‘x y’ term. This procedure will be described in §6.1 using the methods of linear algebra. When neither coefficient of x2 or y2 is zero, a translation of the axes can then eliminate the linear terms in x and y to give one of the forms When one of these coefficients, say that of y2, is zero, the linear term in the other variable x can be eliminated by a translation to give the form These procedures will be described in Exercise 6.5.14. Each of the three basic conics may be characterised geometrically by the path of a point P which moves so that where F is a fixed point called the focus and d is a fixed line called the directrix, with Pd denoting the distance of P from d. The constant e is known as the eccentricity of the conic. By choosing the positions of the focus and the directrix conveniently, we can obtain the conics in standard position. (i) e = 1. Choose d to be the line y = −a and F to be the point (0, a). therefore This is the equation of a parabola. (ii) 0 < e < 1. Choose d to be the line x = and F to be the point (ae, 0). therefore Since e < 1, b2 = a2(1 − e2) > 0 so the equation may be written as This is the equation of an ellipse. By symmetry there is an alternative focus– directrix pair F′, d′, as shown in the figure. As e → 0 the ellipse tends to the circle Hence the circle may be thought of as a conic with ‘e = 0’. (iii) e > 1. Choosing d and F as in the previous case we o