Mathematics for Engineers and Scientists Alan Jeffrey Mathematics for Applications of Mathematics Series Editor: Alan Jeffrey Professor of Engineering Mathematics University of Newcastle-upon-Tyne William F. Ames Numerical Methods for Partial Differential Equations T. J. M. Boyd and J. J. Sanderson Plasma Dynamics C. D. Green Integral Equation Methods I. H. Hall Deformation of Solids Jeremy Hirschhorn Dynamics of Machinery Alan Jeffrey Mathematics for Engineers and Scientists Brian Porter Synthesis of Dynamical Systems Engineers and Scientists Alan Jeffrey University of Newcastle-upon-Tyne NELSON £ Thomas Nelson and Sons Ltd 36 Park Street London W1Y 4DE Nelson (Africa) Ltd PO Box 18123 Nairobi Kenya Thomas Nelson (Australia) Ltd 171-175 Bank Street South Melbourne Victoria 3205 Thomas Nelson and Sons (Canada) Ltd 81 Curlew Drive Don Mills Ontario Thomas Nelson (Nigeria) Ltd PO Box 336 Apapa Lagos First published in Great Britain by Thomas Nelson and Sons Ltd., 1969 Reprinted with amendments 1971 Reprinted 1973 Copyright © Alan Jeffrey 1969, 1971 \11 Rights Reserved. No part of this publication may be reproduced, ored in a retrieval system, or/tfansmitted, in any form or by V means, electronic, mechanical, photocopying, recording or otherwise, out the prior permission of the publishers. . 761605 9 (Boards) 17 771604 5 (Paper) Reproduced and printed by photolithography and bound in Great Britain at The Pitman Press, Bath 5\0-^*° Preface This book has evolved from an introductory course in mathematics given to engineering students at the University of Newcastle-upon-Tyne during the last few years. It represents the author's attempt to offer the engineering student, and the science student who is not majoring in a mathematical aspect of his subject, a broad and modern account of those parts of mathe- matics that are finding increasingly important application in the everyday development of his subject. Although this book does not seek to teach any of the many physical disciplines to which its results and methods may be applied, it nevertheless makes free use of them for purposes of illustration whenever this seems to be helpful. Every effort has been made to integrate the various chapters into a description of mathematics as a single subject, and not as a collection of seemingly unrelated topics. Thus, for example, matrices are not only intro- duced in an algebraic context, but they are also related in other chapters to change of variables in partial differentiation and to the study of simultaneous differential equations. Modern notation and terminology have been used freely but, it is hoped, never to the point of becoming pedantic when a simple word or phrase seems more natural. Of necessity, much of the material in this book is standard, though the emphasis and manner of introduction and presentation frequently differs from that found elsewhere. This is deliberate, and is a reflection of the changing importance of mathematical topics in engineering and science to-day. In many introductory mathematics texts for engineering and science students no serious attempt is made to offer reasonable proofs of main results and, instead, attention is largely confined to their manipulation. Important though this aspect undoubtedly is, it is the author's belief that knowledge of the proof of a result is often as essential as its subsequent application, and that the modern student needs and merits both. With this thought in mind proofs of results have always been included, and, though they have been kept as simple as possible, no attempt has been made to conceal difficulty where it exists. Only very occasionally, when the proof of a result is lengthy, and its details are largely irrelevant to the subsequent development of the argument, has the treatment been shortened to a summary of the logical steps involved. Even then the interested reader can often find more relevant information amongst the specially selected problems at the end of each chapter. As implied by the previous remark, the many problems not only comprise those offering manipulative exercise, but also those shedding further light vi / PREFACE on topics only touched upon in the main text. No serious student can progress in his knowledge of this subject without a proper investment of time and effort spent working at a selection of these problems. The main text is provided with numerous illustrative examples designed to be helpful both when working through the text and when attempting the classified problems. It is hoped that their inclusion also makes the book suitable for private study. The wide range of material covered in this book represents rather more than would normally be contained in an introductory course of lectures. Whilst allowing for changing approaches in teaching, this fact also permits some flexibility in use of the material and at the same time offers further relevant reading to the ambitious student. In addition to the author's own experience of the application of mathematics in engineering and science, the choice and style of presentation of material has been influenced by two recently published documents: the Council of Engineering Institutions syllabuses in mathematics in Britain and the CUPM recommendations made by the Mathematical Association of America. It is the author's hope that this book complies fully with the former document and with the spirit of the latter insofar as its recommendations are applicable to engineering and science students. The material has all been class-tested and, as a result, has undergone considerable modification from its first appearance as lecture notes to the form of presentation adopted here. It is a pleasure to acknowledge the help of the publishers who have given me continued encouragement and every possible form of assistance throughout the entire period of preparation of the book. ,A. J. As a direct result of requests by users of the first printing of this book it was decided that a short chapter on Fourier Series should be added. The present revised imprint contains this new material and also incorporates a number of small corrections drawn to the author's attention by various kind readers. A. J. Contents 1 Introduction to Sets and Numbers 1 V} Sets and algebra / 1-2 Set theory and probability 9 1-3 itegers, rationals and arithmetic laws 21 14 Absolute value of a real number 28 15 Representation of numbers 29 w l-6 Mathemati- cal induction 3/ ^Problems 35 2 Variables, Functions, and Mappings 41 2-1 Variables and functions 41 2;2 ' Inverse functions 48 %J> 'Some special functions 54 2-4 Digression on mappings 58 2-5 Curves and parameters 61 2& Functions of several real variables 64 Problems 67 3 Sequences, Limits, and Continuity 73 31 Sequences 73 3-2 Limits of sequences 79 3-3 The number e 86 3-4 Limits of functions — continuity 89 3-5 Functions of several variables — limits, continuity 98 3-6 A useful connecting theorem 102 Problems 105 ,4 Complex Numbers and Vectors 115 41 Introductory ideas 115 4-2 Basic algebraic rules for complex numbers 118 4-3 Complex numbers as vectors 123 4-4 Modu- lus-argument form of complex numbers 128 4-5 Roots of complex numbers 132 4-6 Introduction to space vectors 134 4-7 Scalar and vector products 147 4-8 GeoVnetrical applications 157 ■ 4-9 Applications to mechanics 163 Problems 167 5 Differentiation of Functions of One or More Real Variables 178 5-1 The derivative 178 5-2 Rules of differentiation 189 5-3 Some important consequences of differentiability 797 54 Higher derivatives — applications 216 5-5 Partial differentiation 222 5-6 Total differential 228 5-7 Envelopes 234 5-8 The chain rule and its consequences 239 5-9 Change of variable 243 5-10 Im- plicit functions 248 511 Higher order partial derivatives 253 Prob- lems 257 viii / CONTENTS 6 Exponential, Hyperbolic, and Logarithmic Functions 270 6-1 The exponential function 270 6 2 Differentiation of functions involving the exponential function 277 6-3 The logarithmic function 281 6-4 Hyperbolic functions 287 6-5 Exponential function with a complex argument 293 Problems 296 7 Fundamentals of Integration 302 7-1 Definite integrals and areas 302 7-2 Integration of arbitrary continuous functions 311 7-3 Integral inequalities 319 7-4 The definite integral as a function of its upper limit-indefinite integral 320 7-5 Differentiation of an integral containing a parameter 324 7-6 Other geometrical applications of definite integrals 326 1-1 Numerical integration 332 Problems 337 8 Systematic Integration 345 8-1 Integration of elementary functions 345 8-2 Integration by substitution 348 8-3 Integration by parts 355 8-4 Reduction for- mulae 358 8-5 Integration of rational functions-partial fractions 362 8-6 Other special techniques of integration 368 Problems 372 J> Linear Transformations and Matrices 378 91 Introductory ideas 378 9-2 Matrix algebra 386 9-3 Deter- minants 396 9-4 Linear dependence and linear independence 404 9-5 Inverse and adjoint matrix 406 9-6 Matrix functions of a single variable 410 9-7 Solution of systems of linear equations 413 9-8 Eigenvalues and eigenvectors 421 9-9 Linear transformations 424 9-10 Applications of matrices and linear transformations 426 Problems 432 1 Functions of a Complex Variable 444 10-1 Sequences of complex numbers and limits 444 10-2 Curves and regions 448 10-3 Function of a complex variable, limits and con- tinuity 452 10-4 Derivatives — Cauchy-Riemann equations 458 10-5 Conformal mapping 471 10 6 Applications of conformal mapping 482 Problems 485 CONTENTS / ix 11 Scalars, Vectors, and Fields 492 11-1 Curves in space 492 11-2 Antiderivatives and integrals of vector functions 504 11-3 Some applications 509 11-4 Fields, gradient, and directional derivative 575 11-5 An application to fluid mechanics 520 Problems 522 1 2 Series, Taylor's Theorem and its Uses 531 12-1 Series 531 12-2 Power series 549 12-3 Taylor's theorem 557 12-4 Application of Taylor's theorem 571 12-5 Applications of the generalized mean value theorem 577 Problems 586 13 Differential Equations and Geometry 596 13-1 Introductory ideas 596 13-2 Possible physical origin of some equations 598 13-3 Arbitrary constants and initial conditions 601 13-4 Properties of solutions — isoclines 604 13-5 Orthogonal trajec- tories 617 13-6 Modified Euler method 618 13-7 A simple pre- dictor-corrector method 619 Problems* 623 14 First Order Differential Equations 626 14-1 Equations with separable variables 626 14-2 Homogeneous equations 628 14-3 Exact equations 630 14-4 The linear equa- tion of first order 634 14-5 Equations with implicit dependence on x 637 14-6 Clairaut's and Lagrange's equations 638 14-7 Picard's iterative method 641 14-8 Direct deductions and comparison theorems 645 Problems 650 15 Higher Order Differential Equations 656 15-1 Linear equations with constant coefficients — homogeneous case 656 15-2 Linear equations with constant coefficients — inhomogeneous case 661 15-3 Variation of parameters 675 15-4 Simultaneous linear differen- tial equations 677 15 5 Series solution of differential equations 678 15-6 Runge-Kutta method 680 15-7 Oscillatory solutions 683 15-8 Coupled oscillations and normal modes 686 15 9 The Laplace transform 691 Problems 696 x / CONTENTS 16 Fourier Series 700 16-1 Introductory ideas 700 16 2 Convergence of Fourier series 770 16-3 Different forms of Fourier series 718 16-4 Differentiation and Integration 726 Problems 731 Answers to selected problems 734 Index 756 Introduction to sets and numbers 1 -1 Sets and algebra In applications of mathematics to engineering and science, we often use the properties of real numbers. Many of these properties are intuitively obvious, but others are more subtle and depend for their proper use on a simple understanding of the mathematical basis of the so-called real number system. This chapter describes the elements of the real number system in a straight- forward manner for subsequent use throughout the book. The reader will certainly know how to work with finite combinations of numbers, but what is less certain is whether he understands how to interpret and use limiting processes. For example, what is the meaning and what, if any, is the value to be associated with the limit lim m->-cc h$ which is to be interpreted as the value approached by the expression in square brackets as n increases without bound ? It was questions such as these and, indeed, far simpler ones that first led to the study of real numbers. Many properties of numbers, nowadays accepted by all as self-evident, were once regarded as questionable. This is still clearly apparent from much of the notation that is in current use. Thus, for example, the fact that \/2 cannot be expressed as the ratio of two integers led to its being termed an irrational number. Even more extreme is the term imaginary number that is given to \/—\. Although, as we shall see later, this number does not belong to the real number system and so merits special consideration, it is however no less real than the integer 2. Experience suggests that in any systematic development of the properties of the real number system, the operations of addition and multiplication must play a fundamental role. These conjectures are of course true, but underlying the idea of real numbers and their algebraic manipulation are the even more fundamental concepts of sets and their associated algebra. Because these notions are sometimes unfamiliar, we shall start by considering some simple but important ideas concerning sets. We must first define the term set for which the alternative terms aggregate, class, and collection are also often used. Our approach will be direct and pragmatic and we shall agree that a set comprises a collection of objects or elements, each of which is chosen for membership of the set because it 2 / INTRODUCTION TO SETS AND NUMBERS CH 1 possesses some required property. Membership of the set is determined en- tirely by this property; an object only belongs to the set if it possesses the required property, otherwise it does not belong to the set. The properties of membership and non-membership of a set are mutually exclusive. An important numerical-set which we shall often have occasion to use is the set N of natural numbers 1, 2, 3, . . ., used in counting. In future the symbol N will always be used to signify this natural set of positive integers. Notice that there can be no greatest member m of this set, since however large m may be, m + 1 is larger and yet is also a member of the set N. Accordingly, when we use a number m that is allowed to increase without restriction, it will be convenient to imply this by saying that 'w tends to infinity', and to write the statement in the form m—>-co. Notice that infinity is not a number in the usual sense, but just the outcome of the mathematical process of allowing m to increase without bound. It is always necessary to relate the symbol oo to some mathematical expression, since by itself it has little or no meaning. N is only one type of set however, and from the wording of our definition it is apparent that the elements of a set need not be numerical. Thus in statistics one is concerned with sets of events which may or may not be numerical, whereas in the analysis of logical operations one is concerned with sets of decisions. The notation and simple algebra we now develop are applicable to all sets and, hence, to any situations such as those just enumerated which are capable of description in terms of sets. To simplify the manipulation of these ideas we must introduce a notation for elements of a set, for sets themselves, and for the membership of an element to a set. It is customary to denote general elements of sets by lower case letters a, b, . . ., x, . . ., and sets themselves by capital letters A, B, . . ., S, . . .. If a is a member of set A we shall write a e A. This is usually read 'a is an element of A\ Conversely, if a is not an element of A we shall write a$A. In this notation we have 3 e N, but rr $ N, where 77 = 3-1415. . ., and N is the set of natural numbers. If a set only contains a small number of elements it is often simplest to define it by enumerating the elements. Hence, for a set 5" comprising the four integer elements 3, 4, 5, and 6 we would write 5 = {3, 4, 5, 6}. This set is a finite set in the sense that it comprises a finite number of elements. Con- versely, the set N of natural numbers is an infinite set since it contains an infinite number of elements. Often it is useful to have a notation which indicates the membership criterion that is to be used for the set. Thus, if we were interested in the set B SEC 1-1 SETS AND ALGEBRA / 3 of positive integers n whose squares lie between the positive numbers m and 2m, we would write B = {n\n e N, in < n 2 < 2m}. Here we have used the convention that the symbol « to the left of the vertical rule signifies a general element of the set in question, whilst the expressions to the right of the rule express the membership criteria for the set. There, of course, the symbol < when used in conjunction with numbers a and b in the form a < b is to be read 'a less than b\ An important set that is frequently used is the set of ordered pairs. An element of this set will be written (m, n), where m and n are not necessarily numbers and the element (m, n) is different from the element (n, m) unless m and n are identical. An important use of this set is in the construction of tables, when the ordered pair becomes an ordered number pair, the first member of which is usually the argument and the second member the func- tional value. Hence the ordered number pair {\tt, 0-5) could refer to the sine of the angle \tt radians. In this example the relationship between the first and second numbers of the ordered pair is determinate since sin ^77 = 0-5, but this is not always the case with ordered pairs. Thus if the ordered pair of integers (m, n) were used to describe the throw of a die in a series of N trials, as the statistician would call them, then m could represent the number of the throw or the trial number, and n the score resulting from that throw. Here m would range from unity to N, the number of trials in the statistical experiment, and n would be any integer between 1 and 6. There would then be no rule by which n could be predicted for any given m. Ordered number pairs are also encountered when constructing graphs of functions where the convention is usually that (a, b) signifies the point with x-coordinate a and j-coordinate b. Thus the graph of the function y == f(x) for which x is between a and b could be written in set notation S = {(x,f(x))\a <x<b}. The notation of an ordered pair as an element of a set readily extends to an ordered triple (m, n, r), which again need not necessarily involve numerical quantities, nor need it be determinate. Again, two ordered triples will only be identical if their corresponding entries are identical. Ordered number triples of a determinate kind occur when considering the graph of a function of two independent variables as, for example, the equilibrium temperature at a given point of a cross-section of a very long metal bar. Statistical events provide the most common source of ordered triples of the indeterminate variety. As a simple illustration we may consider the statistical experiment comprising N trials, each of which involves tossing a coin twice and recording the results of each throw as a 'head' (H) or a 'tail' (T). Then the first quantity in the ordered triple could record the trial number with the second and third quantities recording an H or a T according as the 4 / INTRODUCTION TO SETS AND NUMBERS CH 1 first and second throws gave rise to a 'head' or a 'tail'. A typical ordered triple would then be (3, T, H) in which the second and third entries in the ordered triple cannot be predicted from a knowledge of the first entry. It is often necessary to study relationships between sets and for this pur- pose an algebra of sets must be constructed. The simplest situation that can occur is that from a set A, a new set B is formed, such that all elements of B are also elements of A. Such a set B will be called a subset of A. This result will be written B £ A, which is to be read 'B is a subset of A\ If x is an element of A, so that we may write x e A, then either x e B, or x $ B. When there are some elements x e A which are not to be found in B, so that x' <£ B, then B is called a proper subset of A, the result being written B c A. The definition of a subset B of A does not preclude the possibility that for every element x e A it is also true that x e B. When this occurs sets A and B have the same elements and are said to be equal, the result being written A = B. It is clear from the definition of equality that when A = B both the statements A £ B and B £ A must be true. These last two statements are often useful as an alternative definition of equality between sets. With the above definitions it is clear that if A = N and B = {I, 2, 3, 4, 5}, then B c A; whereas if ^ = {4,7,3,5,9} and B = {7, 4, 5, 9, 3}, then A £ B and B £ A so that ^ = B. A more general situation arises when two sets A and B are involved, each of which possesses elements which are not common to the other so that neither statement A <= B, nor B <= A is true. The set of elements C that is common to these two sets A and B will be called the intersection of the sets A and B and is written Sometimes this is read 'A cap B' with the understanding just defined. In the event that there are no elements common to the sets A and B we shall write AnB = <f>, with the understanding that <j> is the null set, which we define to be the set containing no elements. Under these circumstances the sets A and B are said to be disjoint. By way of example, if A\ = {a, b, 1, 3, 5, 7} and Bi = {a, c, d, e, 3, 7, 9}, SEC 1-1 SETS AND ALGEBRA / 5 A^B a n bi A Ufii (a.) <b) (c) Fig. 1-1 Symbolic representation of set operations: (a) proper subset; (b)nnler- section; (c) union. then Ax n B x = {a, 3, 7}; whereas if A 2 = {1, 3, 7} and B 2 = {0, 4, 9, 11}, ^2 n fi 2 = <£. Another important set related to sets A and B is the set C containing all the elements belonging to A, to B or to both A and B. This is called the union of sets yf and B and is written C = AUB; which reads 'A cup B\ With the sets defined above we obviously have AiV Bi = {a,b,c,d,e,l,3,5,7,9} and A 2 u 5 2 = {0, 1, 3, 4, 7, 9, 11}. Clearly, for any set ^4 we have j> <^. A, Av) j> = A, and A <~^<f> = <f>. These seemingly abstract ideas can be illustrated symbolically by means of a very convenient device. This is the so called Venn diagram, which uses a pictorial representation for the sets in question. Sets are represented by the interior of closed curves, usually of arbitrary shape, and their relationship is then illustrated by the relationships that exist between these curves. Thus, when as in Fig. IT (a) curve A representing set A lies within curve B repre- senting set B, we have the situation that A is a proper subset of B, so that A <= B. Figs IT (b), (c) illustrate, respectively, the intersection A n B and the union A u B of sets A and B, which are shown as shaded areas on those figures. ■T5 *** a n b\ Fig. 1-2 Sets in plane: (a) intersection; (b) union. A U B i (b) 6 / INTRODUCTION TO SETS AND NUMBERS CH 1 In general this representation is only symbolic, but in the event that elements of the sets A and B may be unambiguously represented by points in the plane, the Venn diagrams become true representations. Let set A comprise all the points within and on a circle of unit radius, usually called a unit circle, and centred on the origin, and let B comprise all the points within and on the circle of radius 2 centred on the point x = 2-5 on the x-axis. Then the relationships A n B and A u B are truly represented by the shaded areas in Figs 1-2 (a), (b). Similarly, if we consider the sets A and B defined by the interiors and I 2 A fl B = {1} A f) B=(/> (a) (b) Mg. 1-3 Intersection of sets in the plane: (a) single point contained in intersection ; (b) disjoint sets. boundaries of the two unit circles illustrated in Figs 1-3 (a), (b), we see that in (a), A c\B = {1}, so that only the single point x = 1 on the x-axis is common to A and B, whereas in (b), A n B = <f>. A final idea we now introduce in connection with sets A and B is the complement of B relative to A, which we shall write as A\B. This is a generali- zation of the notion of subtraction and comprises the set of elements of A that do not belong to B. The expression A\B is usually read 'A minus 5' and if, for example, A = {a, 1, 3, 7} and B = {a, 7, 9, 11} then A\B = {1, 3}. Appealing again to a Venn diagram, we illustrate this relationship by the shaded region in Fig. 1-4. A\B Fig. 1-4 Symbolic representation of complement of B relative to A. SEC 1-1 SETS AND ALGEBRA / 7 The following useful results are almost self-evident and are true for arbitrary sets A, B, and C. They may be proved either from the basic defini- tions, or by appeal to Venn diagrams. Basic set operations AuA = AnA = A, (1-1) AnB = BnA, (1-2) Akj B = Bu A, (1-3) (AUB)UC = AU(BU C), (1-4) (AnB)nC = An(BnC), (1-5) Au(BnQ = (AvB)n(AuC), (1-6) An(BvC) = (AnB)u(AnC). (1-7) From these there follows an important theorem due to De Morgan: theorem 1-1 For any three arbitrary sets A, B, and C it is true that A\(B UC) = (A\B) n (A\C) and A\(B nQ = (A\B) u (A\C). Proof An analytical proof of the first stated result involves the following two steps: (a) the proof that if x is an arbitrary element such that x e A\(B u Q, then x e (A\B) and x e (A\C), showing that ipuQc (A\B) n (A\C); and (b) the proof that if x e (A\B) and x e (A\C), then x e A\(B u C), showing that 04\B) O (A\C) = ^\(B u C). Then by our alternative definition of the equality of two sets P and Q, whereby P = Q if i> c g and g c p, the result will follow. The details, which are not difficult, are left to the reader. The proof of the second stated result follows on similar lines. The theorem may be illustrated in general terms, and proved for sets which may be represented by points in a plane, by the use of Venn diagrams. The three diagrams appropriate to the first stated result are shown in Figs 1-5 (a), (b), and (c), where the shaded regions represent the sets A\B, A\C, and A\(B u C), respectively. The reader will have noticed that it is a feature of basic set operations 8 / INTRODUCTION TO SETS AND NUMBERS CH 1 A\B A\C A\(B U Q (a) (b) (c) Fig. 1-5 Representation of De Morgan's theorem. that they essentially combine two sets to generate a third in an unambiguous manner. It is because of this simple property that operations such as union and intersection are called binary set operations, the term 'binary' referring to the two sets on which the set operation is performed to generate the third. Thus the operation n acting on any two sets A and B generates a third set C = A n B where, of course, C will be the null set if A and B have no common elements. Theorem 1-1 illustrates that operations on sets are not always as simple as the formation of the union or intersection of sets. Accordingly, it is neces- sary to appreciate clearly the implication of any statement that may be made in the derivation of a result. These statements may either be 'one way' implica- tions or 'two way' implications in the following sense. An implication will be said to be one way if it is a simple statement of the form 'result A implies result B\ This statement is usually written symbolically in the concise form A=> B. A two way implication arises if from the above statement it also follows that 'result B implies result A\ so that in addition to the previous statement it is also permissible to write B=> A. Rather than write for a two way implication the two results A => B and B => A, the notation is contracted so that the two way implication may be written concisely in the form The symbol <*■ is usually read 'implies and is implied by'. Two simple illustrations using sets of integers should clarify these remarks. We can only write a = 1 => a is an integer, since the converse statement, a is an integer, does not imply that a = 1. SEC 1-2 SET THEORY AND PROBABILITY / 9 However, we may obviously write integer n contains a factor 2 «- n is an even integer. Formal development of these and similar ideas is essential if the logical structure of mathematicsMS to be fully appreciated, though these matters will not be pursued further in this introductory account. 1 -2 Set theory and probability One of the most direct applications of the elements of set theory is to be found in a formal introduction to probability theory. Because the notion of a probability is fundamental to many branches of engineering and science we choose to introduce some basic ideas and definitions now, making full use of the notions of set theory. This will serve a dual purpose in that it will provide an excellent illustration of a specific application of set theory, whilst at the same time introducing an important concept at the very outset of our study. In some situations the outcome of an experiment is not determinate, so one of several possible events may occur. Following statistical practice we shall refer to an individual event of this kind as the result or outcome of a trial, whereas an agreed number of trials, say N, will be said to constitute an experiment. If an experiment comprises throwing a die N times, then a trial would involve throwing it once and the outcome of a trial would be the score that was recorded as a result of the throw. The experiment would involve recording the outcome of each of the N trials. In general, if a trial has m outcomes we shall denote them by £i, E%, . . ., E m and refer to each as a simple event. Hence a trial involving tossing a coin would have only two simple events as outcomes: namely 'heads', which could be labelled E\, and 'tails', which would then be labelled £2. In this instance an experiment would be a record of the outcomes from a given number of such trials. A typical record of an experiment involving tossing a coin eight times would be £i, £2, E\, £i, £i, E 2 , £2, £1. With such a simple experiment the £1, £2 notation has no apparent advantage over writing H in place of £1 and T in place of £2 to obtain the equivalent record H, T, H, H, H, T, T, H. The advantage of the £» notation accrues from the fact that the subscript attached to the £ may be ordered numerically, thereby enabling easier manipulation of the outcomes during analysis. Events such as the result of tossing a coin or throwing a die are called chance or random events, since they are indeterminate and are supposedly the consequence of unbiased chance effects. Experience suggests that the relative frequency of occurrence of each such event averaged over a series of similar experiments tends to a definite value as the number of experiments increases. The relative frequency of occurrence of the simple event Ei in a series of N trials is thus given by the expression 10/ INTRODUCTION TO SETS AND NUMBERS CH 1 Number of occurrences of event Ei N By virtue of its definition, this ratio must either be positive and less than unity, or be zero. For any given N, this ratio provides an estimate of the theoretical ratio that would have been obtained were N to have been made arbitrarily large. This theoretical ratio will be called the probability of occurrence of event Ei and will be written P(Et). In many simple situations its value may be arrived at by making reasonable postulates concerning the mechanisms involved in a trial. Thus when fairly tossing an unbiased coin it would be reasonable to suppose that over a large number of trials the number of 'heads' would closely approximate the number of 'tails' so that P(H) = P(T) = \. Here, of course, P(H) signifies the probability of occurrence of a 'head' and P(T) signifies the probability of occurrence of a 'tail'. If there are m outcomes E\, Ez, . . ., E m of a trial, and they occur with the respective frequencies m, m, . . ., n m in a series of JV trials, then we have the obvious identity m + «2 + • • • + n m _ N ~ When N becomes arbitrarily large we may interpret each of the relative frequency ratios mjN (i = 1, 2, . . ., m) occurring on the left-hand side as the probability of occurrence P(Ei) of event Ei, thereby giving rise to the general result P(E{) + P(E 2 ) + ■ ■ ■ + P(E m ) = 1. (1-8) By this time a careful reader will have noticed that the definition of probability adopted here has a logical difficulty associated with it, namely, the question whether a relative frequency ratio such as m/N can be said to approach a definite number as N becomes arbitrarily large. We shall not attempt to discuss this philosophical point more fully, but rather be content that our simple definition in terms of the relative frequency ratio is in accord with everyday experience. An examination of Eqn (1-8) and its associated relative frequency ratios is instructive. It shows the obvious results that: (a) if event Ei never occurs, then m = and P(E t ) = 0; (b) if event Ei is certain to occur, then ni = N and P{Ei) = 1 ; (c) if event Ei occurs less frequently than event £), then rn < nj and P{Ei)< P(.Ei); (d) if the m possible events E\, £2, . . ., E m occur with equal frequency, then m = « 2 = • • ' = n m = Njm and P(Ei) = P(E 2 ) = • ■ • = P(E m ) = \jm. The relationship between sets and probability begins to emerge once it is SEC 1-2 SET THEORY AND PROBABILITY / 11 appreciated that a trial having m different outcomes is simply a rule by which an event may be classified unambiguously as belonging to one of m different sets. Often a geometrical analogy may be used to advantage when representing the different outcomes of a particular trial and such an approach then leads directly to a representation closely approximating the Venn diagrams of the previous section. A convenient example is provided by the simple experiment which involves throwing two dice and recording their individual scores. There will be in all 36 possible outcomes which may be recorded as the ordered number pairs (1, 1), (1, 2), (1, 3), . . ., (2, 1), (2, 2), . . ., (6, 5), (6, 6). Here the first integer in the ordered number pair represents the score on die 1 and the second the score on die 2. These may be plotted as 36 points with integer coordinates as shown in Fig. 1-6 (a). 6 5 •3 4 a o u o 3 o tw 2 1 • • • • • •: • • • • • •: T3 C O £ o u 2 3 4 5 Score on die 1 (a) 3 4 5 Score on die I (b) Fig. 1-6 Sample space for two dice: (a) complete sample space; (b) sample space for specific outcome. Because each of the indicated points in Fig. 1 -6 (a) lies in a two- dimensional geometrical space (that is, they are specific points in a plane), and in their totality they describe all possible outcomes, the representation is usually called the sample space of events. The probability of occurrence of an event characterized by a point in the sample space is, of course, the probability of occurrence of the simple event it represents. As a sample space will require a 'dimension' for each of its variables it is immediately apparent that only in simple cases can it be represented graphically. Nevertheless the idea is still useful, as was that of the Venn diagram even when it was only symbolic. The points in the sample space may be regarded as defining points in a 12 / INTRODUCTION TO SETS AND NUMBERS CH 1 set D so that specific requirements as to the outcome of a trial will define a subset A of D, at each point of which the required event will occur. Typical of this situation would be the case in which a simple event is the throw of two dice, and the requirement defining the subset is that the combined score after throwing the two dice equals or exceeds 8. Here the set D would be the 36 points within the square in Fig. 1-6 (b) and the set A the 15 points within the triangle. Using set notation we may write A c D. The sample space representation becomes particularly valuable when trials are considered whose outcome depends on the combination of events belonging to two different subsets A and B of the sample space. Thus, again using our previous example and taking for A the points within the triangle in Fig. 1-6 (b), the points in B might be determined by the requirement that the combined score be divisible by the integer 3. The set of points B is then those contained within the dotted curves of Fig. 1-6 (b). A new set C may be derived from two sets A and B in two essentially different ways according as : (a) C contains points in A or B or both; (b) C contains points in A and B. If desired, these statements about sets may be rewritten as statements about events. This is so because there is an unambiguous relationship between an event and the set of points Sin the sample space at which that event occurs. Thus, for example, we may paraphrase the first statement by saying, the event corresponding to points in C denotes the occurrence of the events corresponding to points in A or B, or both. Because of this relationship it is often convenient to regard an event and the subset of points it defines in the sample space as being synonymous. The statements provide yet another connection with set theory, since in (a) we may obviously write C = A u B, whereas in (b) we must write C = A n B. In terms of the sets A and B defined in connection with Fig. 1-6 (b), the set C = A u B contains the points in the triangle together with those within the two dotted curves exterior to the triangle. The set C = A n B contains only the five points within the two dotted curves lying inside the triangle. Here it should be remarked that the statistician usually avoids the set theory symbols u and n, preferring instead to denote the union of A and B by A + B and their intersection by AB. This largely arises because of the duality we have already mentioned that exists between an event and the set of points it defines; the statistician naturally preferring to think in terms of events rather than sets. However, to emphasize the connection with set theory we shall preserve the set theory notation. Using this duality we now denote by P(A) the probability that an event corresponding to a point in the sample space lies within subset A, and define its value to be as follows: SEC 1-2 SETTHEORY AND PROBABILITY / 13 DEFINITION 11 P(A) is the sum of the probabilities associated with every point belonging to the subset A. In Fig. 1-6 (b) the set A contains the 15 points within the triangle and, since for unbiased dice each point in the sample space is equally probable, it follows at once that the probability 1/36 is to be associated with each of these points. Hence from our definition we see that in this case, P(A) = 15 x (1/36) = 5/12. Similarly, for the set B comprising the 12 points con- tained within the dotted curves we have P(B) = 12 x (1/36) = 1/3. We can now introduce the idea of a conditional probability through the following definition. definition 1-2 P(A\B) is the conditional probability that an event known to be associated with set B is also associated with set A. Clearly we are only interested in the relationship that exists between A and B, with B now playing the part of a sample space. Because in Definition 1 -2 B plays the part of a sample space, but is itself only a subset of the complete sample space, it is sometimes given the name of the reduced sample space. In terms of set theory Definition 1-2 is easily seen to be equivalent to P(A n B) P(A\B) = -^p (1-9) which immediately shows us how P(A\B) may be computed. Namely, P(A \B)is obtained by dividing the sum of the probabilities at points belonging to the intersection A n B of sets A and B by the sum of the probabilities at points belonging to B. This ensures that P(B\B) = 1 as would be expected. We can illustrate this by again appealing to the sets A and B defined in connection with Fig. 1-6 (b). It has already been established that P(B) = 1/3, and since there are only five points in A n B, each with a probability 1/36, it follows that P(A n B) = 5/36. Hence P(A\B) = (5/36)/(l/3) = 5/12. This result expressed in words states that when two dice are thrown and their score is divisible by the integer 3. then the probability that it also equals or exceeds 8 is 5/12. A direct consequence of Eqn (1-9) is the so called probability multiplication rule : theorem 1-2 If two events define subsets A and B of a sample space, then P(A n B) = P(B)P(A\B). Sometimes, when it is given that the event corresponding to points in subset B occurs, it is also true that P(A\B) depends only on A, so that 14 / INTRODUCTION TO SETS AND NUMBERS CH 1 P(A\B) = P(A). The events giving rise to subsets A and B will then be said to be independent. The probability multiplication rule then simplifies in an obvious manner which we express as follows: Corollary 1 -2 If the events giving rise to subsets A and B of a sample space are independent, then P(A nB) = P(A)P(B). Consideration of the interpretation of P(A u B) leads to another impor- tant result known as the probability addition rule: theorem 1-3 If two events define subsets A and B of a sample space, then P(A u B) = P(A) + P(E) - P(A n E). The proof of this theorem is self-evident once it is remarked that when computing P(A) and P(B) from subsets A and B and then forming the expres- sion P(A) + P{B), the sum of probabilities at points in the intersection A n Bis counted twice. Hence P(A) + P(B) exceeds P(A u B) by an amount P(A n B). The probability addition rule also has an important special case when sets A and B are disjoint so that A n B = <f>. When this occurs the events corresponding to sets A and B are said to be mutually exclusive and we express the result as follows : Corollary 1-3 If the events giving rise to subsets A and B of a sample space are mutually exclusive, then P(A UB)= P(A) + P(B). As a simple illustration of Theorem 1 -3 we again use the sets A and B defined in connection with Fig. 1 -6 (b) to compute P(A u B). The result is immediate for we have already obtained the results P(A) = 5/12, P(B) = 1/3, and P(A n B) = 5/36, so from Theorem 1 -3 follows the result P(A UB) = 5/12 + 1/3 - 5/36 = 11/18. The applications of these theorems and their corollaries are well illustrated by the following simple examples. Example 1-1 A bag contains a very large number of red and black balls in the ratio 1 red ball to 4 black. If 2 balls are drawn successively from the bag at random, what is the probability of selecting (a) 2 red balls, (b) 2 black balls, (c) 1 red and 1 blackball? SEC .1-2 SET THEORY AND PROBABILITY / 15 Let A\ denote the selection of a red ball first (and either colour second), and Az the selection of a red ball second (and either colour first). Then Ai n A2 is the selection of 2 red balls and, similarly, Bi n B 2 is the selection of 2 black balls. As the balls occur in the ratio 1 red : 4 black it follows that their relative frequency ratios are 1/5 for a red ball and 4/5 for a black ball, so P(Ai) = 1/5 and P(Bi) = 4/5. The fact that the bag contains a large number of balls implies that the drawing of one or more balls does not materially alter the relative frequency ratio that existed at the start, so P{A Z ) = 1/5 and P{B^) = 4/5. This, together with the fact that the balls are drawn at random, implies that the drawing of each ball is an independent event. The independence of events A and B then allows the use of Corollary 1 -2 to determine the required solutions to (a) and (b). We find that (a) P(Ai n A 2 ) = (1/5) . (1/5) = 1/25, (b) P(Bi n Bi) = (4/5) . (4/5) = 16/25. Now to answer (c) we notice that there are two mutually exclusive orders in which a red and a black ball may be selected. Namely as the event Cu D where C = A\ n B 2 (red then black) and D = B\C\ A* (black then red). From Corollary 1-3 we then have that P(C u D) = P(C) + P(D), where P(C) and P(D) are determined by Corollary 1-2. This shows that P(C) = P(Ai)P(B 2 ) and P(D) = P{Bi)P(A 2 ), so that P(C) = P(D) = (1/5) . (4/5) = 4/25. The solution to (c) becomes P(C u D) = 4/25 + 4/25 = 8/25. The three forms of selection (a), (b), and (c) are themselves mutually exclusive and it must follow that P(Ai n A 2 ) + P{Bi n B 2 ) + P(C u D) = 1, as is readily checked. Indeed this result could have been used directly to calculate P{C u D) from P(A\ n A z ) and P(Bi n B 2 ) in place of the above argument using Corollary 1-3. The previous situation becomes slightly more complicated if only a limited number of balls are contained in the bag. Example 1-2 A bag contains 50 balls of which 10 are red and the remainder black. If 2 balls are drawn successively from the bag at random, what is the probability of selecting (a) 2 red balls, (b) 2 black balls, (c) 1 red and 1 black ball ? This time the approach must be slightly different because, unlike Example 1 ■ 1 , the removal of a ball from the bag now materially alters the probabilities involved when the next ball is drawn. In fact this is a problem involving conditional probabilities. Here we shall define A to be the event that the first ball selected is red, 16 / INTRODUCTION TO SETS AND NUMBERS CH 1 and B to be the event that the second ball selected is red. The probability we must now evaluate is the probability of occurrence of event B given that event A has occurred. Expressed in set notation we have to find P(A n B), the probability of occurrence of the event associated with A n B. This is a conditional probability with the set associated with event A playing the role of the reduced sample space. Utilizing this observation we now make use of Theorem 1-2 to write P(A nB) = P(A)P(B\A). Now the relative frequency of occurrence of a red ball at the first draw is 10/(10 + 40) = 1/5, so that P(A) = 1/5. (Not till later will we use the fact that the relative frequency of occurrence of a black ball is 40/(10 + 40) = 4/5.) Given that a red ball has been drawn, 9 red balls and 40 black balls remain in the bag. If the next ball to be drawn is red then its probability of occurrence is the conditional probability P(B\A) = 9/(9 + 40) = 9/49. Hence it follows that the solution to (a) is P(A nB) = (1/5) . (9/49) = 9/245. It is interesting to compare this with the value 1/25 that was obtained in Example 1 • 1 on the assumption that there was virtually an infinite number of balls in the bag. If C is defined to be the event that the first ball drawn is black and D the event that the second ball drawn is black, then to answer (b) we must compute P(C n D). Obviously, P(C) = 4/5, and by using an argument analogous to that above it follows that P(D\C) = 39/(10 + 39) = 39/49. Hence the solu- tion to question (b) is P(C n Z>) = (4/5) . (39/49) = 156/245. Again this should be compared with the value 16/25 obtained in Example 1-1. The simplest way to answer (c) is to use the fact that events (a), (b), and (c) describe the only possibilities and so are mutually exclusive. Hence the sum of the three probabilities must equal unity. Denoting the probability of event (c) by P we have P=l-P(/lnB)-P(Cn D), showing that P = 1 - 9/245 - 156/245 = 16/49. It is sometimes helpful to bear in mind the following table in which equivalent statements are expressed using the alternative languages of sets and probability theory. SEC 1-2 SETTHEORY AND PROBABILITY / 17 Sets Probability Au B = C A + B = C; the event corresponding to C is denned as the occurrence of at least one of the events corres- ponding to A or B or both. A n B = C AB = C; the event corresponding to C is defined as the occurrence of both of the events corresponding to A and B. A n B = <f> AB = 0; events corresponding to A and B are mutually exclusive. A = <f> A = 0; the event corresponding to A does not occur. B <= A B => ^4 ; the event corresponding to B implies that corresponding to A. A\B the event corresponding to A and not that corres- ponding to B. To close this section with a brief examination of repeated trials, the ideas of a permutation and a combination must be utilized. The student will already be familiar with these concepts from elementary combinatorial algebra and so we shall only record two definitions. definition 1-3 A permutation of a set of n mutually distinguishable objects rata time is an arrangement, or an enumeration of the objects, in which their order of appearance counts. Thus of the five letters a, b, c, d, e the arrangements a, b, c and a, c, b represent two different permutations of three of the five letters. These are described as permutations of five letters taken three at a time. Other permuta- tions of this kind may be obtained by further re-arrangement of the letters a, b, c and by the replacement of any of them by either or both of the remaining two letters d and e. The total number of different permutations of n objects r at a time will be denoted by n P r and it is left to the reader to prove as an exercise that n Pr = -» (1-10) (n — r)\ v ' where n\ (factorial n) = n(n — 1)(« — 2) . . .3.2.1, and we adopt the convention that 0! = 1. definition 1-4 A combination of a set of n mutually distinguishable objects r at a time is a selection of r objects from the n without regard to their order of arrangement. 18 / INTRODUCTION TO SETS AND NUMBERS CH 1 It follows from the definition of a permutation that a set of r objects may be arranged in r! different ways so that denoting the number of different combinations of n objects r at a time by I J, we must have •*r«rt(" r ). This gives the important result l«\ = /' ■ (Ml) \r) r\(n-r)\ In many books it will often be found that the expression n C r is written in place of I I. The numbers I J are usually called binomial coefficients because of their occurrence in the binomial expansion (p + q)n — jr \\ p r q n ~ r , with n a positive integer. (112) r = \ r J Now consider an experiment involving a series of independent trials in each of which only one of two events A or B may occur. Then if the prob- abilities of occurrence of events A and B are p and q, respectively, we must obviously have/7 + q = 1 . If n such trials constitutes an experiment, we might wish to know with what probability the experiment may be expected to yield r events of type A. The statistician will call such a situation repeated inde- pendent trials. An experiment will be deemed to be successful if r events of type A and n — r events of type B occur, irrespective of their order of occurrence. Clearly this can happen in I J different ways and by Corollary 1-2, since the trials are independent, the probability of occurrence of any one of these events will be/J r (l — p) n ~ r . Hence, as the results of trials are also mutually exclusive, it follows from Corollary 1 -3 that the required probability P(r) of occurrence of r events of A each with probability of occurrence p in n independent trials is JV) =(")/> r -/>)"-'• (113) Identifying the p and q of Eqn (M2) with the probabilities of occurrence of the events A and fijust discussed, we see that<7 = 1 — p, so that Eqn (112) takes the form i = i (") /,r(1 ~ / ' ) "" r - (H4) Each term on the right-hand side of Eqn (M4) then represents the probability SEC 1-2 SET THEORY AND PROBABILITY / 19 of occurrence of an event of the form just discussed. For example, the first term P(0) -(;)<'-»■ is the probability that event A will never occur in a series of n independent trials, whilst the third term P(2) = fy p\\ - p) 2(\ _ „\n-2 is the probability that event A will occur exactly twice in a series of n indepen- dent trials. The n + 1 numbers P(r), r = 0, 1, . . ., n have, by definition, the property that P(0) +/>(!) + • • • +P(n)= 1, (1-15) and they are said to define a discrete probability distribution. It is conven- tional to plot them in histogram fashion when they illustrate the probabilities to be associated with the n + 1 possible outcomes of an experiment involving n trials. Fig. 1-7 (a) illustrates the case in which n = 4 and p = \ so that 0-6 0-5 0-4 0-3 0-2 01 P{r) ( n = 4,p = i) 10 0-8 0-6 0-4 0-2 12 3 T> U(r) 12 3 5 (a) (b) Kig. 1.7 Binomial distribution: (a) binomial probability density function; (b) binomial cumulative distribution function. ™-G)G)'er -&«■>- 27 , = — , and, similarly, 4/ 64 P(2) = 54/256, P(3) = 3/64, and ^(4) = 1/256. Because of the origin of this distribution, Eqn (1-13) is said to define the binomial distribution. This distribution is historically associated with Jacob Bernoulli (1654-1705) and experiments of the type just examined are sometimes referred to as Bernoullian trials. When the cumulative total 20 / INTRODUCTION TO SETS AND NUMBERS CH 1 U(r)=JP(t), (1-16) «=o is plotted in histogram fashion against r the result is called the cumulative distribution function. The cumulative distribution function corresponding to Fig. 1-7 (a) is shown in Fig. 1-7 (b). It is conventional to refer to the P(r) as the probability density function or the frequency function since it describes the proportion of observations appropriate to the value of r. Example 1-3 If an unbiased coin is tossed six times, what is the probability that only two 'heads' will occur in the sequence of results ? As the coin is unbiased p = q = | and so «-w-s It is an immediate consequence of Eqn (1-13) that (a) if A occurs with probability p in independent trials then the prob- ability that it will occur at least r times in n trials is I (")/>»0 -/>)»-«; (b) if A occurs with probability p in independent trials then the probability that it will occur at most r times in n trials is iQp s (i-p) n - s ; and to this we may add Eqn (1-13) in this form: (c) if A occurs with probability p in independent trials then the prob- ability that it will occur exactly r times in n trials is ("j^d -/')"-'• Example 1-4 What is the probability of hitting a target when three shells are fired, assuming each to have a probability \ of making a hit ? Obviously here p = \ and we will have satisfied the conditions of the question if at least one shell finds the target. Accordingly, using (a) above, the result is Hence the required probability is f + f + \ = \. So far the sample spaces we have used have involved discrete points, and SEC 1-3 INTEGERS, RATIONALS, AND ARITHMETIC LAWS / 21 it is for this reason that the term discrete has been used in conjunction with the definition of the binomial distribution. In other words, in discrete dis- tributions, no meaning is to be attributed to points that are intermediate between the discrete sample space points. In particular, referring to Fig. 1 -6 (a), there is no score to be attributed to a point with horizontal co- ordinate 2-5 and vertical coordinate 4-2 any more than there is to the point with horizontal coordinate 1 1 and vertical coordinate 9. However, situations occur in which perfectly satisfactory sample spaces can be defined which associate an event with every point of the sample space, and not just certain discrete points. The definition of a distribution function appropriate to this case requires ideas from the calculus and will not be discussed here. In statistics such distributions are called continuous distri- butions. 1 -3 Integers, rationals and arithmetic laws The reader will already be familiar with the fact that if the arithmetic opera- tion of addition is performed on the natural numbers, or the positive integers as they are often called, the result will also be a positive integer. Written symbolically this statement becomes a, b e N => (a + b) e N. However the arithmetic operation of subtraction is less simple, since we know from direct experience that even when a,beN, this does not necessarily imply that a — b is a positive integer. Indeed, in general a — b may be equal to some positive or negative integer or to zero. Thus an attempt always to express the result of subtraction of natural numbers in terms of the natural numbers themselves must fail. This is usually expressed by saying that the system of natural numbers N is not closed with respect to subtraction. The difficulty is of course resolved by supplementing the set of natural numbers N by the set N* = {. . ., —3, —2, —1,0} of negative integers and zero. If now in place of N we use the complete set of integers I = N* u N already encountered in Problem 1-1, the assertions a, b e I => {a + b) e I and a, b e I => (a — b) e I become unconditionally true. The need to generalize the notion of the natural numbers N to the com- plete set of integers I is thus seen to arise as a natural result of seeking a number system in which the binary arithmetic operation inverse to addition is always true; namely the operation of subtraction. However, the set of numbers I is still far from adequate to enable everyday practical arithmetic to be performed. To see this it is only necessary to comment that although the product of two integers belonging to I itself lies in I, the quotient of two integers belonging to I does not necessarily lie in I. Thus the complete set of integers I is not closed with respect to division. Symbolically we can write this as a, b e I => ab e I, but a, b e I => a/b e I only if b ^ and a = kb with k e I. The symbol =£ used here is to be read 'not equal to' and the condi- tion involving k simply ensures that the quotient ajb is integral. Here again the operations of multiplication and division are inverse 22 / INTRODUCTION TO SETS AND NUMBERS CH 1 binary arithmetic operations. To remove the artificial restriction on division, so that the quotient of any two non-zero integers becomes a number in some number system, we must still further extend the system I of integers. This is achieved by introducing the familiar system R* of rational numbers, which is defined as the set of all numbers of the form alb, where b ^ and a, b e I. Obviously, since integers are just a special case of rational numbers and, for example, 2 is represented by any of the rationals 2/1, 4/2, 10/5, . . ., the set R* also contains all the integers and so we may write I <= R* >* Numerous though the rational numbers obviously are, we now show how they may be arranged in a definite order and counted. One way in which this may be achieved is indicated in the following array which recognizes as different all rational representations in which cancelling of common factors has not been performed. Thus, for example, in this scheme 4/2, 6/3, 8/4, . . ., are counted as different rational numbers, despite the fact that they all represent 2. If desired these repetitions may be omitted from the resulting sequence of rational numbers, though the matter is not important. The counting or enumeration of the rationals proceeds in the order indicated by the arrows : \ -1 -1 -l i i i l T ~T^~ T" T"*2 3^4 t \ t I t I -2 -2 -2 2 2 2 2 T ~2~ T"*~T*~2 3 4 t \ t i -3 -3 -3 3 3 3 3 ~3~ ~T~* T^^Y ^3 4 If this form of enumeration is adopted then the first few rationals to be specified are 0, 1, 1/2, 1, 2, -2, -1, -1/2, -1, -3/2, -3, . . .. As already mentioned, if desired the repetitions may be deleted, so that the start of the sequence would then become 0, 1, 1/2, 2, -2, -1, -1/2, -3/2, -3, . . .. Clearly all rationals are included somewhere in this scheme, so that as each one may be put into correspondence with an integer, the mathematician SEC 1-3 INTEGERS, RATIONALS, AND ARITHMETIC LAWS / 23 is entitled to say that the rationals are countable, despite the fact that they are infinite in number. What this construction has established is the rather remarkable result that the rationals are no more numerous than the set of positive integers themselves. It might, at first sight, seem that the rationals R* must contain all possible numbers. In fact this is far from the truth since it is possible to show that numbers exist which are not expressible as a rational fraction and yet which lie between two rationals, however close they may be. For obvious reasons they are called irrational numbers, and to substantiate our assertion we now prove the existence of one such number. We will show that \/2 is irrational or, to phrase the statement more precisely, that there is no fraction of which the square is 2. The argument starts from a given assumption and then produces a contradiction, thereby showing that the original assumption must be false. It is called an argument by contradiction and is a device frequently used in higher mathematics. Suppose that mjn is such that m and n are integers having no common factor and (mjn) 2 = 2. Then m 2 = 2n 2 so that m 2 must be even and hence m itself is even. Because m is even we may set m = 2r, where r is some integer. (Why ?) Then Ar 2 = 2n 2 , or 2r 2 = n 2 , which now shows that n 2 and hence n must be even. The fact that n is even now allows us to set n = 2s and thus the numbers m and n have a common factor 2, contradicting the initial assump- tion. Hence the original assumption that \/2 is capable of representation in the rational form mjn is false. We have thus proved that -y/2 is an irrational number. It is established in higher mathematics courses that the irrational numbers are so much more numerous than the rationals that they cannot be enumerated. We make no attempt to justify this claim here. Instead we refer the interested reader to Problems 1-32 to 1-35 if he wishes to gain a little more insight into the relationship that exists between the rationals and the irrationals. A final important result arising from a deeper study of these matters, and to which we make only a passing reference, is the fact that between them, the rational and the irrational numbers exhaust all the possible types of numbers. In effect this is saying that if we work with real rational and irrational numbers, then there are no gaps left in the number system that can only be filled by the introduction of yet another kind of number. This is important because it means that however we may arrive at a number, as the result of a finite or an infinite sequence of operations, it will either be a rational or an irrational number. If the set R* of rational numbers is supplemented by the inclusion of the irrational numbers, the resulting set R is called the real number system or, the field of real numbers. The fact that R contains all possible types of real numbers is expressed by saying that the set of real numbers R is complete. Consequently, until we have occasion to consider entities such as y'—l 24 / INTRODUCTION TO SETS AND NUMBERS CH 1 there will be no need for us to work outside the real number system R. Numbers called transcendental numbers form an important subset of [he irrational numbers. These are numbers like e and it which are not defined as the root of a polynomial with rational coefficients (cf. § 2-3). For future reference it will be useful to summarize the basic properties of the field of real numbers already known to the reader. We now do this making full use of the mathematical shorthand so far introduced. Additive properties A-l a, b e R => (a + b) e R; R is closed with respect to addition. A-2 a, b eR => a + b = b + a; addition is commutative. A-3 a, b, c e R => (a + b) + c = a + (b + c) ; addition is associative. A-4 For every aeR there exists a number e R such that + a = a ; there is a zero element in R. A- 5 If a e R then there exists a number — a e R such that — a + a — ; each number has a negative. Multiplicative properties M-l a, b e R => ab e R; R is closed with respect to multiplication. M-2 a,be'R=>ab = ba; multiplication is commutative. M-3 a, b, c e R => (ab)c = a(bc); multiplication is associative. M-4 There exists a number 1 e R such that 1 . a = a for all a e R; there is a unit element in R. M-5 Let a be a non-zero number in R, then there exists a number a -1 e R such that a~ l a = 1 ; each non-zero number has an inverse. Usually we shall write Ija in place of a -1 , so that the two expressions are to be taken as being synonymous. Distributive property Dl a, b, c e R => a(b + c) = ab + ac; multiplication is distributive. The above results are self-evident for real numbers and are usually called the real number axioms. They are used by mathematicians as the logical basis for our number system. Later we shall encounter other systems of objects which, though sharing many of the properties of real numbers, are not them- selves numbers. For future reference we mention matrices, for which M-l to M-5 are not generally true, and vectors, for which two forms of multiplication exist and for which M-5 has no meaning. It is an immediate consequence of these axioms that commonplace arithmetic operations may be performed without question. For example, it is fundamental to arguments that a — b = Oo a = b, and a£ = arj => f = r\ if a ^ 0. These, and other elementary results of similar form, follow directly as a result of simple applications of the axioms. As it would be out of place to develop these ideas here we shall indicate the proof of just one such result, stating the others in the form of Problem 1-37 which is left to be attempted by SEC 1-3 INTEGERS, RATIONALS, AND ARITHMETIC LAWS / 25 the reader who wishes to question further the basis of the real number system. We prove that there is a unique zero in R. The argument is again by con- tradiction, for we first suppose that two different zero elements exist and denote them by and 0'. Then by A-4 it follows that + 0' = 0' and 0' + = 0, whence by A-2 we must have = 0', thereby establishing the uniqueness of the zero. So far our list of properties of real numbers has been concerned only with equalities. The valuable property of real numbers that they can be arranged according to size, or ordered, has so far been overlooked. It is of course this property that allows us to represent real numbers by points on a line and thereby to construct graphs and other valuable geometrical representations. Ordering is achieved by utilizing the concept 'greater than' which when used in the form 'a greater than b\ is denoted by a > b. Hence to the other real number axioms must be added: Order properties OT If a e R then exactly one of the following is true; either a > or a = or —a > 0. 0-2 a, b e R, a > 0, b > => a + b > 0, and ab > 0. We now define a > b and a < b, the latter being read 'a less than b\ by a> b => a — b > and a < b => b — a > 0. The following results are obvious consequences of the real number system and are called inequalities. In places they also involve the symbol > which is to be read 'greater than or equal to'. Elementary inequalities in R IT a > b and c>d=>a + c>b + d. 1-2 a > b > and c >d> => ac> bd. 1-3 k > and a> b => ka> kb. 1-4 a > b => — a < —b. 1-5 a < 0, b > => ab < 0; a < 0, b < => ab > 0. 1-6 a > => a- 1 > 0; a < 0j=> a 1 < 0. 1-7 a > b > => b' 1 > a- 1 > 0; a < b < => b' 1 < a' 1 < 0. An important use of inequalities is in defining intervals on a line and regions in a plane. Using the order property of numbers to associate numbers with points on a line, an interval on a line may be considered to be a segment of the line between two given points or numbers, a and b, say. Three cases arise according as to whether (a) both end points are included in the interval, 26 / INTRODUCTION TO SETS AND NUMBERS CH 1 a b a < x < b a < x < b a < x <b (a) (b) (c) Fig. 1-8 Intervals on a line: (a) closed interval a < x < b; (b) open interval a < x < b; (c) semi-open interval a < x < b. (b) both end points are excluded from the interval, or (c) one is included and one is excluded. These are called, respectively, (a) a closed interval, (b) an open interval, (c) a semi-open interval. Namely, an interval is closed at an end which contains the end point, otherwise it is open at that end. In terms of the points a and b and the variable x representing an arbitrary point on the line these are written : (a) a < x < b ; closed interval ; (b) a < x < b ; open interval ; (c) a < x < b or a <i x < b; semi-open interval. Thus 1 < x < 2 defines the semi-open interval containing the point x = 1 and the points up to, but not including, x — 2. These are represented in Fig. 1-8 in which a solid line represents points in the interval, a circle represents an excluded point, and a dot an included point. Special cases occur when one or both of the end points of the interval are at infinity. The intervals — oo < x < a and b < x < oo are called semi- infinite intervals and — co < x < co is an unbounded interval or, more simply, the complete real line. We illustrate the corresponding definition of a region in the (x, j)-plane by considering the three inequalities x 2 + y 2 < a 2 , y < x, x > 0. The first defines the interior of a circle of radius a centred on the origin, the second defines points below, but not on, the straight line y = x, and the third defines points in the right half of the (x, j;)-plane including the points on the j-axis (a) (b) Fig. 1-9 Regions in plane: (a) region boundaries x 2 + y 1 = a 2 , y = x, and x = 0; (b) region x 2 + y 2 < a 1 , y < x, x > 0. SEC 1-3 INTEGERS, RATIONALS, AND ARITHMETIC LAWS / 27 itself. These curves represent boundaries of the regions in question and the boundary points are only to be included in the region when possible equality is indicated by use of the signs > or <. The three regions are indicated in Fig. 1-9 (a) in which a full line indicates that points on it are to be included, a dotted line indicates that points on it are to be excluded, and shading indicates the side of the line on which the region in question must lie. Fig. l-9(b) indicates the region in which all the inequalities are satisfied. Simple inequalities of the form (x. + l)(x + 3) > (x — l)(x — 2) also define intervals. For, clearing the brackets, we 'see that x 2 + 4x + 3 > x 2 — 3x + 2 which, by simple application of the elementary inequalities just listed, reduces to x > —1/7 defining a semi-infinite interval, open at the end x = —1/7. The elementary inequalities may often be used to advantage to simplify complicated algebraic expressions by yielding helpful qualitative information as the following example indicates. Example 1-5 Prove that if ai, a 2 , . . ., a n and b\, b 2 , . . ., b n are positive real numbers, then mm 1 < r < n (a r \ ^ ai + a 2 + • ■ • + a n la r \ — I < t : — < max — • \br! 01 + 02 + • • • + b n l < r <n \b r J Here the left-hand side of the inequality is to be interpreted as meaning the minimum value of the expression (a r /br), with r assuming any of the integral values between 1 and n and the right-hand side is to be similarly interpreted reading maximum in place of minimum. The result follows by noticing that a\ + az + • • • + an _ 1 bx + 02 + ■ ■ • + b n ~^T7 2, t>r ® + -® + --- + md: where ^,b r = bi + b 2 + ■ • • + b n . For if each of the expressions (ai/oi), r=l (a 2 jb 2 ), . . ., {dnjbn) is replaced by the smallest of these ratios, which could be the value taken by all the expressions if ai = a 2 = ■ • • = a n > and 0i = 02 = • • • = b n > 0, then ai + a-i+ ■ ■ ■ + a n 0i + 02 + • • • + » . (a r \ ~- mm 77 n < r < n \0r/ = min (£), < r < n \Or/ "(01 + 02 + • • • + b n )' I b r r=l which is the left half of the inequality. The right half follows by identical reasoning if maximum is written in place of minimum. 28 / INTRODUCTION TO SETS AND NUMBERS CH 1 1 -4 Absolute value of a real number definition 1-5 The absolute value \a\ of the real number a provides a measure of its size without regard to sign, and is defined as follows: . . [a when a > \o\ = { [—a when a < 0. Thus if a = 3, then \a\ = 3 and if a = — 56 then \a\ = 5-6. There are three immediate consequences of this definition which we now enumerate as theorem 1-4 If a, A e R then (a) \ab\ = \a\ \b\, (b) \a + b\<\a\ + \b\, (c) \a - b\ > \\a\ — \b\\. The proof is simply a matter of enumerating the possible combinations of positive and negative a and b, and then making a direct application of the definition of the absolute value. We shall only illustrate the proof of (a). There are three cases to be considered ; firstly a > 0, b > 0, secondly a > 0, b < 0, and thirdly a < 0, b > 0. If a > 0, b > then ab > and so \ab\ = ab = \a\ \b\. The second and third situations are essentially similar so we shall discuss only the second. As a > 0, b < we have ab < 0, whence \ab\ — —ab = a(—b) = \a\ \b\, establishing (a). For reasons we give later, result (b) is usually called the triangle inequality. The absolute value may also be used to define intervals since an expression of the form \a — x\ > 2 implies two inequalities according as a — x is positive or negative. If a — x > then \a — x\ = a — x and we have a — x>2or x<a — 2. However if a — x < 0, then by the definition of the absolute value of a — x we must have \a — x\ = —(a — x) showing that — (a — x) > 2, or, x > a + 2. Taken together the results require that x may be equal to or greater than 2 + a or equal to or less than a — 2. x may not lie in the intervening interval of length 4 between x = a — 2 and x = a + 2. This is illustrated in Fig. 1-10 (a) where a solid line is again used to indicate points in the interval satisfied by \a — x\ > 2 and the dots are to be included in the appropriate intervals. By exactly similar reasoning we see that if we consider the inequality 1 < |* + 1| < 2, then if x + 1 > 0, |.v + 1| = x + 1 and the inequality becomes 1 < x + 1 < 2. Hence the interval is < jc< 1. However, if x + 1 < 0, then \x + 1| = — x — 1 and so the inequality becomes 1 < — jc — 1 < 2 giving rise to the interval — 3 < x < —2. These intervals are shown in Fig. 1-10 (b) with circles indicating points excluded from the end of the solid line intervals and dots indicating points to be included. SEC 1-5 REPRESENTATION OF NUMBERS / 29 a+2 0-2 -3 -2 \a-x\>2 l<|;t+l|<2 (a) (b) Fig. 110 Intervals on a line: (a) \a — x] > 2; (b) I < \x + l| < 2. 1-5 Representation of numbers The decimal representation of real numbers is usual in all ordinary arithmetic work and involves expressing a real number as the sum of an integral part and a decimal fraction. Each of the parts is represented as the sum of multiples of powers of 10, with the powers being positive integers or zero when repre- senting the integral part and negative integers when representing the decimal part. The number 10 that forms the basis of the decimal system is called the base of the number system. The integral part r of a finite real number a is thus expressible as r = fl„(10«) + tfn-iOO"- 1 ) + • • • + aiOO 1 ) + a (10°), where n is suitably chosen, and the coefficients a% are either zero or an integer between 1 and 9. Hence, in reality, the number 2049 is a convenient representa- tion of 2(10 3 ) + 0(10 2 ) + 4(10!) + 9(10°), with the positions of the digits indicating the positive powers of 10 by which they are to be multiplied before addition. Similarly, if the decimal fraction part d of a real number a terminates after n decimal places, then it is expressible in the form b\ b% b n -„ + ■ ■ ' ' + 77T 10 10 2 10" with the coefficients b] again being either zero or an integer between 1 and 9. Hence the decimal number 0-3012 is, in reality, the representation of 3 JL _L — 10 + "HP + UP + 10 4 ' with the positions of the digits indicating the negative powers of 10 by which they are to be multiplied before addition. In general then, the decimal number that is written a m am-i ■ ■ ■ fli«o b\b2 . . ■ b n (m + 1) digits n digits and which terminates after n decimal places, is the representation of M10 m ) + flm-iOO™- 1 ) + • • • + aiClO 1 ) 30 / INTRODUCTION TO SETS AND NUMBERS CH 1 Consideration of the representation of non-terminating decimal fractions and irrational numbers will be postponed until we discuss sequences and limits, since the approximation of real numbers by rationals has not yet been discussed. There is no reason why the base of the number system should not be any integer N > 1 and, indeed, in digital computing extensive use is made of the binary system. This is the system of representation using the base 2. Hence a binary number will contain only the digits 1 and with their position indi- cating the power of 2 involved. Thus we may write 11 = 1(23) + 0(22) + i( 2 i) + i(20) so that the binary representation of 1 1 is 101 1. Similarly, the rational number 9/16 may be written 9 _ 1 1 Yi>~2 + 2 2 + 2 z + 2 i ' showing that its binary fraction form is 0-1001. Hence the binary form of the number live becomes 101 11001 and, as in the case of decimals, the position of a digit relative to the binary point indicates the power of two by which it is to be multiplied before addition. It is easily verified that the addition and multiplication tables for binary numbers are as illustrated in the following two tables: Binary Binary addition multiplication + 1 1 1 1 X 1 1 1 Both tables are entered by selecting one digit in the first column and one in the first row, when the result of the operation appropriate to the table, namely addition or multiplication, is shown in the body of the table. For example, using the addition table and taking the digits 1 in the first column and in the first row we see that 1+0=1. Similarly, taking the digits 1 in the first column and 1 in the first row we see that 1 + 1=0. The inter- pretation of this latter result is, of course, that a digit 1 must be transferred to the next higher power of 2, corresponding to the transference of multiples of powers of 10 when performing ordinary addition. The multiplication table is straightforward and needs no further comment. The examples that now follow illustrate the addition, multiplication, and SEC 1-6 MATHEMATICAL INDUCTION / 31 subtraction of simple binary numbers. We shall let a = 12, b = 11 and form a + b, ab, and a — b using binary notation. The binary representations of a and b are a = 1 100, b = 1011 and so we have: Addition 110 10 11 + Multiplication 1 1 x 10 11 li 1 1 1 1 10 110 110 10 10 Here the subscript 1 has been used to indicate the transference of a digit 1 corresponding to the result 1 + 1=0. The subtraction a — b is equally straightforward provided it is recalled that when the subtraction of digits — 1 is encountered, it is necessary to 'borrow' a digit 1 from the next higher position in the number b. Thus the result would be to write 1 in place of — 1 and to add 1 to the next higher position in b. Subtraction 110 0- 10 11 1 The expressions a + b, ab, and a — b for a = 12, b = 11 are thus a + b = 1(24) + o(23) + 1(2 2) + i( 2 i) + 1(2 o) = 23, ab = 1(26) + o(25) + o(24) + o( 2 3) + 1(2 2) + (2i) + 0(2°) = 132, a - b = 0(23) + 0(22) + ( 2 i) + 1(20) = L 1 -6 Mathematical induction Mathematical propositions often involve some fixed integer n, say, in a special role and it is desirable to infer the form taken by the proposition for arbitrary integral n from the form taken by it for the specific value n = m. The logical method by which the proof of the general proposition, if true, may be estab- lished, is based on the properties of natural numbers and is called mathematical induction. In brief, it depends for its success on the obvious fact that if A is some set 32 / INTRODUCTION TO SETS AND NUMBERS CH 1 of natural numbers and 1 e A, then the statement that whenever integer ne A, so also does its successor, implies that A = N, the set of natural numbers. The formal statement of the process of mathematical induction is expressed by the following theorem where, for simplicity, the mathematical proposition corresponding to integer n is denoted by S(n). theorem 1-5 (mathematical induction) If it can be shown that, (a) when n = m, the proposition S(m) is true, and (b) if for n > n x , when S(n) is true then so also is S(n + 1), then the proposition S(n) is true for all natural numbers n > n x . A simple illustrative example will help here and we now prove inductively n that the sum 2 r of the first n natural numbers is given by n(l + n)/2. In r = l other words, in this example the proposition denoted by S(n) is that the following result is true : 1 + 2 + • ■ • + n = «(1 + «)/2. Proof, step (a) First the proposition must be shown to be true for some specific value n = m. Any integral value m will suffice but if we set m = 1 the proposition corresponding to S(l) is immediately obvious. If, instead, we had chosen m = 3, then it is easily verified that proposition S(3) is true, namely that 1 + 2 + 3 = 3(1 + 3)/2. Proof, step (b) We must now assume that proposition S(n) is true and attempt to show that this implies that the proposition S(n + 1) is true. If S(n) is true then 1+ 2 +•••+« = »(1 + «)/2 and, adding (n + 1) to both sides, we obtain 1 + 2 + • • • + n + (n + 1) = «(1 + ii)/2 + (n + 1) = (n + 1X2. + «)/2. However, this is simply a statement of proposition S(n + 1) obtained by replacing n by n + 1 in proposition S(n). Hence S(l) is true and S(n) => S(n + 1) so, by the conditions of Theorem 1-5, we have established that S(n) is valid for all n. Later we shall use this form of proof in cases less trivial than the above example which simply involved establishing the sum of an arithmetic progres- sion. SEC 1-6 MATHEMATICAL INDUCTION / 33 As another illustration of an inductive argument we now consider the determination of the nth term in the sequence of numbers «o, "1, u->, ■ . ., defined sequentially by the equation m„ = 2m„-i+1. (1-17) Equations of this form which define a sequence of discrete numbers u n are called first-order difference equations. Tt is clear that this difference equation provides us with the algebraic rule by which the wth term of the sequence may be computed once the first term «o has been specified. Generally speaking, any rule which specifies the form of computation to be pursued in order to arrive at the solution of a given problem is called an algorithm. A few moments' experiment will suffice to convince the reader that the solution to Eqn (1-17) may be expressed in terms of wo by the equation i/„ = 2» Mo + (2»- 1). (1-18) The initial term «o of the sequence is arbitrary and on account of this fact such a solution is called a general solution of the first-order difference equation (IT 7). Once uo is specified by requiring that uq = C, say, then the solution is said to be a particular solution. The proof of Eqn (1T8) by induction again proceeds in two parts, with the proposition S(n) being that Eqn (IT 8) is the solution of Eqn (IT 7). Proof, step (a) If m = 1, then wi = 2« + (2 — 1) = 2t/o + 1, showing that the proposition S(l) is true. Proof, step (b) Assuming the proposition S(n) is true, then 2m» + 1 = 2[2» Mo + (2» - 1)] + 1 = 2» +1 w + (2' H1 - 1) = Un+U showing that S(n) => S(n + 1). The result is thus true for all n. To conclude this section, having introduced the notion of a difference equation let us take the concept a little further so that it can be used in more general circumstances. A homogeneous linear difference equation of order 2 is a relationship of the form II n + aUn-l + bUn-2 = 0, (H9) where a and b are real constants and « B -2, Un-u u n are three consecutive members of a sequence of numbers. Given any two consecutive members in the sequence, say m and «i, then Eqn (1T9) provides an algorithm by which any other member of the sequence may be computed. Tf we seek a solution u n of the form u n = AV", (1-20) 34 / INTRODUCTION TO SETS AND NUMBERS CH 1 where A and X are real constants, then substitution into Eqn (1-19) shows that X 2 + aX + b = 0. (1-21) This is called the characteristic equation associated with the difference equation (1-19) and shows that solutions of the form of Eqn (120) are only possible when X is equal to one of the two roots X\ and Xz of Eqn (1-21), which we assume to be real numbers. If X\ -j= 1%, then AX\ n and Bfa" are both solutions of Eqn (1-19) and it is easy to show that u n = AXi n + BX 2 n (1-22) is also a solution, where A and B are arbitrary real constants. This result is the general solution of Eqn (1T9). Given specific values for u and «i, A and B can be deduced by substituting into Eqn (1 -22) and hence a particular solution found. Suppose, for example, that the difference equation was U n — U n -1 — Un-2 = 0, and that «o = "i = 1. Then the characteristic equation is X2 - X - 1 = 0, with the two roots Xi == (1 + V 5 )/ 2 and ^ = (1 — V5)/2. Hence the general solution has the form u „ A ^y + B (L=^iy. ,,.23) To deduce the values of A and B particular to our problem we use the initial conditions uo = 1 and wi = 1 to deduce from Eqn (1-23) that 1 = A + B (case n = 0, «o = 1) 1= ^lJ_V^ +5 (i_^ (case « =1,^=1)- Solving these equations for A and B we find V5 + 1 V5 - 1 2V5 2^5 whence the particular solution is The first few numbers wo, "i, W2, . . ., of the sequence generated by this algorithm are 1, 1,2,3,5,8, 13,21,34,55,. . ., PROBLEMS / 35 and comprise the well-known Fibonacci sequence of numbers. This sequence of numbers occurs naturally in the study of regular solids and in numerous other parts of mathematics. Naturally if only the first few members of the sequence are required then they are most easily found by use of the algorithm itself, which in the form Un = Un-1 + Un-2 states that each member of the sequence is the sum of its two predecessors. It is not difficult to see that if the roots of the characteristic equation ( 1 -2 1 ) are equal so that X x = X 2 = n, say, then Aju n is a solution of Eqn (1-19). In terms ofEqn(lT9) this is equivalent to saying that a 2 = 4b and fi = —a/2. However A/u n cannot be the general solution since it only involves one arbitrary constant A, and it is necessary to have two such constants in the general solution to allow the specification of the initial conditions u and m. The difficulty is easily resolved once we notice that nB/u n , with B an arbitrary real constant, is also a solution of Eqn (1T9). This is easily verified by direct substitution. For then we have for the general solution in the case of equal roots in the characteristic equation, u n = (A + nB)p». (1-24) To illustrate this situation, suppose that we are required to solve the difference equation U n = 6h„-i — 9u n - 2 subject to the initial conditions wo = 1, «i = 2. Then the characteristic equa- tion becomes A 2 - 6A + 9 = 0, with the double root I — 3 . From Eqn ( 1 -24) the general solution must thus be u n = (A + nB) . 3 n . Using the initial conditions u = 1, wi = 3, then, shows that 1 = A and 2 = 3(A + B), so that the particular solution to the problem in question is u n = (1 - in)3 n . PROBLEMS Section 11 1-1 Enumerate the elements in the following sets in which I signifies the set of natural positive and negative integers including zero: (a) S= {n | we I, 5 < « 2 < 47}; (b) S = {n 3 | n e N, 15 < rfi < 40! ; 36 / INTRODUCTION TO SETS AND NUMBERS CH 1 (c) S = {(m, n) | m, n e I, 12 < m 2 + n 2 < 18); (d) S = {(///, n, m + n) \ m, n e N, 45 < m 2 - + « 2 , 3 < w + n < 9}; (e)S = {x|xEN, x 2 + Olx - 11 = 0}. 1-2 Express the following sets in the notation of the previous question: (a) the set of positive integers whose cubes lie between 7 and 126; (b) the set of integers which are the squares of the integers lying between M and N(0< N < M); (c) the points in the plane that lie between circles of radii 1 and 3 drawn about the origin and which have x-coordinates greater than 0-5. 1-3 Give an example of (a) a finite set having numerical elements, (b) a finite set having non-numerical elements, and in each case give an example of a proper subset. 1-4 Give an example of (a) a set of ordered triples involving numerical quantities, (b) a set of ordered triples involving non-numerical quantities, and in each case give an example of a proper subset. 1-5 State the relationships between the sets A and B if: (a) A = N, B = {2«|/zeN}; (b) A = {sin x | x = (1 + 12/7)^77, «eN}, B= {£}; (c) A = {1,2,3,4], B= {5,7,9, 11}. 1-6 Form the union, intersection, and the complement of B relative to A of the sets A and B if: (a) A = N, B= {2n\ weN}; (b) A = {a, b, c, 0, 2, 4}, B = {d, e, f, 1, 3, 6, 7}; (c) A = {1, v'2, 2, 3, v "5, 6}, B = {0, y 2, v '5}. 1-7 Construct Venn diagrams for the union and intersection of the sets A and B if: (a) A is the set of points interior to the unit square (that is, square having side of unit length) with one corner at the origin and lying entirely in the first quadrant, and B is the set of points exterior to the unit circle centred on the origin; (b) A is the set of points interior to the isosceles triangle of unit side with its centre of gravity at the origin and a side parallel to the x-axis, and B is the unit square having its centre at the origin and a side parallel to the x-axis. 1-8 Represent by points on a graph the 36 possible outcomes of throwing two dice, each with faces numbered 1 to 6. Identify the set of points at which the sum of the scores on the two dice is greater than or equal to 7. 1-9 By using Venn diagrams, prove Eqns (1-6) and (1-7) for sets which may be represented by points in the plane. 110 Complete the details of the analytical proof of the first stated result of Theorem 11. 111 Illustrate by means of a Venn diagram the result.^\(B r\ C) = (A\B) u (A\C) of Theorem 11. 1-12 The expression (A\B) u (B\A) is called the symmetric difference of sets A and B. Illustrate the result by means of a Venn diagram and show that PROBLEMS / 37 (A\B) u (B\A) = {A u B)\(A n B). 113 Prove analytically that An B = A\(A\B) and illustrate your result by means of a Venn diagram. 1-14 In the following expressions, replace the symbol * by <=, by => or by <?> to make them valid logical statements concerning the sets A, B and an element x: (a) xe A * xe A^J B; (b) x e B * xeAvB; (c) xe A * xe A n B; (d) xe A or xe B or x e A r\ B * xe Au B; (e) xe A or xe B, x$ A n B * xe(Av B)\(A n B). Give one example each of the use of => and o. 1-15 If * is a set operation and it is true that (A * B) * C = A * (B * C), then the operation * is said to be associative. Use a Venn diagram to prove that (a) (/(uB)uC=^u(5uQoiuBuC; (b) (A n B) n C = A n(B n Q & A n B n C. Section 1-2 1-16 Toss a coin 50 times and plot the relative frequency of 'heads'. 1-17 Suggest a graphical representation for the sample space in which the outcome of tossing three coins might be recorded. 1-18 Suggest a graphical representation for the sample space characterizing the score recorded in a trial involving the tossing of a die together with a coin which has faces numbered 1 and 2. Give examples of: (a) two disjoint subsets of the sample space; (b) two intersecting subsets of the sample space, indicating the points in their intersection. 1-19 By using Eqn (1-9) explain why P{A nB) = P(B) P(A | B) = P(A) P(B | A). Verify your result by computing P(A), P(B), P(A | B), P{B \ A), and P(A n B) using the sets defined in connection with Fig. 1-6 (b). 1-20 Use a Venn diagram to prove the generalized probability addition rule P(A u B u C) = P(A) + P(B) + P(C) - P(A n B) -P(A nQ- P(B nC) + P(AnBn C). 1-21 Use Theorem 1-2 to prove the generalized probability multiplication rule P(A n B n C) = P(A) P(B \A)P{C\Ar\ B). 1-22 Complete the argument in Example 1-2 (a). 1-23 A bag contains 30 balls of which 5 are red and the remainder are black. A trial comprises drawing a ball from the bag at random, recording the result and then replacing the ball and shaking the bag. This process is called sampling with replacement. If this process is repeated twice, what is the probability of selecting (a) 2 red balls; (b) 2 black balls; (c) 1 red and 1 black ball? 38 / INTRODUCTION TO SETS AND NUMBERS CH 1 1-24 By considering arrangements of the five letters A, B, C, D, E verify that b Pi = 20 and Q) = 10. 1-25 How many blends of coffee comprising equal quantities of 4 different types of coffee bean are possible if 9 different types of coffee bean are available. 1-26 A game involves a team of 5 persons who play sequentially. How many different teams may be drawn up if 10 players are available. 1-27 On the assumption that a participant in a raffle will buy either 2 or 4 numbered tickets, how many different sets of tickets may he choose from a book of 20 tickets. 1-28 A coin is biased so that the probability of 'heads' is 0-52. What is the prob- ability that: (a) 3 heads will occur in 6 throws; (b) 3 or more heads will occur in 6 throws? 1-29 Shells fired from a gun have a probability \ of hitting the target. What is the probability of missing the target if 4 shells are fired? 1-30 Draw the probability density function for the binomial distribution in which p = J and n = 6. Use your result to draw the corresponding cumulative distribution function. 1-31 By considering Fig. 1-6 (a) deduce and draw the probability density function describing the sum of the scores on the two dice. Section 1-3 1-32 Describe two different ways of defining N rational numbers between 1 and 2. Generalize one of these methods to interpolate N rationals between any two rationals a and b. 1-33 Working from the array of rational numbers given in Section 1-3, use arrows to suggest two alternative schemes to the one already described by which all the rational numbers may be enumerated. Is this array the only possible one that may be used ? If not, give an alternative. 1-34 Use the fact that y'2 is irrational to prove that if a is a rational number, then a + v'2, ay 2 and V2/a are also irrational. Would the results still be true if \ 2 were replaced by any other irrational number, and would your proof still suffice? 1-35 Prove that \/3 is irrational. (Hint: first assume that \/3 is rational and equal to p/q, and then obtain a contradiction by considering even and odd values of q separately.) 1-36 The operation of division is defined in terms of multiplication as indicated in the following problem. The reader is required to provide the justification for some familiar arithmetic operations using only the operation of multiplication and the definition provided. Given that a and b are real numbers and that b /- 0, we define ajb by k = ajb if, and only if, kb = a. Does this define alb uniquely? Why is it necessary that 6^0? Show that ajb = cajcb whenever c / and that ajb + c/d = (ad + bc)/bd; (a/b^c/d) = ac/bd; l/Ca/b) = bja (a =£ 0). PROBLEMS / 39 1-37 Prove the following statements concerning real numbers by directly applying the real number axioms: (a) There is just one zero element and one unit element; (b) a + £ = a + // => £ = 1/ ; (c) O.a = a.O = 0; (d) ai = a>i and a + => £ = >/; (e) (-a)A = a(-Z>) = -(a/>); (-a)(-ft) = ab; (f) a ft = => a = or ft = 0; (g) 0(6 — r) = ab — ac. 1-38 The expression {a r }" 1 denotes the sequence of numbers ai, 02, . . ., a„. Given that {a r )l i = 0-2, 3, 1-8, 2-2, 1, 3, 2 and {MJ-i = 0-3, 2, 1-8, 1-1, 2, 4, 1 verify the inequality of Example 1-5. 1-39 Prove that if a > b > and k > then b b + k , a + k a - < < 1 < < — a a + k ft + k b 1-40 Indicate by means of a diagram the intervals defined by the following expres- sions, using a dot to signify an end point belonging to an interval and a circle to indicate an end point excluded from the interval : (a) (x + 2)(x +3)<(x- \){x - 2); (b) < I x - 3 I < 1 ; (c) |*| < 2; (d) < I 2x + 1 I < 1 ; (e) I 3jc + 1 I >2; (f) £±J> < x 2^+2 2(x - 1) 1-41 Identify the regions in the (x, vO-p'ane determined by the following inequalities. Mark a boundary that belongs to the region by a full line; a boundary that does not by a dotted line; an end point that is included in an interval by a dot; an end point excluded from an interval by a circle: (a) x 2 + y 2 < 1 ; x < 0; y < —x; (b) y < sin x; x 2 + v 2 > "- 2 ; y < I ; (c) ix 2 + v 2 > l; \y\<i; (d) y >x 2 ; \x - 1 | < 1; y <A. 1-42 Give numerical examples to illustrate Theorem 1-4. 1-43 Prove Theorem 14(b) by considering separately the cases a 0, b 0; a < 0, b < 0; a > 0, ft < 0; a < 0, b > 0. 1-44 Express these numbers in binary notation: (a) 27; (b) lyV; (c) 2-Jf ; (d) i», 1-45 Express the following numbers in binary notation, and then use your results to form the expressions a + ft, a — ft, and ab. Check by interpreting the results in terms of the base 10: (a) a = 12, ft = 11 ; (Jo) a = 3iV, * = J; (O a = j», ft = ,'„. 1-46 Give numerical examples to illustrate Theorem 1-4 using binary notation. 1-47 Using the number system to base 3 and the digits 0, 1, 2 represent these numbers: 40 / INTRODUCTION TO SETS AND NUMBERS CH 1 (a) 27; (b) 2|; (c) 2J,; (d) 11 1-48 Using the number system to base 3 and the digits 0, 1, 2 write out the addition and multiplication table for three digits analogous to those of Section 1-5. 1-49 Express the following numbers in terms of the base 3 and use the tables of the previous problem to evaluate a + b, a — b, and ab. Check by re-interpreting your results in terms of the base 10. (a) a = 4, b = 2J; (b) a = 3, b = £; (c) a = •£, b = - 2 V- 1-50 Give an inductive proof that (a) ]£ (a + «0 = - [2a + (n r = L Dd]; (b) 2 r* = r = l 1-51 The expansion «(/! + l)(2n + 1) (a + A)» = a n + na n ~ x b + ^-— — - a"~ 2 b 2 + (Arithmetic Progression) (Sum of Squares) • • + nab"' 1 + b", and the equivalent result ( a + b)» = J I" j a r b»-<; are called the binomial expansion. Prove the result inductively for the case when n is a natural number. 1-52 Give an inductive proof of the results n-l j _ r n (a) 2 r» = (b) 2 * 3 = 1 '«(«+!)" (Geometric Progression) (Sum of Cubes) 1-53 Find the general solution to the difference equation Un + Un-i — 6u n -2 = 0. Determine the particular solution corresponding to wi = 1, ui = 1. 1-54 Find the general solution to the difference equation U n — 3w n -l + 2u n -2 = 0. Determine the particular solution corresponding to wi = 3, wi = 7. 1-55 Find the solution to the difference equation Un — 2u n -l + Un-2 = given that ui = 2, t/2 = 3. 1-56 Find the general solution to the difference equation Un — 6u n ~l + 9«n-2 = 0. Determine the particular solution corresponding to Hi = 1, «2 = —3. Variables, functions, and mappings 2-1 Variables and functions In the physical world the idea of one quantity depending on another is very familiar, a typical example being provided by the observed fact that the pressure of a fixed volume of gas depends on its temperature. This situation is reflected in mathematics by the notion of & function, which we shall now discuss in some detail. The modern definition of a function in the context of real numbers is that it is a relationship, usually a formula, by which a correspondence is established between two sets A and B of real numbers in such a manner that to each number in set A there corresponds only one number in set B. The set A of numbers is the domain of the function and the set B of numbers is the range of the function. If the function or rule by which the correspondence between numbers in sets A and B is established is denoted by/, and x denotes a typical number in the domain A of/, then the number in the range B to be associated with x by the function/is written /(x) and is read '/of x'. The numbers x and/(x) are variables with x being given the specific name independent variable and f(x) the name dependent variable. The independent variable is also often called the argument of the function/. It is often helpful to construct the graph of/ which mathematically is the set of ordered number pairs (x,f(x)), where x belongs to the domain of/ Geometrically the graph of/ is usually represented by a plane curve, drawn relative to an origin defined by the intersection of two perpendicular straight lines called axes. The process of construction is as follows. A distance propor- tional to x is measured along one axis and a distance proportional tof(x) along the other axis. Through each resulting point on an axis is then drawn a line parallel to the other axis and these two perpendicular lines intersect at a unique point in the plane of the axes. This point of intersection is the point (x,f(x)) and the graph of/ is defined to be the locus or curve formed by joining up all such points corresponding to the domain of/ as illustrated by Fig. 2-1. However, it is not necessary to use axes of this type, called rectangular Cartesian axes, and any other geometrical representation which gives unique representation of the points (x,f(x)) would serve equally well. Thus the axes could be inclined at an angle a ^ \n and the scale of measurement along them need not be uniform. For example, it is often useful to plot the logarithm 42 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 of x along the x-axis, rather than x itself. This compresses the x scale so that large values of x may be conveniently displayed on the graph together with small values. Another possible representation involves the use of curved reference axes and leads to curvilinear coordinates. This will be taken up again later in connection with conformal mapping. Not every function can be represented in the form of an unbroken curve, and the function ( when x is rational, J *-' Y ' — \ 1 when x is irrational, provides an extreme example of this situation. Here, although the graph would look like a line parallel to the x-axis on which all points have the value unity, in reality the infinity of points with rational x-coordinates would be missing since they lie on the x-axis itself. The domain is all the real numbers R and the range is just the two numbers zero and unity. Because / transforms one set of real numbers into another set of real numbers a function is sometimes spoken of as a transformation between sets of real numbers. On account of the restriction to real numbers or, more explicitly, to real variables, the function f(x) is called a. function of one real variable. Another name that is often used for a function is a mapping of some set of real numbers into some other set of real numbers. This name is of course suggested by the geometrical illustration of the graph of a function and we shall return more than once to the notion of a mapping. In this terminology, f(x) is referred to as the image of x under the mapping/. Since the domain and range of/ occur as intervals on the x- and j-axes, it is convenient to use a simplified notation to identify the form of the interval that is involved. We now adopt the almost standard notation summarized below in which a round bracket indicates an open end of an interval, and a square bracket indicates a closed end of an interval : (a, b) o a < x < b, [a, b] o a < x < b, (a, b] o a < x < b, [a,b)o a< x < b, (—oo, a] o x < a, [a, oo) o a < x, (— oo, oo) o all x e R. As the definition of open and closed intervals is only a matter of considering the behaviour of the end points, we shall define the length of all the intervals (a, b), [a,b), (a, b], and [a, b] to be the number b — a. This is consistent with SEC 2-1 VARIABLES AND FUNCTIONS / 43 the obvious result that the length of an 'interval' comprising only one point is zero. Domain of/ Fig. 21 Domain, range, and graph o£f(x). It may happen that when x lies within some interval, as for example the interval (b, c] in Fig. 2-1, each point x is associated with a unique image point f(x) and, conversely, each image point/(x) is associated with a unique point x. Such a mapping or function /is then said to be one-one in the domain in question. However, there is another possibility that can arise and that is that in some interval of the x-axis, more than one point x may correspond to the same image point f(x). This is again well illustrated by Fig. 2-1 if now we consider the interval [a, b] and the points X2 and xz, both of which have the same image point since /(x 2 ) = f(xs). In situations such as these the mapping or function /if said to be many-one in the domain in question. A specific example might help here and we choose for / the function f(x) = x 2 and the two different domains [0, 3] and [—1, 3]. A glance at Fig. 2-2 shows that /maps the domain [0, 3] onto the range [0, 9] one-one, but that it maps the domain [—1, 3] onto the same range [0, 9] many-one. Expressed another way, the range [0, 1] shown as a solid line in the figure is mapped twice by points in the domain [—1, 3]; once by points in the sub- domain — 1 < x < and once by points in the sub-domain < x < 1 . Again considering the domain [—1, 3], the function/(x) = x 2 maps the sub- domain 1 < x < 3 onto the range (1, 9] one-one. In many older books the term function is used ambiguously in that it is sometimes applied to relationships which do not comply with our definition 44 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 Fig. 2-2 Example of many-one mapping in shaded range and a one-one mapping in the hatched range. of a function. The most familiar example of this is the 'function' y = a/x, which fails to comply with our definition because to every positive x there correspond two values for y, namely the positive and negative square roots of x which are equal in magnitude but opposite in sign. A mapping of this kind is one-many in the sense that to one value of x there correspond more than one image point /(x), and although it is permissible to describe this relation- ship as a mapping, it is incorrect to term it a function. Nevertheless, the square root operation is fundamental to mathematics and we must find some way to make it and similar ones legitimate. The difficulty is easily resolved if we consider how the square root is used in applications. In point of fact two different relationships are always con- sidered which together are equivalent to y = y'x. These are yi = + \/x and y 2 = — ^x, where the square root is always to be understood to denote the positive square root and the sign identifies the relationship being considered. Each of the mappings yi(x) and yi(x) of the domain (0, oo) are one-one as Fig. 2-3 shows, so that they may each be correctly termed a function, the particular one to be used in any application being determined by other con- siderations, such as that the result must be positive or negative. These ideas will arise again later in connection with inverse functions. SEC 2-1 VARIABLES AND FUNCTIONS / 45 Fig. 2-3 The square root function. In general, if the domain of function/is not specified then it is understood to be the largest interval on the x-axis for which the function is defined. So if fix) = x 2 + 4, then as this is defined for all x, the largest possible domain must be (— oo, co). Alternatively the function /(x) = +\/(4 — x 2 ) is only defined in terms of real numbers when — 2 < x < 2 showing that the largest possible domain is [—2,2]. Similarly, the function f(x) = 1/(1 — x) is defined for all x with the sole exception of x = 1 so that the largest possible domain is the entire x-axis with the single point x = 1 deleted from it. A function need not necessarily be defined for all real numbers on some interval and, as in probability theory, it is quite possible for the dependent and independent variables to assume only discrete values. Thus the rule which assigns to any positive integer n the number of positive integers whose squares are less than n, defines a perfectly good function. Denoting this function by / we have for its first few values /(l) = 0, /(2) = 1, /(3) = 1, /(4) = 1, /(5) = 2, /(6) = 2, /(7) = 2, /(8) = 2, /(9) = 2, /(10) = 3, . . .. Clearly, both its domain and its range are the set N of natural numbers and the mapping is obviously many-one. Before examining some special functions let us formulate our definition of a function in rather more general terms. This will be useful later since although in the above context the relationships discussed have always been between numbers, in future we shall establish relationships between quantities that are not simply real numbers. When we do so, it will be valuable if we can still utilize the notion of a function. This will occur, for example, when we establish correspondence between quantities called vectors which although obeying algebraic laws are not themselves real numbers. The idea of a relationship between arbitrary quantities is one which we have already started to examine in the previous chapter in connection with 46 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 sets. As might be expected, set theory provides the natural language for the formulation and expression of general ideas associated with functions, and indeed we have already used the word 'set' quite naturally when thinking of a set of numbers. A more general definition follows. definition 2-1 A function f is a correspondence, often a formula, by which each element of set A which is called the domain of/, is associated with only one element of set B called the range of/. To close this section we now provide a few examples illustrating some of the ideas just mentioned. Example 21 The function y =f(x) defined by the rule 1 /(*) (x - l)(x - 2) is defined for all real x with the exception of the two points x = 1 and x = 2. The domain of/is thus the set of real numbers R with the two numbers 1 and 2 deleted. In set notation the domain is {R\{1, 2}}. The two lines x = 1 and Fig. 2-4 Graph of y = \/(x — l)(x — 2) showing the asymptotes x = 1 and x = 2. x = 2 shown dotted in Fig. 2-4 are called asymptotes to the graph of/ and although the graph approaches arbitrarily close to the asymptotes it never coincides with them. SEC 2-1 VARIABLES AND FUNCTIONS / 47 Example 2-2 A discrete valued function may be denned by a table which is simply an arrangement of ordered number pairs in a sequence. Table 21 X 1 3 7 /(*) 21 4-2 10 6-3 Example 2-3 One possible system of curvilinear coordinates in the first quadrant may be defined as follows. Using Cartesian coordinates, construct the set of curves y = ajx and the set of straight lines y = mx, each with domain (0, oo) and with a > 0, m > 0. Representative examples of these curves are shown in Fig. 2-5 (a) for the stated values of a and m. y m = 4 m = 1 3 2 in 5 1 «^- i t 1 2 3 x> (a) (b) Fig. 2-5 (a) Families of curves y = ajx and y = mx; (b) curvilinear coordinates. In general, any set of curves such as either of these which is derivable from the same equation by a suitable choice of constant is called a. family of curves, and the constant which is fixed for any one curve but which varies from curve to curve, is called a parameter. This term parameter will often be used in contexts which do not involve families of curves, but in every case it will be used as here in the sense that it implies a 'variable constant'. Next we disregard the Cartesian axes and the manner of construction of the two families of curves and regard the two families of curves themselves as defining new coordinate lines as in Fig. 2-5 (b). Each member of the family of rectangular hyperbolas will then define a line along which a is constant, no two members of the family either intersecting or having the same value of a. Similarly, each member of the family of straight lines through origin 0, collectively called a pencil of lines, is characterized by a different value of m. 48 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 Apart from the single point through which all the straight lines pass and appropriately called a singular point, there is no ambiguity as to the values of a and m to be associated with any point in the region of the plane defined by the two families. We shall use the quantities a and m as our new coordinates for a point. Graphs may now be constructed using the two families of curves as curvilinear coordinates. The intersection of a hyperbola and straight line will define a point in the plane with coordinates given by the ordered number pair {a, m). Thus the points A, B, and C in Fig. 2-5 (b) have curvilinear coordinates (J, 1), (1, J), and (2, 4), respectively. Naturally the graph of the function y = x 2 with domain (0, oo) would look different when plotted first in Cartesian coordinates and then in these curvilinear coordinates by setting a = x and m = y. They would however be two different geometrical representations of the same function. Here we have made use of the useful symbol =, which is read 'identically equal to'. Example 2-4 This example is a final illustration of our more general defini- tion of a function. Take as the domain of the function / the set A of all people, and as the range B of the function/the set of all towns in the world. Then for the function /we propose the rule that assigns to every person his place of birth. Clearly this function defines a many-one mapping of set A onto set B, since although a person can only be born in one place, many other people may have the same place of birth. This example also serves to distinguish clearly between the concept of a 'function' which is the rule of assignment, and the concept of the 'variables' associated with the function which here are people and places. 2-2 Inverse functions In the previous section we remarked that a typical example of a correspon- dence between physical quantities was the observed fact that the pressure of a fixed volume of gas depends on its temperature. Expressed in this form we are implying that the dependent variable is the pressure p and the independent variable is the temperature T, so that the law relating pressure to temperature has the general form P = 4>(T), (A) where <f> is some function that is determined by experiment. However, we know from experience that in thermodynamics it is often necessary to interchange these roles of dependent and independent variables and sometimes to regard the temperature T as the dependent variable and the pressure p as the independent variable, when the temperature-pressure law then has the form T=y>(p), (B) SEC 2-2 INVERSE FUNCTIONS / 49 where, naturally, the function %p is dependent on the form of the function <f>. Indeed, formally, <f> and xp must obviously satisfy the identity <f>[y{p)] = p for all pressures p in the domain of y. The relationships (A) and (B) are particular cases of the notion of a function and its inverse and the idea is successful in this context because the correspondence between temperature and pressure is known to be one-one. Consider a general case of a function y=m (2-1) that is one-one and defined on the domain [a, b], together with its inverse x = g(y) which has for its domain the interval [c, d] on the j-axis. (2-2) Fig. 2-6 (a) Ihversion through the graph of/(x); (b) inversion by reflection my = x. Graphically the process of inversion may be accomplished point by point as indicated in Fig. 2-6 (a). This amounts to selecting a point y in [c, d] and then finding the corresponding point x in [a, b] by projecting horizontally from y until the graph of/ is intercepted, after which a projection is made vertically downwards from this intercept to identify the required point on the x-axis. The relationship between a function and its inverse is represented in Fig. 2-6 (b). In this diagram we have used the fact that when a function is represented as an ordered number pair, interchange of dependent and inde- pendent variables corresponds to interchange of numbers in the ordered number pair. The lower curve represents the function y = /(*) and the upper curve represents the function y = g{x), with the function g inverse to /; 50 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 both graphs being plotted using the same axes. The line y = x is also shown on the graph to emphasize that geometrically the relationship between a one-one function and its inverse is obtained by reflecting the graph of either function in a mirror held along the line y = x. Henceforth such a process will simply be termed reflection in a line. Notice that when using this reflection property to construct the graph of an inverse function from the graph of the function itself, both functions are represented with y plotted vertically and x plotted horizontally. This follows because the range of/ is the domain of g, and vice versa. No difficulty can arise in connection with a function and its inverse because of the one-one nature of the mapping. Expressed more precisely, we have used the obvious property illustrated by Fig. 2-6 (a) that a one-one function/ with domain [a, b] is such that/(;ci) =/(jc 2 ) => x\ = X2 for all xi and X2 in [a, b]. In graphical terms this result can only be true if the graph of/ either increases or decreases steadily as x increases from a to b. When either of these properties is true of a function then it is said to be strictly monotonic. In particular, if a function /increases steadily as x increases from a to b, as in Fig. 2-6 (a), then it is said to be strictly monotonic increasing and, conversely, if it decreases steadily then it is said to be strictly monotonic decreasing. Slightly less stringent than the condition of strict monotonicity is the condition that a function /be just monotonic. This is the requirement that/ be either non-decreasing or non-increasing, so that it is permissible for a function that is only monotonic to remain constant throughout some part of its domain of definition. The adjectives increasing and decreasing are again used to qualify the noun monotonic in the obvious manner. Representative examples of monotonic and strictly monotonic functions, all with domain of definition [a, b] are shown in Fig. 2-7. Decreasing (a) (b) Fig. 2-7 Monotonic and strictly monotonic functions: (a) monotonic; (b) strictly monotonic. The example of a strictly monotonic decreasing function shown in Fig. 2-7 (b) has also been used to emphasize that a function need not be repre- sented by an unbroken curve. The curve has a break at the single point SEC 2-2 INVERSE FUNCTIONS / 51 x = a where it is defined to have the value y — /?. However, as the value /? lies between the functional values on adjacent sides of x = a the function is still strictly monotonic decreasing. Had we set /? = 0, say, then the function would be neither strictly monotonic nor even monotonic on account of this one point ! It is sometimes useful to relate a function and its inverse by essentially the same symbol and this is usually accomplished by adding the superscript minus one to the function. Thus the function inverse to/is often denoted by f~ l which is not, of course, to be misinterpreted to mean Iff. Before examining some important special cases of inverse functions when many-one mappings are involved, let us formalize our previous arguments. definition 2-2 Let the set onto which the one-one function/ with domain [a, b] maps the set S of points be denoted by/(S). Then we define the inverse mapping/- 1 of f(S) onto S by the requirement that f" x {y) = x if and only if y =f(x) for all x in [a, b]. It now only remains for us to consider how some important special func- tions such as y = x 2 , y = sin x, and y = cos x, together with other simple trigonometric functions which are all many-one mappings, may have un- ambiguous inverses defined. Firstly, as we have already seen, the function y = x 2 gives a many-one mapping of [—a, a] onto [0, a 2 ]. Here the difficulty of defining an inverse is resolved by always taking the positive square root and defining two different inverse functions x = +Vy an d x = —\/y, which are then both one-one mappings of (0, a 2 ]. The inversion must thus be regarded as having given rise to two different functions; the one to be selected depending on other factors as mentioned in connection with Fig. 2-3. If we recall that the domain of definition of a function forms an intrinsic part of the definition of that function, then y = x 2 may be regarded as two one-one mappings in accordance with the two inverses just introduced. This is achieved by defining the many-one function y = x 2 on the domain [—a, a] as the result of the two different one-one mappings y = x 2 on — a <; x < and y — x 2 on < x <; a, the difference here being only in the domains of definition. The point is excluded from both domains since that single point maps one-one. By means of this device we may, in general, reduce many-one mappings to a set of one-one mappings so that the inversion problem is always straightforward. It will suffice to discuss in detail only the inversion of the sine function, after which a summary of the results for the other elementary trigonometric functions will be presented in the form of a table. In general, as shown in 52 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 Fig. 2-8 (a), the function y = sin x maps an argument x in the set R of real numbers onto [—1,1] many-one, but it maps any of the restricted domains [(2h — 1)^77, (2n + 1)^7t] corresponding to integral n onto [—1, 1] one-one. >', 1 arc sin iff c '33 •jjill"^^ **l -** • fc -1 *. (a) Fig. 2-8 Principal branch of sine function : (a) principal branch of sin x giving one-one mapping in [— £w, £»]; (b) inversion of sin x by reflection my = x. Now in line with our approach to the inverse of the square root function, the ambiguity as regards the function inverse to sine may be completely resolved if we consider the many-one function y = sin x with x e R as being replaced by an infinity of one-one functions y = sin x, with domains [(2« — l)in, (2n + l)$n]. For then in each domain corresponding to some integral value of n, because the mapping there is one-one, an appropriate inverse function may be defined without difficulty. The intervals are all of length n and are often said to define different branches of the inverse sine function. In general, when no specific interval is named we shall write x = Arcsin y, whenever y = sin x. The function Arcsine thus denotes an arbitrary branch of the inverse sine function. Because of the periodicity of the sine function, when considering the inverse function it is only necessary to study the behaviour of one branch of Arcsine. As is customary, we arbitrarily choose to work with the branch of the inverse sine function associated with the domain [—frr, far], calling this the principal branch and denoting the inverse function associated with this branch by arcsine. Hence for the inverse we shall always write x = arcsin y when y = sin x and — \n < x <; \n. In Fig. 2-8 (b) is shown in relation to the line j> = x the function/ = sin x SEC 2-3 INVERSE FUNCTIONS / 53 with domain of definition [— \n, \n\ and the associated function y = arcsinx with domain of definition [—1, 1]. The reflection property of inverse functions utilized in connection with Fig. 2-6 (b) is again apparent here. It should perhaps again be emphasized that when an inverse function is obtained by reflection in the line y = x, then in both the curves representing the function and its inverse, the variable y is plotted as ordinate (i.e. vertically) and the variable x as abscissa (i.e. horizontally). Table 2-2 summarizes information concerning the most important inverse trigonometric functions and should be studied in conjunction with Fig. 2-9. In general the notation for a function inverse to a named trigonometric "i* x t = arctanj, i in H K.v y*. it V* V J 1 1 /// V ■ ■ . ..*■■ ■■ ■- * :.■:.■■" ■ ■■■■v.. ■v ■ ■ ■■■■■■• ■ '.v. ■ . .. ■ ■ y . ' . / / y -1, / / s / s l\i* iH / * * 1 X^ x -1 (b) x ^M y = tan x (c) Fig. 2-9 Principal branches of inverse cosine and tangent functions : (a) principal branch of cos x; (b) inversion of cos x by reflection in y = x; (c) principal branch of tan x; (d) inversion of tan x by reflection in y = x. 54 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 function is obtained by adding the prefix arc when referring to the principal branch and Arc otherwise. In other books the convention is often to add the superscript minus one after the named function, distinguishing the principal branch by use of an initial capital letter when writing the function. Thus, for example, some authors will write Sin -1 in place of arcsine and sin -1 in place of Arcsine. Unfortunately notations are not uniform here and so when using other books the reader would be well advised to check the notation in use. Table 2-2 Trigonometric functions and their inverse functions Function Domain Inverse function Branch Domain y = sin x [-*», W y = arcsin x Principal [-1,1] y = sin x [(2/i - 1)K (2« + 1)M y = Arcsin x Any [-1,1] y = cos x N>,»] y = arccos x Principal [-1,1] y = cos x [rnr, (n + \)ir] y = Arccos x Any [-1,11 y = tan x (-*», w y = arctan x Principal (—00, oo) y = tan x «2n - 1)K {In + l)fcr) y = Arctan x Any (-00, CO) 2 -3 Some special functions A number of special types of function occur often enough to merit some comment. As the ideas involved in their definition are simple, a very brief description will suffice in all but a few cases. To clarify these descriptions, the functions are illustrated in Fig. 2- 10. (a) Constant function The constant function is a function y = f(x) for which f(x) is identically equal to some constant value for all x in the domain of definition [a, b] . Thus a constant function has the equation y = constant, for x e [a, b] . (b) Step function Consider some set of n sub-intervals or partitions [ao, fli), [fli, 02), [02, 03), . . ., [a n -i, a n ] of the interval [ao, a«J. Associate n constants C\, C?, . . ., C n with these. n sub-intervals. Then a step function defined on [ao,a n ] is the function y =f(x) for which /(x) = C r , for all x in the rth sub-interval. The function will be properly defined provided a functional value is assigned to all points x in [ao, a„] including end points of the intervals. Usually it is Fig. 2-10 (opposite) Some special functions: (a) constant function; (b) step function; (c) y = |x| ; (d) even function; (e) odd function; (f) bounded function on [a, b]. SEC 2-3 SOME SPECIAL FUNCTIONS / 55 J C 3 C 2 C 4 C l t • i 1— * ' I I * 1 1 1 ' [ 1 t I 1 1 » i i ! i i ! i i . ii i.ik a a t a 2 a 3 a„_, a„ x' (b) 56 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 immaterial to which of two adjacent sub-intervals an end point is assigned and one possible assignment is indicated in Fig. 2-10 (b), where a deleted end point is shown as a circle and an included end point as a dot. (c) The function fjc| From the definition of the absolute value of x it is easily seen that the graph of y = \x\ has the form shown in Fig. 2-10 (c). It is composed of the line y — x for x > and the line y = — x for x < 0. (d) Even function An even function y = f(x) is a function for which /(—x) =/(*). The geo- metrical implication of this definition is that the graph of an even function is symmetrical about the j-axis so that the graph for negative x is the reflection in the j-axis of the graph for positive x. Typical examples of even functions are y = cos x, y = 1/(1 + x 2 ) and the function _y = |x| just defined. (e) Odd function An odd function y = fix) is a function for which /(—x) = —f(x). The geo- metrical implication of this definition is that the graph of an odd function is obtained from its graph for positive x by first reflecting the graph in the j-axis and then reflecting the result in the x-axis. In Fig. 2-10 (e) the result of the first reflection is shown as a dotted curve and its reflection in the x-axis gives a second curve shown as a full line in the third quadrant which, to- gether with the original curve in the first quadrant, defines the odd function. By virtue of the definition we must have /(0) = — /(0), showing that the graph of an odd function must pass through the origin. Typical odd functions are y = sin x and y = x 3 — 3x. Most functions are neither even nor odd. For example, y = x 3 — 3x + 1 is not even, since y(—x) = (— x) 3 — 3( — jc) + 1 = —x 3 + 3x + 1 =£ y(x), nor, by the same argument, is it odd, for y(-x) # -y(x\ (f ) Bounded function A function y =f(x) is said to be bounded on an interval if it is never larger than some value M and never smaller than some value m for all values of x in the interval. The numbers M and m are called, respectively, upper and lower bounds for the function /(x) on the interval in question. It may of course happen that only one of these conditions is true, and if it never exceeds M then it is said to be bounded above, whereas if it is never less than m it is said to be bounded below. A bounded function is thus a function that is bounded both above and below. The bounds M and m need not be strict in the sense that the function ever actually attains them. Sometimes when the SEC 2-4 SOME SPECIAL FUNCTIONS / 57 bounds are strict they are only attained at an end point of the domain of definition of the function. Of all the possible upper bounds M that may be assigned to a function that is bounded above on some interval, there will be a smallest one M', say. Such a number M' is called the least upper bound or the supremum of the function on the interval and the name is usually abbreviated to I.u.b. or to sup. Similarly, of all the possible lower bounds in that may be assigned to a function that is bounded below on some interval, there will be a largest one m', say. Such a number tri is called the greatest lower bound or the infimum of the function on the interval and the name is usually abbreviated to g.l.b. or to inf. Not all functions are bounded either above or below, as evidenced by the function y = tan x on (—\tt, \tt), though it is bounded on any closed sub- interval not containing either end point. Typical examples of bounded func- tions on the interval (—00, 00) are y = sin x and y = cos x\{\ + x 2 ). The function y = \\{x — 1) is bounded below by zero on the interval (1, 00) but is unbounded above, whereas the function y = 2 — x 2 is strictly bounded above by 2 but is unbounded below on the interval (—00, 00). (g) Convex and concave functions A convex function is one which has the property that a chord joining any two points A and B on its graph always lies above the graph of the function contained between those two points. Similarly, a concave function is one which has the property that a chord joining any two points A and B on its graph always lies below the graph of the function contained between those two points. Thus the function y = \x\ shown in Fig. 2-10 (c) is convex on the interval (— 00, 00) whereas the function shown in Fig. 2-10 (d) is only concave on the closed interval [—a, a]. (h) Polynomial and rational functions A polynomial of degree n is an algebraic expression of the form y = a„x n + a n -ix n ~ l + • ■ ■ + a\x + a , where n is a positive integer and it is defined for all x. A rational function is a function which is capable of expression as the quotient of two polynomials and so has the form b m x m + b m -ix™- 1 + ■ ■ ■ + b x x + b y = a n x n + an-ix"' 1 + • • • + aix + a and is defined for all values of x for which the denominator does not vanish. An example of a polynomial of degree 2 is the quadratic function y = x 2 — 3x + 4 ; a typical rational function is 58 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 3.x: 2 - 2.x - 1 )' = 4x 3 + llx 2 + 5.V-2 which is defined for all values of a- apart from x = —2, x = — 1, and x = J, at which points the denominator vanishes. For this reason these values are called the zeros of the polynomial forming the denominator and they arise directly from its factorization into the form 4x 3 + \\x 2 + 5x -2~(4x - \)(x + 2)0 + 1). (i) Algebraic function An algebraic function arises when attempting to form the inverse of a rational function. The function/ = +y/x for x > provides a typical example here. More complicated examples are the functions: y = v 2/3 y — x 2 _|_ 2y'x — 1 y = x V-*7(2 — x). More precisely, we shall call the function y = f(x) algebraic if it may be transformed into a polynomial involving the two variables x and y, the highest powers of x and y both being greater than unity. This criterion may easily be applied to any of the above examples. In the case of the last example, a simple calculation soon shows that it is equivalent to the polynomial 2y 2 — 2xy 2 — x z = 0, which is of degree 2 in y and 3 in x. (j) Transcendental function A function is said to be transcendental if it is not algebraic. A simple example is y = x + sin x, which is defined for all x but is obviously not algebraic. (k) The function [x] On occasions when working with quantities that may only assume integral values it is useful to write y = [x] with the meaning that we assign to every real number x the greatest integer y that is less than or equal to it. Thus, for example, we have [-3] = -3, [-1.3] = -2, [0] =0, [0 . 92] = 0, H = 3, and [17] = 17. 2-4 Digression on mappings Having now examined in some detail specific examples of functions providing one-one and many-one mappings, it will be helpful to take a slightly more general look at the notion of a mapping. We again appeal to the Venn diagram, but this time supplement it by the addition of arrows to suggest the form of mapping that is involved. In Fig. 2-11 pairs of closed curves have been used to represent the sets A and B postulated in the formulation of the more general definition of a function/ given in Definition 2-1. Once again points inside a curve represent SEC 2-4 DIGRESSION ON MAPPINGS / 59 elements in the set; with set A representing the domain of the function/and set B the range of/. The arrows relating sets A and B in the three pairs of diagrams are then self-explanatory when taken in conjunction with the captions. Fig. 2-11 Mappings: (a) B = /(/)), a one-many mapping; (b) B = f(A), a many-one mapping; (c) B —f(A), a one-one mapping. The mappings illustrated in Fig. 2T1 are often said to be onto mappings, in the sense that the set A is mapped by function/onto the entirety of set B. Thus, in each case, every element in B is associated with at least one element in A. Naturally if some set C containing B is considered in place of B, then there will be elements of C that are not associated with any element in A. The mapping of A into C by /is then said to be an into mapping. For example, if the function concerned is y = x 2 , then it maps the set A comprising the interval [1, 2] into the set C comprising the interval [1,9], but onto the set B comprising the interval [1,4]. These ideas are of real importance when a double mapping is involved, for then it is necessary to examine the relationship that exists between the range of the first function and the domain of definition of the second. If the first mapping is by a function /and the second mapping is by a function g, then the result of the successive mappings is called the composition of /and g and is usually denoted by f g. The order implies that/is the first mapping which is then followed by g. Using perhaps more familiar terminology and notation we are speaking here of the 'function of a function' g{f(x)}. The general ideas involved here are illustrated in Fig. 2-12. There (a) and (b) indicate the respective domains and ranges of/ and g whilst (c) indicates how, in general, the function f g has for its domain only part of A and for its range only part of B. 60 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 Domain off (a) Domain of fog Range off Domain of g (b) GfSD /w (c) Elements common to range of/ and domain of g Range of g g{Aa)} Fig. 212 (a) Mapping by /of A onto B; (b) mapping by g of C onto D; (c) com- position of/ g. The symbolic representation suggested in Fig. 212 can be made more meaningful by considering the following. Let f(x) = 3x + 1 with domain (- co, 4/3] and g(x) = + V(9 - x) with domain [1, 9]. Then the range off is (— co, 5] and the range of g is [0, 2V2]. The range of/ thus only coincides with the domain of g in the interval [1, 5]. Hence the part of the domain of g that is common to the range off is a one-one mapping by /of the interval [0, 4/3]. This interval must then be the domain of f g. Next, the function g maps [1,5] onto the interval [2, 2-\/2], which must be the range of f g. Thus we have obtained the following: Domain off: Range off: Domain of g: Range of g: (-00,4/3], (-oo,5], [1,9], [0, 2V2], Domain of f og : [0,4/3], Range of f og : [2,2^2]. SEC 2-5 CURVES AND PARAMETERS / 61 Using direct algebraic substitution we see that in fact if/(.v) = 3x + 1 and g(x) = + V(9 - x\ then/ g = g{f(x)} = y/[9 - (3x + 1)] = + V(8 - 3x). This confirms directly that f g maps [0,4/3] onto [2, 2\/2], but does not take explicit account of the effect of the domain of g on the mapping. 2 - 5 Curves and parameters A parameter a may be associated with a curve in two quite different ways. In the first situation we shall discuss, the parameter a occurs as a constant in the equation describing the curve. Thus changing the value of a will change the curve that is described. This simple idea underlies the geometrical concept of an envelope, which will be taken up again later in connection with differen- tiation and with differential equations. In the second situation, a will appear as a variable associated with two functions s(ct.) and ?(a), which will describe separately the x and y coordinates of points on any unbroken curve. This use of a parameter is called the parameterization of a curve and is an alternative method of representing the equation of the curve. (a) Envelopes This situation is best explained by means of an example. Considerthe equation (x — a) 2 + y 2 = - + a 2 which in this form is easily seen to describe a circle of radius |a|/-v/(l + a 2 ) with its centre on the x-axis at the point x = a. Obviously, changing a will both move the centre of the circle and alter its radius, as shown in Fig. 2T3. Fig. 2- 13 Envelope shown as dotted line. 62 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 If a is allowed to vary in some interval, then the single equation will describe a set of circles, each one corresponding to a different value assumed by a in that interval. Collectively these circles are a family of circles with parameter a. If a curve exists that is tangent to every member of a family of curves, but is not itself a member of the family, then it is an envelope of the family. An envelope can be a curve of infinite length or on occasions it may reduce either to a curve of finite length or, in degenerate cases, to a single point. In Fig. 2-13 the envelope is shown as a dotted curve and, as would be expected in this case, the envelope is symmetrical about both the x- and j>-axes. If the family of circles that led to this envelope is written in the form (x - a) 2 + y 2 - ^— t = 0, 1 + v. 2 then it is seen to be a special case of an equation in three variables having the general form f(x,y,x) = 0. (2-3) This is the standard form for an equation defining a family of curves with parameter a and it will be used later to determine the equation of the envelope when it exists. However, it is easy to see that a family of curves does not always have an envelope associated with it, since the concentric circles x 2 + y 2 = a 2 form a perfectly good family with parameter a, but clearly there is no line that is tangent to each circle in the family. Expression (23) is an implicit representation of a function in the sense that it is not directly obvious how and when it is possible to re-express it in the more familiar explicit form y = F(x, a). (2-4) (b) Parameterization of a curve We have seen that when a curve is represented by an explicit equation of the form y = /(x), then for inversion reasons the mapping must be one-one. In other words, either/must be strictly monotonic in its domain of definition or, if not, it must be expressible piecewise as a set of new functions which are strictly monotonic on suitably chosen domains. A more general representation of a curve that overcomes the necessity for sub-division of the domain, and even allows curves with loops, may be achieved by the introduction of the notion of parametric representation of a curve. The idea here is simple and is that instead of considering x and y to be directly related by some function/, we instead consider x and y separ- ately to be functions of the variable parameter a. Thus we arrive at the pair of equations SEC 2-5 CURVES AND PARAMETERS / 63 x = s (a.) y = /(a), (2-5) with a < a < 6, say, which together define a curve. For any value of a in [a, b] we can use these equations to determine unique values of x and y, and hence to plot a single point on the curve represented parametrically by Eqn (2-5). The set of all points described by Eqn (2-5) then defines a curve. As a simple example of a curve without loops we may consider the parametric equations y = a 2 for — oo < a < oo. These obviously define a parabola that lies in the upper half plane and is symmetrical about the j-axis with its vertex passing through the origin. Elimination of a is easy here and results in the explicit representation y = x 2 . In more complicated cases the parameter cannot usually be eliminated and, indeed, this should not be expected since parametric representation is more general than explicit representation. An important consequence of the parametric representation of a curve is that increasing the value of the parameter defines a sense of direction along the curve which is often very useful in more advanced applications of these ideas. An example of a curve containing a loop is provided by the parametric equations y3 _ for y = 4 - a 2 which is shown in Fig. 2-14 together with the sense of direction defined by increasing a. "y -6 1! 2. Fig. 2- 14 Parameterization of a curve denning sense of direction. 64 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 It is implicit in the concept of the parametric representation of a curve that a given curve may be parameterized in more than one way.- Hence changing the variable in a parameterization will give a different parametric representation of the same curve. Thus if in the example above we replace the parameter a by the parameter ft using the relationship <x = /? + 1, then it is readily seen that x = £3 + 3/S 2 + 2$ y = 3 -2^-/32 for-3<£^l. This is an alternative parameterization of the same curve shown in Fig. 2-14. 2-6 Functions of several real variables In physical situations, to say that a quantity depends only on one other quantity is usually a gross oversimplification. Indeed, this was so in the thermodynamic illustration used to introduce the notion of a function of one real variable, because we insisted on maintaining a constant volume of gas. In general the pressure/) of a given gas will depend on both its temperature T and its volume v. Here we would say that there was a functional relationship between p, T, and v which, in an implicit form, may be expressed by the equation f(p, T, v) = 0. (2-6) The function/occurring here is a. function of three real variables and obviously depends for its form on the particular gas involved. Usually one of the three quantities, say p, is regarded as a dependent variable with the others, namely T and v, being regarded as independent variables. Solving Eqn (2-6) forp then gives rise to an explicit expression of the form P = g(T, v), (2-7) with g then being called a function of two real variables. Just as with a function of a single real variable, in addition to specifying the functional form it is also necessary to stipulate the domain of definition of the function. Thus Eqn (2-7), which in thermodynamic terms would be called the equation of state of the gas, would only be valid for some range of temperature and volume. In this case the reason for the restriction on the temperature and volume is a physical one, whereas in other situations it is likely to be a purely mathematical one. Extending the ideas already introduced we shall now let R 2 denote the set of all ordered pairs (x, y) of real numbers and let S be some subset of R 2 . definition 2-3 We say /is a real valued function of the real variables x and y defined in set S if, for every (x, y) e S, there is defined a real number denoted by f(x,y). SEC 2-6 FUNCTIONS OF SEVERAL REAL VARIABLES / 65 As is the case with a function of one variable, when the domain of defini- tion of a real valued function of two or more real variables is not specified it is to be understood to be the largest possible domain of definition that can be defined. Thus, for example, the largest subset S <= R2 in which the function f(x,y) = \/(l — x 2 — y 2 ) is defined is given by S = {(x,y)sR 2 \x 2 +y 2 < 1}. This concept of a function immediately extends to include functions of more than two variables. Using R" to denote the set of all ordered w-tuples (xi, X2, ■ ■ ., x n ) of real numbers of which S is some subset, this definition can be formulated. definition 2-4 We shall say that f is a real valued function of the real variables xi, X2, . . ., x n defined in set S if, for every (xi, xz, . . ., x n ) e S, there is defined a real number denoted byf(xi, xz, . . ., x n ). A typical example of a function of the three variables x, y, z is provided by f(x,y, z) = V(2 - x) + V(9 - y 2 ) + V(16 - z 4 ). The largest subset S <= R3 for which this function may be defined is obviously S = {(jc,j,z)eR3|x<2; -3<j<3; -2<z<2}. The geometrical idea underlying the graph of a function of a single variable also extends to real functions /of two real variables x, y. Denote the value of the function/at (x, y) by z, so that we may write z = f(x, y). Then with each point of the (x, j)-plane at which / is defined we have associated a third number z = f(x, y). Taking three mutually perpendicular straight lines with a common origin as axes, we may then identify two of the axes with the independent variables x and y and the third with the dependent variable z. The ordered number triples (x,y, z) = (x,y,f(x,y)) may then be plotted as points in a three-dimensional geometrical space. The set of points (x, y, z) corresponding to the domain of definition of the function/(x, y, z) then define a surface which, in practice, usually turns out to be smooth. It is conventional to plot z vertically. On account of the geometrical representation just described, even in R n it is customary to speak of the ordered «-tuple of numbers (xi, X2, ■ ■ ., x n ) as defining a 'point' in the 'space' R™. By way of illustration of a graph of a function of two variables we now consider x 2 v 2 x 2 v 2 A*.y) = -4+j with -4+j^ 2 > where the inequality serves to define a domain of definition for the function. The surface described by this function has the equation z = x 2 /4 + y 2 /9 and the domain of definition is the interior and boundary of the curve Cross-section by plane x = b Cross-section by plane y = a Cross section by z = 1 Fig. 2- 15 Surfaces and level curves: (a) representation of surface; (b) level curves. PROBLEMS / 67 x 2 /4 + y 2 /9 = 2. If this latter expression is rewritten in the form x 2 /8 + j 2 / 18 = 1 then it can be seen that the domain of definition of/ is in fact the interior of an ellipse in the (x, j)-plane having semi-minor axis 2V2 and semi-major axis 3-\/2, and being centred on the origin. As f(x,y) is an essentially positive quantity it follows directly that < z < 2 in the domain of/. To deduce the form of the surface, two further geometrical concepts are helpful. The first is the notion of the curve defined by taking a cross-section of the surface parallel to the z-axis. The second is the notion of a contour line or level curve, defined by taking a cross-section of the surface perpendicular to the z-axis. To examine a cross-section of the surface by the plane y = a, say, we need only set y = a in f(x,y) to obtain z = x 2 /4 + a 2 /9, showing that the curve so defined is a parabola with vertex at a height z = a 2 /9 above the j-axis. A similar cross-section by the plane x = b shows that the curve so defined is z = b 2 /4 + _y 2 /9, which is also a parabola, but this time with its vertex at a height z = b 2 /4 above the x-axis. (See Fig. 2-15 (a).) If desired, sections by other planes parallel to the z-axis may also be used to assist visualization of the surface. The curve defined by a section of the surface resulting from a cross-section taken perpendicular to the z-axis is called a contour line or level curve by direct analogy with cartography, where such lines are drawn on a map to show contours of constant altitude. Level curves are obtained by determining the curves in the (x, jO-plane for which z = constant, and it is customary to draw them all on one graph in the (x, j)-plane with the appropriate value of z shown against each curve. (See Fig. 2T5 (b).) Let us determine the level curve in our example corresponding to z = \ which is representative of z in the range < z < 2. We must thus find the curve with the equation x 2 /4 + j 2 /9 = J, which we choose to rewrite in the standard form x 2 /2 + y 2 1(9/2) = 1. This shows that it describes an ellipse centred on the origin with semi-minor axis \/2 and semi-major axis "Sy/2. It is not difficult to see that all the level curves are ellipses; the one corres- ponding to z = 2 being the boundary of the domain of/ and the one corres- ponding to z = degenerating to the single point at the origin. PROBLEMS Section 2-1 21 Sketch the graphs of these functions: (a) f{x) = x 2 ~ 3x + 2 (-l<x<3); (b) f(x) = x + sin x (- -n/2 < x < w/2); (c) /(*) = x 3 (-2<*<2); (d) fix) = x 2 + 1/x (0-2<x<2); (e) fix) = x + 1/x 2 (0-5 < x < 5). 68 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 2-2 Determine the domain and the range of each of functions (a) to (e) denned above. 2-3 Determine the range of the function /(x) = x 2 + 1 corresponding to each of the following domains and state when the mapping is one-one and when it is many-one: (a) [-1,1]; (b) (2,4); (c) [-2,4]; (d) [-3,1]. 2-4 Find the largest domain of definition for each of the following functions : (a) /(*) = x 3 + 3 ; (b) /(*) = x 2 + VO - x*) ; (c) fix) = x 2 + Vd - * 3 ); (d) /(*) = l/(* 2 - 1); (e) fix) = * + 1/x; (0 fix) = x 2 /(l + x 2 ). 2-5 Let fin) denote the function that assigns to any positive integer n the number of positive integers whose square is less than or equal to n + 2. By enumerating the first few values of fin) deduce the values of n for which /(«) = 3. 2-6 An integer m is said to be a prime number if its only factors are 1 and m. Given that/(«) is the function that associates with n the number of primes less than or equal to 2w + 1, enumerate the first ten values of fin). 2-7 Give two examples of functions which are defined only for discrete values of the dependent and independent variables. 2-8 Sketch representative members of the two pencils of lines described by y = a (x — 1) + 2 and y = Pix — 2) + 3, where a and p are parameters. Locate the two singular points and suggest how a and p may be used as co- ordinates for points in the plane of the two pencils. When will the coordinates a and p fail to identify points ? 2-9 Suppose that /is the function that assigns to every qualified driver the name of the driving examiner who issued his licence. Identify the domain A and the range B off, stating the nature of the mapping involved. 2-10 Give two examples of functions relating non-numerical quantities. Section 2-2 211 Sketch the graphs of the following functions in their stated domains of definition and in each case use the process of reflection in the line y — x to construct the graph of the inverse function : (a) fix) = x 3 with x e [-2, 2]; (b) f{x) = x + sin x with x e [0, w/2] ; (c) f(x) = x/(l + x 2 ) with x e {- 1, 2]. 2-12 Where appropriate, classify the following functions as either monotonic or strictly monotonic increasing or decreasing on the stated domains of defini- tion: i&) fix) = x 2 for xe[-l,2\; (b) fix) = x 2 forxe[-l,0); (c) fix) = sin x for x e [-3 jt/4, w/4] ; (d) fix) = cos x for x e [0, *] ; (e) fix) = tan x for x e [- w/4, tt/4] ; txfor xe[0, 1] if) fix) = lforxe(l,2] U 2 /4 for x e (2, 6]; PROBLEMS / 69 ,,,,.. txforxe [1,2) (g)/W = | x 2 forxe [2,4]. 2-13 Complete the entries in this table: f , _. Is mapping f' 1 when it ■^ ' one-one exists X [-3,1] X 3 1/(1 + X) [1,3] sin x [-K W cos (a: + Jn) [O.ir] tan [x — \tt\ [0, \n\ (2,4] Section 23 2-14 Sketch these functions in their associated domains of definition: (a)/(x) = |2x|forxe[-2,2]; (b) f(x) = x + | x | for x £ [-2, 2]; (c) the step function assuming the values 1, 2, —3, 2, 4 on the x intervals [0, 1), [1, 2], (2, 3-5), [3-5, 4], and (4, 5], respectively. Identify end points belonging to a line by a dot and end points deleted from a line by a circle. x | for x 6 [0, 1) x- 1 | for xe [1,2) x-2|forxe[2,3]. (d) fix) = 2-15 Where appropriate, classify the following functions as even or odd: (a)/(x) = x + |oc|; (b) /(x) = x + sin 2x; (c) f[x) = x 2 + sin x; (d) /(x) = 1/x; (e)/(x) = x 2 /(l+x 2 ) 2 ; (f) /(*) = x 5 - x 3 + x; (g) fix) = 2 cos x + sin x. It is obvious that any arbitrary function /(x) which is defined in an interval ^ containing the origin may be written in the form fix) = iifix) +fi~x)) + HJ{x) -fi-x)), in any interval,/ <= J that is symmetric about the origin. Such an interval,/" is said to be interior ioJ. This shows that any such/(x) is expressible as the sum of an even function K/(x) + /( - *)), and an odd function £(/(*) - /( - *)) within ^ Apply this result to display the following functions as the sum of even and odd parts, in each case stating the largest interval ,/ for which the result is true: (h) f(x) = 1 + x 3 + x siri x for — 2^ < x < 3*-; (i) fix) = 1 + x + | x | sin x for — 3*- < x < 3»; (j) /(*) = 1 - x + 2x 2 + 4x 3 for -4 < x < 3. 2-16 Determine if upper and lower bounds exist for the following functions and, when appropriate, state their values and where they occur on the respective domains of definition: 70-/ VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 (a)/(x) = l/xforxe[l,4]; (b)/(x)= l/xforjce(0,3]; (c)/(x)= 1 + x 2 for xe [-2,1]; (d) f(x) = sin x for x e [0, 3 tt/2] ; (e) /(x) = tan x for x e (- n/2, tt/2). 2-17 The pairs of numbers enclosed by the curly brackets following each problem are upper and lower bounds for the associated function in its stated domain of definition. State whether or not each of these bounds is strict : (a) /fr) = x 3 + x + 1 with x e [1, 2], {0, 11}; (b) f{x) = sin x with x e [0, tt/2], {0, 2}; (c) f(x) = 1/(1 + x 2 ) with x e [0, 2], {1/6, 2}; (d) /(x) = sin (1/*) with x e [21*, 30], {0, 1}. 2- 18 Determine by sketching whether the following functions are convex, concave or neither on their stated domains of definition : (a) f{x) = x 3 forxe[l,3]; (b)f(x) = x 3 forxe[-l, 1]; (c) f(x) = a 2 - x 2 for x e [-a/2, a]; (d) /(x) = x + sin x for x e [0, tt/2] ; (e) f(x) = sin * for x e [0, w]. 2-19 Give examples of polynomials of degrees 3, 4, and 5 and of a rational function having a numerator of degree 2 and a denominator of degree 5. 2-20 Classify the following functions as polynomial, rational, or algebraic. When the function is algebraic, state the degrees of x and y in the polynomial that is involved after the surds and fractions have been cleared : (a) y = x 3 — x 2 + 1 ; (b) j = xV(3-x); (c) y = {x - l)/(*4 + 3x 3 - x 2 + x + 1); (d) y = x + 3V{x 2 - 2); (e) y = (x 3 - 3x + 2)/(x - 1). Section 2-4 2-21 Complete the entries in the following table by determining whether the functions /map the stated domains A 'into' or 'onto' the domains B. r a d Into or onto mapping X 3 [1,3] [0, 30] x + sin x [0, in] [0, *(2 + »)] X 2 0,4] [1,16] x i [-1,2] [0, 16] 2-22 Given that f(x) = 2x - 7 with domain (- oo, 20] and g(x) = 10 - x with domain [—6, oo), determine the domain and the range of the composition f°s- 2-23 Given that/(x) = x-+ 1 with domain (-oo, oo) and g(x) = 2 + V(4 — x) with domain [—5, 4], determine the domain and the range of the composition f°g- PROBLEMS / 71 Section 2-5 2-24 Draw the circles corresponding to a = |, J, \, 1, and 2 in the equation (x — l) 2 + (y — a) 2 = a 2 /(l + a 2 ) and sketch the envelope indicating its asymptotes for large positive and negative a. 2-25 Draw the circles corresponding to a = J, J, 1, 2, and 3 in the equation (x — a) 2 + j 2 = I a 2 and draw the envelope. 2-26 Deduce the envelope of the family of circles (x — a) 2 + y 2 = a 2 , with parameter a. 2-27 Sketch representative ellipses belonging to the family * 2 /a 2 + j 2 /(4 — a) 2 = 1 , with parameter a and deduce the shape of the envelope. 2-28 Draw representative members of the family of straight lines y = <xx + 2/a, with parameter a, and deduce the shape of the envelope. 2-29 Sketch the curve represented by the parametric equations x = 2 cos a, y = sin a for — n/2 < a < n/2. 2-30 Sketch the curve represented by the parametric equations x = a 2 + 1, y = a 3 for — 2 < a < 2. 2-31 Sketch the curve represented by the parametric equations x = a 3 + a 2 — 2a, y = 5 — a 2 for — 3 < a < 2. Indicate by arrows on the curve the sense of direction corresponding to increasing a. 2-32 Sketch the curve represented by the parametric equations x = cos a + 4 cos (a/3), y = sin a + 4 sin (a/3) for < a < 3 nj2. Use arguments in- volving even and odd functions to deduce the form taken by the curve for < a < 6tt. 2-33 Suggest two different parametric representations for the curve y = x 2 + x + 1 for < x < 2. Section 2-6 2-34 What are the largest domains of definition for the following functions of several variables: (a)f(x,y) = 1 +x 2 + y 2 ; (b) f(x,y) = (x 2 + F W0 - x 2 - y 2 ); (c) f{x, y) = sin xyj(x 2 + y 2 + 1); (d) f(x, y) = 3* 2 + y 2 + V(2 -y)+ V(4 - x 2 ); (e) f(x,y, z) = V(3 - x) + xV(9 - y) + yV(l - z 2 ); (f) f(x,y, z) = V(x 2 + y 2 - 1) + V(4 - x 2 - y 2 - z 2 ). 2-35 The function f{x, y) = x 2 y has for its domain of definition the rectangle in the {x, j)-plane defined by | x \ < 3, | y \ < 2. Deduce the shape of the curves defined by cross-sections of the surface z = f{x, y) taken by the three planes x = -2, x = 0, and x = 2 that are parallel to the (y, z)-axes and by the three planes y = —2,y = 0, and y = 2 that are parallel to the (x, z)-axes, using your results to sketch the surface. Sketch on one diagram the level curves corresponding to z = — 4, z = — 2, z = 0, and z = 6. 2-36 Sketch the surface z = f(x,y) defined by the function f(x,y) = 1/(1 + x 2 + y 2 ) in the domain | x j < 4, | y | < 4. Draw the level curves corresponding to z = 1/9, z = 1/3, z = 2/3, and z = 1. 72 / VARIABLES, FUNCTIONS, AND MAPPINGS CH 2 2-37 The surface z = f(x, y) is defined by the function f(x, y) = \j[{x — l) 2 + (y- 2) 2 - 1] with 2 < (x - l) 2 + {y - 2) 2 < 9. Deduce the domain of definition of the function and then sketch the level curves corresponding to z = i, 2 = i, and z = f on the same diagram. Use your result to sketch the surface. [Hint: Use the fact that the circle of radius p with centre at (a, b) has the equation (x — a) 2 + {y — b) 2 = p 2 .] Sequences, limits, and continuity 3-1 Sequences The notion of a 'sequence' is a constantly recurring one in everyday life, where it usually implies the ordering of some set of events with respect to time. The sets of events that are so ordered, or arranged, are very varied and may be either numerical or non-numerical in nature. Typical examples of commonplace sequences in these categories are these: (a) the sequence of months in a year ; (b) the sequence of digits identifying a telephone subscriber; (c) the sequence of machining operations required to make a certain component. However, sequences are not necessarily decided by the chronological order of events and they are often determined instead by some attribute possessed by the members of the set to be ordered. Thus, for example, two commonly occurring sequences to be found in any library are the entries in the alphabetic catalogues of authors and titles, neither of which are in the chronological order of acquisition of the books. Although these general ideas could be discussed at greater length, such an examination is inappropriate here, and it must suffice that these few examples show that sequences are commonplace in the world around us, and that they need not necessarily involve numbers. These ideas find an immediate parallel in mathematics, where the natural order existing in R combined with the arithmetic properties discussed in Chapter 1 enables us to deal very successfully and in great detail with ques- tions relating to mathematical sequences. Our main pre-occupation in this book will be with sequences of numbers and sequences of functions so we must first make the mathematical notion of a sequence more precise. Before doing this however we must first issue a word of warning concerning the colloquial usage of the words sequence and series, and on their mathematical usage which is quite different. Colloquially the words sequence and series are often used interchangeably, but in mathematics they have two quite different meanings which must never be confused. In brief, in mathematical terms a sequence is a set of quantities that is enumerated in a definite order, whereas a series involves the sum of a set of quantities. Thus 1, 3, 5, 7, 9, . . . is a sequence but 1 + \ + i + i + re + • • • is a series. If a sequence is composed of elements or terms u belonging to some set S, 74 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 then it is conventional to indicate their order by adding a numerical suffix to each term. Consecutive terms in the sequence are usually numbered sequentially, starting from unity, so that the first few terms of a sequence involving u would be denoted by m, u 2 , u 3 , . . .. Rather than write out a number of terms in this manner this sequence is often represented by {u n }, where u„ is the nth term of the sequence. The sequence depends on the set chosen for S and the way suffixes are allocated to elements of S. A sequence will be said to be infinite or finite according as the number of terms it contains is infinite or finite and, unless explicitly stated, all sequences will be assumed to be infinite. The notation for a sequence is often modified to {u n }% =1 when only a finite number N of terms is involved. As an example of an infinite numerical sequence, let S be the set of real numbers and the rule by which suffixes are allocated be that to each integer suffix n we allocate the number 1/2" which belongs to R. We thus arrive at the infinite sequence u\ = 1/2, w 2 = 1/2 2 , w 3 = 1/2 3 , . . ., which could either be written in the form 111 11 2 2 2 2 3 1 n 2 n+1 or, more concisely, in the form {l/2»}. Had the set S still been the set R of real numbers, but the rule of allocation of suffixes been changed, so that to each integer suffix n chosen from the first N natural numbers we allocated the number l/(2« + 1), then the finite sequence 111 1 3 5 7 (2JV + 1) would have resulted. If we use the notion of a function /(x) which is defined only for integral values of the argument x, the following concise definition can be formulated. definition 3-1 In mathematical terms a sequence is a function / defined only for integer values of its argument and having for its range an arbitrary sets'. Hence the first sequence that was displayed could be regarded as resulting from the function f(x) = \J2 X with u n =/(«), where n is always a positive integer. By exactly similar reasoning, the second sequence can be derived from the function f(x) = 1/(2* + 1) by setting u n = /(«)• The connection between functions and sequences that is established in this definition makes it appropriate to describe numerical sequences in the same terms as would be used to describe the function giving rise to them. SEC 3-1 SEQUENCES / 75 Thus if the terms of a sequence {«„} are such that m <u n < M for all values of n then the sequence is said to be bounded, whilst if u n +\ > u n for all « then the sequence is said to be strictly monotonic increasing. The terms bounded above, bounded below, unbounded, strictly monotonic decreasing, monotonic, and oscillating, etc., can also be used in the obvious manner as shown below. Example 3-1 (a) {l/n}f is a bounded, strictly monotonic decreasing sequence. The upper bound 1 is strict but the lower bound is never actually attained. (b) ( 1 \°° is a strictly monotonic increasing sequence, strictly \sin (\ln)] 1 bounded below by (sin l)" 1 but unbounded above. (c) /(— 1)»)°° is a bounded sequence with strict upper bound J and \ ~ n J strict lower bound —1. (d) {«„}" where W2m-i = m\{tn + 1) and uzm = «2m-i- The first six terms of this sequence are |, f, f, §, f, | correspond- ing pairwise, respectively, to m = 1, 2, and 3. The sequence is thus both bounded and monotonic in- creasing. It is not strictly monotonic increasing because pairs of terms are equal. The lower bound \ is strict, but the upper bound 1 is never actually attained. (e) {(— l) n } is an oscillating but bounded sequence with strict upper bound 1 and strict lower bound — 1 . (f) {(—2)"} is an oscillating but unbounded sequence. Just as a graph proved to be useful when representing functions, so also may it be used to represent sequences. Exactly the same method of repre- sentation can be adopted, but this time, since the domain of the function denning the sequence is the set of natural numbers, the graph of a sequence will be a set of isolated points. A typical example is the graph of the first few terms of the sequence {u n } with u n — [n + (— 1)»]/« which are shown as dots in Fig. 3-1 (a). An obvious deficiency of this representation is that the horizontal axis must be made unreasonably long if a large number of terms are to be repre- sented. This can be overcome by the following simple device which is some- times of use since it compresses the' representation of numbers 1 to infinity onto a line of finite length. The idea is illustrated in Fig. 3-1 (b) where, on the horizontal axis, the integer n is associated with a point distant \jn to the left of a fixed point P. The left end point of the line segment is then associated with the value 1, the mid-point with the value 2, and so on, with the point P itself corresponding to an infinite value of n. An even simpler graphical representation than either of these is often 76 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 used in which the values of successive terms in the sequence are plotted one- dimensionally as points on a straight line relative to some fixed origin. Because of the identification of the numerical value of a term of the sequence with a point on a line, the behaviour of a sequence is often spoken of in terms of the behaviour of the points in this representation (that is-, there is a one-one mapping of {u„} onto the straight line). In terms of this representation, the same 1-5-1 10- 0-5- ! 1 2 3 4 5 6 7 1-5-1 10 0-5 \l-< 1/2 1/3 1/4- 1/5 -1/6 — Fig. 3-1 Two alternative graphs of sequence jl H j: (a) normal graph; (b) compressed horizontal axis. SEC 3-1 SEQUENCES / 77 All points u, for n > 5 lie in this neighbourhood u M 3 " 5 l u i M» " 6 \u t • • { ■ • • ••^••••» • •}« 0-5 a \ i-o k > • }• • 6 : 5 i,\ T¥' 4 1-5 /. , (-1)" Fig. 3-2 Sequence 1 + j plotted on line sequence that gave rise to Fig. 3-1 (a) and (b) will appear as in Fig. 3-2. This could also have been obtained from Fig. 3-1 (a) and (b) by projecting the points of the graphs horizontally across to meet the vertical axis. In each of these three representations, the tendency for the points of the sequence {1 + (— l)"/w} to cluster around the value unity as n increases is obvious and clearly expresses an important property possessed by the sequence. We shall now explore this more fully. In the sequence just discussed it is obvious that as n increases, so the points of the sequence cluster ever closer to the unit point in Fig. 3-2. If we adopt the convention of calling an open interval (a, b) containing some fixed point a neighbourhood of that point, then it is not difficult to see that any neighbourhood of the point unity will contain an infinite number of points of the sequence {u n }. In fact in this case we can assert that no matter how small the length b — a of the neighbourhood, there will always be an infinite number of points in (a, b) and there will always be a finite number of points outside (a, b). This is even true when b — a shrinks virtually to zero! The fact that any neighbourhood of the value unity has the property that an infinite number of points of the sequence are contained within it, whereas only a finite number of points lie without it, is recognized by saying that the limit of the sequence is unity. On account of this name the point corresponding to the value unity in Fig. 3-2 is called a limit point of the sequence. We shall examine the idea of a limit in the next section, and so for the moment will confine discussion to limit points. For this we shall require the notion of a sub-sequence. Henceforth, by a sub-sequence we shall mean a sequence u ni , w„ 2 , . . ., u nm , . . ., of terms belonging to the sequence {w„}, where mi, «2, . . ., n m , ... is some numerically ordered set of integers selected from the complete set of natural numbers. Thus ui, 1/9, H27, «3i, . . • is a sub- sequence of «i, «2, «3, • • • and obviously {u% ug, W27, M31, • • .} c {«»}• In terms of this we now give the following formal definition of a limit point of a sequence {««}. definition 3-2 A point u* is said to be a limit point of the sequence {u„} if every neighbourhood of u* contains an infinite number of elements of the sequence {u n }. 78 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 Since we have not insisted that there be a finite number of points outside any neighbourhood of a limit point it follows that a sequence may have more than one limit point. We shall show by example that a limit point may or may not be a member of the sequence that defines it. This result when applied to sequences with only one limit point will later be seen to be very important, since it provides the justification for the approximation to irrational numbers in calculations by rational numbers. In sequences involving only one limit point the sequence will be said to converge to the value associated with the limit point. This value will be called the limit of the sequence. Not all sequences have limit points and the following examples exhibit sequences having three, one, and no limit points, respectively. { sin C-^M Example 3-2 (a) j c . n ( n 2 + l \_\ has the three limit points — 1, 0, and 1, of which is a member of the sequence and the other two are not. The sequence does not converge. (b) f 1 . (™\\ has only one limit point at zero which is a member (n \ 2 // of the sequence. The sequence converges to zero. (c) {« 2 } has no limit point and so the sequence does not converge. One of the most important applications of the notion of a sequence is to the study of series. The difficulty here is to give a meaning to the sum of an infinite number of terms. What, for example, is the meaning of v l 2-,- (A) The solution is to be found in the behaviour of the sequence {s m } defined by m ] 1 «! The first few terms of the sequence {s m } are s 1 = U *-l+j|. *-l + i + l. ,, = 1 + 1 + 1 + 1 and obviously all such terms s m will only involve the sum of a finite number of numbers. For obvious reasons s m is called the mth partial sum of the series (A). The interpretation of the infinite sum (A) is to be found in the behaviour of the Mh term of {s m }, namely the Mh partial sum sn, as N tends to infinity. If {s m } has only one limit point at which s m tends to some number S, then this will be called the sum of the series. If S is infinite the series will be said to SEC 3-2 LIMITS OF SEQUENCES / 79 diverge. A moment's reflection will show the reader that this is the practical approach to the problem, since the term s# is the sum of the first N terms of the infinite series (A), and it seems reasonable to assume that when the value of (A) is finite, it must be close to the value sn, when N is suitably large. These preliminary ideas on series must suffice for now, but we shall take them up again later and devise tests to determine whether series are convergent or divergent. 3-2 Limits of sequences The term limit was first introduced intuitively in the previous section in con- nection with a sequence {u„} which had only one limit point. As n increases so the points representing the terms u„ cluster ever closer to the limit point whose value L, say, is the limit of the sequence. This idea of a limit is correct in spirit but it is not very satisfactory from the mathematical manipulative point of view since the phrase 'cluster ever closer to' is far too vague. The difficulty of making the expression 'limit' precise is connected with the exact meaning we give to this phrase. Our difficulty can be resolved if we recall that any neighbourhood of a limit point will contain an infinite number of points of the sequence and, if there is only one limit point, will exclude only a finite number of points! Thinking in terms of numbers rather than points, a neighbourhood of a limit point is simply an open interval of the line on which the numbers u n are plotted and we already have a notation for representing such an interval. Suppose, for convenience, that the neighbourhood is symmetrical about the number L and of width 2e, where e is some arbitrarily small positive number. Then a variable u will be inside this neighbourhood if L — e<u<L + s. Recalling the definition of 'absolute value', this inequality can be rewritten concisely as \u - L\ < s. Different values of e > determine different neighbourhoods, and if u is identified with the term u n of the sequence, then L is the limit of the sequence if, no matter how small e may become, only a finite number of terms u„ lie outside the neighbourhood and an infinite number lie within it. We can now give a proper definition of a limit. definition 3-3 The sequence {u n } will be said to tend to the limit L if, and only if, for any arbitrarily small positive number e, there exists an integer N such that n > N ^ \u n — L\ < e. Let us test our definition on the sequence {u n } with u„ = 1 + (— 1)»/«. We already know that this sequence has only one limit point at the value unity, and consequently our definition should show that the limit is unity. Suppose, for the sake of argument, that we check to see that the definition is 80 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 satisfied if s = 1/100. To do this we must find a number N such that when n > N we have ('+*?)- < 1 loo This result is obviously equivalent to the requirement that (l/n) < 1/100 which will be true for any value of n greater than 100. Hence if we take N = \00 the conditions of the definition are satisfied. There are thus 100 terms outside the neighbourhood and an infinite number within it. Had we demanded a much smaller value of e, say e = 10" 6 , the identical argument would have shown that the definition is satisfied if N = 10 6 . There would now be a very large number of terms outside the neighbourhood 0-999999 < u n < 1-000001, in fact 10 6 in all, but this is still a finite number whereas the number of terms within the neighbourhood is still infinite. Clearly, however small the value of e, the conditions of the definition will still apply showing that it is in accord with our earlier intuitive ideas. In general, when the sequence {u n } has a limit L, so that we say it converges to L, we shall write lim u n = L. Whenever using this notation for a limit the reader must always keep in mind the underlying formal definition just given. The definition and the illustrative example just given show that when a sequence has only one limit point, then it must converge to the value associ- ated with that limit point. Any sequence such as {u n } with u n = sin {n(n 2 + l)/2n} cannot have a limit, for it has three limit points at — 1 , 0, and 1 and any small neighbourhood taken about any one must, of necessity, exclude the infinitely many terms associated with the other two. Such a sequence does not converge. Frequently the limit of a sequence is of more importance than its individual terms, and in such circumstances the notation lim u n is advantageous in that it focusses attention on the general term u„ of the sequence. The result of the limiting operation is often readily deduced from the general term as these examples indicate. Example 3-3 Determine the limits in each of the following: r(2« - 1)(« + 4)(n - 2)" (a) lim n— *oo (b) lim n— *-oo (c) lim •1 2 1 + ~2 + * "5»+i + 7«+r 5» _ 7» + n - 1" SEC 3-2 LIMITS OF SEQUENCES / 81 ,^ r n + 22 + 32 + • • • + w2 1 (d ) hm . n— ><x> L '* So/Mr/on (a) The general term is «„ = [{In - 1)(« + 4)(n - 2)]/« 3 , so that expanding the numerator and dividing by n z gives „ , 3 18 8 u n = 2 + + — n n 2 n 3 Obviously, as n increases, the last three terms comprising w„ approach zero, and in the limit we have lim \(2n-l)(n + 4)(n-2)l = ^ Solution (b) The general term is u n = [1 + 2 + ••• + („ _ l)]/ n 2 , in which the numerator is the sum of an arithmetic progression. Now it is readily verified that 1 + 2 + • • • + ( n - 1) = n (n - l)/2 so that m» -m Using the same argument as in (a) above we see at once that as n increases so u n approaches the value \, whence hm — H \- ■ • • A = — Solution (c) The general term here is u„ = (5 B+1 + 7 B+1 )/(5» — 7») and by dividing numerator and denominator by 7" it may be written: 5(5/7)» + 7 tin — (5/7)» - 1 Now 5/7 < 1 so that (5/7)» will tend to zero as n increases. Thus u n will approach the value —7. In this case we may write lim P" +1 + 7 " +1 l = -7. n ^l 5»-7» J Solution (d) The general term is u n = [l a + 2 2 + • • • + «2]/«2 5 j n w hj c h the numerator is the sum of the squares of the first n natural numbers. Using the familiar result 12 + 22 + ... + „ 2 = "("+0(2" + l) 6 enables us to write 82 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 (« + 1)(2« + 1) U n 6w It is obvious that the numerator is quadratic in n whereas the denominator is first degree or linear in n. Hence as n increases without bound, so will u„. This sequence diverges and we write lim 12 + 2 2 + • • . + „2 00. Notice that we do not use the equality sign in connection with the symbol oo, in accordance with the idea that infinity is not an actual number but essentially a limiting process. Before continuing our discussion of limits, let us introduce a useful notation. In the examples above it is apparent that the value of the limit of a sequence involving the ratio of two expressions as n increases, is entirely determined by the ratio of the most significant terms in the numerator and denominator. In the case of a polynomial involving «, the most significant term as n increases is obviously the highest degree term in which it appears. Thus in (a), an inspection of the brackets in the numerator shows the most significant term to be 2n 3 , and as the denominator only involves n 3 , it is at once obvious that for large n the ratio will approach (2n 3 jn 3 ) = 2. To streamline limiting arguments of this type, and yet to preserve some- thing of the effect of the less significant terms, we now introduce the so-called 'big oh' notation appropriate to functions. definition 3-4 We say that function f{x) is of the order o/the function g{x), written /(x) = 0(g(x)) if, for some set of values of x (a) g(x) > and (b) |/(x)| < Mg(x), where M is some constant. The value of the constant M is usually unimportant as for most arguments it suffices that such an M should exist. We have these obvious results: 2x 3 + 2x + 1 = 0(x 3 ), 3x + sin x = 0{x), sin x = 0(1), where the symbol 0(1) has been used to denote a constant. In terms of this notation we may write the general term u„ in Example 3.3 (a) in the simplified form SEC 3-2 LIMITS OF SEQUENCES / 83 2«3 + 0(„2) 0( - n 2) w» = — whence u„ = 2 H — ■ f A") By virtue of the definition of the symbol 'big oh', 0(n 2 ) implies an expression that is bounded above by Mn 2 , so that 0(n 2 )/n 3 ^> (Mn 2 )/n 3 . However, M/n -*■ as n increases without bound, so that lim u n = 2. (B) Normally the argument just outlined would be omitted, so that result (B) would be written down immediately after (A). Implicit in the examples just examined are results which we now combine. theorem 3-1 If it can be shown that m, m, us, . . . and vi, v%, vs, . . . are two sequences such that lim u n = L and lim v n = M, then n— »- co n-* co (a) mi + vi, uz + V2, us + v 3, . . . is a sequence such that lim (u n + v n ) = L + M; n— »co (b) mvi, U2V2, U3V3, ... is a sequence such that lim u n v n = LM; M-*co (c) provided M ^ 0, ui/vi, U2/v 2 , mjvs, ... is a sequence such that lim (u n jv n ) = LIM. n->co These assertions are virtually self-evident and so we prove only the first result, making full use of our definition of a limit and of the triangle inequality of Theorem 1-4. Suppose e is given. Then because {«„} converges to the limit L, there exists a number Ni such that n > Ni => \u n — L\ < \e. By the same argu- ment there exists another number N2 such that n > jV 2 => \v n — M\ < fe. NOW \{u n + V n ) - (L + M)\ = \{u n -L) + (v n - M)\ < \u n - L\ + \v n - M\, and so n > max (Ni, N 2 ) => \{u n + v n ) - (L + M)\ <\e + \e. Thus, taking N = max (Ni, Nz), and given an arbitrarily small positive number e, we have n> N=> \(u„ + v n ) - (L + M)\ < e or lim (u„ + v n ) = L + M. n— *-co In effect, this theorem justifies any argument in which it is asserted that, if a is close to A and b is close to B, then a + b is close to A + B, ab is close to AB, and, provided b and B ^ 0, a/A is close to A/B. theorem 3-2 Let {«„} and {v„} be two sequences which both converge to the same limit L, and suppose {w n } to be a third sequence. Then if for all n greater than some fixed value N, it is true that u„ <: w n < v n , the sequence {w B } converges. Furthermore, the limit of the sequence {w n } is also L. 84 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 The proof of this theorem is not difficult and so is left to the reader as an exercise. In essence it involves two stages. The first is to establish that {u n — w„} and {w n — v n } are both null sequences in the sense that they con- verge to the limit zero. The second involves the use of Theorem 3-1 (a) to establish that these two null sequences imply lim w n = L. In applications use of this theorem is often confined to proving that a given sequence {w n } converges, so that the sequences {u n } and {v„} then need to be devised to satisfy the conditions of the theorem. Example 3-4 Given that , 11 11 >%=!+;- + -- + •••+ — — + 2 2 2 2"- 1 3.2" use Theorem 3-2 to prove that the sequence {w n } converges and to find the limit. Now, obviously , 11 1 ,11 11 1 +2 + 2 3 + --' + 2^ <Wre<1 + 2 + 2-i + ' , - + 2^ + 2-»' and so using the expression for the sum of a geometric progression we may write 2[1 - (*)»] < w B < 2[1 -(i)» +1 ]. Thus for the sequence {u n } we take w n = 2[1 — (J)"] and for the sequence {v n } we take v n = 2[1 - (l) n + 1 ]. The conditions of the theorem are then satisfied, since lim u n = lim i\ = 2. Hence the sequence {w n } converges and has for its limit the value 2. At this stage in our discussion of sequences the following result should be self evident and we state it in the form of a postulate, rather than prove it. postulate Every increasing sequence which is bounded above tends to a limit. The proof of this postulate is outlined in Problem 3.20 at the end of the chapter. The details are left to the reader, together with the task of showing the consequence that every decreasing sequence which is bounded below must also tend to a limit. It is this postulate that validates the usual arithmetic procedure for finding a square root. In the procedure an additional digit is added to the approxima- tion at each stage, thereby giving rise to an increasing sequence that is bounded above. With a number such as \/2 which we know to be irrational, this same postulate also justifies its successive approximation by the increasing SEC 3-2 LIMITS OF SEQUENCES / 85 sequence {u n } of rational numbers 1, 1-4, 1-41, 1-414, 1-4142, ...,««,.. .. In this case an irrational number \/2 is determined as the limit of a sequence of rationals. The implications are important, since although irrational numbers are of frequent occurrence, in our world in which we live we can only undertake practical calculations using rationals ! Not all sequences are defined explicitly by giving an expression for the general term u n - Often a sequence is defined recursively by giving a formula relating the term u n to its predecessor u n -i, and then specifying the value of Mi. This is, of course, a difference equation, but in this context it is customary to call any rule of this kind a recurrence relation, and one of considerable computational importance is Un =\[ Un-1 + (Un-l)™' 1 where m is an integer greater than unity. The particular significance of this recurrence relation stems from the fact that by using Theorem 3-2 it is not difficult to prove the rather surprising result that {«„} always converges to the limit m y'a, irrespective of the choice of mi provided only that it is positive. The value of the limit is obvious once convergence has been established, for denoting it by L and setting x n -i = x n = L, it follows directly from the recurrence relation that L m = a. Table 3-1 shows the effectiveness of this method as a computational procedure or algorithm for computing \/2 to five figures, using three different starting values for mi. To use the relation to compute \/2 we must first set m = 2 and a = 2 when it becomes Un = r W«-l + Un-1. Taking as representative the three starting values mi = 1, 1-4, and 5, we obtain Table 3-1 in which a dash signifies that no further change occurs in the last digit. Table 31 Un «i = 1 ui = 1-4 «i = 5 1 1 1-4 5 2 1-5 1-41429 2-7 3 1-41667 1-41421 1-72037 4 1-41422 — 1-44146 5 1-41421 — 1-41447 6 — — 1-41421 86 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 Obviously convergence is most rapid when the value assumed for m is a good approximation to the answer, and much effort may be spared by taking a sensible starting approximation. 3-3 The number e Later we shall use an important mathematical constant that is always denoted by the symbol e. This number is both irrational and transcendental, and for reference purposes its value to ten decimal places is e = 2-7182818284. There are numerous different ways of defining this constant, but although these are interesting, our real concern later in this book will be with the mathematical use of the constant e. We shall, for example, see how it is of fundamental importance in the study of differential equations and in the definition of important mathematical functions like the natural logarithm and the hyperbolic functions sinh x, cosh x, and tanh x. However, the real purpose of this section will not be to study these applications, but to examine one interesting definition of e as the limit of a particular sequence. This problem provides both a first encounter with e, and also a useful illustration of how approximate information may be ex- tracted from the properties of a difficult sequence. We shall prove that if lim w (3-1) then 2 < e < 3. The problem of determining e correctly to any given number of figures will be deferred until we are better equipped for the task. Consider the sequence {u n } with the general term u n -(■♦i)- We will first establish that u n is a strictly increasing sequence, so that «n+i > u n , and then show that the sequence {u n } is bounded above by the number 3. The postulate of the previous section then establishes that the limit e exists and is such that e < 3. Finally, the lower bound 2 will be added as a trivial consequence of the proof used to establish the upper bound. First let us expand u n by the binomial theorem : ( i+ i)"- i+B G) + =V>G)"--- n(n — 1) . . . [n — (n - -1)] /1\" + Now rewrite this : SEC 3-3 THE NUMBER e / 87 An exactly similar argument applied to u n +i then gives ~ — '^('-^)^(-.-TlX'-.-Tl) + - *.l(-iTT)(-.4l)-(-:-Tl) Now all the terms in «» and w„+i are positive and «„+i has one more term than u n . In addition, terms in u n+1 that are associated with factorials are larger than the corresponding terms in u n because of the obvious inequalities K-Tl) >(-fr Hence w„+i > u n , showing that {u n } is a strictly increasing sequence. To show that {«„} is bounded above we must try to sum the finite series for u n and then examine the behaviour of the sum as n increases. As the finite series (3-2) stands we can make no progress, but an overestimate of this sum can easily be obtained if the terms of the series are simplified. This approach will suffice for our purposes, since to prove that the limit e exists, we only need to prove that {«„} is strictly increasing and bounded above; a strict upper bound is not necessary here. It is only needed when the exact value of the limit is to be determined. If we use the obvious inequalities ■>K)>('-;)H)>'- it follows at once from Eqn (3-2) that 2 1 2! + 3! + ' " " + «! « n <l + l+l + I + ... + l ( 3 . 3 ) 88 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 This is still too difficult to sum explicitly, so using the observation : A 1-1 1- 3! < 22'4! < 2~3' ' ; «! < 2»-i' we further simplify Eqn (3-3) to the form «.<l + l+i+I + i + + 2»-i (3-4) This can now be summed, since after the first term the remaining terms form a geometric progression. We arrive at the result u n < 1 + 1 - (i)* 1-| whence lim u n < 3. n— *oo The conditions of our postulate are satisfied, so we may conclude that {u„} has a finite limit e and, furthermore, that e < 3. Examination of Eqn (3-2) shows that u n > 2 for all n so that finally we have established our claim that 2 < e < 3. The form of argument used to overestimate series (3-2) is often useful and the final inequality (3-4) is usually called a majorizing series. Closely related to limit (3-1) is the sequence {v n (x)} with general term v n {- »-KT (3-5) To establish the relationship that exists between e and the limit of {v n (x)} let us first denote the limit by E(x), so that E(x) = lim (' + ;)"] (3-6) Suppose x > to be any rational number and define an increasing sequence {«*} of natural numbers by the requirement that the numbers n^x are integral. Henceforth we shall set Nt = nijx. Then by restricting n to be a member of {m} we may define a sub-sequence {vn k i.x)} of {v n (x)} for which Eqn (3-5) may be written in the form / i yv** r/ •♦s)'j (3-7) Using the definition of u n we see that vn k (x) = {u N] ) x , SEC 3-4 LIMITS OF FUNCTIONS / 89 so that taking the limit as rik -*■ <x> we have E(x) = lim v„ k (x) nic—x, = [ lim u Nlc ] x = e* N k —x Whence the important result E(x) = e*. (3-8) With a more subtle argument it can be established that Eqn (3-8) is generally true without the restriction of n to the sequence {«*}. This implies that the result is true for all real x. Fig. 3-3 Graph of the functions e x and e~ x . The function e x is one of the most important functions in mathematics and it is called the exponential function. Fig. 3-3 shows its behaviour with x. Notice that it is an essentially positive function which is strictly monotonic increasing with x. Also shown on the figure is the associated function e _x . 3-4 Limits of functions— continuity The notion of the limit of a function f{x) as x tends towards some value a 90 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 Fig. 3-4 Function /(x) with unbroken graph. is intuitively obvious in the case of functions whose graph is an unbroken curve. A typical function of this kind is illustrated in Fig. 3-4 from which it is easily seen that if x is considered to be a moving point, then f(x) will approach the value f(a) as x approaches a from either the left or the right. In this case/(-x) actually attains the value /(a), and we shall speak of f(a) as the 'limit of/(x) as x tends to a' and write lim/(x) = f(a). Thus, if f(x) = x 3 — 2x 2 + x + 3, then clearly in this case lim/(;c) = 5 =/(2). A slightly less obvious example involves finding lim/(x) when /(*) = Vjc- 1 X- 1 ' since the formal substitution of x = 1 in f(x) seems to yield 0/0 which is meaningless as it stands. The difficulty here is easily resolved by cancelling a factor (V* — 1) in the numerator and denominator to give /(*) = 1 Vx+ 1 from which it is apparent that lim/(jt) = |. 3— 1 In effect, the intuitive notion involved in the limit of a function is essen- tially the same as that for the limit of a sequence. Namely, we say that L is the limit of/0) as x tends to a if, for all x sufficiently close to a,f(x) is close to L. In fact, the determination of the value of the limit L involves the behaviour of f(x) near to x = a, but does not consider the actual value of f(x) at x = a. SEC 3-4 LIMITS OF FUNCTIONS / 91 Domain < \x - a\ < 3 Domain < |x - b\ < 8' Fig. 3-5 Function /(*) has a smooth graph and attains the limit L at x = a. Whether or not/(a) is actually equal to L, as was the case above, is immaterial. By only slightly modifying our definition of the limit of a sequence, we arrive at the following definition of the limit of a function, which is illustrated in Fig. 3-5, and will be used for our subsequent discussion of continuity. definition 3-5 The function/(x) will be said to tend to the limit L as x tends to a if, and only if, for any arbitrarily small positive number e, there exists a small positive number 6 such that < |jc — a\ < d => |/0) — L\ < e. The significance of the condition < |x — a| < <5 is that the value f(a) is specifically excluded from consideration as being irrelevant to the determination of the limit. Thus, if /(*) = (J + x 2 for x ^ 1, for x = 1, then lim/(x) = 2, despite the fact that/(l) = 5. Z--1 If the graph of a function /(^ is not unbroken then more care must be exercised when discussing the notion of a limit. The reason can be seen after examination of Fig. 3-6 in which the graph has a break at x = c, at which point the functional value /(c) has been allocated arbitrarily. This graph defines a perfectly satisfactory function, but as x approaches c from either the left or the right, so f(x) approaches either the value L- or L+ which are 92 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 Fig. 3-6 Function f(x) has broken graph. obviously limits in some sense. Furthermore L- ^ L+ and neither is equal to f(c). To take account of this, we introduce the concepts of a limit from the left and a limit from the right. To simplify the explanation we shall write x -* a— in place of 'x tends to a from the left' and x -> a+ in place of 'x tends to a from the right'. In terms of this notation the function/(x) in Fig. 3-6 has the property that lim = L- and lim = L+ which is indicated in the diagram by means of arrows. Once x-*c + again, in arriving at the limits from the left and right of a point, the functional value itself at that point is not involved. It may or may not equal one of the two limits so denned. These ideas may be expressed formally as a definition. definition 3-6 The function /(x) will be said to have the left-hand limit, or limit from the left, L_ as x ->■ a— if, and only if, for any arbitrarily small positive number e, there exists a small positive number 6 such that < a — x <<5=> |/(;t) - L-\ < s. A corresponding definition exists for the right-hand limit, or limit from the right, asx-> a+ in which L- is replaced by L+. Notice that the function f(x) in Fig. 3-6 only has one-sided limits at x = a and x = d and, even though /(x) has a cusp at x = b, and so is not smooth there, it nevertheless still has a limit in the ordinary sense at that point. This is because of the following obvious result. SEC 3-4 LIMITS OF FUNCTIONS / 93 theorem 3-3 If f(x) has identical left- and right-hand limits at a point x = a so that L- = L+ = L, say, then lim/(x) exists and is also equal to L. x~+a We shall usually resolve simple limit problems of the type just discussed either intuitively or, perhaps, by appeal to a graph. However, for complete- ness, we now apply the formal definition of a left-hand limit to a specific function to show, in principle, how it may be used as an analytical tool in less obvious situations. For our example we apply the formal definition of a left-hand limit at the point x = 1 to the function /(*) - {; for x < 1, forx>l. Clearly the left-hand limit at x = 1 is determined only by the behaviour of/(x) to the left of that point. The behaviour of/(x) for x > 1 is irrelevant to the determination of lim/(x). Obviously, as x —>■ 1 — so x 2 ->■ 1, and thus, X— 1- intuitively, lim/(x) = 1. x-*\- If our intuitive argument is correct and this limit is in agreement with our definition, we must show that for any e > we can find a positive S, which will probably depend on e, such that \x 2 — 1| < e when x— ► 1— and < 1 — x < d or, equivalently, 1 — 6 < x < 1 . We have \f(x) - L-\ = |x 2 - 1| = |(x - l)(x + 1)| = \x - 1| . \x + 1|, but since \x — 1| < d this becomes |x 2 - 1| <d\x+ 1|. (A) Since x < 1, we overestimate x in (A) if we replace it by the value unity so that we have |x 2 -l|<2<5. (B) Finally, to make this expression less than any small positive number e, we need only make 28 < e. This finally proves that lim f(x) = 1. Some numbers might help here. Suppose, for example, we wish to find the condition that/(x) should be within 0-001 of the left-hand limit at x = 1. This amounts to asking that |x 2 — 1 1 < 0-001, which is equivalent to setting e = 0-001. Hence, as 6 < \e = 0-0005, our x-inequality 1 — 6 < x < 1 tells us that the required condition on/(x) will be satisfied provided 0-9995 < x < 1 . In higher mathematics this analytical approach is indispensable but, as already remarked, for our purposes a graphical approach to the limit of a function must suffice in most cases. An exception is the discussion of indeter- minate forms which involve finding the limit of a quotient as x approaches some value at which both, numerator and denominator vanish. This will be taken up again later as an application of calculus though the reader should notice that we have already resolved one such simple problem involving a 94 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 limit of the form 0/0. Although a function such as /!<*> = £ 2+ 1 3 for x an integer for all other x is a perfectly satisfactory function from the mathematical point of view, it is not likely to occur in connection with physical problems. We make this assertion because in the physical world functional relationships are usually smoothly changing in the sense that a small change in the independent variable usually produces only a small change in the dependent variable. This is not always the case however and, for example, in gas flows involving a gas shock wave the gas pressure experiences a sudden jump across a geo- metrical surface in space called the shock front. Hence a graph of the gas pressure p across a plane shock at x = a, as a function of the distance x measured normal to the shock front, could appear as in Fig. 3-7. p , —p 2 is pressure jump across shock front Shock front Fig. 3-7 Gas pressure p as a function of distance normal to shock front at x = a. Nevertheless, despite the existence of common physical situations of this type a function as erratic as/i(*) is not likely to be encountered in the real world. Aside from points at which a jump occurs, the 'reasonable' functions that occur in physics and engineering must be expected to have the smooth- ness-of-change property we described earlier. This smoothness-of-change property is given the mathematical name continuity and plays an important part throughout all mathematical analysis. If the reader pauses to think for a moment he will see that the following definition describes continuity in terms of the left- and right-hand limits. definition 3-7 The function f(x) is said to be continuous at x = xo if: (a) lim fix) = lim /(*) = L x-+<co — X-+XQ + and (b)f(x ) = L. SEC 3-4 LIMITS OF FUNCTIONS / 95 In this definition, (a) demands the equality of the left- and right-hand limits and (b) ensures that there is no 'gap' in the graph of f(x) at x = xo- That is to say that the point (xo,f(xo)) lies on an unbroken curve and so coincides with the limits (a). An alternative, but equivalent, definition of continuity that is often used replaces (a) by the requirement that lim/(x) = L but still retains (b). Either form of definition is equally good but we have chosen to emphasize the ideas of left- and right-hand limits since they find important applications in engineering and physics. Continuity essentially describes a property of a function in the neigh- bourhood of a point of interest and not just at the point itself. Accordingly, a function will be said to be continuous in the interval (a, b) if it is continuous at all points x within (a, b). Notice that the effect of condition (b) of our definition on a function such as -,./*» + 1 for * =£1 ^ ) = ( 6 for*=l is to show that/(x) is continuous everywhere except at x = 1. Let us paraphrase the notion of continuity. In effect, by requiring that a function f(x) be continuous at x = a, we are insisting that if the variation of the function about the value L =f(a) does not exceed ±e, where e > is arbitrary, then we can find an x-interval of width 26 centred on x = a within which this property is always true. This is illustrated by Fig. 3-5, which also indicates that in general the number 6 depends on both e and the value of x at which fix) is continuous. Thus for the same value of e, the interval about x = a is of width 26, whereas the interval about x — b is of width 23', with 6' # 6. If the function f(x) is continuous in a closed interval [xi, X2] and s is given, consider the point x = b at which the function changes most rapidly, and find the appropriate interval of width 26' centred on x = b in which the functional variation from f(b) does not exceed ±e. Because the functional variation at x = b was the greatest of any point in [jci, X2], it is obvious that if this same interval of length 26' is associated with any other point x' in [xi, X2], then the functional variation within that interval will certainly differ by less than ±e from the value f(x'). Hence we can assert that for a function f(x) which is continuous in a closed interval, when given an e it is possible to find a number 6 for the definition of continuity which depends only on e and in no way on the value of x at which continuity is being discussed. Because of this continuity property which applies uniformly to points throughout the closed interval [xi, xz] we speak of such functions as being uniformly continuous. This concept proves to be of extreme importance when these ideas are pursued further. The requirement of continuity in a closed interval cannot be relaxed, for then the result is no longer true. For example, the function/(x) = l/x defined 96 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 in the semi-open interval (0, 2] is continuous, but not uniformly continuous. This is because for any given e, the closer we take our point x' to the origin, the smaller must we take the value of 6 in order to satisfy \f(x) —/(*') I < e for \x — x'\ < d. There is obviously no smallest value of d that will apply to the entire interval. There are a number of immediate consequences of the definition of a limit of a function and of the definition of continuity which we now state as two important theorems, theorem 3-4 (limits) Suppose that lim/(x) = L and limg(x) = M, then X^-XQ X-i-Xq (a) lim [f(x) + g(x)] =L + M; x->-xo (b) lim f(x)g(x)=LM; x-+x (c) provided M ^ 0, lim [f(x)/g(x)] = LjM. x-*xa The proof of these results is similar in all respects to the proof of Theorem 3-1 and since a representative example was presented there we shall not repeat the argument again. theorem 3-5 (continuity) If/(x) and g(x) are continuous at x = xo, then so also are the functions (a) /(*)+£(*); (b)/(*)sto; (c) f(x)lg(x), provided g(x ) ^ 0. If, furthermore, f(x) is continuous at x = xq and g(u) is continuous at u —f(xo), then the continuous function of a continuous function g[f(x)] is continuous at x = xo. Once again the proof of this theorem is similar in all respects to the proof of Theorem 3-1. However for the curious reader we shall prove result 3-5 (a), using the alternative definition of continuity that we mentioned. To prove f(x) + g(x) is continuous at x = xq we must establish that lim(/(x) + g(x)) = L exists and that/(x ) + g(x Q ) = L. Now as/(x) and X— *-XQ g(x) are continuous at x = xo by supposition, then lim/(x) = f(x ) and lim g(x) = g(xo) and so for any positive e there must exist positive numbers di and <?2 such that \x — xo\ < di =>• \f(x) — f(xo)\ < |e and \x — xo\ < d 2 => \g(x) - g(xo)| < is. Now, \(f(x) + g(x)) - (/(xo) + g(x ))\ = l(/(*) - /(*o)) + (g(x) - g(x ))\ <: \f{x) - /(xo)| + \g(x) - g(x )| and \x - xo| < smaller of (d u (5 2 ) => \f(x) -f(x )\ + \g(x) - g(x )\ < ie + ie. Thus, given any positive e, we have established that by taking d less than either SEC 3 ' 4 LIMITS OF FUNCTIONS / 97 di or d 2 we ensure that |(/(.y) + g(x)) - (f(.x ) + #(x ))| < e. This formally proves our assertion. The proofs of results (b) and (c) are similar. Arguments involving continuity usually rely for their success on the knowledge that certain familiar functions are continuous. Once a small list of such functions has been established it can then be considerably enlarged by repeated applications of Theorem 3-5. Accordingly, we present below a table of functions, in each case stating the intervals in which they are con- tinuous. No proof will be given for most entries since the results are obvious from the graphs but for the sake of completeness we shall formally prove the first three entries. Example 3-5 (a) Given that C = constant, the function f(x) = C is continuous every- where. The proof is trivial, since for any x = x ,f(x ) = C showing that the defini- tion is always satisfied. (b) The function f(x) = x is continuous everywhere. The proof is again trivial, but let us indicate how the alternative definition of continuity may be used. We must prove that for all x , lim/(x) exists and is x—xo equal to/(x ). Now it is obvious from the definition of/(x) that/(x ) = x . Also, for any x = x and given e > 0, |/(x) -f(x )\ = \x - x | < e => \x — x \ < e so that in this case the quantity 6 = e. The function is thus continuous at x = x and, as x was arbitrary, it finally follows that/(x) = x is continuous everywhere. (c) The function f(x) = x n with n a positive integer is continuous every- where. We give a proof by induction. Suppose the result is true for some n so that x n is continuous at x = x for all x . Now x n+1 = x . x n , and we have just proved that x is continuous at x . Hence, using Theorem 3-4 (b), x n+1 is continuous. The result is true for n = 1 and so by the principle of induction it is true for all n. With a little more care this result can be shown to be true for any real positive n and not just for n a natural number. The information contained in this table is likely to be useful on many occasions and so should be memorized. Its application, together with Theorem 3-5, to questions of continuity is usually immediate. Thus, for example, the function /(x) = 1/x + sin x is continuous everywhere except at the point x = 0, and/(x) = (x™ + a lX ™-i + ■ ■ • + a m )/sin x, with m > 0, is continuous everywhere except at the points x = rrn for which n is an integer. Finally, in preparation for our use of limits in connection with the tech- niques of differentiation, we extend the O-notation to include functions of 98 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 Table 3-2 Short list of continuous functions Fund ion f{x) Interval over which f(x) is continuous C (constant) ( — 00, oo) X (—00, oo) x n (// > 0) (—00, oo) *-" (n > 0) (— oo, oo) excluding point x = 1*1 (—00, oo) x n + w"" 1 + • • • + a n (n > 0) (—00, CO) x n + aix"' 1 + ■ ■ ■ + a n x m + bix m ~ l + ■ ■ ■ + b m (-co, co) excluding the zeros of the denominator sin x (—00, oo) COS* (—00, oo) tan* cosec x cot* (2« - 1) - < x < (2/i + 1) -, integral n O - 1) j < x < (2/i + 1) j, integral n mr < x < (n + 1)tt, integral n mt < x < in + 1)jt, integral n smaller order. Henceforth, we shall write f(x) = o(g(x)) as x -► xo with the meaning that «mM_ ft The symbol o is read 'little oh' and in words the statement asserts that the function /(x) is of smaller order than g(x) asx-> xo- For example, we may write (1 + x 2 ) 3 = 1 + 3x 2 + o(x 3 ) as x -* 0, since (1 + x 2 ) 3 - 1 - 3x 2 , = 3x 4 + x 6 = o(x 3 ) as x — >- 0. 3-5 Functions of several variables — limits, continuity The related concepts of a limit and the continuity of a function extend without difficulty to functions of more than one independent variable, provided only that the notion of the proximity of two points is suitably extended. The ideas involved here can best be appreciated if we confine attention to functions f(x, y) of the two independent variables x and y. Let us suppose that/(x, y) has for its domain of definition some region D in the (x, j)-plane and that (xo, yo) is some point interior to D. Then, before considering f(x,y), we must first make clear what is to be meant by x -*■ xo, y->yo in D. SEC 3-5 FUNCTIONS OF SEVERAL VARIABLES / 99 ifr-V+G'-rj'-s^ Fig. 3-8 Paths for which the point (x,y) -* (xo, yo). An inspection of Fig. 3-8 shows that starting from the points P and Q in D, both the full curve and the dotted curve describe possible paths by which x and y may tend to x and y . In general, we shall write x -> x , y -yyo, or, say that the point (x,y) tends to the point (x ,yo), if />->-0, where P = VK* — *o) 2 + (y — yo) 2 ] is the distance between the moving point (x,y) and the fixed point (x ,yo). This simple device then allows us to interpret a statement about the two variables x and y in terms of a statement about the single variable p. By confining attention to a circular region of radius d centred on (xo, yo) we may conveniently define a neighbourhood of the point (x ,yo). Any rectangle or other simple closed geometrical curve containing (x , yo) would, of course, serve equally well to define a neighbour- hood of (xo, yo)- When using such a neighbourhood it may or may not be necessary to exclude the boundary and the point (x , yo) itself from the defini- tion of the neighbourhood. Thus, for example, the square x = 0, y = 0, x = 1, and y = 1 defines a neighbourhood of the point (J, J). The function f(x, y) = \l{xy(x - 1)0 - l)(x ~ i)(y - J)} is defined in this neighbourhood, but not at (J, £), on the boundary or on x = \,y = i Definition 3-8 is now proposed, with this interpretation of x-^-xo, y ->• yo firmly in mind. definition 3-8 The function /(x, y) will be said to tend to the limit L as x -»■ xo and y -+ j> , and we shall write 100 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 lim f(x,y) =L, if, and only if, the limit L is independent of the path followed by the point (x, y) as x —>■ xo and y ->■ yo. As before, we do not necessarily require that f(x , yo) = L, as the func- tional value actually at the limit point (x , yo) is not involved in the limit process. If it can be established that the result of the limiting operation depends on the path taken then, demonstrably, the function has no limit. The following examples make these ideas clear and, on account of their simplicity, are offered without proof. Example 3-6 (a) If fix, y) = — "— , then lim = — ; !/— 1 (c) if f(x, y) = -— — f— then lim — - — -f- x*+y*+V t-.fr x* + y* + I 8+772 (d) if f(x, y) = — -, then lim f(x,y) does not exist since yi x — 1) a;_»i lim fix, y) = 1 if taken along the line y = x, but lim/(x, y) = — 1 x-*l x -*l »— 1 y-*l if taken along the line y = 2 — x. As might be expected, the concept of continuity of a function fix, y) of two variables then follows as a direct extension of the definition of a limit. definition 3-9 The function fix, y) will be said to be continuous at the point (xo, Jo) if: (a) lim/(x, y) = L exists I-KtO !/— V0 and (b) /(xo, 70) = L. We shall say that/(x, y) is continuous in a region if it is continuous at all SEC 3-5 FUNCTIONS OF SEVERAL VARIABLES / 101 points (x, y) belonging to that region. Notice that condition (a) demands that f(x, y) has a unique limit as x -*■ xo and y -»■ yo, and condition (b) then ensures that there is no 'hole' in the surface z =f(x,y) at the point (xo,yo). The continuity of a function f(x, y) is illustrated in Fig. 3-9 where a circular neighbourhood of the point (xo, yo) is shown in relation to the surface. In effect, continuity of/(x, y) is simply requiring that a small change in location of the point (x, y) will cause only a small change in z = f(x, y). Fig. 3-9 Continuity of f(x, y) at (x , y ) and discontinuity at (a, b). In Fig. 3-9 the point (a, b) has been deliberately detached from the other- wise unbroken surface z =f(x,y), so that the function f(x, y) does not satisfy the definition there and hence is not continuous at that single point. In general, a function of one or more variables which is not continuous at a point will be said to have a discontinuity at that point or, alternatively, to be discontinuous there. Thus the function of one variable shown in Fig. 3-6 has a discontinuity at x = c and the function of two variables shown in Fig. 3-9 is discontinuous at x = a, y = b. These ideas also extend to functions of several real variables in an obvious manner once the 'distance' between two points has been defined satisfactorily. For functions /(x, j, z) of the three independent variables x, y, z a suitable distance function between points (x u y u zi) and (x ,y , z ) is the linear dis- tance between them when plotted as points relative to three mutually perpen- 102 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 dicular Cartesian axes. The distance p is then given by the Pythagoras rule as p = {(xi - xo) 2 + (71 - jo) 2 + (zi - z ) 2 } 1/2 . The interpretation of distance in the so-called finite dimensional spaces of «-dimensions generated by functions of n independent variables is of con- siderable importance in mathematics. Essentially, of any function p(P, Q) measuring the distance between points P and Q in the space we require that for any points P, Q, and R: (a) P (P, Q) > 0, (b) p(P, Q) = if, and only if, P = Q, (c) P (P,Q)=p(Q,P), (d) P (P,R)<p(P,Q)+p(Q,R). It is easy to check that the two distance functions already defined satisfy the above conditions, but this will be left as an exercise for the reader. Again the determination of the regions in which any given function is continuous will usually be done either on an intuitive or on a graphical basis. Thus, in Example 3-6 it is easily seen that: 2x (a) f(x,y) = -— — 2 is continuous everywhere; xy + 1 . (b) f(x, y) = 2 2 is continuous everywhere except at x = 0, y = 0; x ~t" y , „ -, s sin xy (c) f(x, y) = —— — is continuous everywhere; ' x 2 + J 2 + 1 J (d) f(x,y) = — — is continuous everywhere except at (0, 0) and (1, 1) and along x = 1 and y = 0. 3-6 A useful connecting theorem By now it will have become apparent that there is a strong connection between theorems concerning limits of sequences and the corresponding theorems concerning limits of functions. In fact, with only trivial modification, most limit theorems that are true for sequences are also true for functions. Naturally this is no coincidence and the reason is explained by this connecting theorem. theorem 3-6 Let f(x) be a function defined for all x in some interval a < x < b. Further, let {x n } be a sequence defined in the same interval which converges to a limit a that is not a member of the sequence. Then if, and only if, lim/(x„) = L for each such sequence {x n }, it follows that lim/(;c) = L. re— 00 x^-a The proof of this connecting theorem comprises two distinct parts. First SEC 3-6 A USEFUL CONNECTING THEOREM / 103 it must be established that if \imf(x) = L, then sequences {x n } exist having the required property. Second, the converse result must be proved; that if the required sequences {x n } exist, then lim/O) = L. Together, these two results x—*a will ensure that the theorem works in both directions, so that corresponding function and sequence limit theorems satisfying the necessary conditions may be freely interchanged without further question. The first part of the proof is a direct consequence of Definitions 3-3 and 3-5. It follows from Definition 3-5 that when x is confined to some neighbour- hood N a of a, then f(x) is confined to a neighbourhood Nl of L. From Definition 3-3, since {x n } has the limit a, there must be some number «o such that for n > «o it follows that/(x n ) will also be confined to the same neighbourhood Nl of L. The second step is a little harder, since it involves an indirect proof by contradiction. It involves showing that if we assume that limf(x) ^ L, then a sequence {z n } can be found satisfying all the requirements of the theorem, for which lim/(z ra ) ^ L. Hence the contradiction showing that ft-* 00 the conclusion lim/(x) ^ L was false. We leave the details of this to any interested reader as an exercise. To close this chapter, we shall use this theorem together with geometrical arguments to establish the three useful limits: /sin olQ\ s(— H (3 ' 9) S(Hr^)- 0; (3 ' 10) ,. /l-cosa0\ a 2 ,„,„ !2(— jH-t <3U) These limits are all of the indeterminate variety mentioned earlier and, although this topic will receive special mention in a subsequent chapter, it is important for the development of our work that they be examined now. We shall establish that they are all related to the single limit sfrV 1 - which we prove first. Consider Fig. 310 which represents a circular arc of unit radius with its centre at O, inscribed in the right-angled triangle OAB. Then it is obvious that Area of triangle OAC < Area of sector OAC < Area of triangle OAB. Expressed in terms of the angle 6 measured in radians this becomes 104 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 i sin 6 < h6 < \ tan 0, from which we see that sin 6 cos a < < 1. Fig. 3- 10 Area inequalities. (A) This result must be true for all acute angles and, in particular, for the values of the sequence {0»} denned by 6 n = \jn. Thus (A) takes the form „ sin d n , cos d n < — t — < 1 On (B) and, since lim d n = where the limit is not a member of the sequence, we n— *co may combine Theorems 3-2 and 3-6 to deduce that aft 1 )-- (3-12) To establish limit (3-9) it is only necessary to replace 6 in Eqn (3-12) by 7.6, giving rise to /sin <x0\ lim ( — — = or, equivalently, 1 ,. /sin a.6\ The limits (3-10) and (3-11) then follow by using the identity 1 — cos a0 = 2 sin 2 \<*.Q to form the expressions 1 — cos <*0 „ . , „ /sin ioc0\ — =2siniac3 I— ^— \, and PROBLEMS / 105 1 — COS ( >_a0 _ _/sin£a0\ 2 e 2 Applying result (3-9) to these we finally arrive at the required results ,. /1-COS0\ hm ->■ . a = e-o \ J and 9 ^o I e 2 / \2/ 2 The following general result is sometimes useful and, as we shall show by example, may be combined with Eqns (3-9) to (3-11) to give a number of interesting results. Suppose f{x) and g(x) are two functions such that ]imf(x) = a and x-*a lim£(x) = /?, where a and /S are both finite. Then, clearly, limrfz) hm [f(x)yW = [Iim/fr)]*-« = «?. x—*-a x—*a This result, which is true in general, is of course also true when one or more of the limits involved is of the form Eqns (3-9) to (3-11). Example 3-7 , , ,. /x 3 + 2x 2 + x + 1\ [1 ~ m ** x ~ DW* - 1) 2 (a) lim I I *-i \ x* + 2x + 3 ) /un i- Z 1 - cos 3*V sin 2 *^ (b) hm *-0 \ X 2 J Solution to (a) Here fix) = (pfi + 2x* + x + l)/(x 2 + 2x + 3), so that lim f(x) = 5/6 and as g(x) = [1 - cos 20 - l)]/(jf - 1)2, j t follows from X-+1 Eqn (3-11) that limg(x) = 2. Hence, lim [/ftc)]»<*» = (5/6) 2 = 25/36. «— i SWh?w« to (b) In this case/(x) = (1 - cos 3x)/;c 2 and gix) = (sin 2x)/x. A direct application of Eqns (3-9) and (3-11) then shows that lim/(;c) = 9/2 and lim^O) = 2 and thus lim [f(x)y (x) = (9/2) 2 = 81/4. *~* *— x->-0 PROBLEMS Section 31 3-1 Give an example of a numerical sequence and of a non-numerical sequence. 106 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 3-2 Use the terms bounded, unbounded, strictly monotonic increasing, and strictly monotonic decreasing to classify the sequences {u n } which have the following general terms: (a) ii. = (-«)«+!; (b) u n -(-?' (c) u„ = sin (1/n); (d) u n = 2 + (-1)"; , s n + 1 . 2n + 3 (e) u n = = — —z> (f) « n = — — 7- In + 3 « + 1 3-3 Give an example of each of the following types of sequence: (a) bounded; (b) strictly monotonic decreasing; (c) monotonic decreasing; (d) strictly monotonic increasing; (e) bounded above; (f) bounded below. 3-4 Use an ordinary graph to plot the first ten terms of the sequence {u n } for which k» = (-J)»(» + 2)/«. 3*5 Using the device described in connection with Fig. 3-1 (b) to compress the horizontal axis, plot the first five terms of the sequences {u n } which have the general terms: (a) u n = (■ •«" (m-- n J (b) Un = 1 + 2 ~y r = l r- Section 3-2 3-6 Find a neighbourhood {a, b) of the sequence {1 + (— l) n /«} such that (a) there are 100 terms outside it; (b) thefe are 10,000 terms outside it. Deduce that there are infinitely many terms inside any such neighbourhood. 3-7 Find a neighbourhood (a, b) of the sequence {(2n + l)/«} such that (a) there are 10 terms outside it; (b) there are 1,000 terms outside it. 3-8 Name the limit points of the sequence {u n } which has the general term u n = sin [(« + l)/2]w. Identify the sub-sequences that determine these limit points. 3-9 Name the limit points of the sequence {u n } with the general term u n = sin [(n 2 + n + 1)/2/i]tt. Identify the sub-sequences that converge to these limit points. 3-10 Give examples of sequences having (a) no limit point, (b) one limit point, (c) two limit points. 3-11 Name the limit points of the sequence {u n } which has the general term 1 ~ 32S for " even Un= { ——7 for n odd. PROBLEMS / 107 State whether or not the limit points belong to the sequence. 3-12 Determine the following limits: (a) hm ; n— oo n 3 n* i im (2» 8 + n - !)(« + 2) . {h) l™ (3»» + 7«+ll) ' (c) lim ;; — TTTn' ... ,. n + (-2)" (d) hm ; — — -; .. ,. /l 2 + 2 2 + 3 2 +- • - + tfl\ (e) ^ ( W J" 3-13 Give an expression for the «th term of the sequence y/2, V@V2)> V[2V(2 V2)], .... Use your result to deduce the limit of the sequence. 3-14 Determine the limits : (a) lim (V(« + a) — V"). where a > is any real number; n— *-oo „ „ ,. nil sin n — 3 cos 2n) (b) lim ; „—«, n 2 + 2/i + 1 (3«+2 -(. 5«+2\ (d) lim »-v/(l + "")(a ^ °)- n— > oo 3-15 Use the O notation to express the behaviour of the following expressions for large x: (a) 2x 2 + x + sin (1/x); (b) 3 + -; , N 3x 3 + 2x + 1 ,., * 3 sin x + 1 (C) x 2 +l ; (d) x3 + 3 ; x 2 (C) V(* 3 +X+1) 3-16 Suppose that the sequences {u n }, {v n }, and {w n } are such that u n < w n < v n for all n greater than some fixed number no, and that {u n } converges to the limit L and {v n } converges to the limit M. Show by example that the sequence {w n } need not converge to a limit. 317 Outline the details of the proof of Theorem 3-2. (Hint: Consider the limits of the sequences {u n — w n } and {w n — v n }-) 3-18 Give two different proofs of the convergence of the sequence {u n } in which 11 11 u n =l+~ 3 + ^ + • • - + 3^1+ —^> appealing first to Theorem 3-1 (a) and then to Theorem 3-2. 108 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 3-19 Use Theorem 3-2 to prove the convergence of the sequence {u n } in which 1 / 1\ w 2 / 2\ -n 3 . / 3\ w «„ = - 2 sm^l+-j-+- 2 s 1 n^H--j- + -s.n^l+-j- + --- n — 1 . / , n — 1\ w sin 1+ -. n \ n J 2 3-20 Let {u n } be an increasing sequence bounded above by m. Let this bound m, together with the members u n of the sequence, be represented by points on a line. Then, either the mid-point mi = i(ui + m) of the line segment between hi and m is an upper bound of {««}, or it is not. According as rm is, or is not, an upper bound of {««}, take for the next point rm the mid-point of the half line segment to the left or right of mi, respectively. Next, according as mz is or is not, an upper bound of {u n }, take for the next point ma the mid-point of the quarter line segment to the left or right of ni2, respectively. Repeat this process indefinitely to generate an infinite sequence of points {m r } as indicated in the diagram. :,l*Si m Limit L of {u n } Give reasons why (a) {m r } has a single limit point L ; (b) the fact that {u n } is an increasing sequence implies that lim u n = L. m-coo 3-21 Let u n = i(u n -i + (o/«n-i)) and v„ = («„ — V«)/(«n + Va), where hi and a are any positive numbers. By showing that v„ = v n -i 2 = v n -2 4 = v n -3 8 = ■ ■ ■ = vi*"'", deduce the result < v n < | fi |". Then, using Theorem 3-2, prove that lim v„ = thereby establishing that lim u„ = \/a. n— *co ?i-*co 3-22 Using the algorithm u„ = Mu n -i -\ compute to four figures the first five terms in the sequence {««} corresponding to the starting values (a) u\ = 1, (b) «i = 2. Compare your results with the limiting value V3. 3-23 Using the algorithm u n = Mu n -i H A compute to four figures the first five terms in the sequence {u n } corresponding to the starting values (a) wi = 1, (b) «i = 2. Compare your results with the limiting value 3 V5. Section 3-3 The following two related problems show how the approximate behaviour of e* in the interval — 2 < x < 2 may be inferred directly from the sequence {v n (x)}. 3-24 Define v n (x) by the expression v n (x) = (l + A"- PROBLEMS / 109 Use essentially the same arguments as those leading to Eqn (3-4) to prove that {v n (x)} is a strictly increasing sequence for any fixed positive x and then show that x 2 x 3 x n Vn{x) <l + x + T + - + --- + — . By summing this expression and taking the limit as n -*■ co deduce that 2 + x 1 < e* < for < x < 2. 2 — x Compare this result with Fig. 3-3. 3-25 Using the same definition of v n (x) as above, form the sub-sequences {v2m(x)} of even terms and {t)2m+i(x)} of odd terms. Modify slightly the arguments used in the previous example to prove that both sub-sequences are strictly mono- tonic decreasing for negative x. Show that {V2, n +i(x) — i>2m(x)} is a null sequence and hence deduce that both the even and odd sequences tend to the same limit. Modify v* m (x) to establish that x 2 x 3 x 2m V 2m (x) >\~ X+ ---+■■ ■+ ^n By summing this expression and taking the limit as ;/ -»• co deduce that 2 — x < e* < 1 for < x < 2. 2 + x ~ ~ Compare this result with Fig. 3-3. Section 3-4 3-26 Determine the following limits of functions: (a) lim x 3 - x 2 + x + 1 ; (b) lim * t *"!" S z~a *~*3 x 3 — 1 / ^ r V(x2 ~ 6) mm- x 3 + x 2 ~x-2 (c) hm — ; (d) lim .3 x 2 + 1 ' v ^__ 2 (x + \)(x + 2) (e) lim - + A)3 ~ * -; (f) lim {\/(x 2 + 1000) - VO 2 - 1000)}; (g) lim x[V(.x 2 + 3) - jc]. Determine these limits whei (x 3 + x - 1 f( (a) \imf(x) where /(x) = ( «-i 1 1 + sin (.v — :r-*cc 3 r 27 Determine these limits when they exist : - 1 for x < 1 1) for x > 1 ; x - 1 (b) lim f. (c) lim/(x) where f(x) = ( r x 2 + sin i « for x < 3 4 + x 2 for x > 3 ; (d) lim |x 2 -l |; (e) lim ? + C ° S * ■ x-~i x-~\* 1 - sm x 110 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 3-28 Determine the left- and right-hand limits of these functions at the stated points: 3^+1 _j. 5^+1 (a) lim — — ; *-2± 3* + 5* ,w> r /•/■-> u rr ^ f 1 + 2 sin* for .t <\* (b) hm f(x) where /(a-) =j x—u± |cosec x for x > \-n\ (c) lim I x 2 + x - 1 I ; x— 2± -2 for x < (d) lim /"(*) where /'(jf) = ,^o± 7W yv U+-|x|forjc>0; (e) lim -^— • z->-3± J — •* 3-29 Determine the domains of definition for which these functions are continuous : (a) /(*) = x + M ; (b) f{x) = l/(x 2 - 1) ; ^ « ^ * 5 + x 2 - 1 x s + 4x 2 + x _ 6 (C) * x) = 4 + sinx-2cosx' (d) ^ W = (x - l)(x + 4) ; !2x + sin x for x # « 77/2 « 2 + 1 . ,. 3-30 Give examples of functions of the following type: (a) continuous everywhere except at x = 1 and x = 2; (b) discontinuous at the points x = rnr with n an integer; (c) continuous everywhere but neither purely algebraic nor purely trigo- nometric; (d) continuous everywhere except at x = 1, where the left-hand limit is —1 and the right-hand limit is 3 ; (e) continuous everywhere except at x = 1, where the left-hand and right- hand limits both equal 2. 3-31 Suppose it is known that a function /(x) is continuous over the interval xo < x < x 2 , and that f(x ) = yo, f(xi) = y x and /(x 2 ) = yi. Explain why it is reasonable to assume that when the functional values yo, yi, and yz are reasonably close together, f(x) may in some sense be represented by the expression f( x ) ^ - *i)(* - X2) (x - x )(x - x 2 ) ~ (xo — xi)(xo — X2) (Xl — X ){X\ — Xl) ' (x — Xo)(x — Xl) (X2 — Xo)(X2 — Xl)' 2 ' Any formula such as this, from which the behaviour of a function over an interval is inferred from its behaviour at specific points in that interval, is called an interpolation formula. This particular one is called the three point Lagrangian interpolation formula and we shall see later that it gives exact results when applied to any linear or quadratic function f(x). Considering y = sin x for < x < 3tt, explain how this formula might give misleading results. PROBLEMS / 111 3-32 Apply the expression given in Problem 3-31 to the function y = sin x, taking as the points xo, x\, and xi the respective radian arguments 0-6, 0-9, and 1-2 and so find the appropriate three point Lagrangian interpolation formula over the interval 0-6 < x < 1-2. Use your result to deduce approximate values for sin 0-8 and sin 11 and compare these with the exact tabulated values. 3-33 Repeat the previous problem, but this time take xo = 0-4, xi = 1-2, and X2 = 1-7 and deduce approximate values for sin 0-9 and sin 1-5. Compare your results with the exact tabulated values. 3-34 Consider the continuous function f(x) defined on the interval [0, 2] by the rule f(.x) — x for < x < 1 and f(x) = 2 — x for 1 < x < 2. Taking xo = 0-2, xi = 0-8, X2 = 1-3, apply the expression given in Problem 3-31 in order to find an interpolation formula over the interval [0-2, 1-3]. Compare the approximate and exact values at x = 0-5, 0-7, and 10. 3-35 The density of thematerial of a rod of length L is a function /'(x) of the distance x measured from one end. Describe in physical terms, rods that are char- acterized by the following functions f(x) : (a) /(x) = constant for < x < L ; I pi for < x < §L (b) /(*)=" (c) fix) = P (l + kx) < x < L. 3-36 If the function f(x) has the same meaning as above, specify the functional forms it must take in order that it describes : (a) a rod of length L having constant density pi over half its length and a density that changes steadily (that is, linearly) with distance from pi to P2 over the remaining half of the rod; (b) a rod of length L comprising three sections of equal length with constant densities pi, P2, and P3 in each section ; (c) a rod of length L having a density that increases quadratically with x (that is, like the square of x) from pi at x = to P2 at x = L. Section 3-5 3-37 Let/(x, y) denote the density of the material at the point (x, y) of a thin flat plate in the (x, j)-plane. Give the functional forms of/(x, y) in order that it should describe: (a) a circular plate of radius R centred at the origin, with the material to the left of the j-axis having a density pi and the material to the right a density Pi', (b) a circular disc of inner radius R and outer radius 3R in which the density is constant and equal to p out to a circle of radius 2R, after which it decreases linearly to the value Jp at the outer edge of the disc; (c) an isosceles triangle with its apex at the origin and sides of length L lying to the right of the y-axis and inclined at angles \-* and —\t, respec- tively, to the x-axis, with the material above the x-axis having a density Pi and the material below the x-axis having a density P2. 3-38 Let point P have the Cartesian coordinates (1, 1), and let N\ denote the unjt circle drawn with P as its centre. Define N r to be a circle, concentric with Ni, and let us agree to write N r +i <= JV r if the circle AV+i is contained within the circle N r . Then N r +i <= N r , for all r, describes a family of neighbourhoods of 112 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 the point P. Give examples of families {N r } of neighbourhoods of P that : (a) have the property that lim (radius of Nr) ->- J; r— ►co (b) have the property that lim (radius of N r ) ->- 0; r— *-co (c) have the property that; area N r +i = J area N r and lim (area of N r ) -> \. r->co 3-39 State the largest neighbourhood about the stated points P in which the following functions are defined. Also state if they are defined at P and on the boundary of the neighbourhood : (a) f(x,y) = \l{xy(lx - l)(y + 2)(x + l)(y - 2)} taking point P as (- 1, 2); (b) /(x, v) = _ a _ 2 taking point P as (0, 0); 1 + x 2 + y 3-40 Determine these limits when they exist : (c) f(x,y) = 1 ^ taking point P as (2, 3). / ^ ,. ^ 2 y 2x 2 + xv + 1 (a) ^ 2, 2 + 2v 2 +i ; (b) ," m K , 2 + 2,; + , 2 ; !/— 2 J/--.2 (c) ,im fr-')™* . (d) lim :r-2 X i — 4 j^o W— 1 V— Iff 1+2 cos xv + sin xy 2 + xy 3-41 Give examples of functions f{x,y) having these properties: (a) Km f{x, y) = 2; (b) Mm. fix, y) = 0; W—3 )/-> Jtt (c) lim fix, y) does not exist. V—-3 3-42 Find the points or lines of discontinuity of these functions: (0 for x 2 + y 2 = 1 (a) fix, v) = x sin xv [ t _ X 2 _ y2 elsewhere; (3 for x — 1 , y = 2 1+2 , 2+/ elsewhere ; ■ y — 1 ,,, ,, , x 2 sin v + y 2 sin x + 2 (d)/(*.,y)= 3c4 / 2jcV+J>4+1 - 3-43 Let P and Q be any two points in the (x, j)-plane. Prove that if the distance function p(P, Q) is taken to be the length of the straight line joining P to Q then: (a) P (P, Q) > 0; (b) p(P, Q) = if, and only if, P. = Q; PROBLEMS / 113 (c) p(p,Q) = p(Q, P); (d) p(P, R) < p(P, Q) + p(Q, R), where R is another point distinct from P and Q. 3-44 Repeat the proof of the previous problem, but this time let P, Q, and R be points in space. Section 3-6 3-45 Apply the results of Section 3-6 to determine these limits: x r , „ ,. 1 — x'2 cos x (a) lim — -; (b) lim , r _ , A *-o V(l - cos x)' ,^.„ y'[2 sin (x - i^)]' sin (x + h) - sin x 2 sin 3 (x/4) (c) hm ; (d) lim ; A-m h , r ^o x 3 , . ,. I sin x | (e) lim x—o± x 3-46 Apply the results of Section 3-6 to determine these limits: ^ r i 2 _l u , ^ / cos(x + h) - cosx \ (a) lim (x 2 + hx + 1) ; ii^o \ n ] „ . ,. sin x — sin a , . ,. / sin « (b) hm ; (c) lim — x \ sin xjta' (1 \ /x 2 •— x + 4\( si " -■ f '/- r a: sin- ; (e) lim — — - ; x) .r^o\x 2 - x + 1/ ■v_2\ t si " 3(-c - 2)]/(.r - 2) (f) lim ' X ,_ 2 \x* - 4) 3-47 If h{x) is a function for which lim h(x) = 0, use Theorem 36 to justify writing x— *-a lim (1 + //(x)) 1 '"'^ = e. x-*a * 3-48 Let functions /(x) and^-(x) be such that \\mf(x) = 1 and lim g(x) — ► x, so that we may write /(x) = 1 + //(x) where \imh(x) = 0. Then, considering x-*a the function [/(x)] ff<a:> , use the result of Problem 3-47 to show that \\mh(x)y{x) (x) — p.x~*a lim [f(x)]o 3-49 Use the result of Problem 3-48 to determine these limits: (a) lim (l--Y; (b) lim (^JV; x^oz \ Xj x ^„ \X + I J (c) lim ( — ^-r) ; (d) lim (1 + sin 2x) 1/x . x^cc \X + 1/ x— 114 / SEQUENCES, LIMITS, AND CONTINUITY CH 3 3-50 Determine the following limits which do not necessarily require the result of Problem 3-48: (a) lim (l + -)l X ; (b) Hm(l + ^ ) ; I i\(4* + 3)/(x + 2) (c) lim - ; (d) lim (cos x) : *-«, \X 2 J x^O (e) lim (cos x) 1 /"; (0 lim x^O ' r—0 \ 4 * ,2/**. Complex numbers and vectors 4-1 I ntroductory ideas A number of important properties of the real number system have already been considered, and we shall now examine to what extent quantities repre- sentable as displacements in space may be incorporated into a number system. The name vector quantity is reserved for all quantities that are representable as a displacement in space or, more exactly, as a directed line element. Familiar vector quantities are force, magnetic field and velocity, which are all representable by a line whose length is proportional to their magnitude and whose direction is parallel to the direction of the original quantity. In addition, the line of action of a vector has a sense associated with it, which means that we must specify a direction along the line to indicate the way in which the vector acts. Thus to represent a velocity of 3 ft/s in an easterly direction we would first adopt a convenient length scale, say 1 in to represent 1 ft/s and then, after marking the points of the compass on our paper, we would draw a line 3 in long in an east-west direction. Finally we would add an arrow to the line pointing eastwards to indicate the sense of the velocity. This line could be located anywhere on our paper since it does not represent a velocity that is associated with any particular point. Reversal of the arrow would corres- pond to a reversal of the direction of the velocity, so that the line would then represent a velocity of 3 ft/s in a westerly direction. Not all quantities are vectors, and another important group are called scalars. The word scalar describes any quantity that has magnitude but no direction. Typical scalar quantities which have units are temperature, mass and pressure. The real numbers are themselves scalars, and are used to describe the numerical magnitudes of both scalar and vector quantities, irrespective of whether units may be involved. The terms scalar and vector describe collectively two important groups of quantities in the real world. It should, however, be added that they do not jointly give a complete description of all possible physical quantities. Others exist that are neither scalar nor vector, though this need not be elaborated here. In giving meaning to the square root operation when applied to negative numbers, we shall see that a special kind of two-dimensional vector arises. Its value in mathematics has proved to be so great that although such vectors are restricted to describing vector quantities in a plane, they have been given a special name, complex numbers. Because of this restriction, in addition to 116 / COMPLEX NUMBERS AND VECTORS CH 4 studying complex numbers, we shall need a more general theory of vectors so that we can describe the cited examples of vector quantities, and any others that may arise, in all possible situations and not just in a plane. Despite this limitation of complex numbers, their vector properties are still important enough in special situations for them to be in this chapter. Their value elsewhere in mathematics however is even greater, and makes them a discipline in their own right. The main reason for this is to be found in their relationship to real numbers and in the consequences of their intro- duction into functional relationships in the roles of independent and dependent variables. This latter aspect will be pursued later when we discuss another valuable geometrical idea, a conformal transformation. In the meantime we shall develop the vector properties and algebra of complex numbers to the point of general usefulness in mathematics, postponing until the end of this chapter the alternative approach that is necessary for study of general three-dimensional vector quantities. As already mentioned, each is valuable as a separate discipline, though, as would be expected, each has a separate notation and, generally, a quite different field of application. The following introduction to complex numbers is based only on a knowledge of elementary trigonometric identities, and not until after more study of the exponential and trigonometric functions will we unify our treatment of these two topics. The origin of complex numbers was the desire of eighteenth-century mathematicians always to be able to compute the roots of polynomials, even when they are of the form x 2 =-\. (4.1) It was Leonhard Euler (1707-83) who first recognized that the real number system was deficient in respect of admitting solutions to all possible poly- nomials and, in connection with Eqn (4-1), he proposed that a new number i be introduced to extend the number system. In keeping with the mathematical beliefs of that period, he called i the unit imaginary number and related it to real numbers by requiring that J' 2 =-1. (4-2) If we allow the use of this new symbol, then / = y / — 1 is the positive square root of minus one, whence Eqn (4-1) may be seen to have the two roots x = i and x = —i. That x — i is a root follows from the definition of /', whilst x = — / is also a root since (— i) 2 = (— l) 2 . i 2 = 1 . i 2 = — 1. With the introduction of /, equations such as x 2 = -k, which are slightly more general than Eqn (41), can also be solved. The equation may be re-expressed in the form x 2 = k . (— 1), showing that its roots are x = i\/k and x = —iy/k, where the positive square root is always SEC 4-1 INTRODUCTORY IDEAS / 117 taken. For example, if x 2 = —9, then the roots are x = 3/ and x — —3/'. The success of Euler's idea lies in the fact that only this one new number need be introduced to enable solutions to be found to all polynomials, irre- spective of their degree. As a first step towards seeing this, consider the quadratic equation ax 2 + bx + c = 0, (4-3) and suppose that b 2 — 4ac < 0. Then, setting 4ac — b 2 = m 2 , and formally applying the usual formula for the roots of a quadratic, we obtain -b ± V-w 2 or Hal) *'(£)• la Hence, denoting the two roots by x\ and x%, they take the form *-(i?) + / (s) and *-(ir) -'(£)■ (4 - 4 > The numbers xi and xi are not ordinary numbers since each comprises the sum of a real number and a multiple of the unit imaginary number /. On this basis it is reasonable to conjecture that each root of any arbitrary polynomial will be of the same form and, should the multiplier of i be zero, that root will reduce to a real number. This conjecture is correct, but before we may verify it, we must see how to perform arithmetic on numbers of this special type. These are the complex numbers already mentioned and, henceforth, we shall always refer to them by this name. Unless the exact form of a complex number is needed, it is useful to denote it by a single symbol, usually z, so that an arbitrary complex number z is of the form z = x + iy, (4-5) where x and y are real numbers. We call Eqn (4-5) the real-imaginary form of a complex number, and refer to x as the real part of z, and to y as the imaginary part of z. In symbolic form we write x = Re z, y = Im z. (4-6) Hence if z = 4 — li, then Re z = 4 and Im z = —7. We stress that Re z and Im z are real numbers. The zero complex number is denoted by and represents the number z = + i . 0. Already, and without proper justification, we have attributed some reasonable arithmetic properties to i. We have, for example, assumed results such as xi = ix for all real <x, and \/—x = \/—\ . *Jx = iy/x. To proceed logically and rigorously it would be necessary to define addition, subtraction, multiplication, and division for complex numbers and then to examine the applicability of the real number axioms of Chapter 1 in the case of complex numbers. This is necessary since whatever the arithmetic laws we now propose 118 / COMPLEX NUMBERS AND VECTORS CH 4 for complex numbers, they must obviously be in agreement with the real number axioms of Chapter 1, whenever the imaginary parts of complex numbers are zero. We shall not in fact justify the complex number axioms we now formulate, since this is a straightforward matter and provides good exercise for the student (see the problems at the end of the chapter). Instead, we simply summarize the results, pausing only to discuss in detail the most basic operations necessary for the manipulation of complex numbers. 4-2 Basic algebraic rules for complex numbers First we shall agree to denote addition and subtraction of the complex numbers z\ and zi in the usual manner by writing z\ + zz and z\ — z%, respectively. Multiplication of the complex numbers z\ and zz will be denoted by juxtaposition thus, Z1Z2. Before going on, and in order to work with equations, we must define the meaning of equality between two complex numbers, and then we can define the operations of addition, subtraction, and multiplication. The following definitions are all phrased in terms of the arbitrary complex numbers z\ = a + ib and zi = c + id. definition 4-1 We shall say that the two complex numbers z\ and z% are equal, and will write z\ = z% if, and only if, a = c and b = d. That is if, and only if, their real parts and their imaginary parts are separately equal. Example 4-1 Of the complex numbers z\, z%, and zz defined by z\ = 3 — 1 zi = 1 + 3i, and zz = 3 — 2/, it is obvious that z\ = zz but that z\ # . = 3 - 2/, : Z2 and zz =£ zz. definition 4-2 By the sum z\ + zz will be understood the single complex number which written in real-imaginary form has a real part that is the sum of the real parts of z\ and zi, and an imaginary part that is the sum of the imaginary parts of z\ and zz. Thus for the stated numbers z\ and Z2 we have zi + z 2 = (a + c) + i(b + d). Example 4-2 If z\ = 2 + i and Z2 = 1 — 3/', then z\ + z% = 3 — 2/'. definition 4-3 By the difference z\ — zz will be understood the single complex number which written in real-imaginary form has a real part that is the difference of the real parts of zi and z% and an imaginary part that is the difference between the imaginary parts of z\ and z%. Thus for the stated numbers z\ and zi we have z\ — z% = (a — c) + i(b — d). Example 4-3 If z\ = 5 + 6< and z% = 4 — 2/, then z\ — z 2 = 1 4- 8/. SEC 4-2 BASIC RULES FOR COMPLEX NUMBERS / 119 Using these definitions it is easily verified that axioms A-l to A-5 of Chapter 1 also apply to complex numbers. To proceed to an examination of the other axioms we must define the operation of multiplication. definition 4-4 The product z\z%, in which z\ = a + ib and z% = c + id, is a single complex number which may be written in real-imaginary form. The product is carried out algebraically as would be the ordinary product (a + P){y + d), and the final result is obtained by making the identifications a = a, jS = ib, y = c, d = id and using the result i 2 = — 1 to combine the four terms that result into a real part and an imaginary part. Thus we have zxzi = (a + ib)(c + id) = ac + iad + ibc + i 2 bd = {ac — bd) + i(ad + be). Example 4-4 If z\ = 2 + 3/ and Z2 = 1 — /, then z\z% = 5 + /. As a more difficult example let us express (1 + j) 4 + (1 — i) 4 m real-imaginary form. Now (1 + j) 4 = (1 + 4/ + 6/ 2 + 4/ 3 + j 4 ) and (1 - /) 4 = (1 - 4/ + 6i 2 — 4j 3 + j 4 ), but as i 2 = —1, i 3 = —i, and f 4 = 1, these expressions become (1 + /)4 = _4 and (1 - /)« = -4. Hence (1 + i) 4 + (1 - 4 = -8. The definitions of addition, subtraction, and multiplication of complex numbers are used in the obvious manner for the solution of simple equations. Thus, if 2z — (2 + = 4 — 3i, then adding (2 + to both sides of the equation gives 2z + = (4 — 3r) + (2 + i) or 2z = 6 — 2/ whence z = 3 — i. In all cases, the reader should memorize the method employed in the definitions, and not the quoted formulae. With this definition of multiplication it is a simple matter to verify that axioms M-l to M-4 and also axiom Dl apply to complex numbers. When one of the numbers z\ or z% reduces to a real number, then the real and imaginary parts of the other are both scaled by the same factor. If the scale factor is — 1 the sign of the complex number is reversed. To discuss axiom M-5 and division we need to proceed more carefully. As it stands, an expression such as (a + ib)jc is well defined as a complex number, for we may regard (1/c) as a multiplier of (a + ib) and, provided c # 0, Definition 4-4 will give the result. In this case a and b are both scaled by the factor (1/c). However, it is not clear that the more general expression z\ a + ib z 3 = - = — —, (4-7) z<l c + id is reducible to a complex number expressible in real-imaginary form. The key to this problem is to be found in M-5 itself when we recall that division is really defined as the operation inverse to multiplication. Hence, we must rewrite Eqn (4-7) in the equivalent form z 3 (c + id) = a + ib, (4-8) 120 / COMPLEX NUMBERS AND VECTORS CH 4 and then try to determine zz. Now it is easily verified that any complex number a + //? when multiplied by the associated complex number a — //? gives the real number a 2 + /? 2 . Hence, if both sides of Eqn (4-8) are multiplied by (c — id), the multiplier of zz will simply become the real number c 2 + d 2 . Carrying out this operation, Eqn (4-8) takes the form z 3 (c 2 + d 2 ) = (a + ib)(c - id) (4-9) whence, dividing by the real number (c 2 + d 2 ), we find that (ac + bd) + i(bc - ad) c 2 + d z * = :h-5i • < 4 - 10 > Equation (4-10) is now in the real-imaginary form of a complex number and is the result of the quotient (4-7). Many books take expression (4-10) as the formal definition of the quotient (4-7). The definition we shall propose shortly is equivalent to Eqn (4-10) in all respects, but its form is much easier to memorize. The simplification is achieved by the introduction of a new and useful operation called forming the complex conjugate of a complex number. definition 4-5 If z = a + ib is an arbitrary complex number, then the complex number z = a — ib is the complex conjugate of z. The symbol z is read 'z bar'. Equivalently, we may state that the complex conjugate of a number is always obtained by changing the sign of the imaginary part of that number. With this definition in mind it is easy to show that the following definition of the quotient z\\z% is equivalent to Eqn (4-10). definition 4-6 (division) The quotient z\\z<i of the two complex numbers z\ and Z2 is the complex number (ziz 2 )/(z2Z2). Using this definition it is a straightforward matter to verify axiom M-5 for complex numbers, provided only that Z2 ^ 0. Example 4-5 We illustrate division by setting z\ = 2 + / and z 2 = 3 — 2/. Now z 2 = 3 + 2i and zi/z 2 = (ziz 2 )/(z 2 z 2 ) = (2 + 0(3 + 2/)/(3 - 2/)(3 + 2i), whence Z1/Z2 = (4 + 7/)/ 13. By this same method, an equation of the form 2z(2 + i) = 1 + i is seen to have the solution z = (1 + i)/(4 + 2i) = (3 + 0/10. On account of the fact that z is an ordinary complex number, its general properties are exactly the same as those of any other complex number. Hence the number axioms that apply to z, apply equally well to z. The following specially useful results are easily proved, and are related to the arbitrary complex number z = x + iy, to its complex conjugate z = x — iy and to the real number \z\ associated with z and defined to be \z\ = (x 2 + y 2 )*. (See Definition 4-7.) SEC 4-2 BASIC RULES FOR COMPLEX NUMBERS / 121 z fz = 2 Re z = 2x; z — z = 2i Im z = 2iy; * = W, l —F)- z \z}' (z») = (z) n ; Zl Z2 l f i|. |z 2 |' (Zl + Z2 + • ■ ■ +z n ) = Zl + Z 2 + • • Z1Z2 • ■ • Z« = Z1Z2 • • • z n . We now utilize some of these simple properties of the complex conjugate operation to prove an important theorem concerning the roots of a poly- nomial, and shall then deduce three very useful corollaries. In the process of doing so, we shall take as self-evident the fact that a polynomial P(z) of degree n has n factors of the form (z — £). These are called linear factors because they are of degree 1. The numbers £ may, or may not, be complex. T h E o R E m 4- 1 If the «th degree polynomial P(z) = a z n + fliz"" 1 + - ■ • + a„ has its coefficients a , a\, . . ., a n real, then if z = £ is a zero of P(z), so also is z = t, a zero of P{z). Proof Suppose that z = £ is a zero of P(z). Then by definition flo£" + ai^- 1 + ■ ■ ■ + a n = 0. Hence, taking the complex conjugate of this equation we may write (ao£" + fli^" 1 + ■ ■ ■ + C) = 0. However, the complex conjugate of a sum is the sum of the complex con- jugates of the individual terms comprising the sum so that (a £» + fli£»- x + ■ ■ ■ +a n ) = «„£« + ai^" 1 + ■ • ■ + a n . Now as the a r , r = 0, 1, . . ., n are real, it follows that a r = a r and so a r r,«-r = a r tn-r = a r (£)n- r> for r = 0, 1, . . ., «. Hence, a<& + fli^" 1 + • • • + a n = 0; 122 / COMPLEX NUMBERS AND VECTORS CH 4 showing that P{1) = 0. Thus z = \ is also a zero of P(z). Paraphrased, Theorem 4-1 asserts that if a polynomial with real coefficients has complex zeros, then they must occur in complex conjugate pairs. As any zero which is not complex must be real, it follows that we may formulate a Corollary to Theorem 4- 1 . Corollary 4-1 (a) If a polynomial has real coefficients, then those of its zeros that are not real, occur in complex conjugate pairs. If z = £ and z = £ represent any pair of complete conjugate zeros in Theorem 4- 1 , then (z — and (z — £~) must both be factors of P(z). Hence their product (z — £)(z — £) must also be a factor. Now (z - o(z - d = z 2 - a + c> + a, and as t, + \ = 2 Re £ is a real number and ££ = | £| 2 is also a real number, it follows that the pair of complex conjugate zeros correspond to a single quadratic factor with real coefficients. Hence Corollary 4-1 (a) may be re-phrased thus : Corollary 4-1 (b) Any polynomial with real coefficients may always be factorized into a set of factors which are linear or at most quadratic, each of which has real coefficients. Specifically, if the polynomial is of degree n and there are m pairs of complex conjugate zeros, then there will be (« — 2m) linear factors with real coefficients and m quadratic factors with real coefficients. Finally, as an obvious consequence of this last corollary: Corollary 4-1 (c) An odd degree polynomial with real coefficients must have at least one real zero. The significance of these results is best illustrated by an example which shows how they may often be used to simplify a difficult problem to the point at which the solution may be determined by familiar methods. Example 4-6 A polynomial P(z) of degree 5 is defined by the relationship P(z) = z 5 + 5z 4 + 10z 3 + 10z 2 + 9z + 5. Given that z = i is a zero, deduce the remaining four zeros and use the result to express P(z) as the simplest possible product of factors having real coefficients. Solution First, as the coefficients of P(z) are all real, Theorem 4- 1 is applic- SEC 4-3 COMPLEX NUMBERS AS VECTORS / 123 able. Hence if ?. — ns a zero, then so also is z = —i. Thus (z — /) and (z + i) are factors, as is their product (z — i)(z + i) = z 2 + 1 . Using ordinary long division to divide P(z) by (z 2 + 1) we find that P(z)l(z 2 + 1) = z 3 + 5z 2 + 9z + 5. Hence to find the remaining factors we must now factorize this cubic poly- nomial. As the degree is odd, and the coefficients are real, Corollary 4-1 (c) applies showing that it must have at least one real zero. At this point we have recourse to trial and error to find the real zero which for the purposes of this example has been made an integer. Thus, setting Q(z) = z3 + 5z 2 + 9z + 5, we must find a value z = z\ such that Q(zi) = 0. By inspection we see that <2(— 1) = showing that the real zero is z = —1. This corresponds to the linear factor with real coefficients (z + 1). Removing the factor (z + 1) from the cubic by long division, we then find that = = z l + 4z + 5. (z 2 + l)(z +1) (z + 1) Finally we apply the standard formula for the roots of a quadratic to this expression to obtain the remaining two zeros. Completing the calculation, these are found to be z = —2 — i and z = —2 + ;'. Thus the five zeros are z = i, z = —i, z = — 1, z = — 2 — /, and z = — 2 + i. The required factorization is P(z) = (z + l)(z 2 + l)(z 2 + 4z + 5). 4-3 Complex numbers as vectors So far we have discussed the basic arithmetic of complex numbers but have not mentioned their vector properties. To do this, and to give a geometrical representation of complex numbers, we plot them as points in a plane called the complex plane or, sometimes, the z-plane. Specifically, we shall use the real part of the complex number as its horizontal or x-coordinate and the imaginary part of the complex number as its vertical or j-coordinate. Thus to each complex number there corresponds just one point in the complex plane and, conversely, to each point in the complex plane there corresponds just one complex number. The relationship between points and complex numbers is one-one. In the complex plane, the x-axis is the real axis and the j-axis is the imaginary axis. Other accounts of this subject often refer to this geometrical representation of complex numbers as the Argand diagram, in honour of its inventor. 124 / COMPLEX NUMBERS AND VECTORS CH 4 'y Complex-plane -2 -1 z=-l-ii • !••: (a) y Complex-plane 2- l»: = i f = 24 /• 2=1 ■ = 2-iV (b) Fig. 41 Representation of complex numbers: (a) point representation; (b) vector representation. In the complex plane, a complex number may either be considered as a point in the plane or, equivalently, as the directed straight line element from the origin to the point in question. We shall remember this dual relationship between points and vectors but, for simplicity, will usually speak only of points in the complex plane. This duality between points and vectors is indicated in Fig. 4-1 where the complex numbers z = 1 , z = i, z = 2 + /, z = 2 — /, and z = — 1 — \i have been represented as points (Fig. 4-1 (a)) and as vectors (Fig. 4-1 (b)). In the case of the vector representation, arrows have been added to show that the vector is drawn from the origin to the point in question. Notice that if a number, together with its complex conjugate, are plotted in the complex plane, as for example 2 + i and 2 — i'm Fig. 4- 1 (a) and (b), then geometrically, in both the point and the vector representations, one is obtainable from the other by reflection in the *-axis as though it were a mirror. Instead of adding and subtracting vectors analytically by use of Definitions 4-2 and 4-3, the same result may be achieved entirely geometrically as we now indicate. Consider the sum of the vectors z\ = 2 + i and zz = 1 + 2/. Analytically z\ + z 2 = 3 + 3i, and Fig. 4-2 (a) shows this result. The same result may be obtained geometrically by the following construction. If we wish to add vector z% to z\, then for the purposes of addition we shall imagine vector Z2 to be freed from the origin, so that it is capable of translation any- SEC 4-3 COMPLEX NUMBERS AS VECTORS / 125 Fig. 4-2 Algebraic operations with complex numbers: (a) vector addition: zi + Z2; (b) vector subtraction: z\ — z-i. where in the complex plane, but we shall assume that wherever we re-locate it in the complex plane it will always be kept parallel to its original position, and its length and sense will be preserved. The result of adding z<i to z\ is then achieved by translating z% in the manner described until its origin is located at the tip of vector z\. The two arrows of vectors z\ and z% then point in the same direction, and the vector z\ + z<l is the line element directed from the origin to the tip of the vector z<l in its new position. Tn Fig. 4-2 (a) this construction is represented by the lower triangle comprising the parallelogram. Such triangles are vector triangles. A vector not attached to a specific origin or one which, for the purposes of combination with another vector, is freed from its origin to be re-located in some other part of the complex plane will be called a. free vector. This is in contrast to a vector that is attached to a definite origin which we shall call a bound vector. In the addition of z<t to z\ that we have just performed, z\ was regarded as a bound vector and Z2 as a free vector. Notice that by the same argument, z\ may be freed and its origin trans- lated to the tip of the bound vector z% to form the vector z 2 + z\, which is the line element directed from the origin to the tip of vector z\ in its new position. In Fig. 4-2 (a) this construction is represented by the upper triangle comprising the parallelogram. The fact that both constructions give rise to the same line representing on the one hand z\ + z%, and on the other z% + z\, 126 / COMPLEX NUMBERS AND VECTORS CH 4 proves that vector addition is commutative, since z\ + z 2 = z 2 + z\. Before proceeding with the discussion of subtraction, we first observe that Definition 4-4 implies that multiplication of the bound vector z by — 1 reverses its direction. That is to say its origin remains fixed, but the line element representing the vector is rotated about the origin through the angle 77. With this remark in mind we see that subtraction of vector z 2 from z\ (Fig. 4-2 (b)), is just a special case of addition in which the vector to be added is — Z2. The vector — z 2 is obtained from z 2 by reversing the direction of z 2 , as is indicated in Fig. 4-2 (b) by the dotted line directed into the fourth quadrant. The vector z\ — z 2 is then the line element directed from the origin to the tip of the reversed vector z 2 in its new position. In Fig. 4-2 (b) this construction is shown in the right-hand half of the plane. The same construc- tion, with the roles of z\ and z 2 interchanged, is shown in the left-hand half of the plane and when compared with the first result proves that z\ — z 2 = -(z 2 -zi). (Why?) Thus far, complex numbers have been seen to obey the addition, multi- plication, and distributive axioms of real numbers, and the reader might be forgiven for wondering if there is any significant difference between them and the real numbers. The answer is yes. Whereas real numbers can be given a natural order according to their size, complex numbers cannot. A glance at Fig. 4-1 (b) makes it clear that no natural order exists in the field of complex numbers, comprising all numbers in real-imaginary form, since even vectors of the same length may be differently directed, for instance the pairs of vectors 1 and /, and 2 + / and 2 — /. Whereas it makes sense to order the lengths of vectors, since these are scalar quantities and may be so ordered, the vectors themselves have no natural order. To further our argument we now name the length of a vector and introduce a notation whereby it may be manipulated in equations. Fig. 43 Modulus and argument representation. SEC 4-3 COMPLEX NUMBERS AS VECTORS / 127 definition 4-7 (modulus of a vector) The quantity \ Z \ =( X 2 +J , 2)1/2 is called the modulus of the vector z = x + iy. It is the length of the line element drawn from the origin to the point (x, y) in the complex plane (see Fig. 4-3). Example 4-7 If z = 3 + 4/, then \z\ = (3 2 + 4 2 ) 1 / 2 = 5. Notice that in the special case Im z = 0, \z\ reduces to the absolute value of a real number since, as always, the positive square root is involved in the definition. The following useful results are easily verified: zz = |z| 2 ; |ziz 2 | = |zi| . |z 2 |. If either the upper or lower triangles comprising the parallelogram in Fig. 4-2 (a) are considered, then clearly, when expressed in terms of the modulus, the Euclidean theorem 'the sum of the lengths of any two sides of a triangle exceeds the length of the third side' becomes the following inequality relating moduli : |zi| + N > |zi + z 2 |. (4-11) Equality will occur only when z\ and z 2 are collinear. For obvious reasons Eqn (4-11) is called the triangle inequality, and it has already been encountered in simple form when we discussed the absolute value of the sum of two real numbers. An analytic proof of result (4-11) is set as a problem at the end of the chapter. Another useful inequality relating the moduli of the complex numbers zi and z 2 is |zi + z 2 | > ||zi| - |z 2 ||, (4-12) where again equality occurs only when z\ and z 2 are collinear. The proof of this is also left to the reader as a problem. Example 4-8 If z\ — 3 + 4/ and z 2 = 4 + 3/, then z\ + z 2 = 7 + 7/. Hence |zi| = (3 2 + 4 2 )i' 2 = 5, |z 2 | = (4 2 + 3 2 ) 1 ' 2 = 5, and |zi + z 2 | = (7 2 + 7 2 ) 1 '' 2 = a/98, so that \zi\ + \z 2 \ = 10 and \\zi\ - \z 2 \\ = 0. We have thus verified inequalities (4-11) and (4T2) in this special case, for they demand that for any z\ and z 2 IN - Nl < |zi + z 2 | < N + |z 2 | which in this case corresponds to the valid inequality < V98 < 10. 128 / COMPLEX NUMBERS AND VECTORS CH 4 4 -4 Modulus-argument form of complex numbers Referring again to Fig. 4-3, we see that the complex number z need not be specified in the standard form for it may equally well be specified by giving both the value of \z\ and the angle 6 which, by convention, is always measured positively in an anti-clockwise direction from the x-axis to the line of the vector z. The angle 6 is the argument of z and we shall write 6 = arg z. The argument of z is indeterminate with respect to multiples of 277, because angles 6 and 6 + 2kir, where k is any integer, will give rise to the same line on Fig. 4-3. Later we shall see that this indeterminacy in 6 plays an important role in the determination of the roots of complex numbers. When 6 = arg z is restricted to the interval — n < 6 < n, it will be termed the principal value of the argument. If we define the real number r by the equation r = |z|, and still set 6 = arg z, then the ordered number pair (r, 6) describes the polar coordinates of the point z in Fig. 43. That is, the radial distance of a point from the origin together with its bearing measured from a fixed line through the origin. The relationship between the Cartesian coordinates (x,y) and the polar coordinates (r, 6) of the same complex number z is immediate, since from Fig. 4-3 we have x = r cos 6 y = r sin 8 (4T3) or, equivalently, r = (x 2 + V 2 ) 172 cos 6 = - sin = - (4-14) Thus the complex number, or vector, z = x + iy may also be written in the modulus-argument form z = /-(cos d + i sin 6). (4T5) Because arg z is indeterminate up to an angle 2krr, we must phrase our definition of equality between two complex numbers carefully when it is to refer to complex numbers expressed in modulus-argument form. definition 4-8 The two numbers z\ = r(cos d + i sin 6) and z 2 =-p(cos </> + i sin <f>) expressed in modulus-argument form will be said to be equal if, and only if, r = p and 6 = <f> + 2A?n-. Equations (4T3) and (4T4) enable immediate interchange between the modulus-argument and the real-imaginary forms of z, as the following examples indicate. Example 4-9 (a) Express z = — 4V3 + 4/ in modulus-argument form; SEC 4-4 MODULUS-ARGUMENT FORM OF COMPLEX NUMBERS / 129 (b) Express z = 2 + 5un modulus-argument form; (c) If \z\ = 3 and arg z = — 7t/10, express z in real-imaginary form. Solution (a) From Eqn (4-14), r = \z\ = [(-4V3) 2 + 4 2 ] 1/2 = 8, whilst cos = — (4V3)/8 = — (V3)/2 and sin = 4/8 = J, from which we deduce that the principal value of must lie in the second quadrant with = arg z = 577/6. Hence, in modulus-argument form / 577 . 5n\ ^ cos _ + i - sln _j. Notice that although we could have written 6 = arg z = arc tan (—l/\/3), it would not then have been clear in which quadrant 8 must lie, and, conse- quently, we shall always specify sin and cos separately. Solution (b) Again from Eqn (4-14), r = \z\ = (2 2 + 5 2 ) 1/2 = V29, whilst this time cos = 2/V29 and sin 8 = 5/^29, from which we deduce that the principal value of 8 must lie in the first quadrant with 6 = arg z = 1-1903 rad. Hence, in modulus-argument form z = V29(cos 1-1903 + /sin 1-1903). Solution (c) The result is immediate, since Eqn (4- 1 5) gives z = 3 {cos(-f ) + /sin(-f o )j = 2-8533 - 0-9270/. We now examine the consequences of multiplication and division for complex numbers expressed in modulus-argument form. Let z\ and zi be the two complex numbers : z x = n(cos 0i + i sin 61) and z% — rz{cos 62 + i sin 92). (4-16) Then by direct multiplication we find that z lZ2 = n/"2[(cos 61 cos 62 — sin 61 sin 62) + /(sin 81 cos 82 + cos 61 sin 82)], and using the trigonometric identities for cos (0i + 2 ) and sin (0i + 6 2 ) this may be written as ziz 2 = rir 2 [cos (0i + 2 ) + i sin (0i + 2 )]. (4-17) We have thus proved that the result of the product ziz 2 is a complex number with modulus |ziZ2| = /V2 and argument arg (ziz 2 ) = 0i + 2 = arg zi + arg Z2. Thus the result of multiplying two complex numbers is to produce a complex number whose modulus is the product of the two separate moduli 130 / COMPLEX NUMBERS AND VECTORS CH 4 and whose argument is the sum of the two separate arguments (see Fig. 4-4). A special case results if we write / = cos \n + i sin \tt. (4-18) It follows that in the z-plane, multiplication by /' corresponds geometrically to an anti-clockwise rotation through \n without any change of size. To illustrate this, the vectors iz\ and /z 2 have been added to Fig. 4-4. Fig. 4-4 Multiplication and division; ziz2, zi/22. By repeated application of Eqn (4-17) it is easily proved that if z m = r„»(cos m + i sin m ) for m = 1, 2, . . ., n, then ziz 2 • ■ • z„ = r\r% ■ • • r n [cos (0i + 2 + • • • + 0») + isin(fli + fl a + • ■ • + e„)]. (4-19) An argument essentially similar to that which gave rise to Eqn (4-17), but this time using the trigonometric identities for cos (61 — 62) and sin (0i — 02), establishes that whenever z 2 =£ 0, then with the same notation we have — = - [cos (0i - 2 ) + i sin (0i - 2 )]. Z2 n (4-20) Obviously |zi/z2| = nfn = \z1\l\z2\ and arg(zi/z 2 ) = 0i — 2 = argzi — arg 22. Expressed in words, this says that the result of dividing two complex numbers is to produce a complex number whose modulus is the quotient of the separate moduli and whose argument is the difference of the two separate arguments. A most important special case of Eqn (4-19) occurs when all the 21, 22, . . ., z n are equal to the same complex number 2 = r(cos 8 + i sin 0), say. The result then becomes z n — r «( cos n Q + / s in nd). SEC 4-4 MODULUS-ARGUMENT FORM OF COMPLEX NUMBERS / 131 Substituting for z and cancelling a real factor r n , we obtain the following important theorem. theorem 4-2 (de Moivre's Theorem) (cos 6 + i sin 6) n = cos nd + / sin nd. A more subtle argument would have yielded the fact that this remarkable result is true for all real values of n, and not just for the integral values utilized in our proof. This will be undertaken later when the complex exponen- tial function has been discussed. Theorem 4-2 provides a simple method by which certain forms of trigo- nometric identity may be established. One typical example is enough to illustrate this. Example 4-10 Let us relate sin 46 and cos 46 to sums of powers of sin 6 and cos 6. Set n = 4 in Theorem 4-2 and expand the left-hand side by the binomial theorem, using the fact that i 2 = — 1, p = —i, j 4 = l, etc., to obtain cos 4 6 + 4/ cos 3 6 sin d - 6 cos 2 6 sin 2 6 - 4i cos 6 sin 3 6 + sin 4 6 = cos 46 + / sin 46. Then, recalling that equality of complex numbers means equality of their real and imaginary parts considered separately, we have the two results: equality of real parts cos 4 6-6 cos 2 6 sin 2 6 + sin 4 6 = cos 46, and equality of imaginary parts 4(cos 3 6 sin 6 - cos 6 sin 3 6) = sin 46. These are the desired results. It is characteristic of complex numbers that any single complex equality implies two real equalities, and even if only one is sought the other will be generated automatically. The same method works for any positive integral value of n when it will connect sin nd and cos «0 with sums of powers of sin 6 and cos 6. We shall return to this idea in connection with the exponential function, and show that it is possible to use de Moivre's theorem to express sin" 6 and cos" 6 in terms of sums involving sin rd and cos r6. Sometimes Theorem 4-2 can be used to reduce the labour of computation as now shown. Example 4-11 We shall evaluate z 10 where z = 1 + i. Rather than making 132 / COMPLEX NUMBERS AND VECTORS CH 4 repeated multiplications, or applying the binomial theorem, we write z in modulus-argument form as z = \/2(cos n/4 + i sin tt/4), when we have z 10 = ( A /2) 10 (cos n/4 + i sin tt/4) 10 . By de Moivre's theorem this becomes 7 io 2 5 I cos — + / sin — I = 32/'. 4-5 Roots of complex numbers When performing algebra on real numbers the idea of the root of a number plays a fundamental part. The same is true when manipulating complex numbers, and we now discuss the general ideas involved in determining their roots. Let p/q be any rational number, where/? and q are integers with q supposed positive. We shall assume that/j and q have no common factor. definition 4-9 We define zvlv by saying that: W = ZP'9 o W'« = ZP. Let w = p(cos <f> + i sin <f>) and z = r(cos B + i sin 6). (4-21) Then from Definition 4-9 and de Moivre's theorem we have p<?(cos q<f> + i sin q<j>) = rJ>(cos/>0 + /' sin pB). (4-22) Now from Definition 4-8 it follows that p q = r v and q<j>=pd + lk-n, (4-23) and so . p6 + lk-n p = r viQ and 6 = C4-24) q The expressions w = zv'i thus have the general form (p6 + 2k7T\ . . (pB + lk-n w = zp'9 = rP l( i with k an integer. (4-25) cos 1 It is easily seen that only q different values w , w\, w%, . . ., w g -i of w will result from Eqn (4-25) as the integer k increases through successive integral values. It is usual to give k the q successive values k = 0, 1, 2, . . ., q — 1. If k is allowed to increase beyond the value q — 1, then the numbers w , w\, . . ., w a -i will simply be generated again because of the periodicity properties of the sine and cosine functions. Example 4-12 We illustrate the use of Eqn (4-25) by determining the n numbers w satisfying the equation w = (l) 1 '". For obvious reasons these are SEC 4-5 ROOTS OF COMPLEX NUMBERS / 133 called the nth roots of unity. Comparing this equation with the general expres- sion w = z^ii that has just been discussed we see that we must make the identifications z = 1, p = 1, and q = n. To proceed further we must write the number unity in its modulus-argument form 1 = 1. (cos + / sin 0), so that comparing this with z in Eqn (4-21) we see that the further identifica- tions r = 1 and = must be made. Substitution of these quantities into Eqn (4-25) then gives the result 2krr . 2k-n w'jt = cos h / sin n n with k = 0, 1, 2, 1. The result of this calculation with n = 5, for example, is to generate the fifth roots of unity. In Fig. 4-5 these roots are plotted as the numbers wq, w\, . . ., W4 in the complex plane. They are uniformly distributed around the unit circle centred on the origin. By making use of the vector properties of complex numbers we shall usually represent this circle by the convenient notation \z\ = 1. (Why?) Fig. 4-5 Fifth roots of unity. Fig. 4-6 Roots of co = (1 + 2 ' 3 - Example 4-13 As a slightly more general example we now determine z 2 / 3 , when z = 1 + /. In this case/? = 2, q = 3, and in modulus-argument form, z = \/2(cos 77/4 + / sin tt/4) showing that r = -y/2 and 6 = w/4. Substitution into Eqn (4-25) gives 134 / COMPLEX NUMBERS AND VECTORS CH 4 W = 2 1 " COS with k = 0, 1, 2. /I +4AA /l +4Jfe\ "I l-6-j ,r + l,U,, i-6-j w . The three roots wo, wi, and W2 are thus : (k = 0): w> = 2i/3 /cos £ + i sin g) = 2i« (^ + -), (k = 1): wi = 2i'» (cos ^ + ''an?) = 2W- ^ + Ij, (k = 2): w 2 = 2 1/3 /cos ^ + jsin y) = -2i'*i. These are plotted in the complex plane in Fig. 4-6, where they are seen to be uniformly distributed around the circle \z\ = 2 1/3 . Example 4-14 As a final example let us find the roots of the equation w = i~ 113 . In terms of the notation of Eqns (4-21) and (4-25), and recalling that we have agreed always to take q as positive, we have p = — 1, q = 3, and z = i. Now in modulus-argument form '-•( 77 . TT\ cos - + i sin - 2 2 so that r = 1 and 8 = tt/2. Hence, substituting into Eqn (4-25),. we find that w = cos "(-77/2) + Ikli + /sin \-7TJ2) + Ikrr- with k = 0, 1, 2. Hence the three roots wo, wi, and W2 are : (k = 0) : H'o = (cos tt/6 — i sin tt/6) = J( V3 — 0, (& = 1): w\ = (costt/2 + /sin7r/2) = /, (A: = 2): H-2 = (cos 7tt/6 + /sin 7tt/6) = -J(V3 + «')• This completes our preliminary encounter with complex numbers, and our study will be resumed later in connection with the complex exponential function and with functions of a complex variable. The remainder of this chapter is devoted to developing the foundations of our study of general vectors. 4-6 Introduction to space vectors It is clear that any set of vector quantities that do not all lie in a plane cannot be represented vectorially in the form of complex numbers. For example, SEC 4-6 INTRODUCTION TO SPACE VECTORS / 135 even the vectors describing the velocity of a vehicle as it is driven at constant speed past fixed points on a winding hill could not be so represented. Pair- wise these velocity vectors define planes, and so could be represented by complex numbers in those planes, though different pairs of vectors would define different planes, thereby making any general representation impossible in terms of complex numbers. The trouble here is not hard to find. It is that complex numbers just happen to be capable of representation as planar vectors with their own appropriate descriptive language, and they were not developed with general vector representation in mind. In short, they are complex numbers first and vectors second; not the other way around. To overcome this limitation and to be able to describe arbitrary vector quantities we must preserve the idea of a vector as a directed length, but re-think its description. This is best achieved using a diagram, so consider Fig. 4-7 which depicts the mutually perpendicular Cartesian axes 0{x, y, z} with origin O. In more mathematical terms we describe these axes as being mutually orthogonal. This is a technical term that in a geometrical context has the same meaning as perpendicular, though it is often used in a wider sense, when the word perpendicular would be inappropriate. Henceforth we shall almost always use the term orthogonal. The manner of identification of the x, y, and z coordinate axes is not C_3 £(0, <>.<»,) t P("V°2,",) ,<& "V*Jr *s i i o, H0,o r 0) ^ \ \ fi(fl,,fl 2 ,0) y Fig. 4-7 Right-handed Cartesian axes. A 6 "•- fit o +- o <■-■ 0^ '• OA x ^ ^ 136 / COMPLEX NUMBERS AND VECTORS CH 4 arbitrary, but is made in such a manner that they form a right-handed system of axes. By this we mean that having assigned axes for the variaWes-nrnTT y, together with the directions in which they increase positively, the direction of positive z is then chosen to be that in which a right-handed screw would advance were it aligned with the third axis and rotated in the sense x to y. This sense of rotation is indicated in Fig. 4-7 by means of a directed spiral about the z-axis. In the diagram the y- and z-axes are supposed to lie in the plane of the paper with the .v-axis pointing out of the paper towards the viewer. Later we shall refer to this right-handed property in connection with axes which are not orthogonal, when right-handedness is still to be interpreted in exactly the same sense as above. This right-handed property of the system of axes is shared by each pair of axes in turn, provided the senses of rotation are appropriately defined. The following table describes the convention that is always adopted. Table 41 Right-handed axes Rotate R-H screw advances From To in direction of positive x y z y z x z x y The table can easily be remembered in the concise form x y z y z x z x y where the entry in any row is obtained from the entry in the row above by transferring the first letter of that entry to the last position. These entries are called cyclic permutations of the letters x,y, and z, and further cyclic permuta- tions will simply regenerate the table. These rules describe the right-handed symmetry of the 0{x, y, z} axes. If any two letters in an entry are inter- changed, then by the same rule, the negative direction of the third axis is defined. Hence the set of letters y x z are to be interpreted 'rotate from y to x to make a right-handed screw aligned with the z-axis advance in the direction of negative z'. If in the above argument a right-handed screw motion had been replaced by a left-handed screw motion, then a left-handed system of axes would have resulted. Although a left-handed system of axes is in all respects equivalent to a right-handed system for the purposes of vector representation, it is customary to work with right-handed systems. Let P be the point with coordinates x = a\, y = a%, and z = a$ illustrated in Fig. 4-7. We shall denote it by the more concise notation (a\, a<z, as) where SEC 4-6 INTRODUCTION TO SPACE VECTORS / 137 the first, second, and third entries in this ordered number triple represent the .v, )', and z coordinates, respectively. Then from the point of view of coordinate geometry it is the point P that is of interest, whereas from the point of view of vectors it is the directed line element from O to P that is of interest. To signify that it is the vector quantity that interests us here we shall write OP. Notice that by this convention the vector PO is the directed line from P to O and is opposite in sense to OP. In future we will denote the length of the vector OP by |OP|, which is a scalar, and by definition this length will always be positive. In Fig. 4-7 the lengths OA = a\, OB = a%, and OC = az are called the orthogonal projections of OP onto the .y-, y-, and z-axes, and a simple applica- tion of Pythagoras' theorem gives the result |OP| 2 = (OA) 2 + (OB) 2 + (OC) 2 or, |OP| 2 = ai 2 + a 2 2 + fl3 2 . y Dividing by |OP| 2 this becomes 1 \|OP|/ + l|OP|/ ^\\OV\) which can then be rewritten in terms of the angles 0i, 02, 03 as 1 = cos 2 01 + cos 2 02 + cos 2 3 . (4-26) If the numbers /, m, and n are defined by the relations / = cos 0i, m = cos 02, n = cos 03, (4-27) then Eqn (4-26) becomes 1 = /a + m 2 + „2. ( 4 . 2 8) For obvious reasons /, m, and n are called the direction cosines of OP with respect to the axes 0{x, y, z) and it is often convenient to write them in the form of an ordered number triple as {/, m, n}. The angles 0i, 02, and 83 are indeterminate to within a multiple of 2n and, by convention, they will always be taken to lie in the interval [0, 77]. Consider the direction cosines /, m, n as defining a point P' in space with coordinates x = I, y = m, and z = n, then, by Pythagoras' theorem and Eqn (4-28), the vector OP' must have unit length. The direction and sense of OP' are the same as those of OP; only the lengths are different. Vectors of unit length in given directions prove to be extremely useful in vector analysis so they are appropriately called unit vectors. Now by definition, the direction cosines /, m, n are proportional to the 138 / COMPLEX NUMBERS AND VECTORS CH 4 coordinates a\, a%, a$ of the point P and consequently the numbers oi, a-i, and «3 are often called the direction ratios of OP. To convert direction ratios to direction cosines it is necessary to normalize them by dividing by the square root of the sum of the squares of the direction ratios. This is, of course, equivalent to division by the quantity we have agreed to denote by |OP|. Example 4-15 Find the direc tion ratios, the direction cosines and the angles 0i, 02, and 03 of the vector OP, where P is the point (1, —2, 4). Solution The direction ratios are 1, —2, 4, and |OP|, which is the square root of the sum of the squares of the direction ratios, is |OP| = (l 2 + (-2) 2 + 4 2 ) 1 ' 2 = V21. Hence the direction cosines of OP are /= \j\/2\, m = — 2/-y/21, and n = 4/\/21, from which the angles 0i, 02, and 03 are seen to be 1-351, 2-022, and 0-509 radians, respectively. Unless otherwise stated we shall always express angles in terms of radians, as here. Example 4-16 Determine the angles of inclination 0i, 02, and 03 of a vector to the x-, y-, and z-axes, respectively, given that its direction cosines are: (a){|, -V3/2.0}, (b){|,i, VH/4}. Solution (a) Here / = cos 0i = 1/2, m = cos 02 = — \/3/2, n = cos 03 = 0, so that 0i = 7t/3, 02 = 577/6, and 03 = tt/2. Hence in this case the vector lies entirely in the (x, j)-plane. Solution (b) In this case, / = cos 0i = 1/2, m = cos 02 = 1/4, n = cos 03 = a/H/4, so that 0i = 77/3, 2 = 1-318, and 3 = 0-593. Example 4-17 If a vector has direction cosines {^, m, %} deduce the possible values of m. If, in addition, it is stated that the vector makes an obtuse angle 02 with the j-axis determine the value of 02. Solution We use Eqn (4-28), setting / = \ and n = \ to obtain (1)2 + W 2 + (|)2 = L Whence, w 2 = 1/2 or m = ± 1/V2- These values of m correspond to 2 = 7T-/4 for m = l/\/2, and to 2 = 3tt/4 for m = -l/\/2. As the angle 2 is required to be obtuse we must select 02 = 37r/4. The idea of a fixed origin is fundamental to coordinate geometry though it proves to be rather too restrictive in vector analysis. This is because it is SEC 4-6 INTRODUCTION TO SPACE VECTORS / 139 only the magnitude, direction, and sense of a vector that usually matter, and not the choice of origin and coordinate system in which the vector is repre- sented. For example, when specifying a wind velocity it is normally sufficient to say 20 ft/s due East, without identifying the particular points in space at which the air has this velocity. In vector work this ambiguity as to the location of a vector in space is allowed by considering as equivalent, any two vectors that may be repre- Fig. 4-8 Translation of axes without rotation. sented by directed line elements of equal length which are parallel, and have the same sense. In Fig. 4-8 we have depicted two vectors OP and O'P' that are equivalent in the sense just defined. Another way to definelhis equivalence is to require that when the axes 0{x, y, z} are translated, without rotation, to the position 0'{x',/,z'}, the coordinates of P' with respect to the axes through O' are the same as those of P with respect to the axes through O. That is, if P is the point (a u a 2 , fl 3 ) in the system of axes 0{x, y, z}, then P' is the point (a u a 2 , a 3 ) in the system of axes 0'{x', /, z'}. Do not get confused 140 / COMPLEX NUMBERS AND VECTORS CH 4 by this. If O' is the point (ai, 0C2, as) with respect to 0{x, y, z}, then coordi- nates in the unprimed system are related to those in the primed system by the equations x = oa + x', y — a.i + y' , and z = 0C3 + z'. This freedom to translate vectors now enables us to give direction cosines to any vector in space and not just to those having their base at O. Suppose, for example, that we require the length and direction cosines of the vector AB, where A is the point {ay, 02, 03) and B is the point {by, bz, bz) when expressed relative to some set of axes 0{x, y, z}. Then we see at once that the lengths of the projections of AB on the x, y, and z axes are {by — ay), {b% — 02), and (63 — as), respectively. Accordingly, by translating the vector AB until A in its new position A' coincides with O, we see that the tip B in its new h Fig. 4-9 Translation of a vector. SEC 4-6 INTRODUCTION TO SPACE VECTORS / 141 position B' must be the point ((bi — a\), (bi — 02), (bs — #3)) (see Fig. 4-9). Hence |AB|, that is the length of AB, is |AB| = [(61 - ai)2 + (b 2 - a 2 ) 2 + (bs - as) 2 ] 1 ' 2 . (4-29) The direction cosines of AB then follow as before and are bi — cti b 2 - a-i b% — a% 1 = , , „, ' m = , . _ > n = , kn . • (4-30) |AB| |AB| |AB| v ' Example 418 Find |AB| and the direction cosines of the vector AB, if A has coordinates (1, 2, 3) and B the coordinates (4, 3, 6). Solution From Eqn (4-29) we see that |AB| = [(4 - l) 2 + (3 - 2) 2 + (6 - 3) 2 ] 1/2 = a/19, whilst from Eqn (4-30)~¥ follows that / = -3/V19, m= 1/V19, and« = 3/V19. It is now convenient to introduce a triad of unit vectors, denoted by i, j, and k, that are parallel to and are directed in the positive senses of the x-, y-, and z-axes, respectively. Here we remind the reader that these are called unit vectors because they are each of unit length on the x-, y-, and z-length scales. Notice that the term right-handed that was applied to the system of axes 0{x, y, z} also applies to the triad of vectors i, j, k when taken in this order. We shall use this idea again later. An arbitrary vector in any one of the i, j, or k directions may then be obtained by scaling the length of the appropriate unit vector by a multiplica- tion factor fi. Thus a vector three times the size of the unit vector i will be written 3i, whilst a vector twice the size of the unit vector k, but oppositely directed, will be written —2k. Returning to Fig. 4-7 we see that in terms of i, j, and k, the vectors OA, OB, and OC may be written as OA = ail, OB = a 2 j, OC = ask. From our ideas of vector addition in a plane the vector OQ lying in the (x, j»)-plane is OQ = OA + AQ or, because vectors may be translated, OQ = OA + AB. Now in terms of our unit vector notation this may be written OQ = a\i + a 2 j. Turning attention to the plane containing points O, Q, and P, we see that by the same argument OP = OQ + QP. Again, because vectors may be translated, QP = OC so that finally, on substituting for OQ and QP in the equation OP = OQ~+ QP, we obtain OP = aii + a 2 j + tf 3 k. (4-31) For ease of notation, arbitrary vectors, like unit vectors, will usually be 142 / COMPLEX NUMBERS AND VECTORS CH 4 denoted by a single symbol such as a, a, or r. Thus a general point P in space with coordinates (x, y, z) will often be written r = xi + y\ + zk. (4-32) The almost universally accepted convention which we adopt here is to denote vector quantities by bold face type and scalar quantities by italic type. Because a vector such as that in Eqn (4-32) identifies a point P in space itiscaWedapositionvector. In the vector representation Eqn (4-31) the numbers ci, «2, and as are called the components of OP. Two vectors will only be said to be equal if, when written in the form of Eqn (4-31), their corresponding components are equal. The vector a = aii + fl2J + «3k will be said to be a scalar multiple A of vector b = bii + foj + fok, and we will write a = /lb if, and only if, ai — Xb±, #2 = A62, and #3 = A63. In the special case X = — 1 we have a = — b, showing that |a| = |b|, but that the senses of a and b are opposite. Thus in Fig. 4-7 we have OP = -PO. The zero or null vector is the vector whose three components are each identically zero. It is often denoted by O instead of 0, since confusion is unlikely to arise on account of this simplification of the notation. Following on from our first ideas of vectors, and in accordance with the derivation of Eqn (4-31), we now define the operations of addition and subtraction of vectors. definition 4T0 Let a and b be arbitrary vectors with components (ai, at, a$) and (bi, 62, 63), respectively, so that they may be written a = aii + 02} + 03k and b = bii + bz\ + 63k. Then we define the sum a + b of the two vectors a and b to be the vector (ai + b{)i + (a% + b^)'} + (03 + &3)k. The difference a — b of the two vectors a and b is defined to be the vector (ai — bi)i + (02 — b^)} + (az — bz)k. Because real numbers are commutative with respect to addition, it follows directly from this definition that the operation of vector addition is commuta- tive. That is we have a + b = b + a for all vectors a and b. When the sub- traction operation is considered the properties of real numbers imply the result a — b = — (b — a) for all vectors a and b. Example 419 If a = i + j + 2k and b = 3i - 3j + k, then a + b = (1 + 3)i + (1 - 3)j + (2 + l)k, showing that a + b = 4i - 2j + 3k. Reversal of the order of the sum followed by the same argument proves the commutative property a -f b = b + a for these particular vectors. In the case of subtraction we have a — b = (1 — 3)i + (1 — (— 3))j + (2 — l)k, showing that a — b = — 2i + 4j,+ k. It is easily established that a — b = -(b - a). SEC 4-6 INTRODUCTION TO SPACE VECTORS / 143 Although these particular results could be illustrated diagrammatically, the vector triangles involved would look essentially the same as those used earlier in connection with addition and subtraction of complex numbers and would be arrived at by the same reasoning. Rather than illustrate this specific case, we present in Fig. 4-10 the results of addition and subtraction of arbitrary vectors a and b. Because a geometrical projection method is necessary to illustrate three-dimensional problems on a sheet of paper, such diagrams are much less useful as a tool than was the case in a plane. Accordingly, we shall usually concentrate on an analytical approach to vectors, using diagrams 4W- -b Fig. 4 10 Addition and subtraction of vectors. only when they seem likely to be helpful. Two terms worthy of note that are applied to vectors are the names parallel and anti-parallel. Two vectors will be said to be parallel when their lines of action are parallel and their senses are the same. Conversely, two vectors will be said to be anti-parallel when their lines of action are parallel but their senses are opposite. Thus if a is a vector and /j, is a scalar, the vectors a and fin are parallel if> > and are anti-parallel if> < 0. It follows that two vectors will be parallel if their corresponding direction cosines are equal and they will be anti-parallel if their corresponding direction cosines are equal in magnitude but opposite in sign. Example 4-20 The vectors a = i + 2j - 4k and b = 3i + 6j - 12k are such that we may write 3a = b. Since the scalar 3 > it follows that a and b are parallel. However the vectors c = i — 3j + k and d = — 2i + 6j — 2k 144 / COMPLEX NUMBERS AND VECTORS CH 4 Fig. 411 Position vectors defining the vector AB. are such that we may write —2c = d and, as the scalar —2 < 0, it follows that c and d are anti-parallel. By the same argument, the two vectors p = 3i — j + 2k and q = 6i + 2j + 4k are neither parallel nor anti-parallel, since for no scalar /u is it true that ftp = q. The length of the vector AB which we have already denoted by |AB| is a useful quantity and, as with complex numbers, is called the modulus of the vector AB. Its formal definition follows. definition 4-11 The modulus |a| of the vector a = aii + d2) + 03k is the positive square root |a| = fa* + a 2 2 + as 2 ) 1 ' 2 . It is an immediate consequence of this definition that any vector r with direction cosines {/, m, n) may be written in the form r = |rl(/i + m\ + nk). (4-33) The proof of this is obvious for by definition, /|r| is the x-component of r, w|r I is the j-component, and «|r| is the z-component. The form of Eqn (4-33) shows that any vector may be expressed as the product of a scalar (its modulus) and a unit vector defining its direction and sense. SEC 4 '6 INTRODUCTION TO SPACE VECTORS / 145 When it is necessary to define an arbitrary vector AB in space, this may easily be accomplished by using position vectors a and b to identify its end points A and B. This is illustrated in Fig. 4-11 from which, by the rules of vector addition, we may write OA + AB = OB or, AB = OB - OA = b - a. Examination of this simple but useful result suggests that an accurate name for the vector AB would be the 'position vector of B relative to A', since in this role it is A that plays the part of the origin. This more exact name is seldom used since the symbol AB is sufficiently clear as it stands. Example 4-21 Let points A and B be identified by the position vectors a = ~2i - 3j + k and b = 3i - j + 4k, respectively. Find the vector AB together with its modulus and direction cosines. Solution The diagram in Fig. 4-11 can be taken to represent this situation showing that vector AB = b - a. Substituting for the values of a and b, we find AB = (3i - j + 4k) - (-2i - 3j + k), whence AB = 5i + 2j + 3k. Then |AB| = (52 + 2* + 3 2 )^ = V38 after which the usual argument establishes that / = 5/V38, m = 2/V38, and n = 3/V38. By considering the plane containing the vectors a, b, and b - a in Fig. 4-11, the arguments that established the triangle inequalities for complex numbers also establish them for arbitrary space vectors. Hence for arbitrary vectors a and b we have ||a|-|b||<|a + b|<|a| + |b|. ( 4 . 34) Finally, to close this section, let us find the angle between two vectors a and b with the direction cosines {h, m u m} and {/ 2 , m 2 , h 2 }, respectively. When the lines of action of the vectors intersect the angle 6 is well defined and, by convention, is always chosen to lie in the interval [0, n]. If the lines of action of two vectors do not intersect then they are merely translated until they do, when the angle 6 is defined as above. It will suffice to consider the angle between two unit vectors directed along a and b since the length of the vectors will obviously not influence the angle between them. From Eqn (4-33), these unit vectors are seen to be (hi + wij + mk) and (/ 2 i + m 2 j + « 2 k). These are shown in Fig. 4- 12. They have their tips P and Q at the respective points (h, mi, m) and (/ 2 , m 2 , « 2 ). 146 / COMPLEX NUMBERS AND VECTORS CH 4 fW,. m,, «,> Q(l v m,, «j) Fig. 4-12 Angle between two lines. Now, by the cosine rule |PQ| 2 = |OP| 2 + |OQ| 2 - 2|OP| . |OQ| cos 6, (4-35) but | OP | = |OQ| = 1, and by Eqn (4-29), |PQ| 2 = (/ 2 - h) 2 + (m 2 - tm) 2 + («2 - m) 2 , whilst by Eqn (4-28), h 2 + mi 2 + m 2 = h 2 + w 2 2 + « 2 2 = 1. Consequently, substituting into Eqn (4-35) and simplifying, we find the desired result cos 6 = hh + m\m<i, + mm. (4-36) The angle of inclination 6 follows directly from this equation. The restriction of the angle between the vectors to the interval [0, tt] means that in Fig. 4- 12, it is the angle 6 that is selected, and not the angle 6'. As a particular case, if /1/2 + /M1W2 + M1H2 = 0, then the two vectors a and b must be orthogonal. (4-37) Example 4-22 Find the angle of inclination 6 between the vectors a = i + 2j + 3k and b = 2i — j — k. Solution Here |a| = \/14, |b| = \fd, so that the direction cosines {h, m\, m} of a are h = 1/V14, wi = 2/V14, «i = 3/^/14 whilst the direction cosines {k, mi, n 2 } of b are h = 2/V6, w 2 = — 1/V6, «2 = — 1/\/6. Hence by Eqn (4-36), the angle 6 is the solution of the equation cos 6 \Vi4J we) + Ivwlve) + W14/V6/' SEC 4-7 SCALAR AND VECTOR PRODUCTS / 147 or 6 = arc cos I 1 • W21/ On account of the restriction of 8 to the interval [0, -n\ it finally follows that = 1-905 rad. 4-7 Scalar and vector products If a = aii + az) + 03k is an arbitrary vector and A is a scalar, then we have already defined the product Aa to be the vector Aa = Xa\\ + Aa2J + Aa3k. Hence the effect of multiplying a vector by a scalar is to magnify the vector without changing its direction. The result of this product is to generate a vector. We must now discuss the multiplication of two vectors. Here three-dimensional vector algebra differs radically from the vector algebra of complex numbers. With complex numbers there is only one multiplication operation defined, and the product of two complex numbers is always a complex number. In the case of vectors we shall see that two multi- plication operations are defined for a pair of vectors. One operation called a scalar product generates a scalar, whereas the other operation called a vector product generates a vector. The operation of division is not defined for vectors. The scalar product of two vectors is a generalization of the notion of the orthogonal projection of a line element onto another line and is suggested by Eqn (4-36). Its definition follows. definition 4- 12 The scalar pro duct of the two vectors a = aii -f- at, j + 03k and b = bii + 62J + 63k is written a . b and is defined to be the scalar quantity a . b = aibi + 0262 + 0363. Because of the notation used, a scalar product is often colloquially called the dot product. Some books favour the notation (a, b) for the scalar product when it is then usually called the inner product of vectors a and b. To exhibit the relation of a . b to Eqn (4-36) we first divide a . b by the product of the moduli |a||b| to get a.b lallbl =m$ + m+m} Then, from the definition of direction cosines, we recognize that this may be written a.b — — - = Uz + mm* + nw 2 , (4-38) l a ll b l where {l\, rm, m} are the direction cosines of a and {h, mi, n<i\ are the direc- 148 / COMPLEX NUMBERS AND VECTORS CH 4 tion cosines of b. If 6 is the angle of inclination between a and b then, by virtue of Eqn (4-36), expression (4-38) becomes a . b = |a||b| cos 6. (4-39) This may be taken as an alternative definition of the scalar product a . b. alternative definition 4-13 The scalar product of the two vectors a and b is written a . b and is defined to be the scalar quantity a . b = |a||b| cos 6, where 6 is the angle between the vectors. Notice that it is a direct consequence of the definition that the scalar product of two vectors is commutative. That is, we have a . b = b . a for any two vectors a and b. Because of this property we shall sometimes, and without confusion, write a 2 with the understanding that a 2 = a . a. In practice Definition 4-12 is most used to find the scalar product since it relates the scalar product directly to the components of the vectors involved. The alternative form set out in Definition 4-13 is used to find the angle between the two vectors once the scalar product is known. Example 4-23 Find the scalar product of the vectors a = — 2i — 3j + k and b = — i + j + 3k and use the result to find the angle between a and b. Solution From Definition 4-12 we have a . b = (-2)(— 1) + (— 3)(1) + (1)(3) = 2. Now |a| = V 14 and |b| = VI 1> so that substituting in Definition 4-13 we have 2 = y/14 . y/\ 1 cos 6 and hence cos 6 = 2/ VI 54, or 8 = arc cos (2/ VI 54). Consider the scalar products of the unit vectors i, j, and k. Since these are mutually orthogonal the angle between any two is n/2. It follows from Definition 4-13 that the scalar product of any two different unit vectors from this triad is zero. As each of the vectors i, j, and k is parallel to itself, when forming the scalar product of one of these vectors with itself we must set 6=0. Thus as ]i| = |j| = jk| = 1, it follows from Definition 4-13 that i.i=j.j = k.k= 1. In summary we have these important results, which should be memorized since they are fundamental to everything that follows : i.i = j.j = k.k = 1, i . k = k . i = 0, j.k = k.j = 0. These results are conveniently combined in Table 4-2. Each entry is to be SEC 4-7 SCALAR AND VECTOR PRODUCTS / 149 interpreted as the scalar product of the vector at the left of the row of the entry, with the vector at the top of the column of the entry. Table 4-2 Table of scalar products of i, j, and k First member Second member i j k i j k 1 1 1 The scalar product of two vectors may be deduced using Table 4-2 by simple algebraic manipulation without the use of Definition 4-12. To see this consider the vectors a = aii + fl2J + 03k and b = bii + foj + 63k. First form their scalar product a . b = (aii + a 2 j + a 3 k) . (bii + 62J + 63k), and then expand the right-hand side as though ordinary algebraic quantities were involved to obtain a . b = (aii) . (bii) + (aii) . (b 2 ]) + (aii) . (M) + (a 2 j) . (W) + (fl2j) • (*2J) + («2J) . (MO + (ask) . (bii) + (ask) . (62J) + (ask) . (Mi). Next, recognizing that the scalars ai, bi may be taken to the front of each scalar product involved, rewrite the result thus : a . b = aiM . i + aiM • j + aiM • k + a2^ij . i + 0262] • j + azbz\ . k + asbik. . i + a3&2k . j + a3M< . k. Finally, using Table 4-2, this reduces to the desired result a . b = ai&i + ctzbz + 0363. In practice the intermediate working is always omitted and the result of a scalar product is written on sight by retaining only the products involving i . i, j . j, and k . k. Example 4-24 Determine the scalar products of these pairs of vectors : (a) a = i - 3j + k, b = -i + j - 3k; (b) a = 2i + j - k, b = -i + j - k; (c) a = 2i - j + 3k, b = -2i + j - 3k; (d) a = i + 2j - k, b = i + 2j - k. 150 / COMPLEX NUMBERS AND VECTORS CH 4 Solutions To show the application of scalar products of unit vectors we shall retain the notation i . i, j . j, and k . k in the first part of each calculation to indicate the origin of the terms involved. The terms involving products such as i . j, i . k, . . ., will be omitted as these scalar products are zero. The result will usually be written down on sight without any intermediate working. (a) a . b = (i - 3j + k) . (-i + j - 3k) = (l)(-l)i.i + (-3)(l)j.j + (l)(-3)k.k = -1 -3-3 = -7. (b) a . b = (2i + j - k) . (-i + j - k) = (2)(- l)i . i + (l)(l)j . j + (- 1)(- l)k . k = -2+1 + 1=0. Thus a and b are orthogonal. (c) a . b = (2i - j + 3k) . (-2i + j - 3k) = (2)(-2)i . i + (-l)(l)j . j + (3)(-3)k . k = -4- 1 -9= -14. (d) a . b = (i + 2j - k) . (i + 2j - k) = (l)(l)i.i + (2)(2)j.j + (-l)(-l)k.k = 1+4+1=6. Example (d) above is a special case of the scalar product of a vector with itself and either from Definition 4T2 or 4-13 we see that for an arbitrary vector a, a.a= |a| 2 . (4-40) In words, 'the scalar product of a vector with itself is equal to the square of the modulus of that vector'. This simple result is often valuable when finding a unit vector parallel to a given arbitrary vector a. To see how this comes about, if we divide a by its modulus |a| to form the vector & = a/|aj, then result (4-40) shows that 6fc . a = 1 and so a is a unit vector. Example 4-25 Find a unit vector a parallel to the vector a = 3i — j — 2k. Use the result to determine the projection of the vector b = 2i + 3j + k in the direction of a. Solution Here |a| = -y/14 so that the desired unit vector o = a/\/14 = (3/V14)i — (l/\/14)j — (2/V14)k. Now the projection of vector b along a is by definition the length / of vector b when projected normally onto the line determined by a. Thus it is / = |b| cos 6, where 6 is the angle between b and a. Since |&| = 1 we may write this as / = |b| |o| cos 6 or, by Definition 4-13, as / = b . a. Hence in this problem / = (2i + 3j + k) . a = 1/V14. SEC 4-7 SCALAR AND VECTOR PRODUCTS / 151 It follows from the definition of a scalar product of two vectors and from the properties of real numbers, that if a, b, and c are three arbitrary vectors, then a . (b + c) = a . b + a . c. This is the distributive law for the scalar product of vectors. Expressions of the form a . b . c, a . b . c . d, . . ., are meaningless since the scalar product is only denned between a pair of vectors. Note also that division by vectors is not defined, since although we may write a . b = n, it makes no sense to write either a = nj. b or a . = n/b. The other form of product of two vectors is the vector product. We shall denote the vector product of vectors a and b by a x b. Again because of the notation this is often colloquially called the cross product of two vectors. Other notations in use for the vector product are [a, b] and a A b. In prepara- tion for the definition of a x b we now introduce a unit vector ft that is normal (i.e. orthogonal) to the plane defined by the vectors a and b, and whose sense is such that a, b, and ft, in this order, form a right-handed set of vectors. Here, although a, b, and ft are not necessarily mutually orthogonal, we use right-handedness exactly as was defined at the start of Section 4-6. definition 4-14 The vector product of vectors a and b will be written a x b and is defined to be the vector quantity a x b = |a||b| sin 6ft, where is the angle between vectors a and b with sin 6 > 0, and ft is a unit vector normal to the plane of a and b such that a, b, and ft, in this order, form a right-handed set of vectors. This shows that the vector a x b is normal to both a and b and has magnitude |a||b| sin 6. The first interesting and unusual feature of this form of product is that it is not commutative. If a, b, ft, in this order, form a right- handed set for the definition of a x b, then for the definition of b x a it is necessary to take for the right-handed set the vectors b, a, —ft, in the stated order. The immediate consequence is the important general result that if a and b are arbitrary vectors, then a x b = -(b x a). (4-41) In contrast with the scalar product, it is easily seen that the vector product of parallel vectors is identically zero, whereas the vector product of orthogonal vectors is non-zero. A simple calculation gives Table 4-3 of vector products of the unit vectors i, j, and k. The left-hand column identifies the first member of the vector product and the top row identifies the second member of the vector product. The corresponding entry in the table gives the result of the 152 / COMPLEX NUMBERS AND VECTORS CH 4 vector product. The entries along the diagonal are all seen to be the zero or null vector. Table 4-3 Table of vector products of i, j, and k First member Second member i j k i j k k -j -k i j -i If we take, for example, the first element in the left-hand column and the last element in the top row, we see that i x k = — j. In many respects it is easier to memorize these three results : i x j = k, j x k = i, k x i = j, (4-42) and then to use property (4-41), than to remember Table 4-3 complete. The order of the vectors occurring in these key relations can be remembered by making the cyclic permutations i j k j k i k i j As with scalar products, this table of vector products may be used to calculate the vector product of any two vectors expressed in component form. Consider the vector product a X b where a = aii + a^j + 03k and b = Z>ii + foj + fok. Proceeding as though ordinary algebraic quantities were involved we write a x b = (aii + a 2 j + 03k) X (Z>ii + b 2 ] + 63k) = (tfii) X (M) + (aii) X (6 2 j) + Oii) x (& 3 k) + (a 2 j) X (bii) + (a 2 j) X (6 2 j) + («2J) X (63k) + (ask) X (bii) + (a 3 k) x (6 2 j) + (a 3 k) X (63k), working on the assumption that vector multiplication is distributive over addition. Next we recognize that the scalars ai, bj may be taken out in front of each vector product that is involved so that the expression becomes a x b = aibii x i + ai& 2 i x j + aib^i x k + a2&ij x i + C2&2J x j + 0263 j X k + 0361k x i + azbdi. x j + azb&. x k. SEC 4-7 SCALAR AND VECTOR PRODUCTS / 153 Finally, using Table 4-3 and collecting together the i, j, and k terms, we obtain a X b = (a 2 b 3 — a 3 b2)i + (a 3 bi — a\bi)\ + (aib 2 — a 2 6i)k. (4-43) This is often taken as the definition of the vector product a X b in place of our Definition 4-14. Expression (4-43) may be considerably simplified if the concept of a determinant is used. Before showing this we must digress slightly to define this term. definition 415 Let a, b, c, and d be any four real numbers. Consider the two-row by two-column array of these numbers a b (A) c d. Define the expression a b c d that is associated with this array by the identity t b d = (ad - cb). (B) (Q We define the second-order determinant associated with the array (A) to be the number represented in symbols by (B) and having the value defined by (C). The process of expressing the left-hand side of (G) in the form of the right-hand side is called expanding the determinant. Example 4-26 Evaluate the second-order determinants (a) (b) (c) Solution The values of the determinants follow directly from the definition : = (l)(9)-(3)(7) = 9-21 = -12; (a) 1 7 3 9 (b) -1 4 2 (c) 2 6 1 3 = (0)(2)-(4)(-l) = + 4 = 4; = (2X3) - (1X6) = 6-6 = 0. 154 / COMPLEX NUMBERS AND VECTORS CH 4 definition 4-16 Let en, b u and a with /= 1, 2, 3 be any set of nine real numbers. Consider the three-row by three-column array of these numbers fli #2 as b\ bz b$ Cl C2 C3. Define the expression ai a% «3 b\ bi b-& C\ C2 C3 (A) (B) that is associated with this array to be the single number that is determined by the identity fli 02 a 3 hi. bz £3 Cl C2 C3 b% bz bi bz bi hi = ai — Cl2 + a 3 C2 c 3 Cl cz Cl C2 (Q We define the third-order determinant associated with the array (A) to be the number represented in symbols by (B) and having the value defined by (C). Example 4-27 Evaluate the third order determinant A = 1 2 2 2 2 1 = (3) -(-2) + (-7) 1 1 2 1 2 1 Solution From the definition, 3 -2 -7 2 1 2 2 1 1 Expanding the three second-order determinants and adding, we obtain the desired result A = 3(1 - 2) + 2(2 - 4) - 7(2 - 2) 7. It is helpful to classify determinants in some simple way, which the next definition achieves. definition 417 We define the order of a determinant to be the number of terms that lie on a diagonal drawn from the top left-hand corner to the SEC 4-7 SCALAR AND VECTOR PRODUCTS / 155 bottom right-hand corner. The values of these terms are immaterial. Thus in Example 4-26 the determinants are second-order, whereas in Example 4-27 the determinant is third-order, and is evaluated in terms of three second-order determinants. We are now able to give the promised alternative definition of a vector product. alternative definition 4-18 We define the vector product a X b of the two vectors a = aii + a%\ + 03k and b = &ii + 62J + fok to be the formal expansion of the determinant a x b i J k fli ai 03 by b 2 b 3 In this definition we have used the word 'formal' because, although the at and bi are real numbers, the i, j, and k are unit vectors. Aside from this the expansion of the third-order determinant is performed exactly as in Example 4-27. Example 4-28 Determine the vector product a x b where a = i + j — 2k and b = -2i + 3j + k. Solution To apply Definition 4-18 we first notice that the components fli, at, and 03 of a are 1, 1, and —2 whilst the components b±, b%, and 63 of b are —2, 3, and 1. Hence i j k a x b = 1 1 -2 -2 3 1 and so a x b = = 7i + 3j + 5k. = 1 -J 1 -2 + k 1 1 -2 1 -2 3 This effectively demonstrates that for most practical purposes Definition 4-18 involves the least manipulation. It is easily proved that the vector product is distributive, so that for any three vectors a, b, and c we always have ax(b + c) = axb + axc. Indeed this is implied by the way in which Eqn (4-43) was derived. With the introduction of the vector product, mixed products of the form a . (b x c) become possible. This type of product is known as a triple scalar 156 / COMPLEX NUMBERS AND VECTORS CH 4 product and as it involves the scalar product of a with (b x c) it is seen to be a scalar. If a = a x i + a 2 \ + a 3 k, b = bii + b 2 \ + b 3 k, and c = cii + c 2 \ + c 3 k then by combination of Definitions 4-12 and 4-18 we have a . (b x c) = (flii + at\ + a 3 k) . i bi Cl j k b 2 b 3 C 2 c 3 or, a . (b X c) = ai(b 2 c 3 — c 2 b 3 ) — a 2 (bic 3 — cib 3 ) + ^3(^1^2 — cib 2 ). The terms on the right-hand side of this expression are the result of expanding (C) in Definition 4-16, so that they may be re-combined into a determinant to give the general result a . (b x c) = fll 02 a 3 bi b 2 b 3 Cl c 2 c 3 (4-44) By interchanging rows of the determinant it is readily shown that the dot . and the cross x in a triple scalar product may be interchanged so that a . (b x c) = (a x b) . c. (4-45) Example 4-29 Evaluate the triple scalar product a . (b x c) given that a = 2i + k, b = i + j + 2k, and c = — i + j. Solution The components of a, b, and c are, respectively, (2, 0, 1), (1, 1, 2), and (—1, 1,0). Hence a . (b x c) 2 1 1 1 2 -1 1 = 2 . (-2) - . (2) + 1 . (2) 2. As our next generalization, we notice that vector products of more than two vectors are defined provided the order in which these products are to be carried out is specified by bracketing. As a special case we have the triple vector product a x (b X c) of the three vectors a, b, and c which differs from the triple vector product (a x b) x c. The first expression signifies the vector product of a and (b x c), whilst the second signifies the vector product of (a X b) and c, and in general these are different vectors. A straightforward application of Definition 4T8 establishes the following useful identity from which some interesting results may be derived a x (b x c) = (a . c)b — (a . b)c. (4-46) SEC 4 . 8 GEOMETRICAL APPLICATIONS / 157 The details of the proof are left to the reader. Example 4-30 Demonstrate the difference between the triple vector products a X (b X c) and (a X b) x c by making the identifications a = i, b = i + j, c = k. Solution By direct substitution we find that a x (b x c) = i x f(i + j) X k] and so expanding this result by using Eqn (4-42) gives ax (b X c) = i x [— j + i] = — k. Similarly, in the second case, (a x b) x c = [i x (i + j)] x k = k x k = 0. 4-8 Geometrical applications This section illustrates something of the application of vectors to elementary geometry, and gives some simple but useful results. First we consider the representation of a straight line in vector form, and then show how the single vector equation may be reduced to the more familiar set of three Cartesian equations. The straight line Consider the problem of determining the equation of a straight line given that it passes through the point A with position vector a relative to O, and is parallel to vector b. We shall denote the position vector of a general point P on the line by r as shown in Fig. 4-13. Fig. 4.13 Straight line through A parallel to b. By the rules of vector addition we have OP = OA + AP or, r = a + AP. However, as the straight line through A is parallel to the free vector b, 158 / COMPLEX NUMBERS AND VECTORS CH 4 it follows that for any point P on the line there is a scalar A such that we can write AP = Ab. Applying this result to the equation above we see that the vector equation for the straight line becomes r = a + Ab. (4.47) The scalar A in this equation is simply a parameter, and different values of A will determine different points on the line. To express this result in Cartesian form, set r = xi + y\ + zk, a = aii + a 2 j + 03k and b = bii + b 2 j + b 3 k, when Eqn (4-47) reduces to xi + y\ + zk = fl i J + fl 2J + a 3 k + A(Z>ii + b 2 \ + b 3 k). This vector equation implies three scalar equations by virtue of the equality of its i, j, and k components. Hence we arrive at the three scalar equations x = ai + Xbi (i-component) y = a 2 + A62 (j-component) z = a 3 + A63 (k-component). If these are each solved for A and equated, we obtain the more familiar result x — a\ y — ai z — a 3 - y -^— = -^ = *- (4-48) bi bi b Equations (4-48) are the standard Cartesian form for the equations of a straight line. Notice that the coefficients of x, y, and z in Eqn (4-48) are all unity; that b\, b 2 , and b 3 are then the direction ratios of b and a\, a 2 , and a 3 define a point on the line. Equations (4-48) are sometimes expressed in the form of three simultaneous equations relating x and y, x and z, and y and z. This follows by cross-multiplying different pairs of expressions in Eqn (4-48). Example 4-31 Find the vector equation of the line through the point with position vector i + 3j — k which is parallel to the vector 2i + 3j + 4k. Determine the point on the line corresponding to A = 2 in the resulting equation. Also express the vector equation of the line in standard Cartesian form. Solution From Eqn (4-47) we have r = (i + 3j - k) + A(2i + 3j + 4k) or, r = (1 + 2A)i + 3(1 + A)j + (4A - l)k. This is the vector equation of the line, and setting A = 2 determines the point r = 5i + 9j + 7k. To express the equation of the line in Cartesian form we appeal to Eqns (4-48) and use the fact that a = i + 3 j — k and SEC 4-8 GEOMETRICAL APPLICATIONS / 159 b = 2i + 3j + 4k. Hence a\ = 1, a 2 = 3, az = —1, and b\ = 2, b% = 3, and bz = 4, so that the desired Cartesian equations are x — 1 J — 3 z+1 As a check we can also use these equations to determine the point corres- ponding to X = 2. We must solve the three equations 2 ' 3 ' 4 ' which give x = 5, y = 9, and z = 7. These are of course the coordinates of the tip of the position vector r = 5i + 9j + 7k which confirms our previous result. The same approach may be used if the line is required to pass through the two points A and B with position vectors a and (3, respectively. For then the line passes through a and is parallel to the vector [3 — a which is just a seg- ment of the line itself. Hence we identify a with a and b with (3 — a, after which the argument proceeds as before. In the next example we illustrate how the non-standard Cartesian equa- tions of a straight line may be re-interpreted in vector form. Example 4-32 The equations 2x-l_j> + 2_-z + 4 3 ~ 3 ~ 2 determine a straight line. Express them in vector form and find the direction ratios of the line. Solution To express the equations in standard Cartesian form we must first make the coefficients of x, y, and z each equal to unity. Hence we rewrite the equations: x — | y + 2 z — 4 ~m = ~~3~ = F2) ' The vector a then has components «i = J, «2 = — 2, a 3 = 4 and the vector b has the components bi = 3/2, b 2 = 3, b 3 = —2. These latter three numbers are the desired direction ratios. The vector equation of the straight line itself is r = 1(1 + 3A)i + (31 - 2)j + 2(2 - A)k. (Why?) On occasion it is necessary to determine the perpendicular distance p from a point C with position vector c to the line L with equation r = a + Ab. 160 / COMPLEX NUMBERS AND VECTORS CH 4 Fig. 4-14 Perpendicular distance of point from line. This can be done by applying Pythagoras' theorem in Fig. 414. We have the obvious result n 2 — (AC) 2 - (AB) 2 but AC = c - a so that (AC) 2 = |AC| 2 = (c - a) . (c - a), whilst length AB is the projection of AC onto the line L. Now the unit vector along L is b/|b| so that AB = (c -~a) . b/|b| and thus ((c-a).b\ 2 (AB) 2 = / (c - a) . b y I ibi ; Combining these results gives i 2 = (c — (c - a) . (c ( (e - a) . b \ (4-49) from which p may be deduced. Example 4-33 Find the distance of the point with position vector i + j + k from the line r = (i + 2j + k) + A(i - 2j + k). Solution In the notation leading to Eqn (4-49) we have a = i + 2j + k, b = i — 2j + k, and c = i + j + k. Hence c — a = — j and thus (c — a) . (c - a) = (-j) . (-j) = 1. Also (c - a) . b j . (i — 2j + k) = 2 so that ((c - a) . b) 2 = 4, whilst |b| 2 = 6. Hence /(c- a).b\ 2 _4_ I |bi J ~6~ SEC 4-8 GEOMETRICAL APPLICATIONS / 161 Fig. 415 Vector equation of a plane n . r = \n\p. and so from Eqn (4-49), p 2 = 1 - § = i r p = 1/^3 as p is essentially positive. The plane The equation of a plane is easily determined once it is recognized that a plane II is specified when one point on it is known, together with any vector perpendicular to it. Such a vector, when normalized, is a unit-normal to the plane II and is unique except for its sign. The ambiguity as to the sign of the normal is, of course, because a plane has no preferred side. To derive its equation consider Fig. 4-15. Let r be the position vector relative to O of a point P on the plane II, and n be a vector normal to the plane directed through the plane away from O so that the corresponding unit normal is n = n/|n|. Further, let the perpendi- cular distance ON from the origin O to the plane be p. Then for all points P we have (OP) cos 6 = p. In terms of vectors this is r . n 7nT =/7 ' (4-50) which is just the vector equation of a plane. If the number p in Eqn (4-50) is positive then the plane lies on the side of the origin towards which n is directed, otherwise it lies on the opposite side. To express result (4-50) in Cartesian form let r = xi + jj + zk and the unit normal n = n/|n| = /i + m \ + nk, where of course I 2 + m 2 + n 2 = 1. Equation (4-50) becomes lx + my + nz=p. ( 4 . 51 ) 162 / COMPLEX NUMBERS AND VECTORS CH 4 This is the standard Cartesian form of the equation of a plane. Any equation of this form represents a plane having for its unit normal the vector /i + m\ + «k and lying at a perpendicular distance p from the origin. If p = the plane passes through the origin. Example 4-34 Find the Cartesian equation of the plane containing the point (1, 2, 3) which is normal to the vector i + 2j + 2k. Solution First we use Eqn (4-50) to determine p. Since the point (1, 2, 3) lies in the plane, r = i + 2j + 3k is the position vector of a point in the plane. The vector normal to the plane in this case is n = i + 2j + 2k, so that |n| = 3 and the unit normal ft = n/[n| = (i + 2j + 2k)/3. This shows that / = J, m = f , n = f . Hence, substituting into Eqn (4-50), (i + 2j + 3k) . (i + 2j + 2k) or/? = 11/3. As p > 0, the plane must lie on the side of the origin towards which n is directed. Substituting in Eqn (4-51) we find the desired Cartesian form of the equation of the plane : 3* + 3 y + 3 Z == ~3~- This equation could equally well be written in the non-standard Cartesian form x + 2y + 2z = 11, though then the constant on the right-hand side is no longer the perpendicular distance of the plane from the origin. Simple geometrical considerations similar to those set out above, when coupled with the scalar and vector product, enable various useful results to be derived very quickly. For example, as the angle 6 between two planes is defined to be the angle between their unit normals fii and &2 it follows that d may be obtained from the scalar product Ai . n2 = cos d. Also the line of intersection of these two planes is perpendicular to both normals hi and n 2 and so is parallel to the vector t determined by the vector product t = fii x n2. Rather than elaborate on these ideas here, a number of problems are given at the end of the chapter. The sphere Consider a sphere of radius R with its centre at the point A with the position vector a. Then if r is the position vector of any point on the surface of the sphere, the modulus of the vector r — a must equal R. In terms of vectors the equation of the sphere is |r - a| = R or, alternatively, SEC 4-9 APPLICATIONS TO MECHANICS / 163 (r - a) . (r - a) = R 2 . (4-52) If, now, we expand this equation to get r . r - 2r . a = i?2 - a . a, and then set r = xi + y\ + zk, a = aii + a 2 j + a 3 k and R 2 - a . a = q, we obtain the standard Cartesian form of the equation of a sphere x 2 + y 2 + z 2 - 2aix - 2a 2 y - 2a 3 z = q. (4-53) Example 4-35 Find the Cartesian form of equation of the sphere of radius 2 having its centre at a = i + j + 2k. Solution As r = xi + y\ + zk and a = i + j + 2k we have r — a = (x- l)i + {y- l)j + (z - 2)k, whilst R = 2. Hence Eqn (4-52) becomes (x - 1)2 + (y - 1)2 + ( z _ 2)2 = 4, which is the desired Cartesian form of the equation. 4-9 Applications to mechanics This section briefly introduces some of the many situations in mechanics that are best described vectorially. First is one of the simplest applications of vectors, that will already be familiar to the reader. Polygon of forces — resultant It is known from experiment that when forces Fi, F 2 , . . ., F„ act on a rigid body through a single point O, their combined effect is equivalent to that of a single force R, their resultant, which acts through the same point O and is equal to their vector sum. Such a system of forces acting through a single point is a concurrent system of forces. Thus we have R = F 2 + F 2 + ■ • • + F n . (4-54) These forces are often represented in the form of a vector polygon of forces as shown in Fig. 4-16, in which the senses of the forces F< are all simi- larly directed and are opposite to the sense of R. Conversely, the vector polygon shows that the vector -R is the additional force that is required to act through O in order to maintain the system oJ forces in equilibrium. Example4-36 Forces Fi,F 2 , and F 3 have magnitudes 3^/3, s/\ 4, and 2^6 lb and act concurrently through a point O along the lines of the vector i + j + k, 3i - j + 2k, and -i + 2j + k, respectively. Find force Q tha must act through O for the system to remain in equilibrium. 164 / COMPLEX NUMBERS AND VECTORS CH 4 Fig. 416 Vector polygon. Solution This is a direct application of the last remark about the vector polygon of forces, and the only problem is one of scaling. Let us agree that a vector of unit modulus represents a force of 1 lb. From the conditions of the question we see that Fi, F2, and F3 are respectively directed along the unit vectors fi = -= (i + j + k), 1 V14 1 (3i - j + 2k), V6 (-i + 2j + k). Using the scale factor we can use these to write Fi = 3V3fi = 3i + 3j + 3k, F 2 = V14f 2 = 3i - j + 2k, F 3 = 2V6f 3 = -2i + 4j + 2k. Hence the resultant R = Fi + F2 + F 3 = 4i + 6j + 7k. The force necessary for equilibrium is Q = — R showing that Q = — 4i — 6j — 7k. As |Q| = V101> it follows immediately that the desired force is yT01 lbs and acts in the direction of the unit vector q, where ■1 Vioi (4i + 6j + 7k). In many problems of statics the centroid or the centre of mass of a system of particles is of importance. We now define this concept in terms of vectors. SE C 4-9 APPLICATIONS TO MECHANICS / 165 definition 4-19 The centre of mass of the system of masses m\, mo, ■ . ., m n whose position vectors are ai, a2, . . ., a„ is at the point G. where G has the position vector g determined by wiai + mono + • • • + /««a„ g = -. nix + mo + ■ ■ • + m„ Next we discuss simple problems about relative motions, and relative velocity. Relative velocity Problems involving the motion of one point relative to another, which is itself moving, occur frequently in mechanics and easily lend themselves to vector treatment. They are best illustrated by example but first we define relative velocity. definition 4-20 The relative velocity of a point P with velocity u, relative to the point Q with velocity v, is defined to be the velocity u — v. Example 4-37 A man walks due east at 4 mile/h and his dog runs north- east at 12 mile/h. Find the velocity and speed of the man relative to his dog. Solution Let a unit vector denote a velocity of magnitude 1 mile/h and take j pointing due north and i pointing due east. Unit vectors in the directions of motion of the man and dog are then i and + })IV 2 - The velocity u of the man is thus u = 4i and the velocity v of the dog is v = 6V2(i + j). Hence the velocity of the man relative to his dog is u - v = 2(2 - 3^2)1 - 6V2j. His relative speed is |u - v| = (160 - 48V2) 1/2 mile/h. Work done by a force The scalar product can be used to give a convenient representation of the work W done by a force F that produces a displacement d of the particle on which it acts. The work done by a force of magnitude |F| when it displaces a particle through a distance ]dj is defined as the product of the distance moved and the component of force in the direction of the displacement. Hence, as W is positive we have W= |F||d||cos0|, where 6 is the angle of inclination between F and d. So the final result is: W=|F.d|.. (4 .55) Example 4-38 Calculate the work W done by a force F of 12 lbs whose line 166 / COMPLEX NUMBERS AND VECTORS CH 4 of action is parallel to 2i + 3j — 2k when it moves its point of application through a displacement d of 4 ft in a direction parallel to — 2i + j — 3k. Solution The unit vectors parallel to the force F and displacement d are f = (2i + 3j - 2k)/v 17 and d = (-2i + j - 3k)/ v 14, respectively. Let f denote a force of 1 lb and d a displacement of I ft so that F = 12f = (24i + 36j - 24k)/ v 17 and d = 4d = (-8i + 4j - 12k)/y 14. Then the work W that is done is W = |F.d| ft lbs = (24)(-8) + (36)(4) + (-24K- 12) = 240 ft lbs. We now turn to applications of the vector product. One of the easiest occurs in the determination of the angular velocity of a point rotating about a fixed axis. Angular velocity Consider a rigid body rotating with a constant spin SI rad/s about a fixed axis L. Fig. 4- 1 7 represents a point P in such a body, having the position vector d relative to a point O on the spin axis L. Point Q is the foot of the perpendi- cular from P to the line L. The vector SI parallel to L with magnitude U. and sense determined by a right-hand screw rule with respect to L and the direction of the spin O is called the angular velocity of the body. The instantaneous linear velocity v of point P with position vector d is obviously Q. . (QP) in a direction tangent to the dotted circle in Fig. 4T7. It is easily seen that we may rewrite this as |v| = |£2||d| sin© or as v = SI x d. (4-56) The final two applications of the vector product involve the concept of the moment of a vector which is first defined and they require the use of a bound vector. definition 4-21 We define M = d x Q to be the moment of vector Q about the point O, where d is the position vector relative to O of any point on the line of action of the bound vector Q. This definition is illustrated in Fig. 4T8 in which the plane IT contains the vectors d and Q and, by virtue of the definition of the moment, M is normal toll. The natural mechanical applications of this definition are to the moment of a force and to the moment of momentum about a fixed point. In both PROBLEMS / 167 M = dxQ Fig. 4-17 Angular velocity. Fig. 418 Moment of a vector about O. situations the line of action of the vector whose moment is to be found is important, as is its point of application in some circumstances. If Q is identified with a force F, then the expression M = d X F (4-57) is the moment or torque of the force F about O. If the force is expressed in lb and the displacement vector in ft, the units of torque are lb-ft. Similarly, if Q is identified with the momentum mv of a particle of mass m moving with velocity v, then the vector M = d x (mv) = m& X v (4-58) is the moment of momentum or the angular momentum of the particle about O. PROBLEMS Section 4-1 4-1 Give a graphical representation of each of the following velocities by drawing directed line elements. In each case indicate the sense of the vector with an (a) 4 ft/s in a north-east direction; (b) 2-5 ft/s in a south-west direction; (c) 5 ft/s due west. What velocities would these same directed line elements represent if the arrows were reversed ? 4-2 Classify each of these quantities as scalar or vector: (a) volume; (b) length; (c) momentum = mass x velocity; (d) electric field; (e) speed; (f) acceleration; (g) density; (h) chemical concentration; (i) electrostatic capacity; (j) moment of a force. 168 / COMPLEX NUMBERS AND VECTORS CH 4 4-3 Find the roots of each equation : (a) x 2 = -36; (b) x 2 = -27; (c) x 2 = 25; (d) x 2 = -2. 4-4 Find the roots of these quadratic equations: (a) x 2 + 3x + 3 = 0; (b) x 2 - 3x + 2 = 0; (c) x 2 + 4x + 5 = 0. 4-5 By setting x 2 = w, reduce the following quartic equations to quadratic equations, and hence obtain their roots: (a) x* + x 2 - 2 = 0; (b) x 4 + 5x 2 + 6 = 0; (c) x 4 - 5x 2 + 6 = 0. 4-6 Find the real and imaginary parts of each of these complex numbers: (a)z = 9-6/; (b) z = 32; (c) z = 14 + 2/; (d) z = 17/; (e) z = -3 + /. 4-7 Write the following numbers in real-imaginary form given that their real and imaginary parts are: (a) Rez = -11, Imz = 1; (b) Re z = 0, Im z = -3; (c) Re z = 0, Im z = 0; (d) Re z = 4, Im z = 17. Section 4-2 4-8 Which of these complex numbers are equal ? zi = 2 — /, z 2 = 1 — /, z 3 = 4 + ;', z 4 = 1 — ;', Z5 = 2 + /, Z6 = 2 — /, Z7 = 1 — /'. 4-9 Given that the following complex numbers are equal, deduce the values of a and b : (a) 2 - 3/ = 2 + ib; (b) a + 4/ = 1 + ib; (c) 3 + 7/ = a + ib; (d) 5 + ia = b + 6/. 4- 10 Use Definitions 4-2 and 4-3 together with the real number axioms to prove that (a) zi + Z2 = Z2 + z\ thereby showing that complex addition is com- mutative and, (b) z\ — Z2 = — (z2 — zi). 411 Form the sums z\ + zz given that: (a) zi = 3 — /', Z2 = 4 + li\ (b) zi = -2 - 4/, z 2 = 2 + 3/'; (c) zi = 5 + 6/', Z2 = —5 — 6/'; (d) zi = 4 - 3/, z 2 = 2 + 3/. 4-12 Form the differences z\ — Z2 given that: (a) zi = 2 + 6/, z 2 = 4 + 2i; (b) zi = -2 + /, z 2 = -2 + 2/; (c) zi = 4 + li, z 2 = 2 + 7/; (d) z\ = 3/, z 2 = 1 + 3/. 4- 13 Form the products zizo given that: (a) zi = 1 + /, z 2 = 2 + 3/; (b) zi = 3 - 5;, z 2 = 3 + 5/'; (c) 2i = /', Z2 = 4 — 3/; (d) 2i = 2, z 2 = 9 - i. 4-14 Evaluate (1 + 5 - (1 - if- 1 1 4-15 Evaluate (1 + i)« T (1 - ;) 4 PROBLEMS / 169 4- 16 Solve these equations for z: (a) 3z + (9 + 6/) = 7 + 3/; (b) 2z + (3 - 2i) = 3 - 2»; (c) 4z - (4 + 6/) = -3 + i; (d) 3z + (2 + /) - 3(1 + 2/) = 1 + /. 4-17 Form the quotients zi/z2 given that: (a) zi = 3 + 2/, z 2 = 1 - /; (b) zi = 9 + 3/, z 2 = 3 + /; (c) zi = 8 + 4/', z 2 = 2 - 4;'. 4-18 Solve these equations for z: (a) 2z(3 + = 2 + 3/; (b) 3z(l - 2/) = 1 + 4/; (c) 4z(l -0 = l + j; (d) 2z(4 + 0=1+ 4». 419 Use Definition 4-4 and the real number axioms to prove that zizo = z«z\ thereby showing that complex multiplication is commutative. 4-20 Use Definition 4-5 to prove that: <a)*-<7>; (b) (i)=# (c) £5) = (z)3; (d) (~ j = z lf2 . 4-21 Use the real number axioms together with Definition 4-5 to prove that: (Zl + Z2 + • • ■ + Zn) = Zl + Z 2 + • • • + Z„. 4-22 State which of the following polynomials have at least one real root and which, if they have complex roots, will have them occur in complex conjugate pairs. If no deductions can be made about the nature of the roots, then say so. (a) P(z) = z 5 + 16z 4 + z 2 + 3z + 1 ; (b) P(z) = z 4 + 3z 3 + 2z 2 + 1 ; (c) P(z) = z 7 + 5z 5 - 2z 2 + z + /; (d) P(z) = z 3 - 6z 2 + 2z + 4. 4-23 Given that z = 2 + 3/ is a root of the polynomial P(z) = z 4 - 4z 3 + 12z 2 + 4z - 13, deduce the values of the other three roots. Factorize P{z) into linear and quadratic factors with real coefficients. 4-24 Given that z = i is a root of the polynomial P(z) s- z 5 - 2z 4 + 10z 3 - 20z 2 + 9z - 18, deduce the values of the other four roots. Factorize P(z) into linear and quadratic factors with real coefficients. Section 4-3 4-25 Plot the following vectors z x and z 2 in the complex-plane and use geometrical methods to form their sum zi + z 2 and their difference zi - z 2 : (a) zi = 2 + 3/, z 2 = -1 + 2/; (b) zi = 3, z 2 = 4 - /• (c) zi = 4/, z 2 = 3 - 4/; (d) zi = -1 - 2/, z 2 = -1 + 2/. 4-26 Find the modulus of each of these vectors: (a) 4 - 3/; (b) -2 + 3/; (c) 2 - 3»; (d) 3 + 4/; (e) 5i. 170 / COMPLEX NUMBERS AND VECTORS CH 4 4-27 Use Definitions 4-5 and 4-7 to prove that: (a) zz = | z I 2 ; (b) ( zi z 2 1 = | zi | . | z 2 1 ; and give an inductive proof that | Zl Z2 ■ ■ ■ Z„ | = | Zl | . | ZZ | • • • | Z„ | . 4-28 Given that zi = 3 + 4/, Z2 = 4 — 3i, Z3 = 2 + /, and Z4 = V3 + /, use the results of the previous problem to compute | z\ z% |, | zi Z2 Z3 |, and | zi Z2 zz Zi | . Check your results by direct computation and compare the relative labour of computation. 4-29 Use the properties of the complex conjugate operation to prove that for any two complex numbers zi and Z2, zi Z2 + zi Z2 = 2 Re zi £2. Then, using this result together with the obvious inequality | Re zi Z2 | < | zi z 2 | and the identity | Zl + Z2 | 2 = (Zl + Z2)(Z1 + Zz), prove the triangle inequality, | Zl + Z2 | < | Zl | + | Z2 |. 4-30 Use the same form of argument as in Problem 4-29 together with the obvious inequality Re zi Z2 > — | zi Z2 | to prove ||zi| - |z 2 || <|zi + z 2 |. 4-31 Give two examples in which the triangle inequality is strict (that is, the sign < is replaced by <). Give two further examples in which it reduces to an equality. 4-32 Give two examples in which the inequality 1 1 zi | — | Z2 1 1 < | zi + Z2 | is strict. Give two further examples in which it reduces to an equality. Section 4-4 4-33 Express these numbers in modulus-argument form: (a) z = -3 + 4/; (b) z = -3 - 4;; (c) z = -3 + 3/; (d) z = 2V3 - 2i. 4-34 Express the following numbers z in real-imaginary form given that : (a) | z | = 4, arg z = |; (b) | z | = 2, arg z = ^; (c) | z | = 6, arg z = y ; (d) | z | = 3, arg z = ~ 4-35 Use the modulus-argument representation of complex numbers to prove : (a) Zl Z2 • • • Zn = Zl . Z2 • • • Zn\ (b) (z") = (f)"; (c) and arg |-| = —arg; PROBLEMS / 171 4-36 Given the following numbers z in real-imaginary form, compute the products iz. Plot the results in the complex-plane and verify that the effect of multi- plication by / is to rotate a vector anti-clockwise through an angle I * without change of size: z = 3 — 2/; z = —2 + i; z = /; z = — 1 — /. 4-37 Form the products z\ zo and the quotients z\jzo of the following numbers expressed in modulus-argument form: (a) zi = 3(cos ,'jjr + /sin \tt)\ z 2 = |(cos }» + /sin \tt); (b) z\ — 4(cos \-n — /sin \-t)\ zi = 2(cos i^ + /sin Jtt); (c) zi = iHcos iw — / sin iw); z-i = 6(cos 3w/2 — / sin 3-n-/2). 438 The second-order difference equation ailn + bun-l + CUn-2 = has for its general solution the expression u ,, = Ah" + Bh" whenever the characteristic equation aX l + bl + c = has the distinct real roots h and A2. If b 2 — Aac < 0, so that the character- istic equation has the complex conjugate roots A and A, show that if u n is to be real, then the constants A and B must also be. complex conjugates. Hence show that if/) 2 — Aac < and | I \ — r , arg I = 0, then the general solution is expressible in the form Un = r"(Ccos nO + D sin nO), where C and D are real arbitrary constants. Find the general solution of the following difference equation, and hence determine the particular solution appropriate to the stated initial conditions: ti„ — 3\ 2u„-\ + 9«jj-2 = with un = 1, wi = 3. Section 4-5 4-39 Use de Moivre's theorem to express sin 16 and cos 16 in terms of powers of sin and cos 0. 4-40 Use de Moivre"s theorem to express sin 1 1 and cos 1 1 in terms of powers of sin 6 and cos 6. 4-41 Evaluate z 20 when z = -\ '3 + /. 4-42 Evaluate z u when z = I — i \ 3. 4-43 Calculate the seventh roots of unity. 444 Find the roots of the equation iv = (— /) 2 < 3 . 4-45 Find the roots of the equation w = (1 + i\ 3) ,/4 . Section 4-6 4-46 Construct the set of cyclic permutations of the four letters a, b, c, and d. 4-47 Construct a table analogous to Table 41 for a left-handed system of axes. 4-48 Determine the lengths j OP | of the vectors OP given that O is the origin and the points P are: 172 / COMPLEX NUMBERS AND VECTORS CH 4 (a) (1,1,1); (b) (-2,1,3); (c) (-1, -1, -1); (d) (3, -2, -4). 449 Find the lengths | OP |, the direction cosines and the angles 0i, 02, 03 of the vectors OP, where the points P are: (a) (2, - 1, - 1); (b) (4, 0, 2); (c) (-1,2, 1). 4-50 Find the direction ratios, the direction cosines and the angles 0i, 2 , 63 of the vectors OP, where the points P are: (a) (1, 1, l)T(b) (-1,1,1); (c) (2,1, -1). 4-51 Determine the angles 0i, 02, 03 for the vectors with the direction cosines: 452 Given that a vector makes acute angles with each of the coordinate axes and that its direction cosines are Im, m, —pi\, deduce the value of m and hence find the angles. \ ^ 1 4-53 Use the fact that a vector makes an acute angle with each of the coordinate axes and that its direction ratios are 1, 2, 2 to determine the angles 0i, 02, and 03 that it makes with the coordinate axes. 454 Determine the lengths | AB | of the vectors AB, given that the end points A and B are: (a) A = (1, 1, 1), B = (2,0, 1); (b) A = (2, -1,1), B = (-2, 2,2); (c) A = (-1,3, 1), B = (-2, -1,0). Use your results to determine the direction cosines for each of these vectors. 4-55 Write down the position vectors OP in terms of the unit vectors i, j, k given that O is the origin and the points P are: (a) (1,1,1); (b) (-2,3,7); (c) (3, -1, 1 1); (d) (0,1,0). 4-56 Write down the x, y, and z-components of these vectors: (a)3i-2j + k; (b) -i + 3j + Ilk; (c) i - k; (d) j + 3k. 4-57 Form the vector a = ai + /?j + 7k, given that (1 - a)i + 2/5j + (2y - l)k = 2i + j + 3k. 4-58 Determine the values of ot, ft, and y in order that: (1 - a)i + W ~ * 2 )j + (3' ~ 2)k = li + 3j + 2k. 459 Form the sum a + b and difference a — b of the vectors: (a) a = 3i - 2j + k, b = -i - 2j + 3k; (b) a = -i + 2j - k, b = 2i - 4j + 2k; (c) a = 2j - 3k, b = 2i - j + k. 4-60 Prove from the definitions of addition and subtraction of vectors that for any vectors a and b (a) a + b = b + a and (b) a - b = -(b - a). 4-61 Find |AB| and the direction cosines of the vectors AB given that A and B are the points: PROBLEMS / 173 (a) A = (1, 1, 1), B = (2, -1, 1); (b) A = (2,0,-1), B = (1,2,1); (c) A = (1, 1, 1), B = (-1,-1,-1). 4-62 State which of the following pairs of vectors a and b are parallel and which are anti-parallel : (a) a = i - 3j + k, b = -4i + 12j - 4k; (b) a = -2i + 3j - k, b = 2i - 3j + k; (c) a = 4i - j - 3k, b = 8i - 2j - 6k; (d) a = i + 7j + k, b = 3i + 21j + 3k. Section 4-7 4-63 Express the following vectors a as the product of a scalar and a unit vector: (a) a = 2i - j + 3k; (b) a = 3i - 3j + k; (c) a = -^-i + -j - ^k. 4-64 Find the vectors AB, and their direction cosines given that A and B have position vectors a and b, respectively, where (a) a = 3i - 3j + 5k, b = i + 2j - k; (b) a = 2i + 2j + k, b = i + 3j + 2k. 4-65 Verify the inequalities 1 1 a | - | b 1 1 < | a + b | < | a | + | b | for the pairs of vectors : (a) a = i - 2j - k, b = 2i - 3j + k; (b) a = 3i - 4j + k, b = 6i - 8j + 3k; (c) a = 2i + 3j - k, b = -6i - 9j + 3k. 4-66 Find the angle between the vectors a and b where: (a) a = i + j + k, b = 2i + j - k; (b) a = -i + 2j + 2k, b = 2i — j — 2k. 4-67 Give two examples of pairs of vectors that are orthogonal but are not parallel to the vectors i, j, or k. 4-68 Give two different proofs of the fact that scalar multiplication of vectors is commutative by using the two alternative definitions of the scalar product. 4-69 Find the scalar products a . b and hence find the angle between the vectors a and b given that : (a) a = 7i - 3j + k, b = -i + 2j + 2k; (b) a = 2i - 2j + k, b = — 3i — 3j + 4k; (c) a = i + 2j + 3k, b = -2i - 4j - 6k. 4-70 Find unit vectors parallel to the vectors a where : (a) a = 2i - 2j + k; (b) a = -3i + j + 2k; (c) a = 7i - 2j - 3k. 4-71 Prove the distributive law for the scalar product by using either definition of the scalar product. 4'72 Form the vector products a x b if: (a) a = i - 2j - 4k, b = 2i - 2j + 3k; (b) a = -i + 4j - k, b = 3i + 2j + 4k; (c) a = -2i + 4k, b = 3j - 2k. 174 / COMPLEX NUMBERS AND VECTORS CH 4 4-73 Evaluate the determinants: (a) 2 1 (b) 4 16 (c) 2 (d) 3 9 4 6 ' -2 6 ' 16 * 1 3 4-74 For what values of A, if any, do these determinants vanish: (a) 4-75 Evaluate the determinants: A 2 (b) A 2 (0 3 A (d) 3 1 3 2A ' 2 ' (a) 2 1 1 1 2 1 1 1 1 (b) 3 4 5 (c) 2 2 1 » 1 2 3 4 5 3 1 2 6 5 7 3A 4 2 -A 4-76 For what values of A do the following determinants vanish : (a) A 1 2 (b) 1 A 1 ; 2 2 1 A 1 ic) 2A 1 ; 1 3 1 2 A 1 1 A 1 4-77 Use Definition 418 to prove that a x b = — (b x a) for arbitrary vectors a and b. 4-78 Evaluate the vector products b x a given that : (a) a = 2i - j + 2k, b = -3i + 2j + k; (b) a = -i + j + k, b = 4i + 2j + 3k; (c) a = -i - j - k, b = 2i + 2j + 2k. 479 Determine unit vectors that are normal to both vectors a and b when : (a) a = 3i + 5j - 2k, b = i + j + k; (b) a = -4i + 2k, b = j - 3k. State whether the results are unique and, if not, in what way are they in- determinate. 4-80 Use the definition of a vector product to prove that it is distributive and so ax(b + c) = axb + axc. 4-81 Use Definition 418 of a vector product to prove that when a and b are non- zero vectors, then a x b = if, and only if, a and b are parallel. 4-82 Use Definition 418 to evaluate the vector products a x b given that: (a) a = -i + 4j - 2k, b = 2i + 3j + k; (b) a = -2i - 3j + k, b = 6i + 9j - 3k; (c) a = 3i - k, b = 2j. Evaluate these same vector products using Table 4-3 and compare the effort involved. 4-83 Verify the distributive property of the vector product: ax(b + c) = axb+axc, given that a = 2i + j — k, b = i — z) + k and c = 3i — 2j + 3k. PROBLEMS / 175 4-84 Evaluate the triple scalar products a . (b x c) and (b x a) . c given that: (a) a = 2i - j - 3k, b = 3k, c = i + 2j + 2k; (b) a = i + 2j + k, b = 2i + j + k, c = 4i + 2j + 2k. 4-85 Prove that if a, b, and c form three edges of a parallelepiped all meeting at a common point, then the volume of this solid figure is given by | a (b x c) I Deduce that the vanishing of the triple scalar product implies that the vectors a, b, and c are co-planar (that is, all lie in a common plane). 4-86 Determine the vector products a x (b x c) given that : (a) a = 2i - j - 3k, b = 3i + j + k, c = -i + j + k; (b) a = -i + j - k, b = 2i - 2j + 2k, c = i + k. 4-87 Prove that (a x b) x c = (a . c)b - (b . c)a. Section 4-8 4-88 Find the vector equation of the line through the point with position vector 2i- j- 3k wh.ch , s parallel to the vector i + j + k. Determine the points corresponding to X = - 3, 0, 2 in the resulting equation. P 489 v^ctortT-ll ? i at 'T ° f ^k 116 throUgh the P° intS A and B with Potion cSnS of rnis'lmV " " "* " = "' + j + * ° et ™ the *«*<» 4-90 The equations 3 *+ 3 -2y + 1 _ 2z + 6 2 7 ~ SSefof rhe £? ^ **"" ta ' m ™** f ° m and find *e direction 4-91 If the points A and B have position vectors a and b, and point C divides the line AB in the rat.o X : M , show that C has the position vector ,ua + Ab A+ fl - provided A + /t ^ o. 4M S£in he V6 f ° r eqUati ,° n ° f thC Hne that P asses throu g h th « point A with rhere b n : e r: r 2j T 7k 2, an"d J c + = k _Tti ^ " ^ ** ^ " "^ 4-93 Find ^he perpendicular^ distance of the point 2i + j + k from the „„e 4-94 Find the perpendicular distance of the point i + 3j + 2k from the line 2x - 1 y + 2 z- 1 2 3 -~r~ 4 ' 95 a^no'rmairf+jTk 11011 ° f "" *"" ^^ «* ^ * " J + 2k 4-96 Find the Cartesian equation of the plane containing the point 3i - k and alw contauung the two vectors a, £ where a = i S + 2j I k and B = -1 176 / COMPLEX NUMBERS AND VECTORS CH 4 4-97 Find the angle between the two planes jc— l_y + 2_z— 3 2 r~ ~ 3 and 2x-l _ y-l _ 2z + 1 2 ~ 3 ~ 3 4-98 Find the angle between the plane z = 2 and the plane x + 2 y - 1 z+ 1 4-99 Let H be the plane r . n = p, where fi is the unit normal to II and p is its perpendicular distance from the origin. By constructing a plane 11' parallel to 11 through point P with position vector a, show that the perpendicular distance of P from 1 1 is given by the expression | o . n — p \ . What form would this expression take if the plane was expressed in the form r . n = q, where | n | # 1. 4100 A line may be uniquely determined as the intersection of two planes r . in = pi and r . 112 = pz (A) where ni and no are not necessarily unit vectors. The direction of the line is normal to both ni and nz and so is parallel to ni x 112. Hence the line has the equation r = a + A(m x n2) where A is a parameter and a is some point common to the two planes in (A). Apply these arguments to obtain the vector equation of the line determined by the planes x + 2v — z = 3 and 2x + y + 2z = 1. 4-101 Find the Cartesian equation of the sphere of radius 3 about the centre a = 2i + 3j + k. 4-102 Construct the Cartesian equation of the sphere of radius 4 that lies on the side z > of the plane z = and is tangent to the point (3, 1, 0). 4103 The inward drawn normal to a sphere of radius 2 at the point (1, 1, 2) on its surface is 11 = 2i — j + k. Deduce its equation in Cartesian form. Section 4-9 4-104 Forces Fi, F2, F3, and F4 have magnitudes 2\ 6, 3V5, 3, and 15 lb and act concurrently through a point O along the lines of the vectors — i + 2j — k, 2i + k, 2j, and 4i + 3j, respectively. Find the resultant of these forces and determine its magnitude in lb. 4- 105 Forces 1, 2, and 3 act at one corner of a cube along the diagonals of the faces meeting at that corner. Find the magnitude of their resultant, and its inclina- tion to the edges of the cube. 4106 A sphere of 10-in radius and mass 20 lb has one end of a string 18 in long attached to its surface and hangs at rest against a smooth vertical wall to which the other end of the string is attached. The string has a tension Tand the wall exerts a normal reaction R at its point of contact with the sphere. Use a vector triangle of forces to determine T and R. PROBLEMS / 177 4107 Deduce that for three concurrent forces Fi, F2, and F3 to be in equilibrium they must form a closed vector triangle of forces, and hence be coplanar. Use your result to prove Lami's theorem, which asserts that when three con- current forces are in equilibrium, the magnitude of each force is proportional to the sine of the angle opposite to it in the vector triangle of forces. 4-108 Find the centre of mass of the masses 1, 3, 4, and 2 lb situated at points with the respective position vectors 3i — j + k, 2i + 2j + 2k, — i + 7j — k, and 4i - 10k. 4- 109 Prove that the centre of mass of a system of masses is independent of the choice of origin. (Hint : Choose a new origin O' with position vector b relative to the original origin O and apply the definition of centre of mass.) 4-110 The velocity of a boat relative to the water is represented by 4i + 3j, and that of the water relative to the earth by 2i — j. What is the velocity of the boat relative to the earth if i and j represent velocities of 1 mile/h to the east and north, respectively? 4111 The point of application of the force 9i + 6j + 7k moves a distance 5 ft in the direction of the vector 3i + j + 4k. If the modulus of the force vector is equal to the magnitude of the force in lb, find the work done. 4- 112 A body spins about a line through the origin parallel to the vector 2i — j + k at 1 5 rad/s. Find the angular velocity vector Si for the body and find the instantaneous linear velocity of a point in the body with position vector i + 2j + 3k. 4- 113 Find the torque of a force represented by 3i + 6j + k about point O given that it acts through the point with position vector — i + j + 2k relative to O. 4-114 Masses 1, 3, and 2 units at the points specified by the position vectors 3i — k, 2i — 3j + k, and i + j 4- k relative to point O have velocities represented by 2j + k, 3i + j + 2k, and i — j + k, respectively. Determine the vector sum of the moments of momentum of each of these masses about O. Differentiation of functions of one or more real variables 5-1 The derivative The important branch of mathematics known as the calculus is concerned with two basic operations called differentiation and integration. These operations are related and both rely for their definition on the use of limits. The calculus was founded jointly, and independently, by Newton in England, and by his contemporary Leibnitz in Germany to whom we owe the essentials of our present day notation. In introducing the ideas underlying a derivative we shall make use of a simple dynamical problem in very much the same way that Newton did when first formulating his early ideas on differentiation. However we have the advantage of understanding the nature of a limit more clearly than was the case in his day, so that after presenting our heuristic argument, we shall quickly formalize it in terms of the ideas set down in Chapter 3. We shall consider how to define and determine the instantaneous speed of a point P moving in a non-uniform manner along a straight line. To be precise, we shall suppose that a fixed point O on the line has been selected, and that the distance s of point P from O at time t is determined by the equation where f(t) is some suitable continuous function of t defined on some interval J . Thus we know the position of P at a general time t, and are required to use this information to define and find the speed of P at any given instant of time. When the motion of P is uniform, so that its displacement is proportional to the elapsed time, the familiar definition of speed as distance per unit time can be used. However if the motion is non-uniform we must consider the situation more carefully. We shall use intuition here and first consider the difference quotient f(t 2 ) -f(h) H — t\ in which t\ and t% are two different times belonging to J . It seems reasonable to suppose that if H were to be taken sufficiently close to t\ then expression (5-1), which is the quotient of the finite distance travelled and the elapsed time, would in some sense provide a measure of the SEC 5-1 THE DERIVATIVE / 179 average speed of P in the small time interval H — t\. Even better would be the idea that we compute the difference quotient (5-1) not for one time t% close to t\, but for a monotonic sequence {n} of times having for its limit the time ti which is not a member of the sequence. This last condition is necessary because Eqn (5-1) is not defined if H = t\. Then if the sequence of difference quotients corresponding to Eqn (5-1) has a limit we propose to call the value of this limit the instantaneous speed u(t{) of P at time t\. Expressed in the symbolic form of Chapter 3 we may write [fin) -/(/i)l u{t\) = lim _ Ti — t\ (5-2) This definition is obviously consistent with the case of the uniform motion of P, for then every difference quotient involved in the determination of the limit (5-2) would give the same constant value u, say. We will call this value u the constant speed of P. As the function /(r) is continuous it is clearly desirable that we define not in terms of the discrete variable t< but in terms of a continuous variable t. Fortunately we can do this easily, for the conditions of the connecting Theorem 3-6 are satisfied and allow us to rewrite Eqn (5-2) thus: « t) =^\m=m\ (5 . 3) +t L r-t We have now dropped the suffix 1 since t\ was not specific and represented any value of the time t belonging to J. It should be appreciated that the limit u(t) in Eqn (5-3) is a number and not a ratio of quantities as were the members of the sequence used to define the limit. The instantaneous speed u(t) can be interpreted as the distance through which P would move in unit time if, during that time, it were to move at a constant speed equal to the value u(t). Because Eqn (5-3) is consistent with the notion of a constant speed, it is customary to omit the adjective 'instantaneous' and to speak only of the speed of P. The limit involved in Eqn (5-3) is of the indeterminate type and it will be our object to devise techniques for evaluating such limits for a wide class of functions /(?). In trivial cases these may be determined by simple algebraic considerations as this example shows. Example 5-1 Suppose that the distance of a point P from a fixed origin at time t is determined by the equation /0) = let 3 , where A: is a constant with dimensions (Length)(Time)" 3 . Find the functional form of the speed u(t) at time t, and determine its value when t = 4. Solution We are here required to evaluate the limit ~k(T 3 - t 3 )' u(t) = lim T— « L T — t 180 / DIFFERENTIATION OF FUNCTIONS CH 5 which is the form assumed by Eqn (5-3) when/(?) = kt 3 . Using the identity t 3 - t 3 = (t - t)(r 2 + rt + f 2 ) we may write u(t) = lim k(r- ?)( t 2 + t? + /2) - (t - = limA:(T 2 + rt + t 2 ) = 3/fcr 2 . Thus the functional form of the speed is u(t) = 3kt 2 , so that at / = 4 the speed has the value w(4) = 48/c. It is often helpful to check the form of a result by means of dimensional analysis. This is achieved by representing the fundamental quantities of mass, length, and time occurring in expressions and equations by the symbols M, L, and T, and ignoring any purely numerical multipliers that may be involved. The equations then become identities between expressions of the form U>M r T s , where p, r, and s are real numbers. Quantities other than length, mass, and time are represented as suitable combinations of these fundamental quantities. Thus speed and acceleration would be written LT' 1 and LT~ 2 , respectively, with no account being taken of their magnitudes. We illustrate this approach with Example 5-1. By supposition k has dimensions LT~ 3 , so that from the form of the solution we see that u(t) must have the dimensions kT 2 = (LT- 3 )T 2 = LT' 1 , which are the dimensions of speed, as required. A Distance /w _ *r ,y M 1* ,'l T-» IBliiltilili Illlllilllll / r Time t T Fig. 51 Speed interpreted as a derivative. There is a valuable graphical interpretation of the limit (5-3) shown in Fig. 5T which is the graph of a function f(t) together with the chord PQ, SEC 5-1 THE DERIVATIVE / 181 where P is the point (?,/(/)) and Q the point (t,/(t)). The difference quotient within the brackets of Eqn (5-3) before the limit is taken is the tangent of the angle QPR. In the limit asr-*-(, so the point Q approaches the point P and the chord PQ approaches the tangent PS to the curve y —fit) at P. The value u(t) arrived at by considering the limit of the difference quotient (5-3) is thus the tangent of the angle SPR and so is equal to the gradient or slope of the curve y = fit) at P. The number uQi) evaluated at any specific time t = t\ is the derivative of /(f) with respect to t at t = t\. The limit u(t) as a function of t is simply called the derivative of fit) with respect to / and the operation of computing the derivative of a function is called differentiation. A function that possesses a derivative at each point of an interval is said to be differ entiable in that interval. Hence in Example 5-1, the derivative of kt 3 with respect to t at t = 4 is 48A:, whereas the derivative of kfi with respect to t is the function 3k t 2 . The function kt 3 is obviously differentiable in any finite interval. This heuristic approach has served to introduce the limiting arguments underlying the concept of a derivative, and we must now carefully reformulate these arguments and express them in general terms. We shall use the following key definitions. definition 5T A function /(x) of the real variable x will be said to be differentiable at xo if, and only if, f(x)-f(x ) lim x — Xq exists and is independent of the side from which x approaches xo. More generally, fix) will be said to be differentiable in an interval J if it is differen- tiable at each point of*/". At any points of -J for which the limit is not defined the function /(x) will be said to be non-differ entiable. definition 5-2 If f(x) is a differentiable function of the real variable x at Xo, then the value of the expression x->xq X *— Xq df will be denoted by/'(xo) or -^ , and we shall say that it is the derivative x = x of fix) at x = xo. If further we define y by the equation y = /(x), then we dy can also write the derivative of fix) at xo in the form — °- x x=xa These definitions merely express in a more sophisticated way, what is usually put as follows. Let y =f(x). Then if dy is the increment in y occasioned by an increment 182 / DIFFERENTIATION OF FUNCTIONS CH 5 dx in x, we have y + dy =f(x + dx) and hence dy = f(x + dx)-f(x) dx dx Thus at x = xo, dy _ f(x + dx) —f(x ) dx dx and so dy dx ,. f(x + dx) -/(xo) = lim x = x fa->o dx To obtain the formulation of Definition 5-2 above, first write h in place of dx to obtain dy dx = lim /(XQ + h) -/(xo) = x h-*o h and then write x in place of xo + h, so that h = x — xo. What does the requirement, that lim{[/(x) — /(xo)]/(x — xo)} should X-*Xo exist, actually mean? It is this. There is a number /'(xo) such that the left- and right-hand limits of the function <p(x) = [f(x) — /(xo)]/(x — xo) as x approaches xo exist and are both equal to/'(xo). The function q>(x) itself is defined near but not at x = xo but has the property that lim <p(x) = /'(xo). x-*x We shall use this idea together with Theorem 3-4 when we discuss the general properties of derivatives of combinations of functions. If in Definition 5-2 we write xo + h in place of x, and replace xo by x in the subsequent result, we may formulate this definition. definition 5-3 If j' =/(x) is a differentiable function of the real variable x at all points of an interval J , then the derivative of/(x) in J is the function denoted either by f'(x) or dyjdx and defined by W . f* - lim ** + *>-**>. dx ;,_o " The operation of computing the derivative of a function is differentiation. Let. us now apply exactly the same arguments to Fig. 5-2 as were used in connection with the speed at a point of the particle trajectory in Fig. 5T. This time the graph represents any function y = f(x) satisfying the conditions of Definition 5-3. Then if P is any point in the interval within which /(x) is differentiable, and Q is an adjacent point, the chord PQ is, in some sense, an approximation to the tangent line to the curve PR at P. The limiting position SEC 5-1 THE DERIVATIVE / 183 Ax+h) Fig. 5-2 Derivative interpreted as a gradient. of the chord PQ will lie along the tangent line to the curve at P and in terms of angles we have lim 6 = a. However, f(x + h) -f(x) = tan i so that f( x + h)- f(x) hm : — = hm tan 6 a->o h whence, finally, fix) = tan a, or, equivalently, ay ■f- = tan a. ax h-*a (5-4) (5-5) This result shows that we may interpret the derivative of a differentiable function at a point as the gradient of the tangent line drawn to the curve at that point. It is implicit in the definition that the tangent line so defined should be independent of whether Q approaches P from the left or right. The geometrical interpretation of a derivative allows us to see quite clearly that in addition to the function needing to be continuous in the neighbourhood of a point at which it is required to be differentiable, it also 184 / DIFFERENTIATION OF FUNCTIONS CH 5 Fig. 5-3 Non-differentiable function at x = xi and x needs a special kind ot smoothness. Specifically, the left- and right-hand tangents to the curve at the point in question must be one and the same. Indeed, we could re-phrase our definition of differentiability in terms of the equality of the left- and right-hand tangents at a point on the curve, just as we did when dealing with continuity. Consider the function f = f(x) shown in Fig. 5-3 and defined on the interval [.\o, V3], but only continuous in the semi-open intervals [xo, X2) and (.Y2, .y 3 ] . Then, despite the fact that the function f(x) is continuous in [*o, xz) and („Y2, .Y3], it is only possible to assert that tangent lines in the sense implied by Definition 5-3 can be constructed for points in the open intervals (x , xi), (xi, .Y2), and (.y 2 , .Y3). No tangent line can be constructed at X2 because of the discontinuity; two tangent lines h and h can be constructed at point P according as A and B approach P from the left and the right; whilst only right- and left- hand tangents h and U can be constructed at the end points .Yo and .Y3 because the function /(.y) is not defined outside [xo, X3]. We shall now show how Definition 5-3 may be used to determine the derivative of a function and also to prove its non-differentiability at a certain point. Our example is a continuous function whose behaviour is clear at all points other than the origin, at which the existence, or otherwise, of a tangent line to the curve cannot be deduced by inspection of its graph. Example 5-2 Prove that the function / defined by f(x) = x sin (1/.y) for .v # and/(0) = is continuous in (— 00, 00) and sketch its graph. Find its derivative by use of Definition 5-3 and show that it is not differentiable at the origin. SEC 5-1 THE DERIVATIVE / 185 Fig. 5-4 The function y = x sin (1/x). Solution Only the behaviour of/in the vicinity of the origin is in doubt here. When x ^ o we may write /(x) = [sin (l/x)]/(l/x) showing that for large x, fix) behaves like lim (sin h)jh = 1. Conversely, as the origin is approached, so x -> and because sin (l/x) is bounded by ± 1 it follows that lim/O) = 0. The limit of the function/O) at the origin is thus equal to the functional value itself and so/0) is continuous at the origin. It is clearly continuous elsewhere since it is the product of two continuous functions. Hence it is everywhere continuous and Fig. 5-4 shows its graph, which is symmetric about the j-axis because /O) is an even function. We shall approach the differentiability question in two stages: first for x ^ 0, and then for x = 0. Assuming x # and making a direct application of Definition 5-3 we obtain fix) = lim O -+- h) sin ( I — \x + hj xsm which we re-express as {x + h)sm\ l -l\ +-Y 1 f'{x) = lim A-.0 — x sin - x Now for h close to zero we may use the binomial theorem together with our 'little oh' notation of Section 3-4, to write [1 + (A/x)]-i = 1 - (hjx) + o(h) as h ->- 0, and hence 186 / DIFFERENTIATION OF FUNCTIONS CH 5 /'(*) = Um (x + h) sin ;H + *>)- a: sin ■ Next we write the argument of the sine function as [(l/x) - (hlx*) + [o(h)]/x] and use the trigonometric expansion for the difference of two angles to obtain lim (x + h) " . 1 (h o(h)\ 1 . / h o{h)\ sin - cos — — cos - sin ( — x \x* x J x xx 2 - x J . — x sin - X h-*o L h Consider the behaviour of the terms comprising this quotient. If the first and last terms are taken together then in the limit as h -> they reduce to the single term sin (l/x). The remaining term in the centre is — (x + h) cos - x sin U 2 x ) and since x ^ is fixed, it follows from limit (3-9) that this reduces to 1 1 cos - X X as h ->■ 0. Combining these two results we find that the derivative /'(x) is f'(x) = sin cos - for x # 0. Thus we have used Definition 5-3 to compute the derivative, and as this is defined for all x ^ it follows that y = x sin (l/x) is differentiable for all such x. Finally we must examine the behaviour of the derivative at the origin using Definition 5-3. Setting x = 0we obtain h sin (I jh) -0 /'(0) = lim h = lim sin (1/fi). As sin {\jh) oscillates boundedly with ever increasing frequency when h — >■ 0, it follows that/'(0) is not defined. This establishes the non-differenti- ability of/(x) at the origin as was required. SEC 5-1 THE DERIVATIVE / 187 We close this section by deducing the derivatives of some important elementary functions, and stating them as theorems. theorem 5-1 The derivative of a constant function is zero. Proof Let k be any constant and consider the function/(x) where f(x) = k for all x. Then f(x + h)-f(x) k-k nc „ JK ' — J — = = for all x. h h Hence lim** + *>-**> esQ for all,. theorem 5-2 If n is any positive integer, then the real function y = x n is differentiable everywhere and has the derivative dj/dx = nx n_1 . If m is any negative integer, then the function y = x m is differentiable everywhere except at the origin and has the derivative dy/dx = mx m ~ x . Proof We must first consider the limit of the difference quotient [(x + h) n — x n ]jh. By the binomial theorem we have (x + h) n - x n x n + nxn-lfr + ^ 1 x n-2h2 + . . ■ + (") x n - r fl r +• • • + h n - X n 2! \rl = _ n(n — 1) „, In\ , , , , = nx™- 1 + - 5 — — - x n ~ 2 h +■■■ + [ I xn-rhr- 1 + ■ ■ • + A"" 1 . Now lim h = so lim h r = for 1 < r < n — 1 and so A->-0 h-*0 lim ( I a n ~ r h r - 1 = 0. Consequently, ,. (x + h) n - x n hm ^ = nx"- 1 . This is defined for all finite x including x = and so proves the first part of the theorem. Next let m = —n. Then (x + h) m — x m _ (x + h)~ n — x~ n _ (x n - (x + h) n h ~ h Now from our result above _ (x n - (x + h) n \ 1 _ \ h~ / x n (x + h) n ' 188 / DIFFERENTIATION OF FUNCTIONS CH 5 ,. x«-(x + h)* lim i- = -nx n - x h-+o n whilst lim (x + h) = x and so lim (x + h) n = x n . h-+0 k^O If x ^ 0, lim — = = a-o x n (x + h) n lim x n . lim (x + h) n x 2n h->0 h-+0 Thus ,. (x + h) m - x m ,1 lim : = — nx"" 1 . — - = — nx""" 1 = mx m ~ x . a— o h Hence we have proved that y-2n dy dx x=z ax = «xo w_1 for all xo if n is a positive integer, and for all non-zero xo if n is a negative integer. Later we shall prove this result for all real n. Henceforth we shall use the result freely, irrespective of the value of n. theorem 5-3 The functions sin ax and cos ax of the real variable x, where a is any real number, are differentiable everywhere and — (sin ax) = a cos ax dx dx (cos ax) = — a sin ax. Proof These results follow by applying Definition 5-3 and then using limits (3-9) and (3-10). Thus we have d . sin a(x + h) — sin ax — (sin ax) = lim dx ^0 h = lim "sin ax cos xh + cos ax sin ah — sin ax" = sin ax /cos ah — 1\ /sin cah\ lim + cos ax hm I — - — I a-o \ It 1 h^o\ h / = + a cos ax. As this function is defined for all finite x, the first part of the required result has been established. The remainder of the proof follows exactly similar lines, and so will be omitted. SEC 5-2 RULES OF DIFFERENTIATION / 189 Example 5-3 Find the derivatives of the following functions stating any point at which they are not differentiable. / n r/ n (3 for — oo < x < 1 ( a )/W = ( 2 forl<x<oo. (b) f( x ) = x 5 for all x. \x~ z for x ^= (c)/W = ( lforx = (d) f(x) = sin Ax. (e) f(x) = cos Ix. Solution (a) By virtue of Theorem 5-1, the function f(x) has a zero deriva- tive for all x except at the point x = 1 where it is not defined. (b) From Theorem 5-2 we have dy/dx = 5x 4 for all x. (c) From Theorem 5-2 we have dy/dx = — 3x~ 4 for x^O, and the derivative is not defined at x = 0. (d) and (e) From Theorem 5-3 we have — (sin Ax) = 4 cos Ax — (cos Ix) = —7 sin Ix for all x. dx dx By now it is obvious that Definition 5-3 is a working definition that can be used. However, some better method than its direct application is obviously needed to compute derivatives of complicated functions. This requirement will be systematically pursued in the next section. 5-2 Rules of differentiation The complicated functions that occur in mathematical and physical studies are invariably the result of forming sums, products, and quotients of simple algebraic and trigonometric functions. This suggests that our next task should comprise a general study of the operation of differentiation when applied to sums, products, and quotients of arbitrary differentiable functions. We will present our results in the form of basic theorems which must become thoroughly familiar to the reader. theorem 5-4 (differentiation of a sum) If f{x) and g(x) are real valued functions of x, differentiable at xo, and ki and &2 are constants, then the linear combination k\f{x) + k2g(x) is also differentiable at xq. Furthermore, ±( kl f( X ) + k2g(x)) = kif'(xo) + k 2 g'(x ). Proof Here we must apply Definition 5-3 to the linear combination kif(x) + k%g{x). We obtain i- (klf{x) + k 2 g(x)) dx 190 / DIFFERENTIATION OF FUNCTIONS CH 5 = ljm hfjxo + h) + k 2 g(x + h)- [kifjxo) + k 2 g(x )] i,^o h = kl lim /fa + ^)-/fa) + kt , im g(*o + *)-gfro) A-0 h ,,_ h = kif'(xo) + k 2 g'(xo). Iff and g are both differentiable in some common interval J, then the above argument when applied to each point of J yields the result 1 [*i/(*) + £ 2 g(x)] = kifix) + k2g'(x), where x is any point of J . The constants £i and k 2 are often absorbed into the functions /and g, when the result could be expressed 'the derivative of a sum of functions is equal to the sum of their derivatives'. The task of showing that this result is true for a linear combination of an arbitrary number of differentiable functions is left to the reader as an exercise involving proof by induction. Example 5-4 Let us use Theorem 5-4 to compute the derivative of f(x) = sin 2 x. Solution As it stands we cannot differentiate/^). However by a well known trigonometric identity we may transform f(x) to the form f(x) = i(l - cos 2x), when Theorem 5-4 becomes applicable. Then, using our earlier results concerning the differentiation of a constant and of cos ax we find that d d — (sin 2 x) = — {J(l - cos 2x)} ax dx d /1S d ,, = -7- (\) ~ t- (i cos 2x) ax ax d = — I — (cos 2x) ax = — \ . (—2) sin 2x = 2 sin x cos x. theorem 5-5 (differentiation of a product) If/(x) andg(x) are differenti- able real valued functions at xo, then so also is the product function/(x)g(x). Furthermore, £(/W*M> = f'(xo)g(xo) + f(x )g'(xo). x-*xa SEC 5-2 RULES OF DIFFERENTIATION / 191 Proof Again we consider a difference quotient but this time, for economy of expression, use the form of limit given in Definition 5-2. We have the identity /(*W-/W*o) s //(*)-/fa)\ ( + / g W-g(-Vo)\ X — Xo \ X — Xo I \ X — xo } Now we wish to show that lim/(.v) =f(xo). This would be true if fix) were .r--.ro continuous but we only know that it is differentiable and as yet do not know that this implies continuity. We shall prove that it does. As/(x) is differentiable at x — xo we must have f(x) -/pro) =/ (*o) + o(h) asx-> xo, x — Xo where h = x — xo- Hence fix) —fixo) = (x — xo)[f'ixo) + oQi)] as x ->- x . This implies that if x is taken sufficiently close to ,y then the difference fix) —fixo) can be made arbitrarily small. This is just our definition of continuity and so we have proved that differentiability of/(x) at xo implies its continuity at that point. Thus we are permitted to write lim/(x) =/Oo) x^xo and, similarly, \im gix) = gix ). Z-KTO Now l ifx)-fixo) \ ( gjx) - gjx ) \ so, finally, taking the limit of (I) asi-> xo, we obtain the result = f'(.xo)g(x ) +fix )g'ix ). ±(f(x)g(*)) Again, if / and g are both differentiable in some common interval J then, as before, we obtain the more general result £ ifix)gix)) =f'ix)gix) +fix)g'ix) for xe/. As an incidental detail of this proof we have shown that differentiability at a point implies continuity. This result is worth stating formally. 192 / DIFFERENTIATION OF FUNCTIONS CH 5 theorem 5-6 If a real valued function /(x) is differentiate at the point .Yo, then it is also continuous there. The converse result is not true. Proof It only remains to prove that the converse result is not true: namely, that continuity does not imply differentiability. This has already been seen in connection with Fig. 5-3, but let us give a specific example. Our final assertion in Theorem 5-6 will be valid even if we can produce only one example of a function that is continuous at a point but is not differentiable there. Such an example used to prove the falsity of an assertion is a counter- example, and in this case we choose the function /(x) = |x|. This is known to be continuous at x = 0, but the derivative as defined in Definition 5-3 is not denned at the origin so the function is not differentiable at that point. Example 5-5 Differentiate the function f(x) = sin 2 x and compute/' (577). Solution We express the function as a product and use Theorem 5-5. d d — (sin 2 x) = — (sin x . sin x) ax ax — (sin x) ax sin x + sinx dx (sin x) = 2 sin x -(sinx) = 2 sin x cos x. As would be expected, this verifies the result of Example 5-4. Finally, using this expression we compute dx (sin 2 x) X= Jtt . . 77 77 = 2 sin- cos- = 1. 4 4 Our next theorem is important and concerns the rule for differentiating a composite function or, more simply, the rule for the differentiation of a function of a function. theorem 5-7 (differentiation of composite functions) If g(x) is a real valued differentiable function at x = xo and/(w) is a real valued differentiable function at u = g(xo), then/[g(x)] is differentiable at x = xo. Furthermore, d " {/[*(*)]} = flg(xo)].g'(xo)- Proof We have the obvious result f[g(x)]-f[g(xo)] f[g(x)] -f[g(xo)] g(x) - g(xo) X — xo g(x) - g( x o) X — Xo SEC 5-2 RULES OF DIFFERENTIATION / 193 (A) Since g(x) is differentiable at x it is continuous there, and so g(x) -> g(xo) as x -* x . So, writing g(x) = w, g(x Q ) = a we have /[*(*)] -/fcfro)] _ /(»)-/(«) g(x)-g(x ) X — Xo K — (2 X — Xo Now for ease of argument we shall assume the behaviour of g(x) to be strictly monotonic in some neighbourhood of jc , so that g(x) = g(x ) only when x = x . In these circumstances the difference quotients on the right-hand side of (A) are well defined as x ->■ x so that we may take limits and obtain dx {/[*(*)]} = lim X = XQ %~>xo = lim f[g(*)} -/fc(*o)] X — Xo ~ m -m . u — a . lim X-*Xo 'g(x) ~ g(*o) " X — Xo = /'(«)• g'(*o) = /fe(*o)].£'(*o). (B) It is not difficult to show that the theorem is still true when g(x) is not monotonic in some neighbourhood of x and an infinite sequence of points {x t } exist with limit point x at all of which g(x t ) = g(x ). All that is necessary here is to observe that if x ->- xo through the suc- cessive values xt of this sequence, then g(x t ) — g(x ) = and so g(Xi) - g{x ) = for every i. Xi — Xo Hence, by Theorem 3-6, it follows that Tx { ^ 0. However, by the same argument, flgixt)] -flgjxo)] Xi — Xo showing that for every /, rx (A S (m = o, and so result (B) is also valid in this case. If (B) is true at each point of some interval J, then we have the general result ^{f[g(m=fig(x)].g\x). 194 / DIFFERENTIATION OF FUNCTIONS CH 5 When the substitution u = g(x) is made, this result can be written: d df du In this form the theorem is known as the chain rule for differentiation, and it is this result that is most often found in textbooks. By repeated applica- tion, the chain rule readily extends to enable the differentiation of more complicated composite functions such as the triple composite function f{g[h(x)]}, always provided the functions/, g, and h have suitable differenti- ability properties. In this case, setting v = h(x) and u = g(v) result (5-6) takes the form d r ,. ,, df du dv t- / («) =j--t-t- (5-7) dx du dv dx Further extensions of the same kind are obviously possible and are left to the reader. Example 5-6 Differentiate the following functions and find the values of their derivatives at x = 1 : (a) sin(x2 + 3); (b) (jcs + x + l)i/3; (c) sin V(l + x 2 ). Solution (a) Set u = x 2 + 3 so that d d . — [sin (x 2 + 3)1 = — (sin u). dx dx From the chain rule : d d du — [sin (x 2 + 3)] = — (sin u) . — • dx du dx Now (d/dw)(sin u) = cos u, du/dx = 2x so that d — [sin (x 2 + 3)1 = (cos u) . 2x dx = 2x cos (x 2 + 3). nee at x = 1, — [sin (x 2 + 3)] dx = 2 cos 4. x = l (b) This time set u = x 3 + x + 1, SEC 5-2 RULES OF DIFFERENTIATION / 195 dx dx From the chain rule: -£- [(*» + x + l)i/3] = d (M i/ 3) . p. dx du dx Hence as (d/dt/)(« 1/3 ) = §h~ 2/3 , dw/dx = 3x 2 + 1 we obtain — [(X 3 + X + 1)1/3] = (l M -2/3) . ( 3x 2 + !) dx Thus when x = 1 , d dx [(x3 + X + 1)1/3] - J_ ~ W 3 ' (c) We must use the extension of the chain rule given in Eqn (5-7). Set v = 1 + x 2 when sin -\/(l + x 2 ) = sin yjv, and u = \/v when sin y/(l + x 2 ) = sin u. Then _d dx [sin VO + x 2 )] = — (sin u) ax However, dv dx = 2x and r d , ■ j du dv "" "" — (sin m) _d« dv dx du dv = cos u — • dv dx du dv 1 2^(1 + x 2 ) so that, combining all the results, d r • //i , ?Yi x cos V(l + * 2 ) — [sin V(l + * 2 ) = tt;— — « dx v(l + x ) Whence at x = 1, - [sin V(l + x 2 )] cos \/2 V2 theorem 5-8 (differentiation of a quotient) If /(x) and g(x) are real 196 / DIFFERENTIATION OF FUNCTIONS CH 5 valued differentiable functions at x and g(x ) ^ 0, then the quotient f(x)/g(x) is differentiable at x . Furthermore dx \_g(x)_ X = XQ g(xo)f'(x ) - g'(xo)f(xo) lg(xo)] 2 Proof If we consider the quotient f(x)/g(x) to be the product of the two functions /(x) and \/g(x), we have by Theorem 5-5 d* lg(x)_ x = x Wy f0c)+Ax) t lg(x\ Now we must compute (d/dx)(l/g). We set g(x) = u when, from the chain rule, d dx 1 Six). x=xo d_ dx T u. 1 du w 2 dx -g'(x ) x = x x^xo lg(xo)] 2 Hence, combining our results, we obtain the desired result dx \_g(x\ g(xo)f'(xo) - g'(x )f(x ) lg(xo)] 2 As in the other cases the general result follows when the conditions of the theorem are satisfied throughout 'some interval J '. It has the obvious form _d dx fix) lg(x). g(x)f'(x) - g'(x)f(x) [g(xW Example 5-7 Differentiate (3x + l)/(x 2 — 2) and determine the values of x for which the derivative is not defined. Solution Set f(x) = 3x + 1 and g(x) = x 2 - 2. Then f'{x) = 3 and g'(x) = 2x for all x, whilst g(x) = for x = ±\/2. Hence applying Theorem 5-8 we have d '3x + r x 2 - 2. (x 2 - 2) . 3 - (2x)(3x + 1) dx (x 2 - 2)2 "3x 2 + 2x + 6' (x 2 - 2)2 SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 197 provided x ^ ± \/2. To complete this section, Table 5- 1 summarizes the results of differentiating the trigonometric functions. Unfamiliar results may be deduced by directly applying Theorem 5-8 to the definitions of the functions concerned. Table 51 Derivatives of trigonometric functions — (sin x) = cos x — (cos x) = — sin x — (tan x) = sec 2 x d* Ax ax — (cosec x) = — cosec x cot x — (sec x) = sec x tan x — (cot x) = —cosec 2 x ax ax ax 5-3 Some important consequences of differentiability We preface this section by proving a result that belongs more properly to Chapter 3 since it depends for its validity only on the property of continuity. Our sole reason for discussing it here is to present it in the context in which it will first be used. It is usually known by the name of the intermediate value theorem and we shall now show that the idea underlying it is extremely simple. Consider the situation in which a recording thermometer attached to some piece of equipment records its temperature at pre-assigned times. Suppose, for instance, that at times ri and H the temperatures recorded were 7i and T%, respectively. Then although there is no record of the variation of the temperature T(t) at times t between ti and ?2, it may be safely inferred that the temperature will pass at least once through each intermediate value between 7\ and Ti. It is quite possible for the temperature to assume values that do not lie between T\ and Ti, but no assertion can be made about such an event. The situation is illustrated in Fig. 5-5 where T* is a typical tempera- ture intermediate between T\ and T%, and the dotted and solid lines represent two possible temperature variations with time. This physical situation is an example of the operation of the intermediate value theorem in everyday life, and we are able to make our assertion because we know from experience that however rapidly a temperature may change, it can never undergo an abrupt jump. In mathematical terms we are saying that temperature change must be a continuous process. Expressed like this the result seems obvious, but how may we prove it ? Our simple proof relies on the postulate of Section 3-2, which asserts that every bounded monotonic sequence tends to a limit, but first we state the formal result. theorem 5-9 (intermediate value theorem) Let the real valued function f(x) be continuous on the closed interval [a, b] and such that /(a) =£f(b). Then if y* is any number intermediate between f(a) and f(b), there exists a number x* between a and b such that y* = /(**). 198 / DIFFERENTIATION OF FUNCTIONS CH 5 Proof Although a diagram is not essential for this proof, the representative situation shown in Fig. 5-6 will be of help. First set x 1 = \{a + b), then if/CxJ = y* the result is proved. If not consider the intervals (a, xi), (x h b). Then in one of these two intervals, y* will lie between the functional values occurring at either end of the interval. Call this interval h and let it be represented by the open interval (ai, bi). Thus in Fig. 5-6, h is the right-hand interval and so in that case a\ = \{a + b), bi = b. Next set x 2 = h(a^ + b x ). If f(x 2 ) = y* the result is proved. If not con- sider the intervals (a\, X2), (x2, bi). Then in one of these two intervals, y* will lie between the functional values occurring at either end of the interval. Call this interval h and let it be represented by the open interval (a^, b%). in Fig. 5-6 the interval h is the left-hand sub-interval of h, so that a% = a\, bz = i(ai + bi). We either prove the result directly for some x n or we define an infinite sequence of open intervals h => h => h => . ■ ■■ Because each interval is contained by all its predecessors it then follows that the sequence of numbers fli, a%, fl3, • . . is monotonic increasing and bounded above whilst the sequence of numbers b\, b%, b%, . . . is monotonic decreasing and bounded below. Hence by the postulate of Section 3-2, the sequences {at} and {bi} both tend to a limit. That they both tend to the same limit follows from the fact that the length of the nth interval /„ is (b — d)\2 n , which tends to zero as n ->oo. Letting the common value of these two limits be denoted by x* Temp. / / T / / Bi 1/ f 1 2 / / i / T* j f vH H* f 1 •JBB t 1 t 1 i 1 t / r, \ / / i / k . ' 2 Time t r Fig. 5-5 Physical illustration of intermediate value theorem. SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 199 we have lim \f(a n ) — f(x*)\ =0, thereby showing the existence of the n— »-oo required number x*. The following is an obvious consequence of the intermediate value theorem : Corollary 5-9 Every function that is continuous in a closed interval attains both its greatest and least values at points of that interval. These values may occur at the end points of the interval. Fig. 5-6 Intermediate value theorem. 5-3 (a) Maxima and minima One of the most familiar and useful applications of differentiation is to the problem of determining those points in some interval [a, b] at which a function /(x) assumes its maximum and minimum values. Collectively these values are known as the extrema of the function/(x) on the interval [a, b] and they are of various types as this definition indicates. definition 5-4 (extrema) Let/(x) be a continuous function defined on the interval [a, b] so that it attains its greatest and least values at points of that interval. Then we say that the point x belonging to [a, b] is: (a) an absolute maximum if/(x ) >f(x) for all points x in [a, b] ; (b) an absolute minimum if/(x ) </(x) for all points x in [a, b] ; (c) a relative maximum if/Oo + h) — /(*o) < for \h\ sufficiently small; 200 / DIFFERENTIATION OF FUNCTIONS CH 5 (d) a relative minimum if/(xo + h) —f(xo) > for \h\ sufficiently small. No assumption of differentiability has been made when formulating this definition so that in Fig. 5-7, point P is an absolute maximum and both points R and T are relative maxima. Point Q is an absolute minimum and point S a relative minimum. Although the functional value at U lies inter- mediate between those at Q and S, it is not a relative minimum in the sense of the definition, because it lies at the end of the domain of definition [a, b] so that only the one-sided behaviour of the function is known there with respect to h. Fig. 5-7 Extrema of a function on [a, b]. If now, in addition to continuity, we also require of/(x) that it be differen- tiable at the point xo occurring in Definition 5-4, we can easily devise a simple test to identify' the points where extrema must occur. Consider point P in Fig. 5-7 as representative of a maximum at which the function is differentiable. The fact that P happens to be an absolute maximum is immaterial for the subsequent argument. By supposition, if/ is differentiable at P, the expression f'(xo) = lim 'fix) X — Xo Axon must be independent of the manner of approach of x to xo. Now for maxima of types (a) and (c) we have/(x) — f(xo) < 0, and hence it follows that when x < xo, f'(xo) is the limit of an essentially positive function ; whereas when x > xo, f'( x o) is the limit of an essentially negative function. Clearly this is only possible if f'( x o) — 0. We have thus proved that if/ is differentiable at xo, then a necessary condition that/should have a maximum at xo is/'(xo) = 0. SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 201 Similar reasoning establishes that the condition f'(x ) = is also a neces- sary condition for the differentiable function /to have a minimum at xo- To show that the vanishing of the derivative /' at a point is not a sufficient condition for that point to be an extremum, we appeal to a counter-example. The function /= x 3 has a continuous derivative/' = 3x 2 which vanishes at the origin. Nevertheless, / is negative for x < and /is positive for x > 0, thereby showing that despite the vanishing of the derivative, neither a maximum nor a minimum of the function can occur at the origin. Later we shall identify behaviour of this nature as typical of a point of inflection with a horizontal tangent. Generally speaking, a point of inflection is a demarcation point on the graph of a differentiable function separating a region of con- vexity from a region of concavity. Collectively the points at which the deriva- tive vanishes, regardless of whether or not they are maxima, minima, or points of inflection are called critical points or stationary points of the function. Combining the previous results, and recalling that the condition that/be differentiable at xo precludes behaviour of the type encountered at point T in Fig. 5-7, we are able to formulate the following general result. theorem 5T0 Let/ be a real valued differentiable function on some interval [a, b]. Then the stationary points of/ are the numbers £ for which fW = o. Once the stationary points of a function have been determined it is necessary to examine the functional behaviour in the vicinity of each one in order to determine the nature of the point involved. An absolute maximum is identified from amongst the relative maxima by direct comparison of the functional values at the stationary points in question. A similar process identifies an absolute minimum. Example 5-8 Without appealing to graphical ideas, find the location and nature of the extrema of the following two functions and determine if they are differentiable at these points : (a) f(x) = 1*3 + 2;C 2 + 3jc + 1 ; (b) f(x) = (2x - 5)x 2/3 . Solution (a) The stationary points are determined by finding those values x = | for which the derivative/' vanishes. Now/' = x 2 + 4x + 3 and so the desired stationary points are given by the roots of the equation f 2 + 4£ + 3 = 0. These roots are f = — 1 and f = — 3, and the functional values at the respective points are/(— 1) = —J and/(— 3) = 1. As the derivative/' is the sum of continuous functions it is everywhere continuous, so that no cusp-like behaviour with associated extrema as typified by point T in Fig. 5-7 can arise. 202 / DIFFERENTIATION OF FUNCTIONS CH 5 So the two points £ = — 1 and f = —3 are the only ones at which stationary values can occur. An examination of the behaviour of the function near these points will determine if these stationary values correspond to maxima, minima, or points of inflection. A sketch graph would quickly show that in fact f = — 3 corresponds to a local maximum and f = — I to a local minimum, but we are specifically required to establish these results by analytical means. How then can we do this? The solution lies in a direct application of Definition 5-4, and we illustrate the argument by considering the stationary point f = —1. To find the behaviour of f close to f = — 1 we shall set x = — 1 + h, where h is small, and substitute in/(.Y) to obtain /(_1 + /,) = i(_i + hf + 2(-l + A)2 + 3(-l + h) + 1, whence, /(-l+A)=-* + A 2 + y- Now/(— 1) = —J so that we may also write this result in the form f(-i+h)-f(-i)=h^i+^y Clearly for \h\ small, the right-hand side is essentially positive, and so we have succeeded in showing that close to f = — 1 , f(S + h)-f(i)>0, and so by Definition 5-4 (d) the stationary point f = — 1, at which /(f) = —J, is seen to be a local minimum. An exactly similar argument will establish that the stationary point f = —3, at which /(f) = 1, is a local maximum. These are only local extrema because it is possible to find values of x for which/> 1 and/< — tj. Solution (b) This case is more complicated. We have df 20-x - 5) d* 3x 1/3 showing that the stationary points of/ are determined by the roots of the equation 2(2f - 5) = 2f 2/3 + 3| 1/3 This has the single root | = 1 at which /(l) = —3, showing that the function has only one stationary point. To determine the nature of this point let us set x = I + h, where \h\ is small, and substitute into/(x) to find SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 203 /(l + h) = (2A - 3)(1 + h)*\ Next we expand the factor (1 + h) 2 ' 3 by the binomial theorem as far as terms involving h 2 to obtain /(l +h) = (2A - 3)(1 + %h - #* + 003)) or, /(l + A) = -3 + W + 0(A 3 ). Using the fact that/0) = — 3 this becomes /(l + h) -/(l) = f/*2 + 0(A 3 ) showing that close to £ = 1, /(£ + A) —/(|) > 0. Hence by Definition 5-4 (d), the stationary point f = 1 is seen to correspond to a local minimum. Again, it is only a local minimum because for large negative x we have /<-3. We now observe that/' is defined for all x other than for x = 0, at which point /(0) = 0. The behaviour of the function in the vicinity of the origin needs examination since, as it is not differentiable there, Theorem 5-10 can provide no information about that point. Set x = h, where h is small, and substitute in /to get f(h) = (2h - 5)h 2 ' 3 . Now/(0) = 0, so that we may rewrite this as f(h) -/(0) = (2h - 5)h**, thereby showing that as the right-hand side is essentially negative for suitably small h, close to f = we have/(£ + h) — /(£) < 0. From Definition 5-4 (c) we now see that the origin is a local maximum, despite the fact that /is not differentiable at that point. It is only a local maximum because for large positive x we have/>/(0). For reference purposes the function is shown in Fig. 5-8. The method of classification of stationary points that we have just illus- trated is always applicable, though it provides more information than is often required. This is so because not only does it discriminate between maxima and minima, but it also provides the approximate behaviour of the function close to the point in question. We shall return to this problem later to provide much simpler criteria by which the nature of stationary points may be identified. 5-3 (b) Rolle's theorem One form of Rolle's theorem may be stated as follows. theorem 5-11 Let /be a real valued function that is continuous on the closed interval [a, b] and differentiable at all points of the open interval 204 / DIFFERENTIATION OF FUNCTIONS CH 5 Fig. 5-8 y = (2x ■ (a, b). Then if f{a) =f(b) there is at least one point x = f interior to (a, b) at which /'(I) = 0. Proof We know from Corollary 5-9 that a continuous function/^) defined on the closed interval [a, b] must attain its maximum value M and its mini- mum value m at points of [a, b]. Then if m = M on [a, b], the function f(x) = constant, and since the derivative of a constant is zero, the point x = f at which /'(I) = may be taken anywhere within the interval. If f(x) is not a constant function then m ^ M, and as f(a) =f(b) it follows that at least one of the numbers m, M must differ from the value f(a). We shall suppose that M ^f(a). Then clearly the value M must be attained at some point .v = f interior to (a, b). As/is assumed to be differen- tiable in (a, b) it follows that Theorem 5-10 must be applicable showing that f'(i) = 0. A similar argument applies if m ^f(a). Geometrically this theorem simply asserts that the graph of any function satisfying the conditions of the theorem must have at least one point in the interval [a, b] at which the tangent to the curve is horizontal. If/ is not differentiable at even one interior point of (a, b) then Rolle's theorem cannot be applied. Our counter-example in this instance is the simple function f(x) — \x\ with — 1 < x< 1. This function is everywhere continuous, and is differentiable at all points other than at the origin, but there is certainly no point x — $ on [— 1, 1] at which/' = 0. The graph of this function is shown in Fig. 5-9, with one of a function g(x) not satisfying SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 205 ills. i \ i t illitlll 1 A y ■l ■s y -1 g(a)=g(b) (a) (b) Fig. 5-9 Counter examples for Rolle's theorem : (a) Rolle's theorem does not apply — no point f for which /"(f) = ; (b) g '(I) = 0, but Rolle's theorem does not apply. the conditions of the theorem but for which the result happens to be true. 5-3 (c) Mean value theorems for derivatives Our most important application of Rolle's. theorem will be in the proof of the mean value theorem for derivatives. In a first account of the subject it is difficult to indicate just how valuable and powerful this deceptively simple theorem really is as an analytical tool. However something of its utility will, perhaps, be appreciated after studying the remainder of this chapter. First let us present an intuitive approach to the theorem. Consider Fig. 510 which represents a graph of a differentiable function f(x) on the open interval (a, b). Then as P and S are the points (a,f(a)) and (b,f(b)), the gradient m of the line PS is f(b)-f(a) m = — - b — a Now we may identify points Q and R, with respective jc-coordinates f and rj interior to (a, b), at which the tangent lines /i and h to the graph are parallel to PS, and so must also have the same gradient m. Then because of the geometrical interpretation of the derivative/' as the gradient of the tangent line, at either P or Q we may equate m and/'. If we confine attention to point Q we have f(b)-f(a) b — a =/m where a < £ < b. This is the form in which the mean value theorem for derivatives, also known as the law of the mean, is usually quoted. In geo- metrical terms the theorem asserts that there is always a point (£,/(£)) on the graph of the function, with a < £ < b, at which the tangent to the curve is parallel to the secant line PS. The fact that the precise value of f is not usually known is, generally speaking, unimportant in the application of this 206 / DIFFERENTIATION OF FUNCTIONS CH 5 Fig. 5- 10 Illustration of the mean value theorem. theorem. This is because it is often used with some limiting argument in which b —>■ a, so that f -> a also. A formal statement of the theorem is as follows. theorem 5-12 (mean value theorem for derivatives) lff(x) is a real valued function that is continuous in [a, b] and differentiate in (a, b), then there exists a point f interior to (a, b) such that f(f>) -f(a) b — a =/m The existence of more than one point f in (a, b) at which this result is true is not precluded. This is so because it is only asserted that such a point exists, and not that there is necessarily only one such point. Such is the case, for example, in Fig. 5T0 since as was remarked, /'(f) =/'(*?) w ^h f ¥= *), though both points f and v\ are interior to (a, b). Many people would regard the argument above as proof enough of the mean value theorem, but for the more critical reader we now offer the promised proof based on Rolle's theorem. Proof As with the proofs of many mathematical theorems, our result is established more easily by a somewhat artificial approach than by a direct method. Here we shall utilize the intuitively obtained result above to suggest the form of a special function F(x) to which Rolle's theorem can be applied, thereby yielding the desired result. SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 207 Specifically, since by implication the result depends on /(.v) and x, we shall try to find the simplest function F(x) that depends on f(x) and x, that is continuous in [a, b] and is differentiable in (a, b), and is such that F(a) = F(b). The value of F(a) may be assigned arbitrarily and F(x) will still satisfy Rolle's theorem, so to simplify slightly the working we shall assume that F(a) = F(b) = 0. We consider the obvious function F(x) = A + Bx+f(x) which clearly satisfies the continuity and differentiability conditions of Rolle's theorem. The constants A and B must be chosen in order that F(a) = F(b) = 0. Thus = A + Ba +f(a) and = A + Bb+f(b) from which it follows that, b Hence F(x) has the form \ b — a J a — F(x)=f(x)-f(a) + (a - x). Thus we have succeeded in finding a function F(x) with the desired properties which satisfies Rolle's theorem. Differentiating F(x) we obtain F'(x) =/'(*) 7(b) -/(a)" Now by Rolle's theorem there exists a point f, with a < f < b, such that F'(£) = and so we have our desired result b — a Since we may write f = a + d(b — a), where < 6 < 1, this result is sometimes expressed in the following form attributable to Cauchy, f(b) -f(a) = (b~ a)f'[a + 6(b - a)] with < 6 < 1. By applying the same arguments to a suitably constructed function <p(x), analogous to F(x), it is a simple matter to prove the following extension of the mean value theorem due to Cauchy. (See Problem 5-37.) 208 / DIFFERENTIATION OF FUNCTIONS CH 5 Corollary 5-12 If g'(x) = h'(x) at all points of [a, b], then g(x) = h(x) + constant in [a, b]. Proof Setf = g — hm Theorem 5-12 applied to the interval [a, x]. Then g(x) — h(x) = g(a) — h(a) = constant and the result follows. theorem 5-13 (Cauchy extended mean value theorem) If f(x) and g(x) are real valued functions that are continuous in [a, b] and differentiable in (a, b) and g'(x) # in (a, b), then there exists a point f interior to {a, b) such that f(b)-f(a) _f® g(b) - g(a) g'it) 5-3 (d) Indeterminate forms — L'Hospital's rule Limits such as lim (sin ax)/x which apparently tend to the form 0/0 have already been encountered and given meaning in special cases. A closely related problem is that of giving meaning to the limit of a quotient which apparently tends to oo/oo. These limit problems are both called indeterminate forms. One of the most obvious applications of the extended mean value theorem is to resolve the value of the limit in either of these situations, and we now prove the simplest statement of a useful result generally known as L'Hospital's Rule. theorem 5-14 (first form of L'Hospital's rule) If f(x) and g(x) are real valued differentiable functions at x = xo and, (a) f(x ) = g& = 0, (b) lim —^ = X, where X is either a real number or infinity, *^r <?'(*) thCn r fix) ... fix) . hm i— = hm J -— = I. *-*n g(x) x-»x g (x) Proof Apply the extended mean value theorem to the functions /(x) and g(x) denned on the interval [x, xo] and use condition (a) to obtain /(*) ^ fit) g<*) g'(0 where x < £ < xo. Now x -»• xo implies that | ->• x , so that by condition (b) we have the desired result lim /w lim m. L x^xogW f-, g (?) The fact that the variable I appears in the second limit in place of the x stated in the theorem is unimportant. Its function is simply that of a variable SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 209 and the symbol used to denote it is immaterial. In general, when the symbol used to denote a variable is unimportant because it only appears in some intermediate calculation, the details of which do not concern us, we shall call it a dummy variable. A useful extension of L'Hospital's rule is contained in the following corollary which allows examination of limits which tend to the form oo/oo. Corollary 5-14 If <p(x) and xp(x) are real valued differentiable functions at x = xo and, (a) lim <p(x) ->■ ± oo, lim y>(x) -*■ ± oo, X-»X Q X- ».T (b) lim = X, where X is either a real number or infinity, x^x f (X) then hm — — = hm — — - = A. *-**o WW x^x f (X) Proof Apply the extended mean value theorem to the quotient qs(x)jxp{x) in the open interval (x, xi) with xo < x < xi, and write the result in the form <f>(x) W(x) 1 - V(x\) f(x) 1 - <p(xi) <p(x) where x < f < xi. Then, taking xi fixed and arbitrarily close to xo so that £ -»• xo, allow x ->■ xo. The first factor on the right-hand side then approaches arbitrarily close to unity thereby giving rise to the stated result. A modifica- tion of this argument shows that the result is also true if xo ->■ oo. Example 5-9 Determine the value of the following indeterminate forms using L'Hospital's rule and Corollary 5T4: sin ax (a) lim Z-.-0 X ... ,. x* + 3x2 - 2x - 2 (b) lim — ; x ^i 2x 2 — x — 1 , . ,. sin 3x (c) hm — — ; x^O X 3 tan 3x (d) lim X ->1„ tanx 210 / DIFFERENTIATION OF FUNCTIONS CH 5 (e) lim P -I) cot bx Solution (a) This is of the form lim//g— >-0/0 with /(.v) = sin x.v and g(x) = .y. As/'(.v) = a cos a.v and g'(x) = 1 it follows that sin a.v . a cos a.v lim = lim = a. :r— x J---0 1 This confirms the limit that was obtained by a different method in Chapter 3. (b) This is also of the form \im f/g— ► 0/0 but this time with f(x) = x 3 + 3.Y 2 - 2.v - 2 and g(x) = 2.y 2 - .v - 1. Tt follows that f'(x) = 3x 2 + 6x — 2 and g '(.y) = 4.y — 1 so that .y 3 + 3.y 2 - 2.y - 2 ,. 3.y 2 + 6.y - 2 7 lim = hm = — ,_., 2.Y 2 - .Y - 1 ,_., 4.Y - 1 3 (c) This is again of the form lim//g— >-0/0 with /(.y) = sin 3.y and g(x) = .y 3 . Hence /'(a) = 3 cos 3.y and g'{x) = 3.y 2 so that sin 3.y cos 3.y lim — = lim >- + cc. 3-^0 x r _.o x- (d) This is of the form lim//g— >- oo/oo with f(x) — tan 3.y and ^(.v) = tan .y. Hence f'(x) = 3 sec 2 3.y and g'(x) = sec 2 .y and by Corollary 5T4, tan 3.y 3 sec 2 3.y cos 2 .v lim = lim = 3 lim tan x . r „w sec 2 .v . r -*j n cos 2 3.y This is again an indeterminate form, but now of the type 0/0. Applying Theorem 5T4 we have cos 2 .y .. 2 sin .y cos .y .. / sin x \ .. t cos .y \ cos-.y z sin .v cos x ,. / sin x \ ,. / 3 lim = 3 lim — : — = lim .lim t-»jjt cos 2 3y r -.\i7 6 sin 3.y cos 3.y r —\n vsin 3.v/ ,• .<* \ ^COS 3.Y/ and hence tan 3.y , . cos .y lim = — hm — — — • *_.}„ tan .y . r -s„ cos 3.y This last result is yet again an indeterminate form of the type 0/0 so that a further application of Theorem 5- 14 finally gives tan 3.y ,. sin x 1 hm = - lim . = -• . T — s „ tan .y .r->j^ 3 sin 3.y 3 (e) This is of the form lim f/g — > oc/oc but it is easily seen that an applica- tion of Corollary 514 will not simplify the limit to be evaluated. Instead, we SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 211 rewrite the limit in the form a , .y / tan bx lim — = lima ; ,.-.o cot bx .,-.(> .y when it is seen that the alternative form is of the type lim//g— >-0/0 with f(x) = a tan bx and g(x) = x. Now/' (a) = ab sec 2 bx and g'(x) = 1 so that by Theorem 514, ;) ,.. ab sec- x lim — = hm = ab. j--.o cot bx T ^ 1 5-3 (e) Identification of extrema We return to the topic of extrema and, in particular, to the identification of functional behaviour at stationary values by means of the mean value theorem. Suppose that a real valued function f(x) is differentiable in the interval (a, b) and has a maximum at an interior point xo of (a, b). Then if h is assumed to be positive and we consider the interval [xo — h, xo] to the left of xo, by the mean value theorem /(a-o)-/(a-o-/Q h =/(a where xo — h < £ < xq. Now by supposition h > and as xo is a maximum, the numerator of this expression will also be positive showing that/'(f) > 0. Hence by allowing h to tend to zero, it follows that f -*■ xo and we have shown that to the immedi- ate left of the maximum we must have/' > 0. To the right of the maximum, and in the interval [.v , xo + h], the same argument shows that where ao < r\ < xo + h. This numerator is negative so that to the immediate right of the maximum we must have/' < 0. Similar arguments applied to a minimum and a point of inflection with a horizontal tangent yield the following useful theorem, illustrated in Fig. 51 1 . theorem 515 (identification of extrema using first derivative) If/(x) is a real valued differentiable function in the neighbourhood of a point A'o at which /'(.Yo) = then: (a) the function has a maximum at ao \ff'(x) > to the left of ao and 212 / DIFFERENTIATION OF FUNCTIONS CH 5 f'<0 f>0 (c) Fig. 5-11 Stationary values of y = fix): (a) local maximum; (b) local minimum; (c) point of inflection with zero gradient. f'(.x) < to the right of ,v ; (b) the function has a minimum at ,vo if /'(- Y ) < to the left of .\ and f'(x) > to the right of x ; (c) the function has a point of inflection with zero gradient at .yo if f'(x) has the same sign to the left and right of xo. In many books these results are regarded as intuitively obvious deductions from the geometrical interpretation of a derivative in conjunction with the behaviour of the graph of the function. However we have discussed them formally here as an illustration of an important consequence of the mean value theorem. SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 213 Example 510 We again consider the functions of Example 5-8. Case (a) f(x) = |.r 3 + 2.v 2 + 3x + 1 with stationary points x = f at I = — 1 and f = — 3. Ksf'(x) = x 2 + 4.v + 3 it follows that to the immedi- ate left of | = — 1 we have /' < 0, whilst to the immediate right /' > showing that f = — 1 corresponds to a minimum. A similar argument shows that f = — 3 corresponds to a maximum. Case (b) /(.v) = (2.v — 5).v 2/3 with the one stationary point x = f at 1=1. As f'(x) = 2.y 2/3 + 2(2* - 5)/3x 1/3 it follows that /' < to the immediate left of f = 1 and /' > to the immediate right. Hence f = 1 corresponds to a minimum. As Theorem 515 stands, since/is not differenti- able at the origin, the maximum that occurs there must be identified as in Example 5-8. However a trivial modification of the proof would show that results (a) and (b) of the theorem are still valid if/ is not differentiable at .yo. 5 3 (f) Differentials In using the notation dyjdx to represent the derivative of the dependent variable y with respect to x we have thus far been careful to emphasize that dj/d.v is simply a number defined by a limit. Although suggestive of incre- ments, dy and d.x taken separately have as yet no individual meaning. In many applications, particularly in differential equations which we encounter later, it is convenient to work with actual quantities dy and dx which we will call differentials. However differentials must obviously be defined in a manner consistent with the notation dyjdx when it is used to denote the derivative with respect to x of the function y defined by y =/(*)• (5-8) We achieve this by defining dy, the first-order differential of;-, by dj=/'(-v).A.Y, (5-9) where A.y is an increment in a' of arbitrary size. However, if, for the moment, we regard the independent variable .y as a function of .y we can write x = g(x) with g(x) = x. Then by the above argument d.Y, the first-order differential of x, is defined by dx = 1 . Ax, (5-10) showing that we may with meaning write Eqn (5-9) in the form dv=/'(Y)d.Y. (5-11) When needed, the actual increment in y consequent upon an increment A.y in x will be denoted by Ay. In general the differential dj and the increment Ay are distinct quantities and the interrelationship between them is indicated 214 / DIFFERENTIATION OF FUNCTIONS CH 5 Fig. 5- 12 Differentials dx and Ay. in Fig. 5-12. In more advanced treatments the use of differentials is strictly avoided on account of logical difficulties encountered with their definition. However they are so useful that we shall ignore these objections and use them freely whenever necessary. It is an immediate consequence of this that if y = kif(x) + k 2 g(x) then by Theorem 5-4, dy = kif'(x)dx + k 2 g'(x)dx or, equivalently, in symbolic notation d(kif + k 2 g) = krff + k 2 dg. (5-12) If we have y =f(x)g(x) then by Theorem 5-5, ty = g(x)f(x)dx +f(x)g'(x)dx or, equivalently, in symbolic notation d(fgy = gdf + fdg. (5-13) Finally, if y=f(x)lg(x) then by Theorem 5-8, g(x)f'(x)dx -f(x)gXx)dx dy = g 2 (x) SEC 5-3 SOME IMPORTANT CONSEQUENCES OF DIFFERENTIABILITY / 215 or, equivalently, in symbolic notation Jf\ = gdf-fdg (5 . 14) 'I ) " Example 5-11 ]f/(x) = sin (x 2 + 4) and g(x) = x 3 find the differentials: (a)d(3/+£); (b) d(fg); (c)d ©• Solution (a) d(3/+ g) = d[3 sin O 2 + 4) + x*] = 3 cos (jc 2 + 4)d(;c 2 + 4) + 3x 2 dx = 6x cos (a- 2 + 4)dx + 3a- 2 c1.y. (b) d(fg) = d[x* sin (x* + 4)] = 3x 2 sin (x 2 + 4)dx + x s cos (.y 2 + 4)d(.v 2 + 4) = 3x 2 sin (jc 2 + 4)dx + 2x i cos (.y 2 + 4)d.v. '©" "sin (jc 2 + 4)" x 3 cos O 2 + 4)d(jc 2 + 4) - 3.y 2 sin (.y 2 + 4)d.v x^ 2x 2 cos (,y 2 + 4)d.Y - 3 sin (.y 2 + 4)d.v For small values of d.Y, the differential dv is obviously a reasonable approximation to the actual increment Ar. This simple observation is often utilized to relate small changes in dependent and independent variables as the next example shows. Example 512 The pressure/; of a polytropic gas is related to the density p by the expression P = A P r > where A is a constant. Deduce the relationship connecting the differentials dp and dp. Given that y = 3/2 and p = 4, and taking dp as an approximation to the actual pressure change A/?, compute the approximate new pressure if p is increased by 01. Compare the approximate and exact results. 216 / DIFFERENTIATION OF FUNCTIONS CH 5 Solution In this case p =f(p) with f(p) = Ap y . Hence f'(p) — yAp'~ l and thus the desired differential relation is dp = yAp'~ l dp. When y = 3/2 and p = 4 it follows from the stated pressure-density law that the initial pressure po is p = ifii^A = 8/(. Using the differential relation to compute the approximate pressure increase represented by the differential dp we find dp = QI2).A.W*.(0-l) = 0-3A. Hence the approximate new pressure po + dp = 8-3/1. The exact new pressure po + A/? may be computed from the pressure- density law by setting p = 4-1 to obtain po + &p = (4-1)3/2,4 = 8-308/4. This shows that in this case the differential relation gives a good approxima- tion to the pressure increase. 5 -4 Higher derivatives — applications We have seen how differentiation applied to a suitable function/(,v) yields as a result another function /'(.v), the derivative of f(x) with respect to x. If the function f'(x) is itself differentiable then a repetition of differentiation will result in a further function that we shall denote by/"0) and will call the second derivative of f{x) with respect to x. We may usefully employ the dynamical problem that served to introduce the notion of a derivative to give meaning to the notion of a second derivative, for if fix) represents a velocity, then f"{x) represents an acceleration. If the function f'{x) is itself differentiable then it is customary to denote the third derivative off(x) by fix) after which, if necessary, further derivatives are conventionally denoted by the use of superscript roman numerals. Hence the sixth derivative of a suitably differentiable function /(x) would be written / vi (x). A better notation than this is needed for general purposes and the two most often used because of their versatility are d'H- -^ or D"y. d.v" These both represent the «th derivative with respect to x of y = f(x) and for their determination require the successive application of differentiation n times. The number n is the order of the derivative and the symbol D symbolizes the operation of differentiation. Computationally the definition of the- «th derivative of y with respect to x is equivalent to using either of these two equivalent algorithms SEC 5-4 HIGHER DERIVATIVES -APPLICATIONS / 217 d_ dx — -4 ) = T I or D[D»-iy) = D»y. (5-15) These expressions are, of course, only meaningful when n is an integer and we shall agree to the convention D°y = y. Geometrically, the function d n y\dx n bears to the graph of d n_1 j/dx"" 1 , the same relationship as does the function dy\dx to the graph of y. Namely d n y\dx n at x = xo is the gradient of the graph of d^-^/dx" -1 as a function of x at the same point x = xo- Example 5-13 Determine dy/dx, d 2 yldx 2 , and d 3 y/dx 3 given that y = f(x) with: (a) f(x) = cos mx; (b) f(x) = tan x; (c) f(x) = 1/(1 + x). If possible make deductions about the nth derivative. Solution dy d (a) — = f'(x) — — (cos mx) = — m sin mx, dx dx d*y d (dy\ d . -—z = — — I — — I = — — [— m sin mx] = — m l cos mx, dx 2 dx \dx/ dx d /d 2 v\ d = — • I — - 1 = -— [— m 2 cos mx] = m 3 sin mx. dx \dx 2 / dx d3y dx~ 3 An inductive argument easily shows that the nth derivative (d n /dx")(cos mx) = m n cos [mx + (rmjl)]. In respect of the function y = cos mx, it is of importance to notice that the simple algebraic equation d 2 y connects the function and its second derivative. Because this equation involves derivatives it is a differential equation. Such equations are very important in both mathematics and the mathematical sciences ; the last three chapters of this book provide an introductory study of them. dv d (b) -^- ^f'(x) = — (tan x) = sec 2 x, dx dx d2y dx 2 d (dy\ d = — I — = — (sec 2 x) 2= sec 2 x tan x, dx \dx/ dx 218 / DIFFERENTIATION OF FUNCTIONS CH 5 d 3 v d /d 2 v\ d „ d^ = ±x idi j = dx (2 Se ° 2 X tan X) = 2 S6C2 X(2 ta " 2 X + sec2 x) - There is no simple rule by which (d"/dx n )(tan x) may be computed. *? = n X ) = l /'-J— "i - - 1 dx JK ' dx\l+x/ ""(1 + *) 2 ' d2 y ^ A (fy\ _ d r -1 dx \dxf dx (c) dx 2 d dx It follows by induction that d» / 1 \ (-!)»«! d*y _ d /d 2 j>\ _ dx 3 dx \dx 2 / (1 + x)\ 2 (1 + Jt)3 -3! (1 + x)=>J (1 + *)* K n \ 1 + X/ dx" \1 + x/ (1 + x)" +1 In general, functions are not capable of differentiation an indefinite number of times, and at some stage they usually become non-differentiable. A simple example of a function that is not differentiable an indefinite number of times, though for a different reason from the above, is x n , with n an integer. The «th derivative of x n is the constant number n\ so that the (« + l)th and all subsequent derivatives are identically zero. 5-4 (a) Leibnitz's theorem This useful theorem is a consequence of Theorem 5-5 and facilitates the com- putation of high-order derivatives of the product f(x)g(x) of the two func- tions /(» and g(x), in terms of the derivatives of the individual functions f(x) and g(x) themselves. The result is, perhaps, best expressed in terms of the symbolic differentia- tion operator D, and for our starting point we now re-express the result of Theorem 5-5 in terms of the operator D. D(fg)=fDg+gDf. Assuming functions/(x) and g(x) are suitably differentiable, a further applica- tion of the operator D together with Theorem 5-5 yields DKfg) = D(fDg + gDf) = Df. Dg + fD*g +Dg.Df+ g D*f. However J s dx dx dxdx * J ' SEC 5-4 HIGHER DERIVATIVES - APPLICATIONS / 219 so that DHfg)=fD*g + 2Df.Dg + gD2f. (5-16) A repetition of the same argument shows that DKfg) =fD 3 g + 3Df. D*g + 3Dy. Dg + gB*f. (5-17) The coefficients involved in Eqns (5-16) and (517) are seen to belong to the general pattern of binomial coefficients in the expansion of (a + b) n , namely to the rows of numbers (°o) (!) (« = 0) («=i) *■-» (o) (?) (?) (« = 3) Q (?) (?) (?) or, equivalently, to the rows (» = 0) 1 (« = 1) 1 1 (« = 2) 1 2 1 (« = 3) 1 3 3 This suggests that in evaluating D n (fg), the coefficients arising should belong to the (n + l)th row of either of these arrays, which are Pascal triangles. That this is so can be proved fairly easily, using an inductive argu- ment similar to that used to prove the binomial theorem. We shall not give the details, preferring simply to state the theorem. theorem 5' 16 (Leibnitz's theorem) If/(x) andg(x) are n times differentiable real valued functions in the interval {a, b), then D'Kfg) = I (l) D»~*f. D*g. The value and power of this is best shown by an application. Example 5-14 Use Leibnitz's theorem to evaluate (d 3 /dje 3 )(x 6 sin x). 220 / DIFFERENTIATION OF FUNCTIONS CH 5 Solution Setting n = 3 in the general result gives DKfg) = gDJ+ 3D*f. Dg + 3Df. D*g + fD»g. This is, of course, result (517) differently expressed. Now we make the identifications /(.y) = .y 6 and g(x) = sin .v when it follows that Df = 6.Y 5 , D 2 f= 30.v 4 , Z) 3 / = 120.Y 3 , and Dg = cos x, D 2 g = - sin x, D 3 g = - cos x. Hence substitution into the above result gives Z) 3 (.v 6 sin .v) = 120x 3 sin x + 90.y 4 cos x — 18x 5 sin x — x 6 cos .y. 5-4 (b) Identification of extrema by second derivatives An important application of the second derivative of a function /(.y) is to the identification of the nature of its extrema. Let us suppose that/(.Y) is twice differentiable and that/'(xo) = and/"Oo) = L < 0. Then from Definition 5-2 and the notion of a second derivative we must have that r(xo) = lim / ' ( " ) -^° ) = Z,<0. By supposition f'(x ) = 0, so that f'(x) f"(xo) = lim -J-L2- = L < 0. X-+I0 X Xo This limit must be independent of the manner in which .y approaches .Yo so that we must consider separately the cases that x lies to the left or to the right of Xo. If x lies to the left of xo then x — x < 0. Consequently, as the value L of the limit is negative, the expression defining f"{xo) implies that to the immediate left of xo it must be true that/'(.Y) > 0. If x lies to the right of xo then x — xo > 0. Consequently, as the value L of the limit is negative, the expression defining f"(xo) implies that to the immediate right of xo it must be true that/'(.Y) < 0. These results, in conjunction with Theorem 5-15 (a) prove that at a stationary value xo, for which/"(vo) < 0, the function/(.Y) attains a maximum value. An exactly similar argument proves that at a stationary value .yo, for which /"(-Yo) > 0, the function f(x) attains a minimum value. To complete the argument, consider the situation in which f"(xo) = 0. It might be conjectured that this corresponds to a point of inflection; and to establish the correctness of our intuition let us appeal to the geometrical interpretation of a derivative as a gradient. Suppose that .yo corresponds to a point of inflection with zero gradient. Then as .y increases through the value „vo, either (a) f'(x) is initially positive and decreases to a minimum value/'(.Yo) = 0, thereafter increasing again (cf. Fig. 51 1 (c)); SEC 5-4 HIGHER DERIVATIVES -APPLICATIONS / 221 or, (b) f'(.x) is initially negative and increases to a maximum value/'(-Yo) = 0, thereafter decreasing again. In each case .\o is a stationary value of the first derivative/'(.\), so that by an application of Theorem 5- 10 to the function /'(.v) we find that/"(.\o) = at a point of inflection. We have thus proved the following theorem. theorem 517 (identification of extrema using second derivatives) Let /(.y) be a real valued twice differentiate function in (a, b) with a stationary point xo in (a, b), so that/'(.vo) = 0. Then, if (a) f"(x ) < the function /(x) has a maximum at .vo, (b) /"(vo) > the function /(.v) has a minimum at .vo, (c) f"(xo) = the function f(x) has a point of inflection at .\o with zero gradient provided that the sign of/'(.v) is the same to the immediate left and right of xq. The proof of this theorem shows clearly what was asserted earlier; namely that a point of inflection on the graph of a function separates a region of convexity from a region of concavity. There is, of course, no necessity that this point should have associated with it a zero gradient. Following this argument to its logical conclusion we see that the proof of (c) above need only involve the sign off'(x) t0 tne left and right of .v when /'(-Xo) = 0, for then such arguments are needed to distinguish between an extremum and a point of inflection. If/'(.Yo) ^ such problems do not arise and it is sufficient to look for those values f for which /"(f) = 0. We have thus proved the following general result. theorem 5- 18 (location of points of inflection) If/(x) is a real valued twice differentiable function then its points of inflection, if any, occur at the numbers f for which /"(I) = provided that /'(£ ) ^ 0. Tf however this is not so, and /'(f) = 0, then f corresponds to a point of inflection provided that the sign of/'(x) is the same to the immediate left and right of f . It is left to the reader as an exercise to prove that when/'(.*o) = f"(xo) = 0, then provided /'"(xo) exists, our condition onf'(x) may be replaced by the requirement f'"(xo) =£ 0. The proof is essentially similar to that given for Theorem 5T7 though this time the starting point is the definition o[f"(xo) expressed as a limit. We give this result as a corollary. Corollary 5-18 Tf f(x) is a real valued thrice differentiable function and /'(f) =/"(£) = 0, then/(X) has a point of inflection at x = f if/'"(f) =£ 0. Example 5-15 Locate and identify the stationary values of the following 222 / DIFFERENTIATION OF FUNCTIONS CH 5 functions. Find any points of inflection they may have, together with the gradient of the tangent line at such points: (a) f(.\) = .v 3 - 12.v + 1 in [- 10, 10] ; (b) f(x) = tan x in [-fr, H; (c) /(.v) = (.v - 1)3 in (-oo, oc). Solution (a) The stationary values are those numbers I for which/'(£) = 0. Hence as f'(x) = 3.v 2 — 12, the stationary values are determined by the equation 3f 2 -12 = 0. This has roots f = 2, f = —2 which both lie in [—10, 10] and are the desired stationary values. As/"(.v) = 6.v, it follows that/"(2) = 12 > and/"(-2) = — 12 < 0. Hence by Theorem 5-17, the point | = 2 is a minimum and the point | = —2 is a maximum. Since the function has no other stationary value there can be no point of inflection at which the tangent line has zero gradient. However f"(x) = 6.y vanishes when x — 0, so that by Theorem 5-18 we see that .v = must correspond to a point of inflection. The gradient at .y = is/'(0) = — 12 which is the gradient of the desired tangent line to the graph at the point of inflection. (b) Here we have/'(Y) = sec 2 .v and clearly, since sec 2 .v = 1 + tan 2 x, it follows that/'(;v) ^ in [ — \tt, \tt\. The function /(*) = tan .y thus has no stationary values in [— J77, \v], though it assumes its greatest value at 477 and its least value at — \tt. We have/"(.Y) = 2 sec 2 x tan x which vanishes for.v = 0. Hence by Theorem 5-18, the function tan .y has a point of inflection at the origin at which the gradient of the tangent to the graph has the value /'(0)=1. (c) We see that/'(.v) = 3(.y — l) 2 and so the condition /'(I) = yields £ = 1 as the single stationary value. However, f"(x) = 6(.y — 1) which shows that we also have/"(l) = 0. Appealing to the last part of Theorem 518 we see that, as f'(x) = 3(.y — l) 2 > to both the left and right of .y = 1, it follows that/(.v) = (.y — l) 3 has a point of inflection at that point. The tangent line to the graph there has a zero gradient. Alternatively, as/'"(.\)~ 6^0, the result also follows from Corollary 5-18. 5-5 Partial differentiation The notion of continuity has already been extended so that it is meaningful in the context of functions of several independent variables. It is now appro- priate to extend the notion of a derivative in a similar fashion. For simplicity of argument we shall work with the function f(x, v) of two independent variables, and in order to visualize its behaviour geometrically we will define a dependent variable by the equation u=f(x,y). (5-18) SEC 5-5 PARTIAL DIFFERENTIATION / 223 The function may then be represented as a surface in three dimensional space. A typical surface generated by a function of the form of Eqn (5-18) is shown in Fig. 5-13 and, unlike functions of one independent variable, it is necessary to define more than one first-order derivative. The idea involved is simple: by holding one of the independent variables in /constant at some value of interest, the function/then becomes a function of the single remain- ing independent variable. We may then differentiate / as though it were a function only of that one variable. By holding first x and then y constant in this manner, two different derivatives may be defined which, because of their manner of computation, will be called partial derivatives to distinguish them from our earlier use of the term derivative. We shall now express these ideas formally as a definition and set down the standard notation to be used. Fig. 5- 13 Geometrical interpretation of partial derivatives. definition 5'5 (partial derivatives) Let f(x,y) be a function defined near (xo, Vo). Suppose that Jim X—Xq f(x,yo) -f(x ,yo) x — Xo (A) exists and is independent of the direction of approach of x to x . Then /is differentiable partially with respect to x at (.vo, J'o). The value of the limit is 224 / DIFFERENTIATION OF FUNCTIONS CH 5 denoted by f x (x ,yo) or by Sfl8x\ (rolm) and called the first-order partial derivative of/ with respect to x at (xq, yo). Similarly, suppose that lim /fa^)-/(*o..yo) v-vo y -jo v ' exists and is independent of the direction of approach of y to yo. Then / is differentiate partially with respect to y at (x , y ). The limit is denoted by fy(xo,}'o) or by 8fj8y\ {XoM) and called the first-order partial derivative of/ with respect to y at (xo, yo). By analogy with ordinary derivatives, if/(.v, y) is differentiable partially with respect to x and r at all points of some region in the (x, j)-plane and these derivatives are continuous, then we say/is differentiable in that region. The operations of partial differentiation with respect to x and y are usually denoted by the differentiation operators 8/8x and 8/8y, respectively. Let us now interpret these definitions in terms of Fig. 5-13. The function f(x, yo) occurring in the numerator of limit (A) in Definition 5-5 is represented in that figure by the intersection of the surface u =f(x,y) with the plane y — yo which has been labelled III. It is the curve L\. The number f x (x ,yo) defined by limit (A) is the gradient of the tangent line h to this curve at point P. By requiring the limit to be independent of the direction of approach of x to Xo, we have ensured that the tangent lines drawn to the curve at P, whether from the left or the right, will have the same gradient. In simpler terms this ensures that the curve L\ is smooth and has no kink at P. The number /(xo, y) occurring in the numerator of limit (B) in the defini- tion is represented in Fig. 5T3 by the intersection of the surface u = f(x,y) with the plane x = xo which has been labelled n 2 . It is the curve Lt. The number f y (x ,yo) defined by limit (B) is the gradient of the tangent line h to this curve at point P. Thus by differentiating partially we mean that, during the process of differentiation, the other independent variable is to be regarded as a constant. In consequence, all the rules of differentiation developed for functions of a single variable are also rules of partial differentiation, provided only that the functions involved are suitably differentiable. On account of this when, for example, the operator 8/8x acts on a function only of y, say g(y), that function is to be regarded as a constant with respect to this operator and so (8l8x)[g(y)] = 0. Similarly (8l8y)[h(x)} = 0. Example 5-16 In each of the following cases compute/: and/, as functions of SEC 5-5 PARTIAL DIFFERENTIATION / 225 .y and y. Use the result to determine the numerical value of these derivatives at the stated points: (a)/(.Y,F) = * 3 + 2.Yj + 2r 2 ; (1,2); (b) f(x, y) = x sin xy + 3 ; (1,^); (c)f(x,y) = X Kx*+y*); (1,0). Solution (a) /* = £ W + 2xy + 2f] = ^-[x*]+2y^[x] + 2yZ^-[l), ex ox ox whence 8f ox = 3x2 + ^ At the point (1, 2) we find that 8f/8x\ a :2) = 7. Similarly, fy = j [* 3 + 2xy + 2y*\ = x 3^-[l]+2x^{y] +2^[j2] 8y By ' 8y y whence = 2x + 4y. ay At the point (1, 2) we find that d/]8y\ (lt . 2) = 10. 8 0) fx = — [x sin xy + 3] a as = x— [sin xj] + sin xy— [x] + — [3] 8x ex ox whence 3/ — = xy cos xy + sin xy. 226 / DIFFERENTIATION OF FUNCTIONS CH 5 At the point (1, \n) we find that c//8x| (1>) = 1. Similarly, fy = ^-[x sin xy + 3] = x — [sin xy] + — [3] whence 8v = x 2 cos xy and ¥ ^ = 0. (i» (c)/* oX _x 2 + y 2 1 8 3 x* + _y* ox 3x 1 x 2 + y 2 (x 2 + j 2 ) 2 dx [x 2 +y 2 ], 1 2x 2 y 2 — x 2 whence dx x 2 + j 2 (x 2 + y 2 ) 2 (x 2 + j 2 ) 2 At the point (1, 0) we find that 8f/8x\ (lfi) = -1. Similarly, x fv — ~r dy x 2 + _y 2 = x-[(x 2 +J 2 )" 1 ] -x 3 7-[* 2 + ;> 2 ], (x 2 + >' 2 ) 2 8j whence 8/ -2xy dy (x 2 + J 2 ) 2 and so 0/ 0V = 0. n.o) SEC 5-5 PARTIAL DIFFERENTIATION / 227 The notion of partial differentiation extends to functions of more than two independent variables in an obvious manner. Suppose that the function f(x, y, z) is defined near the point (xo, yo, zo) then, provided the limits exist, we define the three first-order partial derivatives/*,/^ and/ z by the expressions 8 1 8x 8 1 dy 8 1 8z , ■ fix, yo, zo) - f(x , yo, Zo) = urn ' (xo,yo,zo) %-*xo X — Xq , • /(xo, y, zo) - /(xo, yo, zo) = lim , (zo.i/o,zo) y^vo y~ y° , • fixo, yo, z) - f(x , yo, zo) = lim {xo,vo,zo) z ~* z o Z — Zo Clearly a function of n independent variables will have n different first- order partial derivatives ; one with respect to each of the independent variables. The actual computation of these partial derivatives is carried out exactly as before. Example 5-17 Find the first-order partial derivatives of f{x, y, z) = x 3 y 2 + 3 sin yz + 2. Solution This function has three independent variables so we must obtain three first-order partial derivatives. Namely,/;,/,, and/ z . First we have 'J- = — [ X 3y2 + 3 s in y Z + 2] 8x ox = j2 ^ [x 3 ]+3 sin 7 zl[l] + £[2], so »/ = 3x 2 v 2 . 8x * Next, f- = \- [x 3 y 2 + 3 sin yz + 2] = ^ 3 | [ ^ + 4 [SinjZ]+ ^ [2] ' so 8f — = 2x 3 y + 3z cos yz. 228 / DIFFERENTIATION OF FUNCTIONS CH 5 Finally, 8f 8 f z = y z lxY + 3 sin yz + 2] = X 3 y 2 m + 3 [sinjz] + -[2], dz oz oz so -=3ycosyz. 5-6 Total differential The idea of a differential, that was useful in ordinary differentiation, may also be developed to advantage in connection with partial differentiation. We first approach this problem from the geometrical standpoint, and then indicate how an analytical counterpart of these arguments can be produced. Let us consider Eqn (5-18) and its geometrical representation in Fig. 513. The conditions for differentiability at P ensure that the surface has a tangent plane II at that point (why?), and it is to this plane that we now confine our attention. An element of this tangent plane defined by the lines h and h through P is depicted in Fig. 5-14. Obviously points on II close to P must also be close to those points on the surface u = f(x, y) that lie vertically below them. This suggests that for such points, the element of plane IT neighbouring P represents a good approximation to the element of the curved surface defining the function u near to P. Thus variations of u close to P may, with propriety, be approximated by the variations of the corresponding points on n . Since we are interested in variations of u about the point P at which u = /(xo, 70), we shall start by translating our coordinate axes without rotation to the point P. In this position the new x, y, and u coordinate axes will be denoted by x', y', and u', respectively, as shown in Fig. 5-15. If, relative to P, the x' and y' coordinates of a point P' are Ax and Ay, then it is obvious from Fig. 5-15 that the increment dw must be dw = Ax tan a + Ay tan ft, where a and /? are the angles between the lines h and h and the x'- and j'-axes, respectively. However, by the definition 0$ f x and/j,, we have fx(xo, yo) = tan a, f y (x , y ) = tan p, so that dw = fx(x , yo)Ax + f y (x , yo)Ay. (5-19) We now define differentials dx and dy in the independent variables x and y SEC 5-6 TOTAL DIFFERENTIATION / 229 Fig. 5.14 Tangent plane II to surface u = f(x, y) at point P. by setting dx = Ax and dy = Ay. Expression (5-19) then becomes dw = f x (xo, y )dx + f y (x , y )dy, (5-20) which is the relationship by which we define the total differential du of the function u =f(x,y). This is so called because it takes account of the total effect, on u, of the changes dx in x and dy in y. The additive effect of these changes is clearly apparent in Fig. 5-15 and results from using a tangent plane approximation to the surface near P. As before, when dx and dy are suitably small, du is a reasonable approximation to the true change Am given by Am =/(x + dx, jo + dy) —f(x ,yo). (5-21) An analytic rather than geometric justification of the tangent plane approximation used to define du in Eqn (5-20) can be based on Theorem 5-12. Equation (5-21), which is exact, is taken to be the starting point and by addition and subtraction of a term/(xo, yo + Ay), is written Aw = [/(x + Ax, y + Ay) -f(x ,yo + Ay)] + lf(xo,yo + Ay) -f(x ,yo)], where the first bracket is a function only of x and the second bracket is a function only of y. Then Theorem 5-12 expressed in the Cauchy form may be applied to the first bracket with respect to x and to the second bracket with respect to y to yield 230 / DIFFERENTIATION OF FUNCTIONS CH 5 dxtana+Jytanfi Fig. 5.15 Element of tangent plane. Am = Axfxixo + f Ax, y + Ay) + Ayf v (x , yo + J? A_y), (5-22) where < | < 1 and < r\ < 1. Partial derivatives have been used here because, although in the first bracket it is only x that varies whilst in the second bracket it is only y that varies, both brackets are nevertheless functions of x and y. Result (5-20) then follows by letting Ax and Ay become small. The continuity of f x (xo + f Ax, yo + Ay) allows it to be approximated by fx(xo, yo) with an error ei and, similarly, the continuity of fy(xo, yo + f] Ay) allows it to be approximated by f y (xo,yo) with an error £2. Then, as Ax, Ay —*■ 0, so also do ei and £2. It is left as an exercise for the reader to supply the details necessary to make this argument rigorous. If Eqn (5-20) is defined for all points (x , yo) of some region in the (x, j)-plane, theh the suffix zero may be discarded and Eqn (5-20) can then be regarded as a functional rela- tionship rather than a result that is true only near one point. We have thus proved a special case of the following more general result whose proof differs in no significant detail. theorem 5-19 (total differential) Let/(xi, x 2 , . . ., x n ) be a real valued function of n real variables and let its first-order partial derivatives exist and be continuous in some region £%. Then the total differential du of the function u —f{xi, X2, ■ ■ ■, x n ) in the region £% is given by 8f 8f 8f du = -i- dxi + -^- dx 2 + • • • + tt~ dx n . OXi 8X2 OXn SEC 5-6 TOTAL DIFFERENTIATION / 231 If we consider the surface generated by setting u = constant, then on that surface du = 0. Theorem 519 then takes the form df df df = -f- dxi + -f- dx 2 + • • - + 7T- dx n , (5-23) OX i 0X2 OX n showing that the differentials dx\, dx2, ■ . ., dx„ are no longer independent since this constraint condition has been imposed on them. This is of course to be expected, since we have imposed the single condition /(jci, X2, . ■ ., x n ) = constant on the independent variables u\, U2, . . ., u n so that we are no longer free to change them arbitrarily. Indeed, if differentials d*i, d^2, . . ., dx„-i are chosen arbitrarily, then the remaining differential dx n is uniquely determined by Eqn (5-23). If we call the number of independent variables the number of degrees of freedom associated with the equation u =/(xi, X2, . . ., x n ), then Eqn (5-23) implies the loss of a single degree of freedom. Example 5-18 In thermodynamics, the pressure p of an ideal gas, its volume V, its absolute temperature T and the gas constant R are related by the ideal gas law pV = RT. Find the expression relating the total differential dp and the differentials dKand dT. Solution We have p = RT/V, and so p =f(T, V) with f(T, V) = RT/V. Hence Bf/dT = R/V and dfldV = -RT/V 2 . Now interpreting Theorem 5-19 in this case we find d H19 dr+ (^) dF ' ( * } and so Notice that the use of the symbol /in the total differential relation (*) to bring it into accord with the notation of Theorem 5-19 is not strictly necessary since p =/. We could equally well have written equation (*) as HIW(I)-. and used the immediately obvious result that 8p _ R dp _ RT 8T~ V an d~V~ ~ ~V~ 2 ' Let us now consider the function u = f(x, y) and, as a special case, set u = so that the equation 232 / DIFFERENTIATION OF FUNCTIONS CH 5 defines y implicitly in terms of x. How then may we compute the derivative dy/dx without solving for y in terms of x? The solution to this problem is provided by Eqn (5-23), which in this case takes the form = ^dx + f-dj. ox cy We saw in connection with the definition of the differentials dy and dx in Eqn (5-11), that the function (dy/dx), called the derivative of y with respect to x, is the ratio d^ : dx of the differentials. Hence dividing by the differential dx, assuming that df/dy =£ 0, and rearranging gives the result dy = -(8f/8x) dx (df/dy) ' We state this as a corollary to Theorem 5-19. Corollary 5T9 (a) If the real variables x and y are related implicitly by the equation f(x, y) = 0, and the partial derivatives df/Bx and df/dy exist and are continuous, then - - ©/(£ dy dx whenever df/dy ^ 0. Insistence on this latter condition may be avoided by writing the result in the alternative form \8y! dx dx \8y The situation is slightly different if three variables x, y, z are involved and z, say, is defined implicitly in terms of the independent variables x and y by the equation f(x,y,z) = 0. In these circumstances it is frequently necessary to compute dzjdx and Bzjdy from this implicit relationship. To do so, notice that an obvious modification of Eqn (5-23) gives but if z could be obtained explicitly, so that z = z(x, y), it would also follow from Theorem 519 that , 8z 8z . dZ '- 8~x dX + Yy dy - SEC 5-6 TOTAL DIFFERENTIATION / 233 Substitution of this result into the above expression gives (df 8f 8z\ J IBf Bf 8z\ , and as x and y are independent variables, dx and dy are arbitrary so that this expression can only be true if 8f 8fdz df 8f8z 8x 8z8x 8y 8z By Hence, we find that provided 8f/8z ^ 0, !=-(£)/© - %- (!)/(!)• We state this in the form of a further corollary. Corollary 5- 1 9 (b) If the real variables x, y, and z are related by the implicit equation f(x, y, z) — and the first-order derivatives of / exist and are continuous, then when Bfjdz ^ 0. Example 5 19 (a) Find d//dx given that x 2 y + sin xy = 0. (b) Prove that (d/dx)(.xT) = rx r ~^ when r is rational. (c) Find dzjBx and 8z\8y given that f(x, y, z) = x 2 + 2xyz + z 3 . Solution (a) We must apply Corollary 5-19 (a). As, in this case, fix, y) = x 2 y + sin xy it follows that 8f — = 2xy + y cos xy and x 2 + x cos xy 8 l = ,z By Hence, by Corollary 5-19 (a), dy _ —j8f/8x) _ _ llxy + y cos xy\ dx~ (dfldy) ~ ~ [ x 2 + x cos xy ) 234 / DIFFERENTIATION OF FUNCTIONS CH 5 whenever x 2 + x cos xy ^ 0. (b) We have already shown in Theorem 5-2 that if y = x n , then dyjdx — nx 11 ' 1 for n a positive or negative integer. Now we must show this result is still true if the power involved is rational. Let j = x r with r = pjq, where p and q are integers without any common factor. Then j = x p,s implies, and is implied by, _y« = x p . Let f(x,y) = y Q — x p so that our equation corresponds to f(x, y) = 0. Then there clearly exist pairs of real numbers (x, y) for which yi = x?, and by Theorem 5-2, dfjdy = qyi- 1 =£ when y =£ (that is, when x =£ 0), and both dfjdy and dfjdx = — pxP~ l are continuous functions. Hence the conditions of Corollary 5-19 (a) are satisfied so that by the second form of its statement we may write dy nyQ-l J- - px v-i = 0. dx Thus dv p xp- 1 p xp- 1 p , , dx q y^ 1 q O^ 7 *)?" 1 q when x ^ 0. In the event that x = we have — (XP'Q) dx , xVi - = hm » whenever this limit exists, which it does v/hen pjq > 1, and is then equal to zero. This establishes our desired result for all x. (c) Here, f(x, y, z) = x 2 + 2xyz + z 3 and so 8f 8f „ f- = 2x + lyz, f- = 2xz, dx dy df -f = 2xy + 3z2 dz Thus by Corollary 5- 19(b), dz = ,2x + 2yz\ and dx \2xy + 3z 2 / dz —2xz dy 2xy + 3z 2 5-7 Envelopes A simple and useful application of the total differential is to the problem of the determination of envelopes already touched upon in Section 2-5. Before proceeding with this application we now formally define an envelope. definition 5-6 Let a family of curves T in the (x, j)-plane with parameter a be defined by the implicit equation SEC 5-7 ENVELOPES / 235 /Or, }', a) = 0. Then the envelope of the family T, when it exists, is that curve £' which is tangent to every member of the family. Figure 5-16 (a) shows some representative members of the family V corresponding to values <xi, <X2, a 3 , and an of the parameter a. Figure 5-16 (b) shows the same situation on closely neighbouring curves Ci and C2 when the parametric value for C2 is ao + doc which differs only by the differential da from the parametric value ao appropriate to Ci. We shall assume that the curves Ci and C2 intersect at the point P with coordinates (xo, yo). Jlx,y,a,) = A y It *ifli§ili|ililliiiliti§> /t\x.y,a +da) = y» ^srfrftx- y- a o) ^ C, 1 k x Q X (a) (b) Fig. 5-16 Construction of envelope: (a) envelope of family of curves; (b) neigh- bouring members of the family. Setting u =f(x,y, a), and regarding x, y, and a as variables, it follows from Theorem 519 that 8f , 8f 8f ox oy Oct and as the family is defined by setting u = (constant) it then follows, as in Eqn (5-23), that 8f 8f 8f ox 8y dor. This equation which relates the differentials dx, dy, and da to the neigh- bouring curves Ci and C2 is, in particular, true at P. We signify this by writing \dxlp \eyl p ■ \8xJ p (5-24) where (-) p denotes that the associated quantity is to be evaluated at P. This equation is just the intersection condition for curves Ci and C2 at P. As it is required of the envelope S' that it be tangent to every member of 236 / DIFFERENTIATION OF FUNCTIONS CH 5 the family T it follows that as da -> 0, so curve Ci must tend to C2 and the gradient of the envelope «f at P must tend to the gradient of the tangent to Ci at P. To compute this we use the fact that a = ao is constant for curve Ci so that the argument that gave rise to Eqn (5-24), when applied to fix, y, ao) = gives the tangency condition -(£),"* + (i), d '- (525) Now both Eqns (5-24) and (5-25) must be simultaneously true for «? and, consequently, we arrive at the condition [da.) I) dK = °- ' v which, since in general da is a non-zero differential, can only be true if = 0. (5-26) In addition to this result, the fact that P is a point on Ci implies that f(xo, Jo, ao) = or, equivalently that lf(x,y,aL)] p = 0. (5-27) Both conditions (5-26) and (5-27) must be satisfied if the envelope $ is to pass through P and be tangent to Ci at that point, so that dropping the suffix P, we see that $ is the locus of all points for which /(*,>>, a) = and —f(x,y,a) = 0. (5-28) Elimination of a between these two equations gives a relationship between x and y which is the desired equation of the envelope S. We have thus proved the following result. theorem 5-20 (envelopes) When it exists, the equation of the envelope $ of the family of curves f{x, y, a) = with parameter a is determined by the elimination of a between the equations fix, y, a) = and ^/( x > J- °0 = °- Example 5-20 Determine the envelope $ of the family of curves (x - a)2 + iy + a) 2 = 1, SEC 5-7 ENVELOPES / 237 Fig. 517 Envelope of circles. with parameter a. Solution If we write the equation of this family of curves in the form f(x,y,aL) = 0, then we must set f(x,y,H) = (x-*)* + (y + <*)*- 1. Hence the equation 8f/8a. = corresponds to -(x - a) + (y + a) = or, equivalently, to a = |(x — y). To determine the envelope, the conditions of Theorem 5-20 require that f(x, y, a) = simultaneously with dfjda. = 0. Hence substituting for the parameter a arrived at above from the condition df/dx = into the family of the curves f{x,y, a) = gives 238 / DIFFERENTIATION OF FUNCTIONS CH 5 1(* + J) 2 + l(x + y) 2 = 1 or, x + y = ± y/2. The desired envelope £ thus comprises the two straight lines y = \J2 — x and _y = —-^2 — x. This result could also have been deduced by geometrical arguments as follows. The original family of curves comprise circles of unit radius, each with its centre at x = a, y = — a. Consequently, the tangents to these circles which form their envelope $ must be straight lines parallel to the line of centres j = — x and separated from it by a unit distance (Fig. 5-17). Although in this case it was possible to eliminate a from the equations arising from Theorem 5-20, this situation is not generally possible. In the next example we illustrate how on occasions a may be retained in a form which allows the equation of the envelope to be expressed in parametric form. Example 5-21 Find the envelope of the equation a 2 (x - a) 2 + y 2 = t— — 2 > 1 + a d where a is a parameter. Solution We again write the equation in the form fix, y, a) = 0, where this time tx 2 fix,y, a) = (x - a) 2 + y 2 - 2 - Then 8f „ x 2a 2a3 J = -2(x - a) - — — + 3a 1 + a 2 (1 + a 2 ) 2 and hence the condition dfjda. = requires that (x — a) = (1 + a 2 ) 2 1 + a 2 Now this is a specially simple situation because y is absent from the equation 8J]8x = which allows us to solve immediately for x in terms of a to get , r 2 + a 2 1 * = a3 L ( TT^# (A) SEC 5-8 THE CHAIN RULE AND ITS CONSEQUENCES / 239 To find the envelope g, Theorem 5-20 requires that in addition to satisfy- ing the condition Sf/da. = we must also require that f(x, y, a) = 0. Using the form of (x — a) given above this is easily seen to be equivalent to requiring that 2 (X 2 + y 2 = .(1 + a 2 ) 2 (1 + a 2 )J J 1 + a' This may now be solved for y in terms of a to obtain ±a 2 (3 + 3a 2 + a 4 ) 1/2 y (1 + a 2 ) 2 ' ( ' The coordinates (x, y) of points on envelope g are thus determined in terms of a by equations (A) and (B). Although it is not possible to eliminate a between these equations to obtain an explicit representation for the envelope $ in terms of x and y, this is of no real importance as we have obtained the equations of £ in parametric form which are equally satisfactory. Different values of a will determine different points (x(a), j>(°0) on tne envelope <f . This example has in fact provided the detailed solution to the problem first studied in Section 2-5. Notice that for large values of a we have x — > <x and y—*- ±1, as was deduced from purely geometrical considerations when the problem was first examined. 5-8 The chain rule and its consequences If, in Theorem 5-19, the variables jci, x%, . . ., x n are specified in terms of a parameter t, say, then the result requires slight modification. Suppose that Xl = Xl(t), X2 = X2(t), . . ., X n — X n (t), which are all differentiate functions of t. Then the variable u becomes a function of the single real variable t for we may write u = F(0, (5-29) where F(t) =f(xi(t), x 2 (t), . . ., x n (t)). Hence by an obvious adaptation of Eqn (5-11) defining differentials we may write d« = F'(t)dt, (5-30) where, of course, F'(t) = duj&t the derivative of u with respect to t. However by a further application of Eqn (5-11) to each of the variables xi = xi(t), X2 = xz(t), . . ., x n = x n (t) we have the result dx, -(£)*.*.-(£)„ «*-(£)*. ( ,3„ 240 / DIFFERENTIATION OF FUNCTIONS CH 5 Substituting these expressions for the differentials dx< in terms of the differential dt into the statement of Theorem 5- 19 gives / 8f dxi 8f dx 2 8f dx n \ , d«=M + — - + • • • + — -\dt. (5-32) \Bxi dt 8x2 dt 8x n dt ) K ' Finally, a comparison of Eqns (5-30) and (5-32) shows that 8xi dt 8x2 dt 8x„ dt As F'(t) = dujdt, this result facilitates the calculation of dujdt without the need for formal substitution into u=f(xi, X2, . . ., x n ) of the values Xl = Xi(t), X2 = X 2 (t), . . .,X n = X n (t). We have proved the following useful result. theorem 5-21 (chain rule for partial derivatives) Let u = f(xi, X2 x n ) be a real valued function of n real variables and let its first-order partial derivatives exist and be continuous. Further, let each of the variables x\, x%, . . ., x n be a differentiable function of the single real variable t so that we may write XI = Xi(t), X 2 = X2(t), . ■ ., Xn = X„(t). Then the total derivative of u with respect to t is given by d« 8f dx\ 8f d^2 8f dx n dt 8x\ dt 8^2 dt 8x n dt Two special cases of this theorem are of sufficient importance to merit recording as corollaries. The first arises when / is a function of only two variables between which an explicit relationship exists, and the parameter t is identified with one of these variables. As only two variables are involved we shall avoid the use of numerical suffixes by agreeing to write x\ = x and X2 = y where, by supposition, y = y{x) is some known explicit relation. The statement of Theorem 5-21 then becomes d« 8f dx 8f dy dt ~ ~8xdl 8ydt' If, now, we identify t with x, then t = x and dx/dt = 1, dy/dt = dy/dx so that the above result becomes d« = e/; + a/;d7 dx 8x 8y dx The expression on the right-hand side is the total derivative of u with respect to x. The first term on the right takes account of the change directly due to x SEC 5-8 THE CHAIN RULE AND ITS CONSEQUENCES / 241 whilst the second term takes account of the fact that y is itself a function of x. This result enables dw/dx to be obtained without needing to substitute y = y( x ) in the relation u = f(x,y). Corollary 5-21 (a) If u=f(x,y) is a real valued function of the real variables x and y with continuous first-order derivatives and y is related to x by the explicit equation y = y(x), then dw = S/; + fd£ dx 8x dy dx More generally, suppose that u = f(x, y) whilst x and y are related implicitly by the equation g(x,y) = 0. How must we modify our previous argument in order that we may compute the total derivative dw/d.v? The result of Corollary 5-20 (a) is still true but obviously dyjdx now depends on the form of g. To find the form of dy/dx we can use Corollary 5-19 (a), writing/ = g, to see that *y = _ ( d A\ l( 8 i\ dx \8x}/ \8y}' showing that du = 8f_/8f\/8g\//8g\ dx dx \8y)\8x)l \8yf provided 8g\dy ^ 0. We state this as our next result. Corollary 5 -2 1 (b) If u = f(x, y) is a real valued function of the real variables x and y with continuous first-order derivatives, and y is related implicitly to x by the equation g(x,y) = 0, then ^ = 8 l-( d l\( d A\l( d J\ dx dx \8yj\8x)i \8y)' provided 8g\8y =fi 0. Example 5-22 Determine the derivative du/dt given that u = sin (x 2 + j 2 ) with x = 3t, y = 1/(1 + t 2 ). Solution We must apply Theorem 5-21 making the identifications xi = x, X2 = y, and/(x, y) — sin (x 2 + y 2 ) with x = 3t and y = 1/(1 + t 2 ). Hence 242 / DIFFERENTIATION OF FUNCTIONS CH 5 — = 2.V COS (x 2 + v 2 ) -L 8x K y ' dy = 2y cos (x 2 + j 2 ) whilst dx dy __ —It d7 ~~ ' d7 ~ (1 + ? 2 ) 2 ' Substituting in Theorem 5-21, du — = 2x cos (x 2 + y 2 ) . (3) + 2/ cos (x 2 + J 2 ) -2r .(1 + / 2 ) 2 J or du d7 = 2 cos(x 2 + J 2 ) 3x 2^r (1 + Z 2 ) 2 . Using the known relationships between x, y, and t, the derivative' dujdt can thus be computed for any desired value of t. The details are left to the reader. Example 5-23 Determine the total derivative dujdx in each case : (a) u = x cos y + y cos x when y = 1 + x + x 3 ; (b) u = x 2 + 2xy — j 2 when x 2 + y 2 + cos xy = 0. Solution (a) This requires an application of Corollary 5-21 (a). We set f(x, y) = x cos y + y cos x and y = 1 + x + x 3 so that 8x and dy dx = cos y — y sin x, = 1 + 3x 2 . 8f — = — xsiny + cos x dy Hence, substituting into Corollary 5-21 (a), du dx = cos y — y sin x + (cos x — x sin y)(l + 3x 2 ). (b) In this case we use Corollary 5-21 (b), with /(x, y) — x 2 + 2xy — y 2 and g(x, y) = x 2 + y 2 + cos xy. Hence 8 1 dx = 2x + 2y, 8f dy = 2x — 2j, SEC 5-9 CHANGE OF VARIABLE / 243 8e 8g — = 2x — v sin xy — - dx y ' 8y = 2y — x sin xy. Finally, applying Corollary 5-21 (b), dx = 2(x+y)- 2(x — y)(2x — y sin xy) (2y — x sin xy) 5 - 9 Change of variable This section discusses a somewhat more complicated situation than that covered by Theorem 5-21, namely, the implications on partial differentiation of changing the independent variables in a function u =f(xi, X2, . ■ ., x n ) that is to be differentiated. This situation commonly occurs as a result of changing coordinate systems to suit physical problems as the following example illustrates. Suppose that p = p(x, y, z) is the pressure in a fluid flowing parallel to the z-axis. Then dpjdz is the pressure, gradient along-the direction of flow and Bpjdx, dpjdy are the transverse pressure gradients in the plane z = constant. » Fig. 5-18 Cylindrical polar coordinates. Now, if the flow takes place in a rectangular duct with sides described by x = constant, y = constant, then the Cartesian coordinates 0{x, y, z} are obviously the natural ones to use. However, if the flow takes place in a cylindrical pipe, then the z-axis is still convenient as it can be aligned with the axis of the pipe, but the x-, j-axes are now less useful since the wall of the pipe becomes the curve x 2 + y 2 = constant. Clearly, a more sensible coordi- 244 / DIFFERENTIATION OF FUNCTIONS CH 5 nate system would be the cylindrical polar coordinates r, 6, z' in which r and define a point in the plane z' = constant. Figure 5-18 illustrates this idea. Plane z = z' = in both the 0{x, y, z) and 0{r, 6, z'} systems of axes, and is denoted by IT. Relative to these two systems the point P has the coordi- nates 0{x, y, z} and 0{r, 6, z'}, respectively, where x = r cos 6, y = r sin 6, z = z'. (5-33) How can the pressure gradients described by the partial derivatives dp/dr, dpjdd, and dpjdz' be determined from Eqn (5-33), and the known functions dpjdx, dpjdy, and dpjdz. The rest of this section is devoted to solving this type of problem. Notice that from the definition of partial differentiation, dpjdz and dpjdz' have essentially the same meaning, whereas dpjdr is the derivative of p computed along a radius with 6 and z' held constant, whilst dp/dd is the derivative of p tangential to a circle r = constant drawn on the plane z' = constant. Although the replacement of coordinate variables in this manner involves replacing a set of n independent variables by a new set also comprising n in number (n = 3 above), we shall first prove a more general result. Specifically, consider the implication of the situation in which u = f(xi, x 2 , . . ., x n ), (5-34) when the independent variables x\, xi, . . ., x n are themselves differentiable functions of another set of variables which we denote by oci, 1x2, . . -, «m- It is not necessary that m should equal n. Thus we have Xl = Xi(<Xl, 0C2, . . ., oc TO ), X2 — *2(<Xl, 1X2, • • •> «m), f5-35 , \ Xn = X n (tX-l, <X2, . . ., <*m), If the variables xi in Eqn (5-34) were to be replaced by the equivalent functions (5-35) involving the variables an, then / would become some function F(xi, (X2, . . ., ocm) of ai, 0C2, . . ., oL m so that by Theorem 5- 19 we could write 8F dF dF d« = — dai + — da 2 + • • • + — dam. (5-36) oai 0OC2 cam CHANGE OF VARIABLE / 245 Next, observe that by applying this same theorem to the equation for x t in Eqn (5-35) we obtain a dx * j 8x i , Sx t dx^-da^-da, + •••+_ do,, (5-37) for i=l,2,. . ., n. Substituting these expressions into the statement of Theorem 5- 19 then gives H„_ 8 f \ 8x ^ A . 8x i, , 8 Xl I + ^fc dai + ^ da2 + --- + a^H- (5-38) On re-arrangement this becomes d „=r^!^+^^! + ... , ¥^„i ■ L&a a«i ^ dxz aai + + a^^Tj dai + * * • da m . (5-39) Since /(* lf * 2 , . . ., Xn ) = F ^ u X2 ^ it fo j lows fe a direct comparison of the fth terms of Eqns (5-36) and (5-39) that j? = 8/^8x1 ,£f_Sxs 8/ 8x« 8x t dxi Son "*" 8x2, Son + ' ' ' + Bx n sTt ( 540 > for i = 1, 2, . . ., w. We state this result in the form of a general theorem. theorem 5-22 (change of variable) Let/(*x, *,, . . ., Xn) be a real valued function of the real variables x u x 2 Xn whose first-order derivatives exist andare continuous. Further, let*! = xfa, « 8> . . ., ^ Xz = ^^ ^ . ; ^ . . .,x n - x„(ai, « 2 , . . ., a. m ) be differentiable functions of the real variables <*i, a2, . . ., a m , then 3ai 8x! Sai 8x2 8ai ' Sx„ Sax 246 / DIFFERENTIATION OF FUNCTIONS CH 5 8/ _ 8f 8xi 8f 8x2 8f_ &*» 8(X2 8X1 8<X2 8x2 80L2 8x n 8x2 8f _ 8^8xi 8f_8x2_ 8f 8x n 8cn, m 8xi dx m 8x2 8a.m 8x n 8xm Example 5-24 Express df/dr, 8fj8d, and 8fj8z' in terms of df/dx, 8fj8y, and 8fj8z given that x = r cos 6, y — r sin 6, z = z'. Find their values given that /(x, y, z) = x 2 + 3xy + y z + z 2 . Solution We must apply Theorem 5-22 with m = n = 3 by making the identifications xi = x, x% = y, xz = z and oci = r, «2 = 6, 0C3 = z'. Our first result is 8r 8x 8r 8y 8r 8z 8r 8f _ 8f 8x 8f 8y 8f 8z 86 ~ '8x ~86 + '~8y lid* "SzW 8f _ 8f 8x 8f 8y 8f 8z 8? _ 8x ~8z' + ~8y 8? + ~8z ~8z~'' However, 8x 8x . 8x 8y . — = cos 6, —- -r sin 6, — = 0, — = sin d, 8r 80 8z 8r 8y 8y 8x 8y „ 8z' ■4 = r cos 0, — , = 0, —=■/• = 0, — = 1. 86 8z' 8z' 8z' 8z Hence, substituting these values into the above transformation equations shows that 8f 8f n 8f . n 8f 8f . n 8f -4 = - — r sin 6 + -x- r cos 6, 86 8x 8y 8z' 8z Next, using the fact that/(x, y, z) = x 2 + 3xy + y 2 + z 2 we see that SEC 5-9 CHANGE OF VARIABLE / 247 ! = 2* + 3 7) 8f f = 3x + 2y, 8y oz so that -^ = (2x + 3y) cos 6 + (3x + 2y) sin 0. However, as r 2 = jc 2 + y 2 and cos 6 = x/(x 2 + j 2 ) 1/2 , sin 6 = y/(x 2 + J 2 ) 1 ' 2 , this result simplifies to 8f 2x 2 + 6xy + 2j; 2 g;- ( X 2 + ^2)1/2 A similar calculation shows that | = 3(, 2 -^ 2 ), | = 2, Consider the special case of Theorem 5-22 that results when m = n = 2, so that its statement becomes 8f _j8f 8xi 8f 8x2 8xi 8xi 8xi 8x2 8oli 8f _ 8f 8xi 8f dx z 8a.2 8xi 8x2 8x2 8x2 (5-41) Now for any differentiable function/(*i, X2), once the variable change has been decided, these equations express the partial derivatives / Kl ,/ aa in terms offx ,f x which we suppose to be known. However, if 8f/8xi and 8fj8x2 are supposed known, then Eqns (5-41) can be regarded as simultaneous equations for/i ,f x% . Thus, provided the simultaneous equations can be solved, Eqns (5-41) may be regarded as describing a one-one transformation, or mapping, between partial derivatives of/ with respect to (xi, X2) and (ai, 0C2). It is easily seen that provided J(xi, X2) ¥^ 0, we have 8f_ 8x1 ( 8f 8x2 8x1 8x2 8f 8x2 8x2 8x1 3(xi, x 2 ), 8x2 \8xi 8x2 8x2 8x1/1 (5-42) where 8x\ 8x2 8x1 8x2 _ J(Xl, X2) = — — - ^— - 8x1 8x2 8x1 dxi 8x1 8x2 8x2 8x2 (5-43) 248 / DIFFERENTIATION OF FUNCTIONS CH 5 The expression J(xi, xi) is the Jacobian of the transformation and is usually written in the form of the functional determinant shown in Eqn (5-43). If the Jacobian vanishes at any point in the (on, a2)-space then at such points the transformation we are discussing obviously becomes invalid and is singular. This is because at such points there is no longer any relationship between partial derivatives in the two coordinate systems. In more advanced discussions, the Jacobian is shown to play a fundamental role in all matters relating to changes of variable. Sometimes, to emphasize the variables in- volved, in place of }(xi, X2) the alternative notation 8(xi, x 2 )/S(ai, <x 2 ) is used. This idea is readily extended to more than two independent variables as would be appropriate in Example 5-24, where three variables are involved. The non-vanishing of the Jacobian is thus seen to provide an essential condition for the partial derivatives of any differentiable function /, with respect to (xi, x%) and (ai, 0C2), to be interchangeable by virtue of Eqns (5-42). Example 5-25 Find the Jacobians of the following transformations and state where, if at all, they vanish : (a) x = r cos d, y = r sin 6 (polar coordinates); (b) x — u + v, y = u — v; (c) x = 3m 2 + v 2 , y = u + v. Solution (a) 5(x,y) = d(x, y) 8(r, 6) cos a —/•sin 1 sin /-cos = /-(cos 2 + sin 2 0) = r. Hence in the case of polar coordinates the Jacobian vanishes at r = (that is, at the origin) which is the only singular point of the transformation. (b) J(x,j) = Kx,y) 8(u, v) 1 1 1 -1 = -2. This Jacobian never vanishes so that the transformation is always permissible. (c) J(x, 7 ) = Kx, y) 8(u, v) 6u 2v = 6w — 2v. The Jacobian vanishes when 1u = v, so that the transformation is invalid, or singular, at all points on that line in the (u, t>)-plane. 5-10 Implicit functions We have already used implicit functions when discussing various consequences of total differentials, and will now examine these ideas more closely. Consider the equation f(x, y) — 0. Often the argument is used that from this implicit function of x and y we can, in principle, solve for y, and as y depends on x, we are entitled to express y in the explicit form y = <p(x). SEC 5-10 IMPLICIT FUNCTIONS / 249 Suppose that f(x, y) = x 2 + y 2 + 1 . Then no real values of x and y satisfy the implicit equation f(x, y) = 0, so certainly in this case one cannot solve for y. Thus a necessary condition that we may solve for y near to some point P with coordinates (xo, yo) is that there are real numbers xo, yo such that /(xo, jo) = 0. Now let u =f(x,y) be the graph of f(x,y), and assume that /a; and f y exist and are continuous so that the graph will be a smooth surface of the type shown in Fig. 5-19. Then/(x, y) = is the curve of the section of this surface by the plane u = 0. In general the curve of the section will be similar to the smooth curve L shown in the figure and can be described by an equa- tion of the form y = <p(x). This will obviously be the case provided firstly, that the surface u = f(x, y) and the plane u = intersect and secondly, that they are nowhere tangential. The curve L will be smooth, and the function cp(x) differentiable, because the assumed continuity of the derivatives f x and f y will ensure that the surface u = f(x, y) is itself smooth, and so will generate a smooth curve of section. This is, of course, the assertion made in Corollary 5-19 (a). Let P be a representative point on L with coordinates (xo, jo) in the u = plane, and line / be drawn tangential to the surface u = f(x, y) at P in the plane x = xo. Then by Definition 5-5, the angle a between line / and the plane u = is such that tan <x = Sf/ 8y\ (xom) . Fig. 5- 19 The function y plane u = 0. <i>(x) defined by the intersection of u = f(x, y) and the 250 / DIFFERENTIATION OF FUNCTIONS CH 5 Hence the condition that the surface u =f(x,y) and the plane u = should not be tangential at P is seen to be f y (x , yo) ¥=0. Collecting our results we now formulate them as the following theorem. theorem 5-23 (implicit function theorem) Let/(x, y) be differentiable and have continuous first-order partial derivatives near to (xo.jo) at which f(xo, yo) = andf y (xo, yo) i z 0. Then, near (xo, yo), it is possible to solve the implicit equation f(x, y) = uniquely for y in the explicit form y = y(x), where y(x) is differentiable. That is, near to (xo, yo),f(x, (p(x)) = 0. Notice that this theorem is only of the existence type in that it ensures that an explicit representation y = <p(x) exists, but gives no information on how such a representation may be found in any specific case. As a corollary to this theorem, consider the relationship between the derivatives of a function and its inverse. Let F(x, y) = y — f(x), so that F(x, y) = implies the relationship y =f(x). Suppose that at some point (*o, yo) we have f'(x ) ^ and y = f(x ). Then, noticing that dFjdx = (8l8x)[-f(x)] = (dldx)[-f(x)) = -f'(x) and dF/8y = 1, it follows from Theorem 5-23 that close to (xo,yo) we may solve for x as a function of y to obtain an inverse function x = (p(y). That is, F(<piy), y) = y — f[f(y)] = CL Furthermore, applying Corollary 5- 19(a) to F(x, y) = and regarding y as the independent variable and x as the dependent variable, we have so that provided /'(Jc) ¥= 0, we have ^=l//'(*) or cp'(y) =l/f'(x), which is the desired result. Corollary 5-23 Let y =f(x) be a real valued differentiable function of x close to some point (xo, yo) at which yo =f(xo). Let x = y(y) be the function inverse to it close to the same point (xo, yo) so that xo = <p(yo), and let /'(*o) ^ 0. Then close to (xo, yo), we have <p'(y) = !//'(*) or, equivalently, 41 -/(£)• dy This corollary has two important applications which we mention next. The SEC 5-10 IMPLICIT FUNCTIONS / 251 first application of Corollary 5-23 is to the differentiation of inverse circular functions. In Section 2-2, we agreed to write y = arc sin x when x = siny and — w/2 < y < tt/2. Now, d — (sin j;) = cos j> j= for — w/2 < J < W2; that is, for — 1 < x < 1 and so, by Corollary 5-23, ^ = 1 l(te\ =J-= l = 1 d* / \dyj cosy V0 - sin 2 y) V0 - * 2 ) The positive square root has been taken here because the principal branch of the function y = arc sin x is a monotonic increasing function of x in its domain of definition — 1 < x < 1 . By this same argument, the negative square root is taken when differentiating the principal branch of the function y = arc cos x which is a monotonic decreasing function of x in its domain of definition — 1 < x < 1 . Thus d 1 — (arc sin x) = —- for — 1 < x < 1. ax v (1 — x 2 ) Similar arguments establish Table 5-2. In the entries for the derivatives of arc cosec and arc sec, the term \x\ has been introduced to take account of the two separate cases that need consideration when deriving these results; namely, when x > a and when x < —a. These same ideas will be encountered again in the next chapter in connection with Table 6-3, when they will be discussed in more detail. Table 5-2 Derivatives of inverse circular functions - (arc sin xja) = — — (arc cos xja) = ■ dx V(a 2 - x 2 ) dx K vV - x*) for —a < x < a for — a < x < a d a d — a ■ (arc tan xja) = —— — - — (arc cosec xja) = ■ dx" a* + x 2 dx y 1*1 V(* 2 - a 2 ) for all x for | x \ > a d , , . a d , , , —a - (arc sec xja) = - — ■ — — — — (arc cot xja) ■■ dx \x\ V(* 2 - a 2 ) dx ' ' a 2 + x* for | x | > a for all x In Chapter 2 we saw that curves may be described parametrically thus: 252 / DIFFERENTIATION OF FUNCTIONS CH 5 x = X(t), y = Y(t), (5-44) where t is a parameter defined in some interval J. The question that now arises is how may we find dy/dx in terms of the functions X(t) and Y(t ). Let us suppose that X(t) and Y(t) are differentiable functions of t with continuous derivatives and that X'(t) ^ 0. Then by Theorem 5-23, we may solve x = X(t) in the form t =f(x), say, so that then y = Y[f(x)]. From Theorem 5-7 on the differentiation of composite functions we have dy d dYdf or, equivalently, dy dy dt dx = dt"d~x ( 5 ' 45) However, by Corollary 5-23, d//dx = l/(dx/df) so that dy dy /dx dx dt/dt (5 ' 46) Hence, like x and y, the derivative dy/dx is now also known parametrically in terms of t. This result is best remembered in symbolic operator form : d i d dx ~ (dxjdt) d7 (5-47) Higher order derivatives with respect to x may be found either by a repetition of the argument leading to Eqn (5-46), or by successive applications of Eqn (5-47). Thus, using Eqn (5-47), we have d2y = _d_ /dy\ = 1 fd /dy ldxY\ dx 2 dx \dx) ~ (dx/dt) Idt [dt I dt /J or, denoting differentiation with respect to t by a dot, d2y _ d_ /dj\ _ld /dy\ dx 2 dx \dx) x dt \dx/ Using the fact that dy/dx = y\x and performing the indicated differentia- tions gives d 2 j xy - xy d7 2 - —JT- (5 ' 48) It is recommended that the reader remembers the arguments leading to the operator rule (5-47) together with the rule itself, rather than remembering SEC 5-11 HIGHER ORDER PARTIAL DERIVATIVES / 253 results of the form (5-48). Example 5-26 If x = t + 2 sin t, y = cos t determfne dyjdx and d 2 j/dx 2 and hence deduce their values when t = 0. Solution We have dx dy . =1+2 cos t, — = —sin t, dt dt so that by Eqn (5-46) dy _ dy jdx _ —sin t dx ~ dt I dt ~ 1 + 2 cos i When f = we have x = 0, y = 1 and dy d^ *=() Next, as —sin t 1 + 2 cos f = 0. «=o d*y dx 2 (djc/d/) df \dx/ we have 1 d 2 ); dx 2 ~ 1 + 2 cos t dt —sin .1+2 in t "1 1 cos fj Thus, performing the differentiation and simplifying, 2 + cos ? <Py dx 2 and so d 2 ^ dx 2 z = .(1+2 cos 3 . ' 2 + cos t .(1 + 2 cos r) 3 . «=0 5-11 Higher order partial derivatives If the function f(x, y) is differentiate with continuous first-order derivatives fx and f y , then it can also happen that these partial derivatives which are functions of x and y are themselves differentiable. Thus we are led to consider the further partial derivatives Tx^Jy^k^^Yy^- These functions, when they exist, are second-order partial derivatives of/ 254 / DIFFERENTIATION OF FUNCTIONS CH 5 and are respectively denoted by 8J 8 2 f 8J d*f 8x? 8y8x 8x8~y &nd 8f Using an alternative notation we often write these same derivatives as fxx, fxy, fyx, and fyy. In this notation the first suffix signifies the partial derivative of/ that is to be differentiated partially with respect to the second suffix. The centre pair of derivatives are mixed second-order partial derivatives and it is conventional that the order of x and y in corresponding mixed derivatives in the two notations is interchanged. Thus we have, 8 B 2 f 3 d 2 f Ty^ = Wx=^ bUt 8-xM = dy It is important to notice that the double operations of partial differentia- tion that lead to the mixed derivatives f xy and f yx are performed in different orders. Consequently we have no right to expect that the derivatives that result will be equal to one another. To emphasize this point we now write out in full the limiting operations involved in arriving a.tf X y and^: 8 fxv(.x , yo) = y [fx(x, y)] = lim - lim 4—0 K Ul-*0 (Zo.Vo) f(x + h,y + k) -f(x ,y + k) — lim A-K) h /(*o + h,y ) - f(x , jo)l and so, writing g(xo,yo, K k) =f(x + h,y + k) —f(x ,yo + k) -f(x + h,y ) + f(xo,yo), we obtain the result fxy(x , yo) = lim lim — g(x , y , h, k), (5-49) k— 0/s-M) UK where the inner limit with respect to h is to be taken first. Exactly similar reasoning gives the corresponding result fvx(xo, yo) = lim lim — g(x , yo, h, k). (5-50) A-mt-m UK Here it is the inner limit with respect to k that is to be taken first. The double limits used in Eqns (5-49) and (5-50) are called iterated limits SEC 5 1 1 HIGHER ORDER PARTIAL DERIVATIVES / 255 on account of the fact that they are taken sequentially so that their order is important. They are not to be confused with the simple double limit of Definition 3-8 into which questions of order do not enter. Let us now explore the consequence of requiring one of the mixed derivatives, sayf xy , to be continuous. This is, of course, the usual situation. Definitions 3-8 and 3-9 imply that if fxy is continuous at (xo, yo), then a limit L = fxy(xo, yo) exists with the property that L — \\mf xy (xo + h,y + k), (5-51) A—0 ft->0 where the question of the order in which the limits are to be taken does not occur. Hence, &sf xy (xo,yo) is also defined by Eqn (5-49) in which an iterated limit is involved, the equating of these two results implies that if f xy is con- tinuous, then the order of the iterated limits in Eqn (5-49) is immaterial. Thus, under the stated conditions, expressions (5-49) and (5-50) become identical and the continuity of f xy implies not only the existence of f yx , but also that fxy = fy X . This establishes our next result. theorem 5-24 (equality of mixed derivatives) Let f(x,y) be a real valued function of the real variables x, y, and \etf x ,f y ,f xy exist and be continuous in the neighbourhood of the point (xo, yo)- Thenf yx also exists at (xo, yo) and 8J 8xdy takvo) 8 y 8x (xo.vo) Still higher-order derivatives can be defined by an obvious extension of the notation. Thus, for a suitably differentiable function /we may define the third-order partial derivatives Jxx X , Jyyx, Jxyx, fyyy, etc. If the higher-order derivatives involved are continuous then, by an obvious extension of Theorem 5-24, the order of performing differentiations may be disregarded. In the case of the mixed third-order partial derivative f xyx this would imply that f -L Jxvx ~ 8x " 3 ?y {fx \ -(/»]-/* 8 8~y Hence, under these conditions, it is proper to extend the 8 notation by writing ay sy ey ey 8x 3 8x8y 2 8x 2 8y 8y 3 ' Example 5-27 If f(x, y) = x 4 + 2x 2 y 2 + xy* find the second- and third- order partial derivatives of/. 256 / DIFFERENTIATION OF FUNCTIONS CH 5 Solution First-order derivatives : f x = 4x 3 + 4xy 2 + y\ f y = 4x 2 y + 4xy 3 . Second-order derivatives : /„ = 12x 2 + 4y\ f yy = 4x 2 + \2xy\ f*« = ■?-(/*) = Sxy + 4y3. This mixed derivative is continuous, and sof xy = f yx . As a check in this case we compute f yx directly: 8 fyx = — (fy) = ixy + 4y 3 . Third-order derivatives : 8 fxXX = 24*, fyyy = 24xy, f X yy = — (f x y) = 8* + 12j 2 , 8y 8 fxxy = g~ (fxx) = 8j. The continuity of the third-order derivatives we have computed ensures the existence and equality of the other corresponding third-order derivatives that may be defined. Thus, for example, as f XX y = 8y is continuous, there is no need to compute f xyx> since it exists and is equal tof xxy . Example 5-28 Define the function /by the requirement I xyix - y ) e . the] _ ^ ^ f(x,y)=l * 2 +J 2 ^ y [0 if both x = and y = Deduce the value of each of the mixed derivatives at the origin. Solution We shall use definitions (5-49) and (5-50) for this purpose by setting xo = 0, jo = so that « « , ,, hk ( h2 ~ k2 ) Then, from Eqn (5-49), /^o,o>-, imI , m ±pJgi^5) 1--0 h-^o hk\ h 2 + k 2 j PROBLEMS 257 h*-k 2 (-k*\ = hm lim — — = lim -— - = — 1. t-M, A-.0 h* + k 2 *^o \ &* / However, because the order of the iterated limits are reversed in Eqn (5-50), the same argument also shows that . ,. A 2 -it 2 ,. (h 2 \ . M°> 0) = hm hm = hm - = 1. Thusf xy (0, 0) = — 1 whereas /^(O, 0) = 1. This occurs because the functions fxy andfyx are not continuous at (0, 0) as may be checked by direct calculation. PROBLEMS Section 5-1 5-1 Give examples of four physical quantities that are essentially defined in terms of a derivative. 5-2 Use Definitions 5-1 and 5-2 to prove that the following functions are differ- entiable in the stated intervals and to compute their derivatives. Evaluate these derivatives for the stated values: (a) /(x) = 3x* in [o, 3], find/'(2); (b) /(x) - 2x» + x + 1 in [-1, 4]) find/'(3); (c)/(x)=|x|in(0,co),nnd/'(l); (d) /(x) = | x | in (- oo, 0), fihd/'(-3); (e)/(x)=l/;cin[l,5],nnd/'(4); (f) f(x) = X 1 ' 4 in (0, oo), find/'(2). 5-3 Prove that/0) = I x | is not differentiable at the origin. 5-4 Consider the graph of f(x) = x 3 + x + 1. Let xi and x 2 be two points on the x-axis with the property that the gradient dy/dx of the curve y = /(x) at x = X2 is four times the gradient at x = xi. Derive the algebraic equation connecting xi and X2 and deduce that | xi \ > 1. 5-5 Deduce the gradients of the functions /(x) to the immediate left and right of x = 1 given that: fx 3 + x + 1 for x > 1 (a)/W= 5-x-x2forx<l; {x 3 — x + 3 for x > 1 2x + l forx< 1. 5-6 Prove that the function / defined by /(x) = x 2 sin (1/x) for x =£ and /(0) = is differentiable at the origin and find the value of its derivative there. 5-7 Prove from first principles that d/dx(cos ax) = — a sin ax. 5-8 At which points in the stated intervals, if any, are the following functions /(x) non-differentiable: (a) fix) = x + sin 2x for < x < *; 258 / DIFFERENTIATION OF FUNCTIONS CH 5 ,^ rr s l X + !/* f0r X ^ °1 • (b) fix) = m the interval [-1,1]; (0 tor x = 0) , •> „ ^ (1 for x rational 1 (c) /(x) = . . in the interval [0, 1]. 10 for x irrationalj 5-9 The function /(>:) is defined on the interval < x < 1 by the expression {sin 2x for < x < Jn- ax + b for \v < x <> 1. Deduce the values of a and b in order that the function should be continuous and have a continuous derivative at x = Jw. Interpret these conditions geometrically. 5-10 Give an example of a continuous function / defined on the interval [0, 5], that is differentiable everywhere except at x = 1 at which point the left-hand derivative is 3 and the right-hand derivative is 5. That is to say, the tangent line to the graph drawn to the left of x = 1 has gradient 3 whilst the tangent line to the graph drawn to the right of x = 1 has gradient 5. Section 5-2 5-11 By assuming Theorem 5-2 is also valid for rational n where necessary, find the derivatives of the following functions, stating at which points in their domains of definition, if any, they are non-differentiable: t \ rt \ i xX ' 3 + cos 3x > for * * °\ ■ u ■ i , (a)/(x)= } in the interval —$* <, x < v ; 10, for x = 0J (b) f{x) = x sin 2x + x 5 ' 3 for -1 < x < 3; (c) f(x) = | cos x | for <, x <. it. 5-12 Use Theorem 5-4 to give an inductive proof that, if ki, k% k n are con- stants and/i(jc),/ 2 (x), . . .,fn(x) are differentiable functions in the interval a <, x < b, then d * " -5- 2 ktftix) = J kifi'ix) ina<,x<b. ax i=l i = l 5-13 Dififerentiate the following functions: (a) y — x 1 * 3 sin x; (b) y = (x 2 + 3x + 1)(1 + cos 2x); (c) j = sin 6x cos 2x; (d) y = (x 3 + 2x - 1) eos 3x. 5-14 Differentiate the following functions by making a repeated application of Theorem 5-5: (a) y = (1 + x 2 ) sin 7x cos 4x; (b) y = (1 + 2x2 + X 4)S. (c) _v = cos 3 2x; (d) y == (1 + x 3 )2 sin 2 3x. 5-15 Differentiate these composite functions: (a) y = (x 2 + 2x + 1) 3/2 ; PROBLEMS / 259 (b) y = (a + bx 3 ) 1 ' 3 ; (c) j = (2+ 3sin2x) 5 ; (d) y = sin (1 + 2x 3 ); (e) y = sin [sin (1 + x 2 )]; (f) y = cos (1 + x 4 ) 1 ' 2 . 516 Differentiate these quotients: (a) y = (x 2 + 3x+ 7)/(x 4 + 1); sin (1 + x 2 ) (b)y = (c)y = (d) y = (e)j- = x 4 + 2x 2 + 6' 1 1_ . 3 cos 3 x cos x' tan(l - x 2 + x 4 ) sin (x 2 + 1) ' 1 + Vx 1 - V* 5-17 Differentiate these functions: 1 (1 -3cosx) 2 ' x tan (1 + x 2 + x 4 ) (b)y=* (C) y = ^~ sin (1 + x 2 ) (d) j = cosec 2 (l + 3x); sin x + 2 cos x (e) y = ^ 5 17 sin x — 2 cos x <S)y = /3x - 1\ 5-18 If the functions /i(x),/2(x),^i(x), and^Cx) are differentiable, show by direct expansion that this theorem is true: _d_ dx /i(x) / 2 (x) giix) g£x) £i(x) gz(x) + Apply this result to differentiate the determinants: (a) x 2 x sin x cosx 1 fl>) (1 + x 2 cos x) (2 - sin 2 x) (1 - x 2 cos x) (2 + sin 2 x) 5-19 Suppose that the functions /«(*)> with i,j = 1, 2, or 3, are differentiable func- tions of x. Prove, by means of Problem 5-18, that dx /ii(x) /i 2 (x) fis(x) /2l(x) /22(X) /23(X) /3l(x) /32(X) /33(x) fn'ix) fW{x) fw'ix) /2l(x) /22(X) /23(X) /3l(x) /32(X) / 33 (X) 260 / DIFFERENTIATION OF FUNCTIONS CH 5 + /uW /12O) fis(.x) /21'W ft&\x) fis'ix) f3l(x) foAx) foi(x) + fn(x) /12W fia(x) f2i(x) f 2 z(x) f 23 (x) fai'ix) fn'(x) f 33 '(x) Section 5-3 5-20 Use the intermediate value theorem to prove that if f(x) is continuous on [a, b], with f(a) and f(b) having opposite signs, then there must be at least one point x = I, with a < f < b, for which /(I) = 0. 5-21 Why is it not possible to conclude from the intermediate value theorem that if/(x) = 1/(1 - I x |) for I x I ^ 1 and /(| 1 |) = 0-5 then (a) there is no point x = £ in the interval [0, 6] for which /(I) = 0; (b) yet there is a point x = rj in the interval [— 11, —2] for which /(»)) = —0-5 ? Identify the point on the jc-axis giving rise to this functional value. 5-22 The function /(x) = \x 3 — x + 2 which is defined in the interval (—00, 00) has extrema at the points x = 1, x = — 1. Identify their nature by considering the behaviour of the function close to these points. Are they relative or absolute extrema ? 5-23 By considering the behaviour of f(x) = sin £x cos \x in the neighbourhood of x — in, show that the function attains an absolute maximum at that point. 5-24 By considering the behaviour of y = x 2 — 2x + 3 in the neighbourhood of x = 1, prove that this point gives rise to an absolute minimum of the function. Find its value. 5-25 Find the critical points of the function f(x) = x 3 — x 2 — 4x + 4. Identify the nature of the extrema associated with them by considering the functional behaviour close to each of these points. 5-26 Find the critical point of the function f(x) = (x — i)x 213 and identify its nature. Do the points x = — 1, x = correspond to extrema of the function and, if so, of what type are they? 5-27 Find the critical points of the function f(x) = x 2 (3 — x) 2 . 5-28 Identify the critical points and extrema of the function lx 2 -3x + 2 for < x < 2-5 U 2 - 7x + 12 for 2-5 < x <. 5. 5-29 Apply Rolle's theorem to the following functions where it is applicable, and hence determine at how many points in the stated intervals [a, b] the following functions satisfy the result of that theorem: (a) /(*) = x* - 1 in [-2, 2]; (b) f(x) = 1 + sin x in [— 2w, 3*-]; (c) f{x) = 1/(1 + I x |) in [-1,1]; (d)/w = (* 2 +3* + 2for-l<;^0 U 2 - 3* + 2 for 0< x < 1. 5-30 Give an example of a simple continuous function g(x) of the type illustrated in Fig. 5-9 (b) in which^'(f) = for some point in an interval [a, b], but to which Rolle's theorem is inapplicable because g(x) is non-differentiable at one point of that interval. PROBLEMS / 261 5-31 Show that the conditions of the mean value theorem apply to /(x) = x + sin x for the interval [0, £w]. Find the value of I in the statement of the theorem. 5-32 In the proof of Theorem 5-12 a function F(x) was constructed on the interval [a, b] which had the property that F(a) = F(b) = and, in addition, satisfied the other conditions of Rolle's theorem. Repeat the proof of Theorem 5-12, but this time with the requirement that F(a) = F(b) = K, where K is an arbitrary non-zero constant. The following four problems illustrate how the mean value theorem may be used to estimate the behaviour of functions in closed intervals. 5-33 Let/(x) be a differentiable function having a monotonic increasing derivative in the interval [a, b]. Then by writing the mean value theorem in the form f(b) = /(a) + (b - a)/'(S)> with a < f < b, prove that /(a) + (x - a)f\a) < f{x) < f(a) + (x - a)f'(b), for a < x < b. We shall agree to say that these inequalities define upper and lower estimates of f(x) in [a, b]. Show also that if /'(■*) is monotonic decreasing, then the inequalities must be reversed in the above expression. 5-34 Apply the result of Problem 5-33 to the function /(x) = sin x in the interval [0, £tt] in order to prove that < sin xjx < 1 for < x < in. 5-35 Apply the result of Problem 5-33 to the function /(x) = (1 + * 2 ) 3/2 in the interval [1, 2], thereby obtaining upper and lower estimates for it in that interval. 5-36 If fix) = 1 + x + (1/5) sin 2 x, show that /'(*) is monotonic increasing in the interval [— Jw, &*]. Hence apply the result of Problem 5-33 to/0:) to obtain upper and lower estimates for /(x) in that interval. Evaluate the inequalities for x = and x = i^ and compare the estimates with the exact result. 5-37 Let the functions /(x) and^(x) be continuous in [a, b] and differentiable in (a, b), with^'fx) non-zero in (a, b). Show that under these conditions Rolle's theorem may be applied to the function F(x) defined by F(x) = f(a)g (a) - f{b) g{a) + [g{a) -g(b)]f(x) - [/(a) - fQ>)]g (x), for a < x < b. Hence estab- lish the Cauchy extended mean value theorem. 5-38 By repeatedly applying L'Hospital's rule where necessary, evaluate the following indeterminate forms of the type 0/0: . tan ax „, ,. xcosx-sinx (a) lim — — ; (b) lim —5 ; x->0 X x-»0 X tanx- sinx /JA ,.„ x 3 - 2x 2 - x + 2_ x — sin x x 2 — sin 2 x , s ,. tan x — sin x ... ,. (c) lim : ; (d) lim (e) lim 3_o x' sin-= x 5-39 Evaluate the following indeterminate forms which are of the type co/co: (a) lim (7r/x)/cot *x/2; (b) lim tan x/tan 5x; (c)iim 3x2 + x ~ 1 ; (d) Hm _£!*iL_ W JHT«, x 2 + 2 ' w ^o*-cotx 262 / DIFFERENTIATION OF FUNCTIONS CH 5 5-40 Explain the fallacy in this argument. The limit x 2 + x sin x + sin x lim X— *-oo does not exist because, applying Corollary 5-14 to L'Hospital's rule gives ,. x 2 + xsin x + s'mx ,. 2x + sin x + xcosx + cos x lim - = hm = lim X—*- oo 1+1 cos x + z—*-co sin x + cos x 2x = 1 + i lim cos x. X->co What is the true value of this limit? 5-41 Indeterminate limits of the form co — oo, . oo can be reduced to the types 0/0 or co/co by means of the following simple devices. If the limit is of the type . oo set limf(x) = and lim £•(.*:) — >- co, then x— *a x-*a lim [f(x)g(x)] = lim [f{x)K\jg{x)} (type 0/0) x-*a x-<-a = lim [g{x)l{\lf{x))} (type co/co). X-Hl If the limit is of the type co — co set lim/(x) = 0, \img{x) = 0, then lim x-*a "_1 1_" = lim x— >a = lim x-^a x—*a - g{x)-f{x) - _ ftogV) . }l(g(x) - flx)\ Apply these results to evaluate the following limits : (a) lim l-X- - -) ; (b) x ^o \sin x xj (type 0/0) (type co/co). limP-- , 5 V x-*3 \x — 3 x 2 — x — 6 J (c) lim (1 — cos x) cot x; x-*Q ttX (e) lim (1 — x) tan — ; x^l 2 (d) lim xsin-; X— *-oo X (f) i im /_iL__^y x _j„ \^cot x 2 cos xj 5-42 Verify the nature of the extrema in Problem 5-20 using the results of Theorem 515. 5-43 Verify the nature of the extrema in Problem 5-23 using the results of Theorem 515. 5-44 Apply to Problem 5-26 the modification to Theorem 5- 15 indicated at the end of Example 5-10 (b) to identify the behaviour of the function at the origin. 5-45 Apply Theorem 5-15 to Problem 5-26 to identify the extrema occurring in the interval (0, 5]. 5-46 If j = f(x), where/is a differentiable function, find the differential dy given that: (a) f(x) = x 6 + 3x 2 + x + 6; (b) /•(*) = * sin (* 2 + 1); PROBLEMS / 263 (d) /(*) = (1 + X 2 ) 1 /*. 5-47 Metals A and B have coefficients of linear expansion a and /S, respectively. That is to say, when the temperature changes by an amount t from the ambient value To, the linear dimensions of metal A change by a factor (1 + <*t), whilst those of metal B change by a factor (1 + /3/). Suppose that a block of metal A contains a cylindrical cavity of height Ho and radius R at temperature To which is empty apart from a cylinder of metal B which has height ho and radius ro at that same temperature. Obtain an approximate expression for the small volume change d V of the cavity between the cylinders consequent upon a small change of temperature dt. Section 5-4 5-48 Compute the first and second derivatives of the functions /(x) listed below: (a) fix) = tan x; (b) f{x) = x 2 sin x; (c) f(x) = (1 + x)(3 sin x + cos 2x); id) fix) = (x» + I) 1 ' 2 ; (e) fix) = sin (1 + x 2 ); (f) /foe) -tan- 5-49 Show that if fix) = |(3* 2 - 1), then (1 - x*)f"ix) - Ixf'ix) + 6fix) = 0. Equations of this type are called second order ordinary differential equations, and this one is a special case of Legendre's differential equation. 5-50 If fix) = H5x 3 — 3x) and gix) = K3x 2 — 1). find the algebraic equation connecting f\x),g \x), and/(x). 5-51 Show that the function fix) defined below is continuous and has a con- tinuous first derivative at x = 1, but that it has a discontinuous second de- rivative at that point : for x < 1 for x > 1. = (x* + x* - x + 1 ' \2x 3 - x 2 + x 5-52 Use Leibnitz's theorem to evaluate the third derivatives of the following functions: (a) f{x) = TTx'' (b) f(x) = ( * ? ~ 1} tan x ' (c) fix) = sin 2 x; (d) fix) = x 3 sec 2x. 5-53 Apply Theorems 5-17 and 5-18 to locate and identify the extrema and points of inflection of the following functions, using your results to determine the gradients at the points of inflection : (a) fix) = 2x 3 + 3a: 2 - 12* + 5; (c) fix) = x \x - 12) 2 . 264 / DIFFERENTIATION OF FUNCTIONS CH 5 5-54 Use the mean value theorem to prove that if /(*) has a maximum at x = xo, then near to xo, f"{x) < 0. Show that if f(x) has a minimum at x = xo, then near to xo, /"(*) > 0. Hence show that these tests may be used to identify maxima and minima, even when/'(xo) does not exist. 5-55 Apply the results of Problem 5-54 to prove that the function fix) = (3x — l)x 2 ' 3 has a maximum at the origin. 5-56 Determine the values of a and b in order that f(x) — x 3 + ax 2 + bx + 1 should have a point of inflection at x = 2 at which the gradient of the tangent to the graph is —3. Section 5-5 5-57 Compute the derivatives/* and// given that: (a)/(x,j) = x 2 jy; (b) f(x, y) = 3x 2 y + (x + y) 2 x + 1 ; (c) f(x,y) = sin (.x*+y z ); (d) f{x, y) = x cos (1+ x 2 y 2 ). 5-58 Given that fix, y) = x 3 + 3x 2 y + 4xy 2 + 2y 3 prove that xf x + yfy = 3/. 5-59 Compute the derivatives f x , fy, fz given that: (&)f(x,y,z) = x 2 yz + —j (b) /(x, y, z) = x cos yz + y cos xz + z cos xy; (c) f{x, y, z) = cos (x 2 + xy + yz). 5-60 Show that if fix, y, Z) = ( x2 + yS + z 2)3/2 then xfx + yfy + zf z = —If. 5-61 Show that if fix,y,z) = x + j-l then/* +/„+/*= 1. 5-62 Show that if f= ix - y)iy - z)iz - x) then/* + /*,+/* = 0. Section 5-6 5-63 Find the total differential d« given that u = f(x,y, z), where: ia)fix,y,z) = ^- z +xyz; (b) fix x y,z) = x sin iy 2 + z 2 ); (c) fix, y, z) = (l -x 2 -y 2 - z 2 ) 3 ! 2 . PROBLEMS / 265 5-64 The speed of wave propagation u in a transmission line with inductance L and capacitance C is given by the equation u = (LC)- 1 ' 2 . Relate the differ- ential du to the differentials dL and AC. How must dL and dC be related if u is to remain constant ? 5-65 Apply the triangle inequality |a + 6|<|a| + |6|to establish that, if u =f(xi, X2 x n ) is differentiable with respect to each of its independent variables xi, x%, . . ., x„, then d«| < Sxi dxil + V 8x2 dx 2 | + «/ fan dx„ A triangle with sides of length a, b, c has area A = y/[s(s — a){s — b)(s — c)], where 2s = a + b + c is the perimeter. If s is kept constant, find the largest possible value that may be assumed by | dA |, the absolute value of the area differential dA, consequent upon changes in the differentials da, db, and dc. Apply the result to an equilateral triangle in which a = b = c = 4, when changes da = 001, db = 0-015, and dc = -0025 are made. 5-66 Compute dyjdx from the following implicit relationships: (a) x 2 + y 2 = 4; (b) x sin xy = 1 ; (c) x 2 y + 2xy 2 + y 3 = 2. 5-67 Compute dzjdx and Szjdy given that: (a) x 2 + y 2 + z 2 = 1 ; (b) xyz + sin xz 2 = 2; (c) x 2 - 2y 2 + 3z 2 - yz + y = 0; (d) x cosy + y cos z + z cos x = 1. Section 5-7 5-68 Find the envelope of the family of curves with parameter a (x - a) 2 +y 2 = a 2 /2. 5-69 Find the envelope of the family of curves with parameter a . 3 y=txx + — 2a 5-70 When a particle is projected into the air with velocity V at an angle 6 to the horizontal then, neglecting air resistance, its height y when distant x from the point of projection is given by v = x tan d - „ T ,f* „ ■ y 2F 2 cos 2 By regarding 9 as a parameter, show that the envelope of the family of trajectories for < 6 < n is a parabola, and find its equation. This is usually called the parabola of safety because no projectile can penetrate beyond it. 5-71 Find the envelope of the family of curves with parameter a specified by (JC - a) 2 + (y + a) 2 - a 2 = 0. 5-72 Show that the envelope of the family of curves with parameter a denned by x cos a + y sin a = 2 is a circle. Find its centre and radius. Interpret this family geometrically. 266 / DIFFERENTIATION OF FUNCTIONS CH 5 Section 5-8 5-73 Find du/dt given that: (a) u = xy + sin (x 2 + y 2 ) with x = It, y = (1 + t 2 ) 1 ' 2 ; (b) u = (1 + x 2 + y 2 ) 3 ' 2 with x = t(l+ t), y = t 3 ; (c) « = with x = 3 cos f, y = 3 sin t, z = f 2 . (x 2 + j- 2 ) 1 ' 2 5-74 If u = x 2 — xy + y 3 , compute du/dt at points on the curve specified para- metrically by the equations x = 2t + 1, y = t 2 + t — 2. 5-75 Prove that if u =f(2x 2 + y 2 ), where /is a differentiable function, then du Bit y- 2x — = 0. J Sx 8y [Hint: Set t = 2x 2 + y 2 .] 5-76 If « =f(x,y), compute du/dx given that: (a) f{x, y) = (1 + xy + x 2 ) where y = tan (-) ; (b) f(x,y) = (1 + x 2 — y 2 ) 3 / 2 where y = cos 3x; (c) f(x, y) = x cos y + y cos x — 1 where _y = 1 + sin 2 x. 5-77 If u = f(x,y) and g(x, y) = are differentiable functions, compute dw/dx given that: (a) f{x, y) = x 3 + 3xy + y 3 and g{x, y) = x cos _y 4- y cos x — 2; (b) f(x,y) = x 2 j 2 + sin xy and^(x, j) = x 2 — 2j 2 — 3. 5-78 If u = x 2 — xy + y 2 , determine du/dx at points on the ellipse 2x 2 + 3y 2 = 1. Section 5-9 P(r,0, V ) PROBLEMS / 267 5-79 In spherical polar coordinates a point P is specified in space by giving the ordered number triple (r, <p, 0). Here r is the radial distance of P from the origin, <p is the azimuthal angle of P measured anti-clockwise from the x-axis in the (x, j)-plane, and is the acute angle between the radius vector drawn to P from the origin and the z-axis. (See Figure.) It is easily seen that : x = r sin 6 cos <p, y — r sin sin <p, z = r cos 0. Uf(x,y, z) is differentiable with respect to x, y, and z, express df/Sr, 8fj8d, and 8f\8<p in terms of 8f/8 x , dfjdy, and 8f/8z. Find their values given that fix, y, z) = x 2 + 2xy + yz + z 2 . 5-80 Given that/Or, y, z) = x 2 + xy + sin yz, compute Bf/8 r , 8fj8d, and Bfjdz, where (r, 6, z') are the cylindrical polar coordinates corresponding to the point (x, y, z). 5-81 The notion of a Jacobian extends to transformations involving more than two variables. If, in Theorem 5-22, m = n = 3, the Jacobian or functional determinant is Bxi dX2 8x3 BJXI, X2, xj) 9(«i, <*2, a 3 ) dai 8ai Sai 8xi 8x2 8x3 8a.2 8a.2 80.2 8x1 8x2 8x3 80.3 80.3 801.3 Evaluate the Jacobian 8{x, y, z)\8{r, 6, z') for the transformation from Cartesian to cylindrical polar coordinates. 5-82 Use the definition in Problem 5-81 to evaluate the Jacobian 8(x,y,z)j s i r , V, e ) for the transformation from Cartesian to spherical polar co- ordinates. 5-83 Find the Jacobians of the following transformations, stating where, if at all, they vanish: (a) x = 2« + 3v + 1, y = 3m - 2v - 1; (b) x = u 2 - v 2 ,y = u 2 + v 2 ; (c) x = u 2 + 2uv + v 2 , y = u. 5-84 Use Theorem 5-22 with n = 2, m = 3 to determine 8fj8u, 8f/8v, and 8fj8w, given that: f=x 2 + iy 2 where x = u 2 + v + w and y = uvw. 5-85 If u and v are functions of x and y which satisfy u 2 — v 2 + 2x + 3y = and uv + x - y = 0, find 8u\8x, 8u\8y, 8vj8x, and 8vj8y in terms of u and v. 5-86 Prove that if z = /(«, v), where u = x + 3t, v = y — It, then — - -\—- 1 — 8t 8x 8y 268 / DIFFERENTIATION OF FUNCTIONS CH 5 5-87 Show that if u = \\r n , where r 2 = x 2 + y"- + z 2 , then £!ff £!ff 82 " - "(" ~ *) 8x 2 8y 2 8z 2 ~ r n+2 5-88 Prove that if u = 2xy + xfiy/x), then Su du x Tx + yT y = u + lx y- Section 5- 10 5-89 Which of the following implicit functions /(x, y) — may be solved explicitly for y in the neighbourhood of the stated points (xo, yo) : (a) f(x, y) = x 2 + y 3 + xy - 11 at (1, 2); (b) f(x,y) = (l-x 2 - y 2 y> 2 at (-1, 0); ( c ) /(*>)>) = sin xy - l at (1, i^); (d) f{x, y) = y + sin xy - 2 at (i», 1) ? 5-90 Compute dx/dy for each of the following relationships : (a) y = 1 -f x 2 + x sin x; (b) y = (1 - x + x 2 yi 2 ; (c) j = x + tan x. 5-91 Differentiate these functions: (a) /(x) = x 2 arc sec (x/a); (b) fix) = (x 2 + x + l)/arc sin (x 2 - 2); (c) /(x) = (1 + x + ar.c cos 2x) 3 / 2 . 5-92 Compute dy/dx and d 2 //dx 2 for each of the following parametrically defined curves: (a) x = t - 1, y = t 3 ; (b) x = cos 3 /, y — 2 sin 3 1 ; (c) , = arc cos _I_ , = arc sin ^^-^ (d) x = 2(cos t + t sin 0, y = 2(sin f — t cos f). 5-93 Compute dy/dx and d 2 j/dx 2 at f = |w if x = / — sin t and j = 2(1 — cos /). 5-94 Compute d 3 yjdx 3 when t = 1, given that x = 2f + 1, y = f(l + t 2 ). 5-95 In Example 5-21, an envelope is specified in terms of a parameter a, and it comprises two curves corresponding to the + and — signs associated with y. Find the gradient of each of these curves at the origin (that is, corresponding to a = 0). Section 511 5-96 Compute 8 2 z\8x 2 , 8 2 z\8x8y, 8 2 z\8y8x, and 8 2 z/8y 2 for each of the following functions and hence show that 8 2 zj8x8y = 8 2 z\8y8x : (a) z = (x 2 + y 2 yi 2 ; (b) z = x cos y + y cos x; (c) z = arc tan (y/x). 5-97 Compute /c*(l, l),/ty(l, 1), and/^,(l, 1) given that fix,y) = U+x)Hl+y) 3 . PROBLEMS / 269 Is 8 2 f/8x8y = 8 2 fj8y8x1 Give reasons for your answer. 5-98 Given that f(x,y) = * 2 + j 2 { 1 for x = 0, y = compute 8 2 f\8x8y stating, with reasons, when it is equal to 8 2 fj8y8x. Is there any point at which this result is not true and, if so, what property of the function invalidates the result? [Hint: Consider limits taken along the line y = mx.] 5 99 Show that if w = arc tan (x/y), then 8 2 w/8x 2 + 8 2 w\8y 2 + 8*w\8z 2 = 0. 5-100 Given that V = arc tan 2xy\{x 2 — y % \ prove that 8V 8V 8 2 V 8 2 V 5-101 Compute 8 a zl8 x 8y 2 and 8 3 z/8x 2 8y given that z = x A y 2 + sin x 2 y. Exponential, hyperbolic, and logarithmic functions 6 -1 The exponential function This chapter will be concerned primarily with the exponential function, first introduced in connection with limits in Section 3-3 and, thereafter, with a number of related functions. This time our approach will be to utilize both geometrical ideas and the elementary calculus to produce a more useful form of definition than that contained in Eqn (3-6). Let us seek a function E(x) equal to its own derivative and such that £(0) = 1. Specifically, we must solve the equation E'(x) = E(x) (6-1) which, because it involves the unknown function E{x) together with its derivative, is called a differential equation. This differential equation has the following simple geometrical interpretation : if the graph of the function E(x) is drawn, then the gradient of the graph at the point (x, E(x)) is equal to the functional value of E(x) itself. Perhaps it is worth remarking that Eqn (6-1), taken together with the condition E(0) = 1, immediately implies that E(x) is a convex function for x > 0. No deduction can yet be made about its behaviour for x < though, in fact, we shall shortly prove that E(x) is a convex function for all x. As on previous occasions, our desired result is soonest obtained by studying an artificial function. The reason for considering the precise form of function to be used will become apparent once the result has been obtained. Suppose, for the moment, that there is a unique function E{x) defined by our requirements, and consider the new function F(x), where F(x) = E(x)E(a - x). (6-2) Then, F'(x) = E(x) — [E(a - x)] + E(a - x) ~ [E(x)] " which, using the defining property (6T), becomes F'(x) = -E(x)E(a -x) + E(a - x)E(x) = 0. Consequently, F{x) = constant but, as F(0) = E(0)E(a) = E(a), it follows at once that F(x) = F(0) = E(a) for all x, and thus Eqn (6-2) takes the form E(x)E(a - x) = E(a). SEC 6-1 THE EXPONENTIAL FUNCTION / 271 Alternatively, by replacing a by a + b and x by b this may be written E(a + b) = E(a)E(b). (6-3) Hence, if « is a positive integer, E(n) = E(n - 1)2(1) = E(n - 2)(£(1))2 = • • • = (£(1))». (6-4). If, now, we denote £(1) by the symbol e, then Eqn (64) is equivalent to E(n) = e». (6-5) The fact that £(0) = 1 taken together with Eqn (6-1) implies £(1) > 1, also implies, via Eqn (6-5), that lim e n -> oo. n-»oo Again, £(-«)£(«) = £(0) = 1, so that £(-») = -=rr-'~ — = e-». (6-6) v ' £(«) e» Now we must extend this notation to take account of rational and irrational x. Let us consider E(x) for rational x, so that x = pfq with p, q integers. Then, using Eqn (6-5), we may write ['<£)]' -'®-«*-'- and so E (?\ = &«. (6-7) A similar argument using Eqn (6-6) shows that £ (^i\ = q-p'i. (6-8) Thus we have shown that for all rational x E(x) = e*. (6-9) To extend the definition of E(x) to all the real numbers x and not just to the rationals, it only remains to add that for any irrational number f , we define £(£) by the equation E(t-) = e { . Although the foregoing arguments have established the algebraic properties of E(x), they have still not provided a method of attributing an actual number to E(x) for any given value of x. Nor, indeed, are we certain that only one function E(x) exists that satisfies Eqn (61) and is such that £(0) = 1; that is to say, is E(x) unique? This question will be answered in the affirmative 272 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 immediately following the next stage of our argument. We now seek a series solution to our function E(x) of the form y = 2a rX r (6-10) where, for simplicity, we have set y = E(x) so that Eqn (61) now becomes Ay with/(0) = 1. Assuming that this infinite series may be differentiated termwise, we have dv " / = 2 ra r xr~\ ax r = so that substituting for y and dy/dx in Eqn (6- 11) yields 00 00 2 ra r x r ~ l = 2 a r x r r=0 r=0 or, equivalently, J (r + l)a r +ix r = f a r x r . (6-12) r=0 r-0 For this result to be unconditionally true for all x, as it must be to satisfy our definition of E(x), it follows that it must be an identity in x. This can only be possible if the coefficients of the corresponding powers of x on each side of Eqn (6-12) are identical. Hence, equating the coefficients of the general term involving x r , we find that (r + lK+i = a r (6-13) for r = 0, 1, 2 As we require that j(0) = 1, it follows by setting x = in Eqn (6-10) that ao = 1. Using this result together with Eqn (6-13), which defines the coefficients a r recursively, it is easily seen that 1 1 * ao = 1, ai = 1, a z = — , as = — , . . ., a r = — 2! 3! r! Substitution of these coefficients into Eqn (6- 10) then shows that E(x) = l+x + - + - + ■■■ + - + ■■ ■ (6-14) whatever this expression may mean. We have already remarked that the sum of ah infinite series is to be interpreted as the limit of the partial sums of the series, so let us now consider the nth partial sum SEC 6-1 THE EXPONENTIAL FUNCTION / 273 X ■n-1 5„=l + , + - + --- + (T - Tyi (6-15) of the function E(x). If x > then S n +\ - S n = x 1l jn\ > 0, so that {S n } is increasing. Is {S n } bounded ? Let R be an integer greater than 2x, then x\r < J for r > R, and so x r x x xx x x ■R-l <77f-^,(i) ,r-R+l r\ 12 R-l R r (R - 1)! Thus fl-l yF " _1 v r X R ~ l n ~ l r% r\ r = Rr\ {.R ~ l)!r = fl which shows that {S„} is bounded. Hence by the postulate of Section 3-2 it follows that lim S n exists, and we now define the sum of the infinite series n-»oo (6- 14) to be equal to the value of this limit. The infinite series (6- 14) is thus defined for all positive x. As we have agreed to write £(1) = e, it follows from Eqn (6-14), by setting x = 1, that e = 1 + 1+ 2l + 3l + ,, ' + ^ + '"' (616) which, to 15 decimal places, has the numerical value e = 2-718281828459045. A modified argument shows that E(x) is also defined for all negative x, so that taking account of Eqn (6-9) we have proved the following result: theorem 6-1 (exponential theorem) For all x it is true that if v l e= 2 -{ »=o n\ then CO yfl e*= 27 Let us now dispel any lingering doubts there may be about the uniqueness of e*. Suppose there is a different solution z = E(x) of Eqn (6-1), with z(0) = 1. Then we must have 274 / EXPONENTIAL, HYPERBOLA, AND LOGARITHMIC FUNCTIONS CH 6 dz d"* = Z (6-17) and so, differencing Eqns (611) and (6- 17), it is easily shown that dw 5J-». (6-18) where w = y — z. We also have w(0) = y(Q) — z(0) = 0. Now solving Eqn (6-18) by the same device as before, but this time setting w = f i b r x' r , (6-19) we arrive at the recurrence relation (/• + l)b r +i = b T , (6-20) for r = 0, 1, 2, . . ., which is strictly analogous to Eqn (6-13). However, setting x = in Eqn (6- 19) and using the condition w(0) = we find that bo = 0, and so it follows from Eqn (6-20) that all the coefficients b r are zero. Hence from Eqn (6- 19) we see that w(x) = 0, and thus y = z, showing that the function e* defined by Eqn (6- 14) is unique. Finally, it remains for us to establish the equivalence of the function E(x) defined by Eqn (3-6) and the one denoted by the same symbols in Eqn (6-14). We shall only give the details for positive x. Our best method is first to expand Eqn (3-6), obtaining Then, setting E n +i = [1 + (*/«)]", we rewrite the result in the form *«-'♦»+ *KKK)K)+-- +£K)H)-('-^)<-> Defining the number g(r, n) by *«4-3(-3-('-3- we next write Eqn (6-21) as E„+i = 1 + x + |^(1, n) + |jg(2, «) + ••• + ^g(n - 1, n). (6-22) Now the difference S n +i — E n +i is SEC 6-1 THE EXPONENTIAL FUNCTION / 275 Sn+1 - E n+1 = - (1 - g(\, „)) + L (1 _ g(2, «)) + ••• which is obviously positive since < g(r, n) < 1. However, it is readily seen that for any given r limg(>,n) = 1, showing that lim (S n +i — E n +i) = 0. n-»a> From Theorem 3-1 (a) it then follows that lim E n +i = lim S n +i — e* n-*oo n-*-oo thereby establishing the equivalence of our two alternative definitions when x is positive. A similar argument also establishes the equivalence when x is negative. Having now achieved a working definition for E(x) we shall henceforth always denote this function, known as the exponential function, either by e x or by exp (x). It is worth formally recording the differentiability properties of this function t x . However, we first remark that if fix) — e» te ', where g(x) is a differentiable function of x, then, setting g(x) = u so that f(x) = e" and using the chain rule in the form displayed in Eqn (5-6), we find that d/" df dw Tx = Tu-Tx = eUg ' (x)==8 ' (x)egixK theorem 6.2 If f(x) = e"<*>, where g(x) is a differentiable function of x, then d — {e» ( *>} = g'(x)e<>< x) . In particular, if g(x) = a.x, where a is a constant, then, — (e ax ) = ct.e x . dx Let us now establish an important property of e x . Consider the quotient eP/xP, where p is any positive integer. Then from Eqn (6-14) it follows that, x 2 x? xv +1 e* 2J p\ (p + 1)! x xP~ xv > (p + l)f 276 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 Hence we have shown that x e* hm — > lim ->■ 00. r -*oo*l> ^oo (p + 1)! We have proved the following result: theorem 6-3 The function e* increases more quickly than any positive power of x as x -»■ oo. We have already noted that lim e* -*■ oo, and as e* = 1/e - * it follows X— *oo that lim e* =0 or, equivalently, lim er x = 0. From Theorem 61 it follows ai-»-oo as-*oo that the function e* is everywhere positive and since, by virtue of its definition, its derivative is everywhere a strictly monotonic increasing function of x it must be a convex function. A graph of e* is shown in Fig. 61. These last properties are frequently of help when studying limiting prob- lems involving the exponential function, as illustrated in the following examples. -2 -1 Fig. 6-1 The exponential function Example 6-1 Deduce the values of the following limits: 3e* + jc 3 + 1 (a) lim 7 , x~* oo +£?' "T" -* /m r 2e2* + x* + 2 . 0»]™ 3 e3* + 7 ' (c) lim — SEC 6-2 DIFFERENTIATION OF EXPONENTIAL FUNCTION / 277 Solution (a) We have 3e* + x 3 + 1 3 + (x»/e*) + (1/e*) 2e* + x 7 ~ 2 + (x 7 /^) and from Theorem 6-3 it then follows that all but the initial terms in numera- tor and denominator must vanish as x -> oo, so that ,. 3e* + x 3 + 1 3 hm — — = -• _■ 2e* + x 7 2 (b) In this case we have 2 e 2* + x 2 + 2 _ 2e-* + (x 2 /e 3 *) + (2/e 3 *) 3c 3 * + 7 3 + (7/e 3 *) However, this time as x -*■ oo so all the numerator tends to zero whilst the denominator approaches the value 3. Hence we have ,. 2e 2 * + .*2 + 2 n hm — = 0. «_„ 3e 3 * + 7 (c) This limit involves an indeterminate form of the type 0/0, so we appeal to Theorem 5-14. Writing/(x) = e"* — e 6 * and g(x) = 2x we see that /(0)= <? (0) = 0,and x-*o g (x) x -*o 2 2 Hence, by the conditions of Theorem 5- 14, Qttx _ e bx ae ax _ fcbz a — b lim — = lim = — - — x~*0 2x x-*Q 2 2 6 - 2 Differentiation of functions involving the exponential function The exponential function occurs frequently in mathematics, and all of its differentiability properties follow from Theorem 6-2 combined with the fundamental differentiation theorems of Chapter 5. These results are straight- forward and are best illustrated by examples. The first example illustrates the ordinary differentiation of simple combinations of functions. Example 6-2 Differentiate the following functions /(x): (a) fix) = 2x2 + 3e2* ; (d) fix) = c«*/(l + e*) ; (b) fix) = x*&*\ (e) /(x) = sin (1 + e*). (c) f(x) = 2 exp (x 3 + 2x + 1); 278 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 Solution (a) f{x) = ^ (2x 2 + 3e 2 *) = 4x + 3 4 (e 2 *) dx dx and so /'(x) = 4 + 6e 2 *. (b) /'(*) = e 3 * 4- (x 2 ) + x z 4 (e**) dx dx so that /'(*) = 2xe 3 * + 3x 2 e 3 *. (c) This is a more complicated example of a composite function or, more simply, of a function of a function. Set u = x 3 + 2x + 1 so that /(*) = 2e«. Then, by the chain rule, r(x) = v.*a. JW du dx but df d dw -^ = — (2e M ) = 2e" = 2 exp (x 3 + 2x + 1) and — = 3x 2 + 2 du d« dx so that, finally, fix) = (6x 2 + 4) exp (x 3 + 2x + 1). (d) Writing /(x) in the form f(x) = e 2 *(l + e*)- 1 we have /'(*) = (1 + e*)- 1 ^ (e 2 *) + e 2 * £ [(1 + e*)" 1 ] or w-otV^k 1 *^- To evaluate the last term set 1 + e* = u, so that we then need to evaluate dx\u/ which, by the chain rule, is SEC 6-2 DIFFERENTIATION OF EXPONENTIAL FUNCTION / 279 _d_ dx C-)= ± (-)■-■ \u] du \uj dx However, dujdx = e* and (d/d«)(l/M) = — (1/w 2 ) = —1/(1 + e*) 2 , showing that d — e* [(1 + e*)-i] = dx ' J (1 + e*) 2 Hence, combining our results, we find 2t 2x c 3x f'(x) = (1 + e*) (1 + e*) 2 (e) This is another composite function. Set u = 1 + e 2x , so that f(x) = sin u. Proceeding as before we then see that /'(*) = ~ ■ jp = 2e 2 * cos (1 + e 2 *). Higher order derivatives are defined, as usual, by repeating the differentia- tion process the requisite number of times. Example 6-3 Find/"(x), given that: (a)/(x) = x2e-2*; (b) /(*) = (x- l)e*. Solution (a) Proceeding as before we find that f'(x) = 2xe-2* - 2x 2 e~ 2x , and f"(x) = 2e- 2 * - 4xe- 2x - 4xe- 2 * + 4x 2 e-2*. Collecting terms we obtain f"(x) = 2(1 - Ax + 2x 2 )e-2* (b) f'(x) = e* + (x - \)e x = xe x so that f"(x) = e* + x&. Partial differentiation of functions involving the exponential function is also straightforward, as the following example indicates. Example 6-4 Determine yi, f y , andf xy , given that fix, y) = (x 2 + J 2 ) exp (*2 _ j2) . 280 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 Solution y x = lx exp (x 2 - y 2 ) + (x 2 + j 2 ) y x [exp (x 2 - y 2 )] = 2x exp (x 2 - y 2 ) + 2x(x 2 + j 2 ) exp (x 2 - y 2 ). Notice that dfjdx comprises the sum of everywhere continuous functions and so is itself everywhere continuous. j = 2y exp (x 2 - y 2 ) + (x 2 + y 2 ) y [exp (x 2 - /)] — 2y exp (x 2 — y 2 ) — 2y(x 2 + y 2 ) exp (x 2 — y 2 ). The partial derivative dfjdy is also seen to be everywhere continuous. Theorem 5-24 now tells us that dfjdxdy = dfjdydx, so that we may differentiate either dfjdx or dfjdy to arrive at f xy . We choose to differentiate f x partially with respect to y. 8 2 f = —4xy exp (x 2 — y 2 ) + 4xy exp (x 2 — y 2 ) dydx — 4xy(x 2 + y 2 ) exp (x 2 — y 2 ), whence * — = -4xj(x 2 + y 2 ) exp (x 2 - y 2 ). dxdy dydx As a final illustration, let us consider an application of Theorem 5-21 to the exponential function. Example 6-5 Find dfjdt, given that f(x, y) = xy exp (x 2 + 3/ + 1), with x = sin /, y = t z + 1. Solution Here we must use the chain rule formula for partial differentiation : 6f = dJ_ dx df dy dt dx' dt dy' dt Now ■£ = y exp (x 2 + 3j + 1) + xy — exp (x 2 + 3y + 1) = y exp (x 2 + 3j + 1) + 2x 2 7 exp (x 2 + 3y + 1), and thus SEC 6-3 THE LOGARITHMIC FUNCTION / 281 %-=y(l + 2x 2 ) exp (x 2 + 3y + 1). Similarly, 8f 8 — — x exp (x 2 + 3>> + 1) + xy — exp (x 2 + 3y + 1) dy 8y = x exp (x 2 + 3y + 1) + 3xy exp (x 2 4- 3y + 1), and thus ^ = x(l + 3y) exp (x 2 + 3y + 1). We also have dx dv — = cos f and -f- = 3? 2 , dt dt and so df/dt may now be found by direct substitution into the chain rule formula, with the following result: df -I = [(/3 + i)(! + 2 sin 2 1) cos t + 3f 2 (3? 3 + 4) sin t] dt X exp (4 + 3t 3 + sin 2 0- 6-3 The logarithmic function Having introduced the exponential function there is now a need for an inverse function. The implicit function theorem (Theorem 5-23) tells us that such an inverse function exists and, furthermore, that it is differentiable whenever (d/dx)(e s ) # 0. However, this is always the case since we have already seen that (d/dxXe*) = e x , which is never zero for x in the interval — oo < x < oo. Hence a differentiable function, inverse to the exponential function, exists for all x. We call it the natural logarithmic function and denote it by log e whenever it is necessary to indicate that it has the base e. definition 6T We define the natural logarithmic function log e x by the requirement that y = loge x o x = e y . We may use this definition, together with Corollary 1 to the implicit function theorem, to compute the derivative of log e x. As dy/dx = l/(dx/dy) and x = e*', it follows that dx/dy = &>, whence dy_ j__ 1 dx & x 282 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 Now e*' is essentially positive, so that d 1 — (loge x) = - for x > 0. (6-23) It is obvious that log e 1 = and, as x increases strictly monotortically with y, it also follows that loge x— *■ + oo as x -> + oo, and log e x -*■ — oo as x-*0. Let us now prove that lim ^^ = for all a > 0. x->ca -* As x = e" we have ]ogeX = J_ and so lim !°iii = lim Z _ I i im 21. x— >oo x y— *oo e * oc y—+oa e y Setting h = ay we arrive at ,. log e X 1 U lim g = - lim — = 0, z—><X) % * «— >oo C by virtue of Theorem 6-3. Collecting the previous results we arrive at the following theorem. theorem 6-4 If j = loge x, then dy 1 (a) -f = " for x > 0; dx x (b) lim ^iL? = for all a > 0. Logarithms to other bases can be used if convenient. They are defined as follows. definition 6-2 We define the logarithmic function to the base c, denoted by loge x where c is a positive number, by the requirement that y = loge x o x = c«. For reference purposes we record the following familiar properties of the logarithmic function, established in elementary courses. SEC 6-3 THE LOGARITHMIC FUNCTION / 283 Basic properties of the logarithmic function Let loge and log c represent logarithms to the bases e and c respectively, and a, b, r be real numbers ; then : (a) loge ab = loge a + \og e b; (b) log e a r = r loge a; (C)1 ° gca = ioiel ; (d) log c e = logec Results (c) and (d) quoted above are immediately useful if it is necessary to differentiate loga x. For we have , loge x logo X = loge a so that al (,0&,x) = k^'dl (l0geX) whence, d ., . 1 logae — (log* x) = —. = -2— (6-24) ax x loge ax ' Let us now find the derivative of the function a x , where a is any positive number. Notice first that, by. virtue of Definition 6-1, so that a x — (^og e a\x _ gSlogea^ Now loge a is simply a constant, so we have (a*) = — (e x loge •) = loge a e* loge a = a* loge a. We have thus established the useful result d — (a*) = a* log e a. (6.25) This result can also be obtained in another manner. We set so that taking the natural logarithm gives 284 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 loge y = X loge a. Differentiating this result with respect to x we obtain ^(logej) = ^(*log e <0 or 1 Ay - ■ -f = loge a, y dx and so dv d — = — (a x ) = y loge a = a x log e a. For our final general result we consider the differentiation of the function y = log e g(x), where g(x) is a differentiable function. Setting u = g(x) so that y = loge u and using the chain rule gives dy = dy du = l_ dx du dx u so that, finally, |;[.og. ? WJ=||- (6-26) Henceforth, unless otherwise stated, the natural logarithm will always be used, so for simplicity of notation we shall write log in place of log e . Often, in other texts, the notation In is used to denote the natural logarithmic function. Let us now examine some representative cases of limits involving logarithms. Example 6-6 Evaluate the following limits : (a) lim fc 3 ; x— *oo X log a x (b) lim b with a > 0; frtlim l+* 3 log P + (!/*)] . log (1 + 3*) (d) lim — t; — • x-*0 *-X SEC 6-3 THE LOGARITHMIC FUNCTION / 285 Solution (a) We have log x 3 3 log x x x so that by Theorem 64 (b) it follows at once that lim fe- = 0. X— *oo X (b) We have log a x x log a 3x + 1 - 3x + 1 and so li m lo 8 a:r _ lim * lo g a = i. , X— >oo -JX "I 1 a:—* oo ^-^ i A J (c) Using the result 1 + x 3 log [2 + (1/x)] _ (1/x 3 ) + log [2 + (1/x)] 3x 3 + 2x 2 + 1 3 + (2/x) + (1/x 3 ) it is at once apparent that lim l + *3 log[2 + (1/x)]=1 3x 3 + 2x 2 + 1 3 B (d) This is an indeterminate form of the type 0/0. It is easily verified that Theorem 514 (L'Hospital's rule) is applicable so that x-.0g(x) x^0g(x) with f{x) = log (1 + 3x) and gix) = 2x. As /'(*) = 3/(1 + 3x) and g'(x) = 2 it thus follows that lim log(l + 3x) = lim 3 = 3 x~o 2x x ^o 2(1 + 3x) 2 Example 6-7 Determine the derivative dy/dx for each of the following functions / = fix) where : (a)/(x) = log(3x2 + 2); (b) fix) = log tan 2x; (c)/(x) = 3*x2; (d) fix) = (sin xy. 286 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 Solution (a) Here we must apply Eqn (6-26), with g(x) = 3x 2 + 2. As g'(x) = 6x it follows at once that d [log (3x2 + 2)] 6X dx L D v " 3x 2 + 2 (b) Again we must use Eqn (6-26), but this time with g(x) — tan 2x. As g'(x) = 2 sec 2 2x, we have that d 2 sec 2 2x — (log tan 2x) = = 2 sec 2x . cosec 2x. dx tan ix (c) We have — (3*x 2 ) = 3* — (x 2 ) + x 2 4- (3*) dx dx dx which, by virtue of Eqn (6-25), becomes — (3^ 2 ) = 2x . 3* + x 2 3* log 3 dx giving — (3*x 2 ) = (2x + x 2 log 3)3*. dx (d) We set y = (sin x) x and take logarithms to get logj; = xlog sin x. Now, differentiating, we find that 1 d J , • d /, • N - • — = log sin x + x — (log sin x) y dx dx or dy — = (sin xWlog sin x + x cot x). dx Partial differentiation involving the logarithmic function is equally straightforward. The final example illustrates a typical situation. Example 6-8 If u = x log [1 + (x/y)] + y log [1 + (y/x)], show that 8u 8u x 1- y — = m. dx f dy Solution We start by computing dujBx. It is readily seen that SEC 6-4 HYPERBOLIC FUNCTIONS / 287 du 8. ^iog(i + ^ + ,-i.og(. + j;) + v£io g (i + i) = , °s( 1+ ;) + ^T1^7 + - , '-TT^(^ and so du , / , x\ x y] x + y x(x + y) The symmetry of x and y in u then allows us to interchange x and y in the above partial derivative in order to derive du\dy without further calculation. We obtain 8u i /, , y\ , y x 2 -«. log (l + >) + - y \ xj x 8y \ x) x + y y(x + y) Hereafter, direct substitution verifies that du du X ¥x +y Jy = U - 6-4 Hyperbolic functions It is useful to define new functions called the hyperbolic sine, written sinh x, and the hyperbolic cosine, written cosh x, which are related to the exponential function. This is achieved as follows. definition 6-3 (hyperbolic functions) For all real x we define sinh x and cosh x by the requirement that t x —r e~ x e* + e~ x sinh x = , cosh x = 2 2 It is an immediate consequence of the series for e x and e~ x that x 3 x b x l x 2m+1 slnhx=x+ _ + _ + _ + ... +__+..., (6 . 27) and r 2« coshx=1+ _ + _ + _ + . ..+_+.... (6 . 28) Furthermore, it also follows from Definition 6-3 that sinh x is an odd function and cosh x is an even function. We now define the hyperbolic tangent, cotangent, cosecant, and secant, denoted by tanh x, coth x, cosech x, and sech x, as follows. 288 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 EFINITION 64 sinh x tanh x = — - — ; cosh x cosh x coth x = -— — ; sinh x cosech x = ^— — : sech x = — - — sinhx' cosh x We illustrate how useful identities may be established directly from Definition 6-3. Let us prove that sinh a cosh b + cosh a sinh b = sinh (a + b). Substituting for sinh a and cosh b from Definition 6-3 we obtain e a g-a e 6 _|_ e -6 ga _j_ g-a g& g-6 g(a+6) — e -(a+6) 2 2 + 2 2 = 2 ' which proves our result since [e (a+ « — e- (a +«]/2 = sinh (a + b). Similar manipulation establishes the validity of all the identities listed below in Table 61. Table 61 Identities for hyperbolic functions sinh (x ± y) = sinh x cosh y ± cosh x sinh y; (6-29) cosh (x ± y) = cosh x cosh j ± sinh x sinh ^ ; (6-30) cosh 2 x — sinh 2 x = 1 ; (631) tanh 2 x + sech 2 x = 1 ; (6- 32) 1 + cosech 2 x = coth 2 x. (6-33) Table 6-2 Derivatives of hyperbolic functions — (sinh x) = cosh x; (6-34) ax — (cosh x) = sinh x; (635) dx — (tanh x) = sech 2 x; (636) dx — (coth x) = — cosech 2 x; (6-37) dx — (cosech x) = — cosech x coth x; (6-38) dx — (sech x) = — sech x tanh x. (6-39) dx SEC 6-4 HYPERBOLIC FUNCTIONS / 289 Appeal to Definitions 6-3 and 6-4 together with the differentiability properties of the exponential function establishes Table 6-2, the table of derivatives. The behaviour of the hyperbolic functions is indicated graphically in Fig. 6-2 and for comparison the graphs of y = \z x and y = \t~ x have been added to Fig. 6-2 (a). Functions inverse to the hyperbolic sine and cosine are introduced through the following definitions. definition 6-5 The inverse hyperbolic sine, arcsinh x, and the inverse hyperbolic cosine, arccosh x, are defined by the relationships: (a) y = arcsinh x o x = sinh_y; (b) y = arccosh x o x = cosh y. Their derivatives are readily obtained by direct use of this definition and we illustrate the process by deriving d/dx arcsinh x. If y = arcsinh x, then x = sinh y and so, differentiating with respect to x, we obtain dy 1 = cosh y -f-> ax and so dj 1 1 dx cosh y \/(l + sinh 2 j) by virtue of identity (6-31) and the fact that cosh y is essentially positive. Hence, using the fact that x = sinh y, we find that d , 1 — (arcsinh x) = — — for all x. dx V(l + x 2 ) In the case of y = arccosh x we must proceed with more care. If y = arccosh x, so that x = cosh y, then, as before, differentiating with respect to x gives dy 1 = sinh v . — y dx or, dy 1 dx sinh j? 290 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 Fig. 6-2 Hyperbolic functions: (a) y = sinh x and y = coshx; (b) y = tanhx; (c) y = coth x; SEC 6-4 HYPERBOLIC FUNCTIONS / 291 y = cosech x (d) Fig. 6-2 (continued) (d) y = cosech x; (e) y — sech x. Now from the graph in Fig. 6-2 (a) we see that sinh y is positive if its argument arccosh x > and negative if arccosh x < 0. Thus two different inverse functions must be defined. If arccosh x > 0, then Table 6-3 Derivatives of inverse hyperbolic functions d_ dx d_ dx d_ dx • all x: I arcsinh - | = — , for \ a J V(* 2 + a 2 ) (x\ 1 XX arccosh - I = — , for arccosh - > and - a J \(x 2 — a 2 ) a a i x\ — 1 X X I arccosh - ) = — , for arccosh - < and - y a J v '(* — a') a a > i; > i; dx\ I arctanh x\ _ a a J a 2 — x 2 I arccoth - for x 2 < a 2 ; for x 2 > a 2 ; dx \ a I a* — x* d / , A ~ a , „ — - I arccosech - I = , for all x; dx \ a ! x\(x 2 + a 2 ) d / , x\ —a „ , x x — I arcsech - I = — — , for arcsech - > and < - < 1 ; dx \ a! xv (a *■ — x l ) a a- d / x\ a x x — I arcsech - = — — — , for arcsech - < and < - < 1 . dx \ a I x\/(a^ — x l ) a a (6-40) (6-41) (6-42) (6-43) (6-44) (6-45) (6-46) (6-47) 292 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 dy 1 1 , 1 cLy sinh y V(cosh 2 y - 1) VCy 2 - Conversely, if arccosh x < 0, then dp 1 -1 -1 for a- > 1 . for a- > 1 . dx s'mhy \/(cosh 2 y — I) \/(a 2 — 1) Other inverse hyperbolic functions are defined similarly and it is left to the reader to verify the remaining entries in Table 6-3. (In many books the inverse function is denoted by a superscript — 1 , when sinh -1 x is written in place of arcsinh x, etc.) The following examples are representative of the limiting and differenti- ability problems encountered with hyperbolic functions. Example 6-9 5 sinh 3a- + xe x (a) Evaluate lim — ; 4e 3 * (b) Find /'(a) if /(a) = sinh (a 2 + 3x + 1) 1/2 ; (c) Find /'(a) given that /(a) < is given by /(a) = arccosh (sin 2 a); (d) Determine f x andf y given that /(a, y) = xy cosh (a 2 + y 2 ). Solution (a) From Definition 6-3 it is easily seen that for large x sinh 3a = ^e 3x . Hence, applying the usual arguments, it follows at once that 5 sinh 3a + Ae* . (5e 3 */2) + Ae* 5 hm = lim = — 4e3* ^ 4e3* 8 (b) /"(a) = [cosh (a 2 + 3a + 1) 1/2 ] • - • „ (2 * + 3) w - / w L v ' J 2 (a 2 + 3a+ 1) 1/2 so that fix) = , ^ X + 3) cosh (a 2 + 3a + 1) 1/2 . J y ' 2(a 2 + 3a + 1) 1/2 v ' (c) Set y = arccosh (sin 2 a) so that sin 2 x = cosh y. Differentiation with respect to a then gives dv 2 sin a . cos x = sinh y . — dA SEC 6-5 EXPONENTIAL FUNCTION WITH COMPLEX ARGUMENT / 293 or dy 2 sin x . cos x dx sinh y As we are told that y —fix) < it then follows that dy —2 sin x . cos x —2 sin x . cos x dx = V(cosh 2 j> - 1) = V(sin 4 x- 1) provided sin x ^ 1 . 8/\ (d) j- = j cosh (x 2 + j 2 ) + xj> d/dx cosh (x 2 + j 2 ) = y cosh (x 2 + y 2 ) + 2x l y sinh (x 2 + y 2 ). Similarly, 8f j- = x cosh (x 2 + j 2 ) + 2xj> 2 sinh (x 2 + j> 2 ). 6 - 5 Exponential function with a complex argument If we formally replace x by ix in the series expansion of t x in Theorem 6-1 we obtain x 2 x 3 x 4 x 5 x 6 x n t ix = 1 + ix i 1 \- i 1- • • • + /» h • • • 2! 3! 4! 5! 6! n\ Clearly e fa is a complex number for any fixed real number x and, writing it in the form e to = C(x) + iS(x), it follows by equating real and imaginary parts that v-2 x i y6 r 2» and X 3 X 5 X 7 , ^ x 2 " +1 *(,)-,-_ + ___ + ... + (_!). __ + .., Thus, in fact, if x is regarded as a variable, S(x) and C(x) are functions of x and e te is, in some sense yet to be properly defined, a function of a complex variable. Assuming that the series for C(x) may be differentiated term by term it is easily verified that „„ s x 3 x 5 x 7 x 2n + l C'(x) =-~xH H h- • •+ (-l) n+1 h • • • W 3! 5! 7! T ^ V ; (2«+l)! + 294 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 »* Next, differentiating C'(x) again with respect to x yields jc 2 x 4 X s x 2 " showing that in fact C"(x) = -C(x). Now, setting x = in the series for C(x) and C'(x), we find that C(0) = 1 and C'(0) = 0. Hence the function C(x) is seen to be the solution of the special differential equation with j(0) = 1 and /(0) = 0. This same differential equation with the conditions on y was encountered in Example 5-13 (a), where it was derived as the equation satisfied by y = cos x and its derivatives. Thus the function C(x) is, in reality, the function cos x. An analogous argument establishes that S(x) = sin x. On account of this identification of C(x) and S(x) we may write t ix = cos x + i sin x. (6-48) As a direct consequence of replacing x by — x in Eqn (6-48) and using the fact that cos x is even, but sin x is odd, we find that e -te = cos x — i sin x. (6-49) Combination of Eqns (6-48) and (6-49) leads to the following definitions of the sine and cosine functions. DEFINITION 66 sin x = and cos x = 2/ 2 Comparison of Eqns (415) and (6-48) shows that e ix represents a complex number of unit modulus lying on the unit circle drawn about the origin. The argument of e tx is x. Slightly more general than Eqn (6-48) is the complex number e (x+i ^ for, by the property of indices together with Eqn (6-48), we have e cr+<irt = e * . c*» = e*(cos y + i sin y), (6-50) showing that | e's+M | = e* and arge te +w> = y. (6-51) SEC 65 EXPONENTIAL FUNCTION WITH COMPLEX ARGUMENT / 295 ». Thus the modulus-argument form of a general non-zero complex number z may be written z = re* 9 , where r = | z | and = arg z. (6-52) This is, of course, an alternative form of Eqn (4- 1 5). As it is true for any exponent a that (a x ) a = a xx , it follows that (e ix ) a = e iax , so that from Eqn (6-48) we arrive at the result (cos x + i sin x) a = cos a.x + i sin xx. (6-53) This is simply de Moivre's theorem (Theorem 4-2) for any exponent a and not just for the integral values used in the first proof of this important theorem. To close, let us apply these results to give an alternative derivation of the results of Example 4-10, and also to express sin™ and cos™ in terms of sums involving sin rd and cos rd, as promised in that example. As in Chapter 4, the argument is best presented by example. Example 6-10 (a) Express sin nd and cos nd in terms of cos and sin 0. Deduce the form taken by the result when n = 4. (b) Express cos 7 in terms of cos rd. (c) Express sin 5 in terms of sin rd. Solution (a) cos nd = Re(e toe ) = Re[(e iS ) B ] = Re[(cos d + i sin 0)»]. sin nd = Im(e te ") = Im[(e ie )»] = Im[(cos d + i sin 0)»]. When « = 4we have (cos d + i sin 0) 4 = cos 4 8 + Ai cos 3 d sin d — 6 cos 2 d sin 2 — Ai cos sin 3 8 + sin 4 0. Hence cos Ad = Re[(cos -h / sin 0) 4 ] = cos 4 0-6 cos 2 0'sin 2 + sin 4 and sin 40 = Im[(cos + i sin 0) 4 ] = 4(cos 3 sin - cos sin 3 0). (b) From Definition 6-6 we may write -i«\7 COS 7 _ /e« 9 + e-* e \ ' 296 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 y the Expanding the right-hand side by the Binomial theorem, simplifying and grouping terms, we obtain 1 /e" 9 + e" w e 5 * 9 + e~ 5ie e 3 * 9 + e" 3 * 9 ^ = 2^— + 1 ~T— + 2l —2 Again using Definition 6-6, we see that this immediately simplifies to cos 7 6 = — (cos 70 + 7 cos 50 + 21 cos 30 + 35 cos 0). (c) From Definition 6-6 we may write Expanding the right-hand side, simplifying and grouping terms gives 1 /e^* 8 _ e-5i e e 3 ' 9 — e~ 3<9 e* 9 — e~ i9 \ sin5e = 2-<(-^ 5 — 2T- + l0 —2r-} Again appealing to Definition 6-6, we see that this immediately reduces to sin 5 6 = — (sin 50-5 sin 30 + 10 sin 6). 16 A variant of the method used here and in example (b) above is to be found outlined in Problems 6-37 and 6-38. PROBLEMS Section 6-1 6-1 Solve the differential equation dy/dx = y, with y(0) = c, as in Section 61, by substituting CO y = J a rX r . Hence deduce that, provided c ^ 0, the differential equation has the non- trivial solution y = ct x . 6-2 The function y = e~ x satisfies the differential equation dy/dx = —y, with y(0) = 1 . Use the method of the previous problem to verify the series solution. 6-3 It follows from the argument preceding Eqn (6- 16) in Section 6T that < S n - Sr < x" (R - D! where the integer R > 2x. Use this result to deduce the least number of terms that must be included in the series expansion of e 2 in order that the error involved is less than 0-01. PROBLEMS / 297 6-4 Evaluate the following limits: 4e 2j + xe x + 3 (a) lim (b) lim ^ x 5xe 3x + c*+ 1 (x 2 + l)e 3 * + e z + 1, (2x 2 - 3x + l)e 3 * (2 - x 2 )f + 3 _ V^T+v + xy**' ,„ ,. 3(2 e~ 3 * + x 2 + 1 ) (d) ^o 4e* + 2*+l • 6-5 Make use of the series expansion of c x to evaluate the following limits and verify your result by using Theorem 5- 14: (a) lim — ; 1 — e~ x (b) lim -^-r-; 3-_>o sin Ax , , .. & -\-x 6-6 Differentiate the following functions: (a)/(x) = 2e*cosx; (b) /"(jc) = e 3 -* arcsin x; (c)'/(x) = e*/x 2 ; (d) /(x) = e* 8lM . 6-7 Differentiate the following functions: (a) f{x) — arcsin e 2 *; (b)/(*) = v '(*e* + x); (c)/(x) = sinOe*+ 2); (d) /(*) = (e* - l)/(e* + 1). Section 6-2 6-8 Differentiate the following functions: (a) /**) = 3 exp [-(** + *+ 1)]; (b) f(x) = e si " 2j: ; (c) /(jc) = cos [exp (x sin x + 2)]. 6-9 Find the second derivatives of the following functions: (a)/(x) = e 3 * 2 ; (b)/(x) = sin(l + e 2 *); (c) f(x) = e sinr . 298 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 6- 10 Consider the function f(x) defined as follows: (e- 1 ^ 2 for x # 0, /to = , J 10 for x = 0. Clearly the differentiability properties of this function at the origin must be deduced directly from the definition of a derivative. To deduce these properties show first that for x i= 0, it follows that /'to = |e-^ 2 . Then, by using Definition 5-2 together with Theorem 6-3, prove that/'(0) = 0, and hence deduce that \imf'(x) = f'(0) = 0. x-*0' Finally, deduce that in general, /<">(*) = e~ 1/xi x (Polynomial in Ijx), and hence by using an inductive argument prove that / <n, (0) = for all n. This is an example of a function which is capable of differentiation an arbitrary number of times for all x, and yet which has every derivative equal to zero at one point of its domain of definition. 611 Find SfjBx and dfjdy, given that f(x, y) = e sin (vlx \ 6 12 Show that u == xy + xt ylx satisfies the equation x-^+y— = xy + u. ox y Sy y 6- 13 Find d//d/, given that f(x, y) = e 2 * +7 *' where x = cos t, y = sin t. 6 14 Find df/8u and Sfjdv if x f(x, y) = 2 arctan - y with x = u sin v and y = u cos v. Section 6-3 6-15 Evaluate the following limits: , , ,. (x- l)logx 2 (a) lifn v \ s ; x— *cc X (b) lim **££»■. X-+K, Ax + 1 PROBLEMS / 299 (c) lim z—0 log (3 sin x) — log [(1 + x) sin x] It* - 1 (d) lim [log (3* + 1) - log (2x + 5)]; (e) lim J-*CO log (1 + 2e*) 616 Let/(x) and g{x) be functions such tha,t lim/(x) = and lim^(x) = but x— >a x-*a lim 44 = *■■ Then lim lo § H + y (x)] = ]im log [1 + /(x)] 1 ''/**) = lim log {[1 + /(x)]i//(*>}/MArt*>. However, it follows from Chapter 3, Section 3 that lim [1 + /(jc)] 1 '-^ e* e, so that , im ] °g [1 +/ (X)] = «m log e™> = Hnn ^ = *• Apply this result to evaluate the following limits : log(l +lx) _ 2x log (1 + 3 sinx) (a) , im !5S0±H). x^o 2x (b) lim x-*0 (c) lim *— log[l - 2sin 2 (x/2)], use your result to deduce lim (cos x) 1,x . a-—0 617 Apply Theorem 514 to evaluate the limits in Problem 6-16. 618 Differentiate the following functions: (a)/(x) = logO 3 + 7x 2 + 2); (b) f(x) = log sin 2x; I x — V (c) f{x) = log cos 619 If v = [f(x)]o^ then, taking the natural logarithm, log y = £-(x)log/(x). Hence, differentiating with respect to x, it follows that dy dx ^x)log/(x)+<g/'(x) [/(x))^> Use this result to differentiate the following functions ; (a) y = x x ; (b) y = (sin2x)*; (c) v = x* inx ; (d) 'y = 10 lo 8 sin *. 300 / EXPONENTIAL, HYPERBOLIC, AND LOGARITHMIC FUNCTIONS CH 6 6-20 If u = x log (1 + xjy) + y log (1 + yjx) a 82 " o^ 2 " show that x 1 -— = y 2 — . 8x 2 J dy 2 6-21 Find the total derivative dz given that z = log (x 2 + 2y 2 ). 6-22 Show that the function /(*> y) = arctan y/x + log (x 2 + y 2 ) satisfies the equation dx 2 dy 2 6-23 By taking logarithms deduce du/dx, dujdy, and dujdz if u = (xy) 2 . Section 6-4 6-24 Use the definitions to establish the form taken by : (a) sinh x; (b) cosh x; (c) tanh x; when x is large. Distinguish between x large and positive and x large and negative. 6-25 Prove by means of the definition that (cosh x + sinh x) n = cosh nx + sinh nx. 6a26 Use the definitions to verify any three of the identities contained in Table 6-1. 6-27 Prove by means of the definitions that : (a) 2 sinh x cosh y = sinh {x + y) + sinh (x — y); (b) 2 cosh x cosh y = cosh (x + y) + cosh (x — y); (c) 2 sinh x sinh y = cosh (x + y) — cosh (X — y). 6-28 Verify any three of the entries in Table 6-2. «i/29 Verify the derivatives of arccosech x/a and arcsech x\a given in Table 6-3. 6-30 Evaluate the following limits, using the series (6-27) and (6-28) where necessary: x 3 cosh 2x + e x (a) lim X— »-oo (b) lim X— *■ — (c) lim x->0 (d) lim (2x 3 + x + l)e 2 * + * 3 e- 2 *' x 3 cosh 2x + e* ,o (2x 3 + ^ + l)e 2 * + x 3 e- 2 *' sinh ax X-+0 x 1 — cosh 2x x^o 3x 2 6-31 Differentiate the following functions: (a) f(x) = sinh 2x cosh 2 x; (b) f(x) = exp (1 + cosh 3x); PROBLEMS / 301 (c)/X*)=«log(tanhx); (d) f{x) = arcsech (x 2 + J) if /(*) > 0; (e) f{x) = cosh (sin 2x). 6-32 Evaluate dujdx and Suj8y given that : (a) uix, y) = sin x cosh xy\ (b) «(x, j) = sinh (x 2 + x sin y + 3y 2 ) ; (c) u(x,y) = xcosh^+2^ Section 6-5 6-33 Establish by means of the definitions that: (a) sin (Jz) = i sinh z; (b) cos (;'z) = coshz; (c) sinh(z'z) = /sin z; (d) cosh (iz) = cos z. 6-34 Given that a, b are positive real numbers, deduce four trigonometric identities by equating real and imaginary parts in each of the following results Qia gib = Qi(a+b) and e* a . e - *^ = £*(«-&), 6-35 Express the following complex numbers in the form re i6 : (a) 1 + i; (b) 1 - /; (c) -80V3 - 1); (d) (-1+/) 8 ; (e) (5+ 140/(4 + /). 6-36 Show by means of de Moivre's theorem that : (a) 32 cos fr 6 = 10 + 15 cos 26 + 6 cos 46 + cos 66; (b) sin 70 = 7 sin 6 - 56 sin 3 6 + 112 sin 5 6-64 sin 7 6. cos 6 = - j z H — | and sin 6 2\ z) 6-37 Verify that if z = e ie , then ~*H) and, more generally, cos r6 = - J z r + — J and sin rd = — - | z r J • By replacing cos 6 and sin 6 by their equivalent expressions involving z, make use of these results to express cos 2 6 sin 3 6 in terms of sin n(t. 6-38 Use the method of Problem 637 to express sin 8 in terms of cos nd. 6-39 Consider the function cosh z, where z = x + iy. Then, using Definition 6-3, deduce that coshz =a when z = (2« + l)«'/2, with n = 0, ±1, ±2, . . .. Use the results of Problem 6-33 to deduce the zeros of cos z. 6-40 Consider the function sin z, where z = x + iy. Then, using Definition 6-6, deduce that sin z = when z = w, with n = 0, ±1, ±2 Use the results of Problem 6»33 to deduce the zeros of sinh z. Fundamentals of integration 7-1 Definite integrals and areas The work of this chapter is concerned with the theory of the operation known as integration, which occupies a central position in the calculus. The connec- tion between differentiation and integration is basic to the whole of the calculus and is contained in a result we shall prove later known as the funda- mental theorem of calculus. Once again, limiting operations will play an essential part in the development of our argument. In fact we will show not only how they enable a satisfactory general theory of integration to be established, but also how they provide a tool, albeit a clumsy one, for the actual integration of functions. However, aside from a number of simple but important examples, the practical details of the evaluation of integrals of specific classes of function will be deferred until Chapter 8. We begin by seeking to determine the shaded area / of Fig. 71 which is interior to the region bounded above and below by the curve y = f(x) and the x-axis, respectively, and to the left and right by the lines x = a, x = b. This approach will lead naturally to what is called the definite integral of f{x) over the interval a < x < Z>, and it illustrates a valuable geometrical interpretation of the process of integration. Although we use the definite integral to give precise meaning to the notion of the area contained within a closed curve, this appeal to geometry is not actually necessary when defining an integral. Indeed, we shall also show how a purely analytical definition of Fig. 71 Area / defined by y = f(x). SEC 7-1 DEFINITE INTEGRALS AND AREAS / 303 a definite integral, quite independent of any geometrical arguments, may be formulated. Let f(x) be a non-negative continuous function defined in the closed interval [a, b] and consider, for a moment, the conceptual problem that arises when trying to determine the area /defined by it in Fig. 71. The only simple plane geometrical figure for which the concept of area is defined in an ele- mentary and unambiguous manner is the rectangle, so that we shall seek to define the area / in terms of the limit of a sum of rectangular areas. It should perhaps be remarked at this point that the derivation of the formula nr 2 for the area of a circle of radius r involves the concept of integration, although this is invariably avoided in any first encounter by the employment of arguments that are at best only plausible. We shall start our discussion from the postulates that (a) the area of a rectangle is given by the product length X breadth, (b) the area of the union of two non-overlapping rectangles is the sum of their separate areas, and (c) if a rectangle is divided into two parts by a curve, then the sum of the separate non-rectangular areas comprising these two parts is equal to the area of the rectangle. On the basis of postulate (c), we at once see that the area / in Fig. 7-1 exceeds the rectangular area ABEF, but is less than the rectangular area ACDF. Letting m, M denote, respectively, the minimum and maximum values attained byf(x) in [a, b], this result becomes m(b - a) < / < M(b - a). (7-1) This inequality, although interesting, must obviously be refined if it is ever to lead to the actual value of/. In principle, our approach will be simple, for we shall begin by dividing [a, b] into n adjacent sub-intervals in each of which an inequality of type (7T) will apply, after which we shall use postulate (b) to find better upper and lower bounds for /. Specifically, we start by choosing any sequence of n + 1 numbers xo, xi, . . ., x n subject only to the requirements that .yo = a, x n = b, and Xo < XI < • • • < X n -1 < X n . The sequence {x r }" r =o so defined is called a partition P of the interval [a, b], and for any given value of n it is obviously not unique. Next, on each sub- interval [xi-u xt], let the function f(x) attain a minimum value mi and a maximum value Mi and denote the length of the /th sub-interval by A{, so that Aj = x t — Xi-i. We now define numbers Sp and Sp called, respectively, the lower and upper sums taken over the partition P, by the expressions n Sp = miAi + w 2 A 2 + • • • + m n A» = 2 w rA r (7-2) 304 / FUNDAMENTALS OF INTEGRATION CH 7 and S P = MiAx + M 2 A 2 + • • • + M„A B = 2 M^ r . (7-3) Clearly, as Figs. 7-2 (a), (b) illustrate, Sp and Sp are, respectively, under- and over-estimates of the area /. The fact that Sp < S P is apparent on geometrical grounds, but it also follows without appeal to geometry by considering the difference Sp — Sp — (Mi — wi)Ai + (M 2 — w 2 )A 2 + + (M„ — m„)A. n . (7-4) Fig. 7-2 (a) Shaded area represents lower sum S p ; (b) shaded area represents upper sum S p . In this equation we have, by definition, A r > and M r > m r for r = 1, 2, . . ., n, so that Sp- Sp>0 or, 5p < Sp, and thus by postulate (c), Sp<I<Sp. (7-5) It would seem reasonable to suppose that as the number n of points in a partition increases, provided the lengths of all intervals shrink to zero, the limit of both the lower and upper sums must be /, the desired area. We prove this in two stages, first considering the effect on the lower and upper sums of the refinement of the partition P by the inclusion of extra points. It will suffice here to consider only the effect of the inclusion of one extra point Xr between x r -i and x r in the partition P. The resulting partition P' is called a refinement ofP, in the sense that although P' has more points than P, all points of P are also points of P'. Suppose that in the intervals [x r -i, x r '] and [x/, ay] the function f(x) attains the minimum values m r ' and m r ", respectively. Then the effect of the SEC 7-1 DEFINITE INTEGRALS AND AREAS / 305 extra point is to replace the term m r {x r — x r -i) in the lower sum Sp by the sum m r '(x r ' — x r -i) + m r "(x r — x r ') thereby generating the sum S P ' appro- priate to the refinement P' of the partition P. As it must be true that m r < m r ' and m r < m r ", it thus follows that m r \Xr — Xr-l) + m r "{x r - X r ') > m r (x r - Xr-l), whence Sp < Sp'. (7-6) Identical reasoning involving the maxima M/ and M r " attained by f(x) in the intervals [x r -i, x r '] and [x r ', x r ] establishes that (7-7) Mr' = Mr mr = Mr" m T = m r (b) Fig. 7-3 Effect of refinement of a partition: (a) area inequality on interval [xr-i, x r ] ofP; (b) area inequality on interval [x r -u x r ] o(P'. The inequalities leading to results (7-6) and (7-7) are illustrated geometric- ally in Figs. 7-3. Thus in Fig. 7-3 (a) the area inequalities associated with the interval [x r -i, x r ] of P are displayed, whilst in Fig. 7-3 (b) the corresponding situation is displayed for the refinement P' produced by inserting an addi- tional point x r ' in [x r ~i, x r ]. The further refinement of the partition P' by the inclusion of additional points only serves to reinforce results (7-6) and (7-7). We have thus estab- lished that if the partitions Pi, Pz, . . ., P m are successive refinements of the partition P, then m(b — < a) < S Pl < Sp 2 < < S Pm < / < S Pm < S Pm . < • • • < S Pl < M(b - a). (7-8) Expressed in words, the effect of refinement of a partition is to increase the corresponding lower sum and to decrease the corresponding upper sum, so that {S Pr } is a monotonic increasing sequence of numbers, and {S Pr } is a monotonic decreasing sequence of numbers. For the second and final stage of our argument we introduce the norm || A | \p of a partition P by means of the definition 306 / FUNDAMENTALS OF INTEGRATION CH 7 ] | A | \ P = max (x t - x<_i). (7-9) i That is to say, for any partition P of the interval [a, b], the norm ] ] A ] \ P is the length of the longest sub-interval of [a, b] produced by the partition. Let us consider a sequence of partitions which are successive refinements of P and are such that lim II A lip =0. m—*oa Then by the postulate of Section 3-2, as {S Pr } is monotonic increasing and bounded above it must tend to a limit S and, similarly, as {S P } is mono- tonic decreasing and bounded below it must tend to a limit S, where S<I<5. (7-10) To show that 5 = S, as would be expected, observe that if dp = max (Mi — m t ) for all /', i then Eqn (7-4) gives rise to the inequality 5p - S P < <5 P (Ai + A 2 + • • • + A») = d P {b - a). (7-1 1) Hence, for any sequence of partitions Pi, P%, . . ., P m , . . . which are refine- ments of P with the property that lim 1 1 A | \ Pm -»- 0, it follows from the continuity of f(x) that lim d Pm -»• 0, thereby showing that {S Pm — S P } is a null sequence. Thus {S Pm } and {S Pm } both have the same limit. Taken in conjunction with Eqn (7-10), we have proved that because of the continuity of/(x), the limit of the lower sums is equal to the limit of the upper sums, and each is equal to the limit / which has been interpreted as the shaded area in Fig. 7T. The limiting argument used above certainly suffices to define the area /, but before formulating our definition of the definite integral, let us first make a useful generalization of our argument. With the partition P used earlier associate any set of n numbers fi, h, ...,!» for which it is true that *o < f 1 < Xi, xi <_f 2 < xt, . . ., x n -i < i n < x n . Now form the approximating sum S P defined by Sp =/(^i)A 1 -r-/(! 2 )A 2 + • • • +/(|„)A n . (7-12) Then because mi </(&) < Mi, it follows at once that S Pm <S Pm <S Pm , (7-13) for all refinements Pi, i>2, . . ., Pm, . . . of the partition P. Consequently, since lim S Pm = lim S Pm = /, it follows immediately from Theorem 3-6 that lim S P = /. This important result asserts that if f(x) is continuous on [a, b], then as SEC 7-1 DEFINITE INTEGRALS AND AREAS / 307 the partition is refined, so the corresponding upper and lower sums S Pm , Sp and the approximating sum S Pm all converge to the same limit. We now state this as our first fundamental theorem which forms the basis of our development of the integral. theorem 7-1 (first limit theorem for sums defined on a partition) Let f(x) be a continuous non-negative function on the closed interval a < x < b, and let Pi, P2, . • -,Pm,- ■ . be a sequence of successive refinements of some partition P of [a, b] with the property that lim 1 1 A 1 1 Pm = 0. Then, if & is any point in the /th sub-interval of length A* generated by the partition P m , and S P and S P are respectively the lower and upper sums associated with P m , it follows that n lim S P = lim S Pm = lim T /(£ i)A«. — m m 1 1 A I 1 n ■ t m-»oo m-»oo ||A||p m -*0 t = l This theorem suggests the following form of definition for the definite integral. definition 7-1 (definite integral of a continuous non-negative function) Let f(x) be a continuous non-negative function on the closed interval a < x < b, and let Pi, P2, , . ., Pm, ■ ■ ■ be a sequence of successive refine- ments of some partition P of [a, b] with the property that lim 1 1 A | \P m = 0. Then, if |« is any point in the rth sub-interval of length A< generated by the partition P m , the definite integral of f(x) integrated over the interval [a, b], and written symbolically rb f(x)dx, i Ja is defined to be Cb Ja f(x)dx = lim 2 /(&)A«. l|A||P m -0 i = l In the context of a definite integral, the function f(x) is called the inte- grand, the numbers a, b are called the lower and upper limits of integration, respectively, and the sign J" itself is called the integral sign. In summary then, a definite integral of a positive continuous function f(x) integrated over the interval [a, b] is a positive number defined by means of a limiting process. It may be interpreted geometrically as the shaded area / below the curve y = f{x) as shown in Fig. 7-1. To show that this is a Working definition, in the sense that it can be used to yield a useful answer, let us now apply it to a simple function. Example 71 Evaluate the definite integral x 2 dx, where a < b. J a 308 / FUNDAMENTALS OF INTEGRATION CH 7 Solution As x 2 is everywhere continuous and is non-negative on the stated interval Definition 7-1 applies. Thus we start by considering a convenient partition P n in which [a, b] is divided into n equal sub-intervals, each of length A = (b — d)\n. Then, if for convenience we identify f t with the right- hand end-point of the rth sub-interval, we have f i = a + A, | 2 = a + 2A, h = a + 3A, . . ., &, = a + «A. Hence, from Definition 7-1, / = lim J (a + (A) 2 A. n— »-oo i = l Expanding and grouping the terms of the summation then gives / = lim [wa 2 A + 2aA 2 (l + 2 + 3 + • • • + ri) + A3(l 2 + 22 + 32 + ■ • • + «2)]. Using the fact that A = (b — d)\n together with the well-known results 1+2 + 3 + - • • + it = 2 („ + i) and 12 + 22 + 3 2 + ... + „2 = ^+i)^L±l), 6 it follows that / = lim [a\b -a) + a{b - a) 2 K w + *) | Thus, taking the limit, we find / = K* 3 - « 3 ), and so _l ri. vi v- + 1X2" + 1) f x 2 dx = K& 3 - a 3 )- In terms of numbers, if a = I, b = 2, then I 2 * 2 dx = K2 3 - l 3 ) = -■ 1 j When the behaviour of f(x) is monotonic over the interval a<x<b, then Theorem 7-1 coupled with Definition 7-1 can often be used to derive interesting and useful series approximations to the definite integral as the following example illustrates. SEC 7.1 DEFINITE INTEGRALS AND AREAS / 309 Example 7-2 Show that » / 1 \ r 2 dx n ( l x r =i \n + r Solution In this case f(x) = 1/x, which is continuous, positive, and mono- tonic decreasing on the interval [1, 2] so that Theorem 7-1 and Definition 7-1 apply. We again choose a partition P n which divides the interval [1, 2] into n equal sub-intervals of length A = 1/n. The general point x r in the partition P n is, of course, x r = 1 + rjn so that n + r Thus as/(x) is monotonic decreasing, it follows that on the interval [x r -i, x r ], f(x) attains its maximum value M r at x r -i and its minimum value m r at x r , where n , » M r = r and m r = « + r — 1 « + r Hence 5, -if-^V and S pBa if— 2_U ~ Pn r =i \n + r) n Pn r -i \n + r - 1/ n so that from Theorem 7-1 and Definition 7-1, we deduce that if 1 )>r*>i(— r ± x \n + r - 1/ ~ Ji x r =i \n + r A few numbers might help here, so we show in the table below the be- haviour of the upper and lower sums S Pn and 5" Pn as a function of n. n Sp n Sp„ 5 0-7456 0-6456 10 0-7188 0-6688 15 0-7101 0-6768 00 0-6931 0-6931 We shall discover later that the exact result, which is shown in this table against the entry n = oo, is in fact log e 2. Before closing this section let us give brief consideration to the effect on Theorem 7T of removing the condition of continuity imposed on the function f{x) and substituting instead the condition that/(x) is bounded. The argu- 310 / FUNDAMENTALS OF INTEGRATION CH 7 ment leading to Theorem 7-1 proceeds as before until the stage at which S P and S Pm are defined. Then, without the continuity of f(x) to ensure thai | M r — m r | -> as \ Xr — x r -i | -> 0, it is no longer possible to infer that when lim S Pm and lim S Pm exist, they are necessarily equal. However, if they do exist and are equal, it follows as before that lim S Pm also converges to the same limit. Thus we arrive at the following more general form of Theorem 7-1. theorem 7-2 (second limit theorem for sums defined on a partition) Let f(x) be a non-negative bounded function defined on the closed interval [a, b], and let P\, P 2 , . . ., P m , . . . be a sequence of successive refinements of some partition P of a < x < b with the property that lim 1 1 A | | Pm = 0. Then, if f < is any point in the z'th sub-interval of length A< generated by the partition P m , and S Pm and S Pm are respectively the lower and upper sums associated with P m , it follows that if lim S P = lim S P = /, it must also be true that / = lim 2 /(f«)A,. II A||P m -*0 » = 1 The corresponding modification of Definition 7-1 is given below for reference and, because this form of definition was first given by B. Riemann (1826-66), the definite integral is known formally as the Riemann integral. Usually only the term definite integral will be employed. definition 7-2 (Riemann integral of a non-negative function) Let/(x) be a non-negative bounded function on the closed interval a < x < b, and let Pi, Pi, ■ . ., P m , ... be a sequence of successive refinements of some partition P of [a, b] with the property that lim || A \\ Pm = 0. Furthermore, let fj be any point in the rth sub-interval of length A* generated by the partition P m , and let S Pm and S Pm be, respectively, the lower and upper sums associated with P m . Then, if lim S P = lim S P , the Riemann integral off(x) integrated over the interval [a, b], and written symbolically f b /Mdx, is defined to be rb f(x)dx= lim i/(IOA*. l|A||p m -0 i=l Ja SEC 7-2 INTEGRATION OF ARBITRARY CONTINUOUS FUNCTIONS / 311 To show that not all bounded functions are Riemann integrable it is only necessary to consider the integral over the interval < x < 1 of the function fM = (J 1 for x rational for x irrational. Then clearly /(x) is non-negative and bounded on [0, 1], but by a suitable choice of the numbers & in the approximating sum of Definition 7-2, the limit of the sum may be made to assume any value between zero and unity. This situation arises because the limits of the upper and lower sums are not the same. In more advanced accounts these difficulties are overcome by defining a more general form of integral known as the Lebesgue integral. 7-2 Integration of arbitrary continuous functions As most functions assume both positive and negative values in their domain of definition, our notion of a definite integral as formulated so far is rather restrictive, for it requires that the integrand be non-negative. A brief examina- tion of the introductory arguments used in the previous section shows that this restriction stems from our idea of area as being an essentially positive quantity, although this was not stated explicitly at any stage in our argument. Nothing in the limiting arguments that we used requires either the upper and lower sums themselves, or any of the terms comprising them to be non- negative. Since a term in either of these sums will be negative when m r or M r is negative, that is, when f(x) is negative, it follows that the inter- pretation of a definite integral as an area may be extended to continuous functions /(x) which assume negative values provided that areas below the x-axis are regarded as negative. This is illustrated in Fig. 7-4 in which the positive and negative area contributions to the definite integral of f(x) integrated over the interval [a, b] are marked accordingly. Thus using this convention when interpreting a definite integral as an area, we may remove the condition that the integrand /(x) be non-negative throughout all of Section 7- 1 . Because it simply amounts to the deletion of the word 'non-negative', we shall not trouble to reformulate our earlier definitions and theorems to take account of this result. It is interesting to observe that had we introduced the definite integral via the upper and lower sums, without any appeal to graphs and areas, this artificial restriction would never have arisen. The definition of a definite integral of a function /(x) integrated over the interval [a, b] immediately implies a number of important general results which we now state in the form of a theorem. No proofs will be offered since the results are virtually self-evident. theorem 7-3 (properties of definite integrals) Let/(x), g(x) be continuous functions defined on the closed interval a < x < b, and let c be a constant and k be such that a < k <b. Then 312 / FUNDAMENTALS OF INTEGRATION CH 7 (b) c/(x)dx = c f(x)dx (Homogeneity), Ja Ja J'b /*6 /*6 (/(*) + g(x))dx = /(x)dx + g(x)dx (Linearity). a Ja Ja Fig. 7-4 Positive and negative areas defined by y = f(x). By virtue of these results, the definite integral of the function /(x) appro- priate to Fig. 7-4 could, if desired, be written in terms of the sum of three integrals involving non-negative integrands. To achieve this, notice that/(x) is negative for k\ < x < k%, so that for all x in this interval, —f(x) is positive. Then, first expressing our integral as the sum of three separate integrals over adjacent intervals f(x)dx = f(x)dx + f(x)dx + f(x)dx, Ja Ja Jki Jk2 (7-14) we can replace — f(x) by | f(x) \ in the second of these integrals to obtain fb /-A-i f( X )dx = /{xyh Jp Ja ° K2 \f(x) | dx + \"f(x)dx. ki Jk2 (7-15) Each of these integrands is now the definite integral of a non-negative function as required. We must now take account of the fact that so far it has been implicit in our definition of a definite integral that x increases positively from a to b, where b > a. This sense, or direction, of integration is indicated in the definite integral by writing a at the bottom of the integral sign J to signify the lower limit of integration and by writing b at the top to signify the upper limit of integration. If, despite the fact that b > a, their positions as upper and lower limits of integration are reversed, this implies that integration is to be carried out in the direction in which x increases negatively. Because we are now allowing areas to have both magnitude and sign, to be consistent we must compensate for a reversal of the limits of integration by changing the sign of SEC 7-2 INTEGRATION OF ARBITRARY CONTINUOUS FUNCTIONS / 313 the integral. Hence we arrive at our next definition. definition 7-3 (reversal of limits of integration) If a < b, then we define the definite integral [ /(x)dX of a continuous function /(x) by the equation P/(x)dx = - f /(x)dx. Jb Ja Example 7-3 Evaluate the definite integral •i 1 2x 2 Ax. 3 Solution From Definition 7-3 we have 2x 2 dx = — 2x 2 dx. Hence an application of Theorem 7-3 (b) together with the result of Example 7-1 shows that f 2x 2 dx = -2 ( x 2 dx = -2(J)(33 - !») = - 52 y Since a definite integral is simply a number, the choice of symbol used to denote the argument of the function/forming the integrand is arbitrary, and often it is convenient to replace x by some other variable, say t. Thus \ b f(x)dx and ff(t)dt Ja Ja are identical in meaning, so that Cf(x)dx = Cf(t)dt. (7-16) Ja Ja On account of this fact, the variable in the integrand of a definite integral is often called a dummy variable, and it is sometimes said to be 'integrated out' when the integral is evaluated. This fact is usually recognized in modern accounts of the theory of the definite integral by simply writing f I Ja in place of either of the expressions in Eqn (7T6). The full significance of the symbol dx, which is suggestive of a differential, comes when changes of 314 / FUNDAMENTALS OF INTEGRATION CH 7 Fig. 7-5 (a) Area / bounded by curves y = fix) and y = g(x); (b) area below y =f(x); (c) positive and negative areas defined by y = g(x). variable of the form x = g(u) are made in Eqn (7-16) and it is for this reason that we choose to retain it. This matter will be taken up in detail in the next chapter, where it is shown that because of the chain rule for differentiation, dx can indeed be interpreted as a differential. Now that the definite integral has been extended to arbitrary continuous integrands we are in a position to determine quite general areas. Consider, for example, the situation illustrated in Fig. 7-5 (a) in which it is desired to determine the area / of the shaded region. Then obviously, referring to Figs. 7-5 (b), (c) we have / = h + J 2 _ l 3 + / 4 , where h to h represent the positive areas identified by these symbols. However, we know that h = f(x)dx, Ja and from the form of argument leading to Eqn (7- 1 5) we also know that —h = \ g{x)dx, h = g(x)dx, -h = g(x)dx, where ki and fa are the first and second points of intersection of y = g(x) with the x-axis as x increases from a to b. However, by Theorem 7-3 (a) we have —h + h — h = g{x)dx, Ja so that combining these results we obtain SEC 7-2 INTEGRATION OF ARBITRARY CONTINUOUS FUNCTIONS / 315 y =/(*) ■HI o Fig. 7-6 Piecewise continuous function y = f(x) defining a sequence of areas A,/2, . . .,/„-!. / = Cf(x)dx - ( b g(x)dx. Ja Ja From Theorem 7-3 (b) it then finally follows that I = f\f(x) - g(x))dx. Ja (7-17) Example 7-4 Find the area / between the two curves y = t 2x and y = —x 2 , which is bounded to the left by the line .v = 1 and to the right by the line x=3. Solution We start by making the obvious identifications f(x) = e 2x , g(x) = —x 2 , a = 1 and b = 3. Then from Eqn (717) it follows that -r (e 2 * + x 2 )d.v whence, using the results of Example 7-1 and Problem 7-3, we find 26 / = Ke 6 - e*) + y • The fact that a definite integral is additive with respect to its interval of integration enables a function to be integrated even when it has discontinu- ities, provided only that they are finite in number and that elsewhere the function is continuous and bounded. This result is perhaps best seen dia- grammatically, though an analytical justification can easily be given without appeal to geometry. By way of example, consider the function y =f(x) illustrated in Fig. 7-6 which is bounded and continuous everywhere except at the discrete number of points r\\, r]2, . . ., r\ n - Such a function is said to be piecewise continuous, for obvious reasons. Using the valid interpretation of a definite integral in terms of area we see 316 / FUNDAMENTALS OF INTEGRATION CH 7 that the total shaded area / is the sum of the sequence of areas h, h, . . ., I n +u so that we may still write I=\ f(x)dx, (7-18) Ja but this time with the understanding that J'b r n - /*ij2- rb f(x)dx = f{x)dx + f(x)dx + • • • + /(.v)d.v. (7-19) o •>" J ni+ Jn,i + Here, as before, we have used r^— to signify the limiting process of approaching the point x = r\t from the left, and r] t + to signify the limiting process of approaching the point x = r\i from the right. Example 7-5 Evaluate the definite integral /=£/(x)dx when _ lx 2 for < .v < 1 / W - | e5 z for J < x < 2 . Solution From Eqn (7-19) we have / = x 2 dx + \ e 5 * dx, Jo Ji + so that evaluating the integrals and then taking the appropriate limits gives 7 = 1 + l( e io _ e »). Sometimes a more difficult situation than this arises in which either the integrand tends to infinity at some point in the interval of integration or, perhaps, the interval of integration itself is infinite in length. Such definite integrals are called improper integrals, and the way in which to attribute a value to any such integral is suggested by Eqn (7-19). Let us illustrate something of the difficulty that can arise if ideas are not made precise. Consider the integral /: dx Then since;' = l/.v 2 is essentially positive, the area under the curve must also be positive. Now if we apply the result of Problem 7-5 we have I 1 dx_ _]_ J_ _ _ x 2 1—1 SEC 7-2 INTEGRATION OF ARBITRARY CONTINUOUS FUNCTIONS / 317 which, since it is negative, contradicts our previous conclusion. What has gone wrong? The trouble is that \jx 2 tends to infinity as x -> 0, so that the arguments of Problem 7-5 are not applicable, for it was pre-supposed there that the interval of integration excluded the origin. When dealing with improper integrals of this type in which the integrand has an infinity within the interval of integration we shall assign a value to the integral according to the following definition. definition 7-4 (improper integral due to infinity of integrand) Let the function f(x) be continuous throughout the intervals a <Z x < c and c < x < b, and suppose that f(x) has a singularity at x = c in the sense that f{x) tends to infinity as x -> c. Then the integral of f(x) over the interval of integration [a, b] is said to be improper, and it is defined to have the value f(x)dx + lim f(x)dx, a (5--0 Jc + d whenever both limits involved exist. Under these circumstances the improper integral will be said to converge to the value /. When either of the limits does not exist, the integral will be said to be divergent. If the point c coincides with an end-point of the interval [a, b], then / is defined to be equal to the limit of the single integral for which the interval of integration lies within [a, b]. On the basis of this definition we are now able to determine the value to be attributed to the improper integral used as an illustration above. Let us do this in the form of an example. Example 7-6 Evaluate the improper integrals : Solution The integrand Ijx 2 tends to infinity as x — *■ 0, so that for case (a), when appealing to Definition 7-4, we need to make the identifications a = — 1, b = \, c = and/(x) = \jx 2 . Thus, ,• t~° dx .• f ld * h = hm — + hm — • ^o J-i x 2 g-o Js x 2 Using the result of Problem 7-5 we find that /i = lim(-- 1 ) +lim(-l + t ) -* oo. e ^o \e / <5-0 V <v Thus the improper integral (a) is divergent. In case (b) the integrand is (x 2 + l)/x 2 , which again tends to infinity as x—*-0. However, in this case we must make the identifications a = — 1, b = 0, c = 0, and/(x) = 1 + l/x 2 , so that this time the singularity in the integrand occurs at the right-hand end-point of the interval of integration 318 / FUNDAMENTALS OF INTEGRATION CH 7 [—1, 0] (that is, at the upper limit of integration). It then follows from Definition 7-4 that which, from the results of Problems 7-2 (b) and 7-5, becomes <--M 7 2 = lim (-£ + + I-- M/ — °°- Hence the improper integral (b) is also divergent. The one remaining form of improper integral requiring consideration occurs when the interval of integration is infinite. In these circumstances we shall assign a value to the integral according to the following definition. definition 7-5 (improper integral due to infinite interval of integration) Let the function fix) be continuous on the interval [a, co), then the integral of/(x) over the interval of integration [a, oo) is said to be improper, and it is defined to have the value f* h = lim f(x)dx, k->osJa whenever this limit exists. Under these circumstances the improper integral will be said to converge to the value h. When the limit does not exist, the integral will be said to be divergent. Similarly, if the interval of integration is (— oo, a], then when the limit exists, the improper integral of f(x) over the interval of integration (— oo, b] is defined to have the value h = lim f(x)dx. fc->oo J — k Symbolically, these improper integrals will be denoted, respectively, by I x = P/Wdx and h = \ f(x)dx. Ja J -co Example 7-7 Evaluate the improper integral f 00 dx Solution It follows at once from Definition 7-5 that . C k dx / = l.m -, it-* oo J3 X" so that by virtue of the result of Problem 7-5, SEC 7-3 INTEGRAL INEQUALITIES / 319 fc^oo \_k 3 J 3 Hence this improper integral converges to the value 1/3. 7-3 Integral inequalities A number of useful inequalities may be deduced concerning definite integrals, the simplest of which has already been stated inEqn(7T). Let us now derive our first result of this type, of which Eqn (7-1) represents a special case. Suppose that the definite integrals of/(x) and g(x) taken over the interval' [a, b] both exist. In brief, let us agree to say that/(X) and g(x) are integrable over the interval [a, b]. Now suppose that/(x) < g(x) for a < x < b. Then if P m is a partition of [a, b], we have from Theorem 7-2 that Cg(x)dx- Cf(x)dx = f (g(x)-f(x))dx Ja Ja Ja n .= Km 2(s(f<)-/(&))Af, (7-20) l|A|| Pm -0 i = \ where f « is some point in the fth sub-interval of length A« generated by the partition P m . Now since by hypothesis f(x) < g(x), it follows that f(h) < g(£i), so that the right-hand side of Eqn (7-20) must be non-negative. Thus we have proved the following theorem. theorem 7-4 (inequality between two definite integrals) Let/(x) <g(x) be two integrable functions over the interval [a, b]. Then, Cf(x)dx < f g(x)dx. Ja Ja Equation (7-1) follows as a trivial consequence of this result, for the theorem implies that if </>(x) </(*)< y>(x) are three integrable functions over the interval [a, b], then fb rb rb (f>(x)dx< f(x)dx< rp{x)dx. Ja Ja Jx Hence, if m, M are, respectively, the minimum and maximum values of f(x) on [a, b], our required result follows by setting </>(x) = m, y>(x) = M, when we obtain m(b - a) < f{x)dx ^ M(b - a). (7.21) Ja This last simple result implies a more important result which we now derive by appeal to the intermediate value theorem of Chapter 5. Writing 320 / FUNDAMENTALS OF INTEGRATION CH 7 inequality (7-21) in the form m <- — a Ja f(x)dx^M shows that the number — a Jn, f(x)dx is intermediate between m and M which are extreme values of the function f(x) itself. Hence, provided /(x) is continuous, it then follows from the inter- mediate value theorem that some number f exists, strictly between a and b, such that '«>-rbJ> )dJ (7-22) This result is called the first mean value theorem for integrals, and it constitutes our next theorem. theorem 7-5 (first mean value theorem for integrals) Let f(x) be con- tinuous on the interval [a, b], then there exists a number f, strictly between a and b, for which f f(x)dx = (b- a)f(i). Ja AF = Fix + h) - F{x) y =/w 7> O a x x+ h b * Fig. 7-7 Area below y = fit) as a function of the upper limit of integration x. 7-4 The definite integral as a function of its upper limit-indefinite integral If the lower limit of a definite integral is held constant, but the upper limit is replaced by the variable x, then the numerical value of the integral will clearly depend on x. Another way of describing this situation is if we say that a definite integral with a variable upper limit x defines a function of x. In Fig. 7-7 this idea is illustrated in terms of areas, with the shaded region marked SEC 7-4 UPPER LIMIT-DEFINITE INTEGRAL / 321 F(x) denoting the area below the curve y = f(t) which is bounded on the left by the line t = a, and on the right by the line t = x. In terms of the definite integral we have F(x) = fV(0dr. (7-23) Now let us suppose that f(t) is continuous in some interval [a, b], with a< x<b. Notice here that for the first time it is necessary to use the dummy variable t, because x and t are fulfilling two different roles inEqn (7-23). To be precise, x represents the upper limit of integration, whilst the dummy variable t represents the general variable in the interval of integration a<Lt<x. Consider the difference F(x + A) - F(x) = f + V(0d* ~ \ X f{t)dt Ja Ja Jx x + h f(t)dt. (7-24) Then the first mean value theorem for integrals allows us to rewrite Eqn (7-24) in the form F(x + h)- F(x) = hf(M\ (7-25) where x < f < x + h. Now, forming the difference quotient {F(x + h) — F{x)}jh, we find F(x + h)- F(x) _ h ~ Jkih so that taking the limit as h —*■ gives, r W = limJ F( ' + *>- fW }-/ W . (7.26) This important result shows that the integrand of integral (7-23) at the upper limit of integration / = x is equal to the derivative of F(x) with respect to x. Suppose now that G(x) is any function for which G'(x) = f(x). Then, G'(x) - F'(x) = £ [G(x) - F(x)} = 0, and so from Corollary 5T2 G(x) = F(x) + constant. (7-27) Combining Eqns (7-23) and (7-27) shows that the most general function G(x) whose derivative is equal tof(x) must be of the form G(x) = f'/COdr + C, (7-28) Ja where C is a constant. 322 / FUNDAMENTALS OF INTEGRATION CH 7 The first term on the right-hand side of Eqn (7-28) is called an indefinite integral. The function G(x) itself is called either a primitive off or an anti- derivative of/. We shall usually use the name antiderivative, since this offers an accurate description of the process by which it is to be found. Namely, an antiderivative arises from the process of reversing the operation of differ- entiation, and the most frequent method of finding antiderivatives utilizes this idea by employing tables of derivatives in reverse. That is to say, by matching an integrand with an entry in a table of derivatives and thereby finding the functional form of G(x) apart from the additive arbitrary constant. Usually the antiderivative G(x) defined in either Eqn (7-27) or Eqn (7-28) is written symbolically in the form f f(x)dx = F(x) + C. (7-29) In this notation, the fact that an antiderivative is a function related to the operation of integration, and not just a number as in an ordinary definite integral, is indicated by again employing the integral sign, but this time without limits. On occasions the reader will find books in which an antiderivative is signified by the notation f f(x)dx, rather than the notation used in Eqn (7-29). The following short table lists a few of the antiderivatives which are of most frequent occurrence in mathematics. Table 7.1 J/(x)dx = F(x) + C /(*) F{x) 1 a (const) ax 2 x n jfn+i n + I 3 fJ.X A 6 4 sin x — COS X 5 cos* sin x Other useful elementary antiderivatives that should be memorized, together with an account of systematic methods for finding antiderivatives, are given in the next chapter. Let us now return to Eqn (727) and notice that it follows from this that SEC 7-4 UPPER LIMIT-DEFINITE INTEGRAL / 323 G(b) - G(a) = F(b) - F(a) = F(b) = f /(x)d.r. (7-30) Ja Hence we have proved that f(x)dx = G(b) - G(a), (7-31) Ja where G'(x) =f(x). This provides a method for the evaluation of definite integrals, for expressed in words it asserts that the definite integral off(x) taken over an interval [a, b] is the difference between the value of any antiderivative of f(x) at x = b and x = a. It is now time to express results (7-26) and (7-31) in the form of two basic theorems known, respectively, and the first and second fundamental theorems of calculus. theorem 7-6 (first fundamental theorem of calculus) lf/(x) is continuous for a< x<b, and F(x) = ["fiWt, then F'(x) = f(x) for all points x in [a, b]. Alternatively expressed, this result may also be written d^/> )d '= /W - theorem 7-7 (second fundamental theorem of calculus) If f(x) is con- tinuous for a < x < b and G{x) is any antiderivative of f(x), then f Ja f(t)dt = G(x) - G(a). The statement of Theorem 7-7 is often written in the form f f(x)dx = G(x)\lZl Ja with the understanding that G(x)\*z b a = G(b) - G(a). It follows from Theorem 7-7 that the definite integral calculated so laboriously in Example 71 may be evaluated directly by appeal to entry number 2 in Table 7- 1 . To see this set n = 2, so that f(x) = x 2 , then F(x) = x 3 /3, and by Theorem 7-7 we immediately deduce that f x 2 dx = K* 3 - a 3 ). 324 / FUNDAMENTALS OF INTEGRATION CH 7 The systematic employment of the fundamental theorems of calculus will be taken up in detail in Chapter 8, since our concern here is primarily with the theory rather than the practice of integration. Finally, to emphasize that the indefinite integral is a function, we now give an example of such an integral which defines an important mathematical function. Since we have the relationship d , l — loge x = -, for X > 0, ax x it follows from Theorem 7-7 that, provided a > 0, — = loge X — loge 0. Ja Hence, setting a = 1 gives the result C'dt log.* -J 7 which is illustrated as the shaded area in Fig. 7-8. (7-32) Fig. 7-8 Natural logarithm represented as an area. 7-5 Differentiation of an integral containing a parameter It can sometimes happen that an integrand, in addition to being a function of x, also depends on a parameter a. Furthermore, the upper and lower limits of the integral may themselves be functions of a so that the value of the integral must then itself depend on a. Our concern in this section will be with the differentiation, with respect to a, of an integral of the form /(a) = f{x, a)dx. (7-33) To derive the form of our result let us begin by assuming that <^(a), f(cn) are difFerentiable functions with respect to a in some interval c < a < d, and that/(x, a) is both integrable with respect to x on the interval [^(a), ^(a)] SEC 7-5 INTEGRAL CONTAINING A PARAMETER / 325 and differentiable with respect to a. Then, first notice that from the mean value theorem for derivatives, in c < <x + h < d, we have <£(« + h) = <£(a) + ft ( -^ J , with oc < f < a + ft; y(a + ft) = y(a) + ft l-p) , with a < r? < a + ft; (7-34) /(jc, a + ft)=/(x, a) + ft(/| , witha< £<a + ft. The partial derivative notation is needed in the last of these results because for this application of the mean value theorem for derivatives we are regarding the variable x as a constant. Now we have fix, oc + h)dx, so that using results (7-34) we find /(oc + ft) = f{x, a + ft)dx + f(x, a + ft)dx •Mac) Jtf(a) + f(x, a + ft)dx An application of the mean value theorem for integrals (Theorem 7-5) to the first and last terms then shows that /(a + ft) = ft (p^\ f(x', a + ft) + P fix, oc + h)dx where y(oc) < x < y(a) + Ay', <^(a) < x" < <£(a) + h<j>'. Next, forming the difference /(a + ft) — /(a), combining integrals and using the final result of (7-34) gives /(oc + ft) - /(oc) = ft (^) fix', oc + ft) + ft f * " ffi dx - ft (^) /(*", a + ft). (7-35) Finally, forming the difference quotient {/(a + h) — /(a)}/ft and taking the limit as h -*■ it follows that f , »j, and £ all tend to oc, whilst x' tends to • y(a) and x" tends to <£(a), whence 326 / FUNDAMENTALS OF INTEGRATION CH 7 d/ d - = (? W «) - (t)m> a) + I"*" %■ dx - ( 7 ' 36 > a \da/ \da/ J*(«) dx theorem 7-8 (differentiation of an integral containing a parameter) Let (f>(x), ^i(a) be differentiable functions with respect to a in some interval c < a < d, and let /(x, a) be both integrable with respect to x over the interval <f>(x) < x < ^(a) and differentiable with respect to a. Then, i r a> /(x, a)dx = (^ W a) - (^ W a) + P ^ dx. da J*(a) \da/ \da/ J#a) ox A useful special case of this arises when <^(a) = a and y(a) = b are con- stants, so that the only dependence on the parameter a is through the inte- d^ dw grand /(x, a). The terms — and — - are then identically zero, so that we da da arrive at the following corollary. Corollary 7.8 If /(x, a) is both integrable with respect to x over the interval [a, b] and differentiable with respect to a, then d C b C b 8f — f(x, a)dx = — dx. da Ja Ja OX Example 7-8 Apply the results of Theorem 7-8 to the following integral: |«3 + 2 sin 3a j„ 1(a) = f + coset X 2 + a 2 Solution If we make the identifications <f>{x) = 1 + cos a, y)(x) = 3 + 2 sin 3a, and /(x, a) = (x 2 + a 2 ) -1 , it then follows directly from Theorem 7-8 that d/ 6 cos 3a sin a „ f 3 + 2sin3 « dx /•3 + 2sin3a a) 2 + a 2 Jl + cosa (.X da (3 + 2 sin 3a) 2 + a 2 (1 + cos a) 2 + a 2 Ji+cosa (x 2 + a 2 ) 2 7-6 Othe/ geometrical applications of definite integrals This section offers a brief discussion of the application of the definite integral to the determination of arc length for plane curves, the surface area of a surface of revolution, and the volume of a volume of revolution. Each result will be derived by appeal to the basic definition of a definite integral, since it will first be necessary to define the precise meaning of the concepts that are involved. SEC 7-6 OTHER GEOMETRICAL APPLICATIONS / 327 O a = Xo Xl X2 Xn-1 X„ = b (a) (b) Fig. 7-9 (a) Arc length of curve; (b) element of arc length. (a) Arc length of a plane curve Consider the plane curve V with the equation y — f(x) illustrated in Fig. 7-9 (a). Then our task here will be first to define the meaning of the length s of the arc MN, and then to deduce a method by which it may be found once the equation of T has been given. Let go, gi, ■ ■■, Qn represent any set of points on T, the first of which coincides with the left-hand end- point M, and the last of which coincides with the right-hand end-point N. Then if A.s t denotes the length of the chord joining g 4 -i to Q ( , the length S n of the polygonal line joining M to N is n i = l Now the projection of the set of points Qo, Qi, . . ., Qn onto the x-axis defines a set of points a = xq < *i < . . . <x n = b which form a partition P n of the interval [a, b]. Thus, denoting the norm of P n by || A \\ Pn , we shall define the length s of the arc T from M to N to be lim 2 A 5 *- n A iip„-*° f=1 (7-37) Now, setting A< = xt — x«_i and 6i —f(xi) —f(Xi-i), it follows directly by an application of Pythagoras' theorem (Fig. 7-9 (b)) that m a^ = vW-r-<^)= m + However, by virtue of the mean value theorem for derivatives we may write, provided that/(;c) is differentiable on [a, b], Si f(xi) -f(x t -i) Xi Xi-l = /'(&), where xt~i < f t < xu and so &s ( = V(l + [/'(&)] 2 ) A«. (7-38) 328 / FUNDAMENTALS OF INTEGRATION CH 7 Thus the desired arc length s will be determined by evaluating s = lim J V(l + ifiidf) A ( . ||A||p B -Ot = l (7-39) We see from Definition 7-2 that this is simply the definite integral of the function \/(l + [f'(x)] 2 ) integrated from x = a to x = b, and hence * = f VO + [/'(*)] 2 )d* = f III + f^fW (7-40) theorem 7-10 (arc length of plane curve) Let y =f(x) be a differentiate function on the interval [a, b\. Then the length j of the plane curve Y defined by the graph of this function in the (x, jO-plane between the points (a,f(aj), (b,f(b)) is given by Example 7-9 Determine the length of arc of the curve y = cosh x between the points (1, cosh 1) and (3, cosh 3). Solution We have a = 1, b = 3, y = cosh x, and so dy/dx = sinh x, whence -r V(l + sinh 2 x) Ax-- f 3 cosh x dx. Now since d/dx (sinh x) = cosh x, it follows that sinh x + C is an anti- derivative of cosh x, so that by Theorem 7-7 we have s = cosh x dx = (sinh x + C~)\\ = sinh 3 — sinh 1. y = v(<) O a a x — 0( f ) /J Fig. 7-10 Length of parametrically defined curve r. B(r = 7i) SEC 7-6 OTHER GEOMETRICAL APPLICATIONS / 329 Theorem 7-10 will fail for curves T of the type shown in Fig. 7-10, for any representation of the function in the form y =f(x) will not be single valued on the interval [a, /?], and so it will not be differentiate there. The difficulty here is easily overcome by using the fact that each point on the curve T can be uniquely defined and a unique derivative assigned if the curve r is capable of parametric representation in the form * = <£(/), y = y>(0 for T <t<Ti, (7-41) with 4>(t), f(t) differentiable on [T , T{]. Using the result for parametric differentiation J w dx f (0 in Eqn (7-39), and then employing the differential relationship A* = </>'(t)A.t to define A< in terms of At, we find that s= lim I /(i + [ffi] V(f,)Af, (7-42) where u~\ < f * < U. Thereafter, the argument that gave rise to Eqn (7-40), now gives rise to 3 = loV ( l + ill)] l ^^ = f V(tf {t)? + WitW) ^ (7 ' 43) theorem 7-11 (arc length of parametrically defined curve) Let <f>(t), y>(t) be differentiable functions in T < t < T\. Then the length s of the plane curve defined parametrically by x = <f>(t), y = \p(i) between the points (<f>(T ), y>(T )), (<f>(Ti), v(ri)) is given by J'Ti V([«A' (tw + mmdt. (b) Area of surface of revolution The name surface of revolution is given to any surface which is generated by rotating a plane curve y = f(x) about either the x-axis or the /-axis. Since the determination of the area in either case is exactly similar, we shall discuss only the case of the revolution of the curve y —f(x) about the x-axis, as shown in Fig. 7-11. A problem arises here as to how to define the area of a non-cylindrical curved surface. We propose to approach the problem by sectioning the surface into annular strips of width A< as shown in Fig. 7-11, and then to approximate the area AS of each such annular strip by representing it by the conical area which is obtained by rotating the chord PQ of length Ast about the x-axis. Then if this element of area of cone between the planes x — x«-i and x = xt is ASt, this will be given by 330 / FUNDAMENTALS OF INTEGRATION CH 7 A*^_Q Fig. 7-11 Area of surface of revolution. AS, = 2n( y -^±Il) A Si . (7-44) Similar elements of area may be defined for each of the other annular strips defined by some partition P„ of the interval [a, b] by the set of points a = xo < xi < ■ ■ • < x n = b. Thus, denoting the norm of P n by || A|| Pn , we shall define the area S of the surface of revolution generated by rotating y = fix) about the x-axis, and contained between the planes x = a and x — b, to be n n S= lim 2AS*= lim rr £ iyi-i + yd A*. (7-45) ||A|| Pb -*0 i = l HAIIp„-0 i = l Hence, if fix) is differentiable in a < x < b, by using result (7-38) we find S = lim ttJ 0,_i + _y, V(l + [/'(&)] 2 ) A«, ||A||P„^0 t=l (7-46) where Xi-i < & < x«. Once again our previous form of argument shows that this is just the definite integral of the function 27t/(x)\/(1 + [fix)] 2 ) integrated from x = a to x = ft, and so S = 2n \ b fixWi\ + [f'(x)f) dx. Ja (7-47) SEC 7-6 OTHER GEOMETRICAL APPLICATIONS / 331 theorem 7-12 (area of surface of revolution) Let/(x) be a differentiable function on a < x < b. Then the area 5" of the surface of revolution generated by rotating the graph of the function / = f{x) about the x-axis, and contained between the planes x = a and x = A is given by S = 2n ff(xW(l + [f(xW)dx. Ja Example 7-10 Find the area contained between the planes x = — 1 and x = 2 of the surface of revolution about the x-axis of the curve y = cosh x. Solution We have a = — 1 , b = 2, and/(x) = cosh x, and so/'(x) = sinh x, whence S = 2tt \ cosh x-\/(l + sinh 2 x) dx = 2tt I cosh 2 x dx. To evaluate this result we now use the hyperbolic identity cosh 2 x = }(1 + cosh 2x) to obtain S = 77- (1 + cosh 2x)dx. Then, as it is easily verified that \ sinh 2x + C is an antiderivative of cosh 2x, we have from Theorem 7-7 that 12 -I S = 7T (1 + cosh 2x)dx = tt(x + \ sinh 2x + C)| 2 = |w(6 + sinh 4 + sinh 2). (c) Volume of revolution Finally, let us determine the volume of revolution V of the volume shown in Fig. 7-11. This time, to define the volume of such a figure, we consider cylindrical elements of volume of thickness A«, and place upper and lower bounds on that element of volume by the obvious inequality : 77 x (least radius of annulus) 2 x Aj< element of volume < tt x (greatest radius of annulus) 2 x Aj. Then, if xi-i < |« < x«, a volume element A Vi satisfying this inequality and bounded to the left by the plane x = xi-i. and to the right by the plane x = xt is AK« = ^[/(fOPAi. (7-48) The volume of revolution generated by rotating y =f(x) about the x-axis, and contained between the planes x = a and x = b will then be defined to be V= lim Trif/dOPA*. (7-49) ||A||p B ^0 i = l 332 / FUNDAMENTALS OF INTEGRATION. CH 7 A repetition of the previous form of argument then yields V = 77 f [/(x)] 2 dx. (7-50) Ja Notice that we have imposed no differentiability requirements on f(x), so that result (7-50) is applicable even if/(x) is only piecewise continuous. theorem 7-13 (volume of solid of revolution) Let /(x) be a piecewise continuous function on a < x < b. Then the volume of the solid of revolu- tion generated by rotating the curve y = f(x) about the x-axis, and contained between the planes x = a and x = b, is given by V=TT f [fixWdx. Ja Example 7-11 Determine the volume of revolution generated by rotating the parabola y = 1 + x 2 about the x-axis, and contained between the planes x = 1 and x = 2. Solution Here we have a = 1, b = 2, and/(x) = 1 + x 2 , so that V = 77 f (1 + x 2 ) 2 dx = 77 f (1 + 2x 2 + x*)dx ( 2x 3 x 5 \ = 77 \X + 2 _ 17877 i ~ "TJ" 7-7 Numerical integration From the second fundamental theorem of calculus we have seen that the successful analytical evaluation of a definite integral involves the deter- mination of an antiderivative of the integrand. Although in many practical cases of importance an antiderivative can be found, the fact remains that in general this is not possible and Theorem 7-7 is therefore of no avail. Such, for example, is the case with an integral as simple as e-* 2 dx, f for although an antiderivative of e~* 2 certainly exists on theoretical grounds, it is not expressible in terms of elementary functions. Of the many possible methods whereby a numerical estimate of the value of a definite integral may be made, we choose to mention only the very simplest ones here. The general process of evaluating a definite integral by numerical means will be referred to as numerical integration, though the old fashioned term numerical quadrature is still often employed for such a process. The matter of the accuracy of these methods will be taken up SEC 7-7 NUMERICAL INTEGRATION / 333 elsewhere in connection with applications of Taylor's theorem. O -►* a = Xo Xl Xl X»-l x n = Fig. 712 Trapezoidal approximation of area. (a) Trapezoidal rule Although a strictly analytical derivation of the so called trapezoidal rule for integration may be given we shall not use this approach, and instead make appeal to the area representation of a definite integral. Consider Fig. 7-12, and let us estimate the shaded area below the curve y = f(x) which we know has the value f Ja f(x)dx. Let us begin by taking any set of n + 1 points a = xo < xi < • ■ • < x n = b, and on each interval [xt-i, xt], approximate the true area above it by the trapezium obtained by replacing the arc of the curve through the points (xi-i,/(x«-i)), (xuf(xi)) by the chord joining these two points. Then the area of the trapezium on the interval [xt-i, xi\ is Uf(xi-i) +/(*<)) A*, where Aa-j = xt — xt-i. Thus, adding the n contributions of this type, we arrive at the general trapezoidal rule f(x)dx *< M/(*o) +/(xi))Axi + K/(*i) +/(x 2 ))Ax 2 + • • • Ja + i (f(Xn-l) + f(x n )) Ax„. (7-51) If the interval [a, b] is divided into n equal parts of length h = (b — a)jn, then (7-51) becomes the trapezoidal rule for equal intervals ^^dx = h[lf(x ) +f( Xl ) +/(X 2 ) + • • • +/(*„_!) + kf(xn)] + e(h), (7-52) f Ja 334 / FUNDAMENTALS OF INTEGRATION CH 7 where an equality sign has now been used because we have included the error term e(h), which recognizes that the error is, in part, dependent on the magnitude of h. (b) Simpson's rule A different approach involves dividing [a, b] into an even number n of sub- intervals of equal length h = {b — a)jn, and then approximating the function over consecutive pairs of sub-intervals by a quadratic polynomial. That is to say fitting a parabola to the three points (a,f(a)), (a + h,f(a + h)), (a + 2h, f(a + 2h)) comprising the first two sub-intervals, and thereafter repeating the process until the whole of the interval [a, b] has been covered. The value of the definite integral can then be estimated by integrating the successive quadratic approximations over their respective intervals of length 2h and adding the results. This simple idea leads to Simpson's rule for numerical integration which we now formulate in analytical terms. Consider the first interval [a, a + 2h], and represent the function y = f(x) in this interval by the quadratic y = co + cix + c 2 x 2 . (7-53) Then the approximation to the desired integral taken over this interval is f(x)dx s» (c + cix + c 2 x 2 )dx a Ja a + 2h "( C0X+ __+_ (7-54) To determine the coefficients Co, ci, and c% in order that the quadratic should pass through the three points (a,f(a)), (a + h,f(a + h)), {a + 2h,f(a + 2h)) we must solve the three simultaneous equations f(a) = co + c x a + c 2 a 2 , f(a + h) = c + ci(a + h) + cv{a + h)\ f(a + 2/r) = co + ci(a + 2h) + c 2 (a + 2h) 2 . (7-55) When this is done and the results are substituted into Eqn (7-54) we arrive at the desired result f Ja f(x)dx = - (f(a) + Af{a + h) + f(a + 2A)) + e(h), (7-56) where again we have included the error term by e(h). In its simplest form Eqn (7-56), together with its error term, is called Simpson's rule. An explicit form for e(h) in both the trapezium rule and Simpson's rule will be given later. If, now, result (7-56) is applied to the intervals [a, a + 2h], [a + 2h, a + 4h], . . ., [a + (« — 2)h, b] and the results are added, we arrive at SEC 7-7 NUMERICAL INTEGRATION / 335 Simpson's rule for an even number n of intervals f f(x)dx = \ [f(a) + 4/(a + h) + Ifia + 2h) + 4f(a + 3h) + ■ ■ • + Af{a + (« - \)h) + fib)] + eih), (7-57) where h = ib — a)\n. Example 712 Calculate the definite integral ,2 dx x -r by the trapezoidal rule and by Simpson's rule, taking ten integration steps of length h = 01. Solution We start by tabulating the functional values of the integrand l/x at intervals of 01. X 1 X 10 10000 11 0-9091 1-2 0-8333 1-3 0-7692 1-4 0-7143 1-5 0-6667 1-6 0-6250 1-7 0-5882 1-8 0-5556 1-9 0-5263 20 0-5000 Then, using the trapezoidal rule (7-52), we find 7^01 x [0-5000 + 0-9091 + 0-8333 + 0-7692 + 0-7143 + 0-6667 + 0-6250 + 0-5882 + 0-5556 + 0-5263 + 0-25], whence / ^ 0-6938. The same calculation using Simpson's rule, (7-57), gives / & -^- X [1-0000 + 4 X (0-9091) + 2 X (0-8333) + 4 x (0-7692) + 2 x (0-7143) + 4 x (0-6667) + 2 x (0-6250) + 4 x (0-5882) + 2 x (0-5556) + 4 x (0-5263) + 0-5000], whence I «» 0-6932. 336 / FUNDAMENTALS OF INTEGRATION CH 7 In actual fact the exact result of this definite integral is log e 2 = 0-69315. As would have been expected on intuitive grounds, Simpson's rule is more accurate than the trapezoidal rule. (c) Integration of interpolating polynomials A direct extension of the previous method that may be exploited system- atically to produce integration formulae of high accuracy and flexibility involves the replacement of the function y = f(x) over the interval [a, b] by an interpolating polynomial of degree n. Thus, on the interval [a, b], the function y = f(x) is represented by y = Co + C!X + C2X 2 + • • • + c n x n , (7-58) and the numerical integration formula then follows by writing J'b rb f(x)dx ^ (c + cix + c 2 x 2 + • • • + c n x n )dx. (7-59) a Ja Thus, if the error term is again represented by e(h), we obtain the numerical integration formula rb f Ja f(x)dx = c (b - a) + °-± (b* - «2) + | (63 _ fl 3) + + —-: (b n+1 - a» +1 ) + e(h). (7-60) The difficulty in this approach arises from the fact that the sense in which Eqn (7-58) is to approximate y = f{x) is still to be defined, and this will influence both the method by which the n + 1 coefficients Co, c\, . . ., c n are to be determined and, naturally, the error term e(h). Probably the simplest choice of approximating polynomial, and the only one to be discussed here, is determined by the requirement that the poly- nomial and the function should have identical values at it + 1 points xo < xi < • • • < x n belonging to [a, b]. That is, the requirement that the graph of Eqn (7-58) should pass through the n + 1 points (xo,/(*o)), (xi,f(xij), • • ■, (x n ,f(xn))- Such a polynomial is called a Lagrangian interpolation polynomial, and its form may be written down directly as follows. We illus- trate the Lagrangian interpolation polynomial Ls(x) of degree 3, which passes through the four points (xo,f(x )), (xi,/(xi)), (x2,f(x s )), and (xs, /(xs)). Higher degree polynomials may be constructed in a similar manner. (x- xi)(x - x 2 )(x - x 3 ) U(x) = -f(x ) (Xo ~ Xl)(*o — *2)(-X0 — *3) (X - X )(X - X2)(X — X 3 ) .. . + 7 ^7 ^7 ;/Oi) (xi — xo)(*i — X2)0:i — x 3 ) PROBLEMS / 337 (x - Xp)(x - Xi)(x -X 3 ) (x 2 — X )(X2 ~~ *l)(*2 — X3) + (^-^-^-X2) (7 . 61) (X 3 — X )(X3 — Xl)(X3 — X 2 ) This form of approach to the development of an integration formula is essential when, as is often the case, the function/(x) is only known in tabular from. Example 7-13 Given the following tabular values of a function /(x), derive the Lagrangian interpolation formula Z-3(x) for/(x). r Xr /(*') 2 2131 1 4 1-242 2 6 4-507 3 7 9-702 Solution It follows by direct substitution into Eqn (7-61) that (x - 2)(x - 4)(x - 7) + (4X2X-I) * (4 ' 507> (x - 2)(x - 4)(x - 6) Simplification of this will yield the required third degree polynomial which may, if desired, then be integrated over any sub-interval of the interval [2, 7] on which /(x) is defined, thereby yielding an approximation to the definite integral of/(x) integrated over that same sub-interval. PROBLEMS Section 71 71 Let f(x) = ).x on some closed interval a < x < b lying in the positive part of the x-axis, where / > is a constant. Then, if P n is a partition of [a, b] into n sub-intervals of equal length, determine the form of the lower and upper sums 338 / FUNDAMENTALS OF INTEGRATION CH 7 §i\; Sp n for /(*) taken over this partition and prove directly by taking the limit that lim S Pn = lim §p n . Hence deduce that Ja b j Ax dx = - (6 2 - a 2 ). 7-2 Let A, /* > be constants, and set /(x) = /i + Ax on some closed interval a < x < b lying in the positive part of the x-axis. Show, using the method of Problem 71, that 6 A (// + Ax) dx = ,i(b - a) + - (b 2 - a 2 ). (A) Ja Show also by this method that Cb ftdx= n{b - a), (B) f Ja Ja and deduce from (A), (B) and the result of Problem 71 that rb pb rb {n + Ax) dx = I ji dx + I Ax dx. Ja Ja Ja This provides a direct proof of the linearity of the operation of integration in the special case that f(x) = ft + Xx. 7-3 Let /"(x) = e Xx , and take P n to be a partition of the closed interval [a, b] into n sub-intervals of equal length. By taking the numbers |< of Definition 7-1 to be at the left-hand end points of the sub-intervals, compute the approximating sum Sp n corresponding to/(x) = e Ax , and by finding its limit prove that Ja b i t Xx dx =- (e Xb - e Aa ). 7-4 If a < k < b, use the result of Problem 7-3 to deduce that Cb fk rb J'b flc pb e* x dx = e A * dx + e*" dx. a Ja Jk This provides a direct proof that the operation of integration is additive with respect to the interval of integration in the special case that/(x) = s Xx . 7-5 Let [a, b] be any closed interval not containing the origin, and denote by P m the partition of this interval into m equal sub-intervals each of length (b — d)\m. Denote by x r the point x r = a + (rjm){b — a) lying at the right-hand end point of the rth interval. Then, by setting f r = V(x r -ix r ) show, by considering x r -i — S r and x r — fr, that x r -i < IV < x r +i. By writing /(x) = 1/x 2 in Definition 7-2; and taking P m and the points f r in that definition to be as defined above, prove that r b dx = /l _ 1\ PROBLEMS / 339 n j „ / 1 W i l \ Hint: Use the fact that J, = 1 Z Z~, )\z~~T) , = 1 Xr-lXr r ~i \ x r ~ x r-lf \Xr-l X r J 7-6 Determine the lower bounds m r and the upper bounds M r of the function f(x) = 1/(1 + x 2 ) in each of the n adjacent sub-intervals of length l/« com- prising a partition P„ of the closed interval [0,1]. Use these results to deduce the form taken by the upper and lower sums Sp n , Sp n and show that lim (Sp H - S Pn ) = 0. n— *ao Deduce from this that I hm n { „ , .„ + „ , „„ + a . , a + • 1 + x 2 „" «, |« 2 + l 2 « 2 + 2 2 n 2 + 3 2 n 2 + « s or, equivalently, , 1 1,1, 1 = hm n {— + „ , ,„ + „ , „ + n 2 n 2 + l 2 n 2 + 2 2 n 2 + (n - l) 2 We shall see later that this integral has the value \ir, and so each of these different expressions has this same interesting limit. Section 7-2 7-7 Outline the proofs of the results of Theorem 7-3. 7-8 If f(x) = 2x - 3, use result (A) of Problem 7-2 to evaluate the definite integral I (2x - 3)dx. Rewrite this as the sum of two definite integrals each with a non-negative inte- grand and verify that their sum leads to the same result. 7-9 Use the result of Problem 7-3 to evaluate the definite integral r2 I e~ 3x dx. '4 7-10 Find the area / between the curves y = x 2 + 2 and y = — x + 1, which is bounded to the left by the line x = — 1 and to the right by the line x = 2. 7-11 Discuss, without attempting to evaluate any integrals that are involved, the problem of determining the area between the curves y = 1 + sin x and y = 1 + cos x which is bounded to the left by the line y = and to the right by the line j = 2*-. 7-12 Find the area / between the two curves y = 1/x 2 and y = e 05x — 3, which is bounded to the left by the line x = 1 and to the right by the line x = 2. 7-13 Evaluate the integral "*/W dx, -f 340 / FUNDAMENTALS OF INTEGpATION CH 7 given that Ix for < x < 1 ; /(*)= 2 + 2x for 1 <x<2; U - 1 for 2 < x < 3. 714 On the assumption that the definite integral r b dx f Ja = arcsin b — arcsin a, Vd - x*) prove that the improper integral Jo V(l - * 2 ) is convergent, and determine its value. 7-15 Sketch the area bounded below by the positive x-axis, and above by the line y = x on the interval < x < 1, and by the curve y = l/x 2 on the interval 1 < x < co . Determine this area / by the use of an improper integral combined with elementary geometrical arguments. Section 7-3 7-16 Use Theorem 7-4 to place bounds on the value of the definite integral / = I e - * 2 cos 3 x dx. = ' 7-17 Evaluate the definite integral x 2 dx, 1 and use the result to determine the number g in Theorem 7'5 when it is applied to this definite integral. Is the number I unique? Repeat the argument, but this time applying it to the definite integral i 2 x 2 dx. 2 Is there a unique number f in this case? 7-18 Prove the following result which is a restricted form of the second mean value theorem for integrals. Let f(x) > be continuous and monotonic decreasing on [a, b], and let^(;t) > be continuous on [a, b]. Then, f f(x)g(x)dx=f(.a) [ g(.x)dx, Ja Ja where a < f < b. State the corresponding form of the theorem when/(x) > is continuous and monotonic increasing on [a, b]. [Hint: Consider the inte- grand f(a){f{x)g{x)lf{a)} and use Theorem 7-4.] 7-19 The requirement of continuity for/(jc) in Theorem 7-5 is essential, for without PROBLEMS / 341 it the result of the theorem may, or may not, be true. Illustrate this by con- sidering step functions /(x) defined on the interval [1, 4], and show that it is possible to define ones for which, (a) no number f exists which satisfies Theorem 7-5; (b) an infinity of numbers f exist satisfying Theorem 7-5. Section 7-4 7-20 Use Theorem 7-7 to evaluate the following definite integrals: (a) (x 5 ' 2 + 3e*)dx, (b) sin x dx, (c) sin x dx, rb f" r%« (x 5 ' 2 + 3e*)dx, (b) sin x dx, (c) si Ja Jo Jo f Jo (d) | sin x | dx. Jo 7-21 Use Theorem 7-7 to determine the area contained between the x-axis and the curve y = 1 + x 3 + 2 sin x, which is bounded to the left by the line x = and to the right by the line x = ■*. 7-22 Using the basic properties of the logarithmic function listed in Section 6-3, express logo x in terms of an indefinite integral, and sketch the interpreta- tion of the result as an area below a curve. Section 7-5 7-23 Apply Theorem 7-8 to the following integral, but do not attempt to evaluate the result : Ja l + o 2 t~ x cos ax dx. 7-24 Apply Theorem 7-8 to the following integral, but do not attempt to evaluate the result: j* 00 1(a) = x"-^-* dx, (a > 0). 7-25 This problem outlines an alternative form of proof for the result of Theorem 7-8. It is based on the chain rule for differentiation and on a direct proof of Corollary 7-8. Define the function F(a, #a), v( a )) by the equation F(a,<£(a),H«))= I "A*,«)d*. Then it follows from the chain rule for differentiation that the derivative of the integral with respect to a is given by dF_dF8Fdy>8Fdf da da. dy) da 8<j> da Use Definition 7-3 together with the first fundamental theorem of calculus to prove that 342/ FUNDAMENTALS OF INTEGRATION CH 7 8F 8F — =/[y(°e), oc] and — = -/fo(a), a]. (B) Finally, obtain the statement of Theorem 7-8 by substituting results (B) into (A) and giving a direct proof, as in the text, that 8 p rv if — f(x, x)dx = ir-Ax, where for the purposes of partial differentiation with respect to a, the limits <t>, y> are to be regarded as constants. Section 7-6 7-26 Express in terms of a definite integral the arc length of the curve y = 1 + x 2 + sin 2x, that lies between the points on the curve corresponding to x = 1 and x = 4. 7-27 Prove that the circumference of a circle of radius a is 2-na by using the para- metric equations of a circle x = a cos t, y = a sin / with < / < !■*. 7-28 Find the area contained between the planes x = — 2 and x = 3 of the surface of revolution about the x-axis generated by the curve y = 2 + cosh x. [Hint: An antiderivative of cosh x is sinh x + C] 7-29 If the curve j = f(x) has an inverse x = <f>(y), state the form taken by Theorem 7-12 when the curve y = f(x) between the points (a,/(a)) and (b,f(b)) is rotated about the y-axis. 7-30 Determine the volume contained between the parabola y = 2 + x + x 2 and the cubic y = 5 + 2x + x 3 , which lies between the planes x = 1 and x = 2. 7-31 If the curve j = f(x) has an inverse x = <t>(y), state the form taken by Theorem 7- 13 when the curve y = f{x) between the points (a,f(a)) and (b,f(b)) is rotated about the y-axis. Section 7-7 7-32 Evaluate the definite integral "3 (x 3 + 2x+ \)dx f by the trapezoidal rule using four intervals of equal length and then by Simpson's rule for the same intervals. Compare the result with that obtained by direct integration. Infer from your result that Simpson's rule is exact for cubic equations despite the fact that it is based on a parabolic fitting of the function. 7-33 State the form of the Lagrangian interpolation formula L2OO, and use it to deduce Simpson's rule, (7-56), by applying it to the three points (a,f(a)), (a + h,f(a + h)) and (a + 2h,f(a + 2 h)) through which the function y = f(x) passes. PROBLEMS / 343 7-34 Let the curve r be defined in terms of the polar coordinates (r, 6) by means of the equation where /( 6) is a continuous function. Then if P n is a partition of the interval a < 9 < ft into the points a = O < 9i < • ■ ■ < 0« = /? with the norm II A IUv prove that the area A between the origin and the curve r which is bounded by the radius vectors = a and 6 = ft is given by A= lim 2 iPdOA,, where 0»-i < Si < Qt and X = 0; — 0,-1. Hence deduce that f(3 -r / 2 (0)d0. Use this result to find the area swept out by the radius vector drawn from the origin to the Archimedian spiral r = k e between the radius vectors = a and 6 = ft, with ft > a. 7-35 Consider a straight rod of length L which has a uniform cross-sectional area. Aligning the *-axis with the rod in such a manner that the origin coincides with the left-hand end point, assume that the mass M(x) of material contained in the rod in the interval [0, x] is given by T Jo M{x) = P (t) At. Jo Then the essentially non-negative function p(x) is called the linear density distribution of the matter in the rod, and by the first fundamental theorem of calculus it follows that p(x) = M'(x). Now in mechanics the moment of inertia I about an axis of a point mass m situated at a perpendicular distance x from that axis is defined to be mx % . By considering a partition P n of < x < L into the points = xo < xi < ■ ■ • < x n = L with the norm 1 1 A 1 1 Pn , prove that the moment of inertia / of the rod about an axis perpendicular to the rod and passing through an end point is given by n /= lim 2 fiMff)Ai, ||A||p B -0 i = l where xt-i < I; < xt and A, = x t — x,-i. Hence deduce that x 2 p(x) dx. Jo In the case of a rod of mass M having a uniform linear density p(x) = p , deduce the relationship between pq and M and use it to prove that the moment of inertia of the rod about an axis perpendicular to its length and passing through an end point is ML 2 344 / FUNDAMENTALS OF INTEGRATION CH 7 7-36 Consider a circular disk of radius a, and suppose that the mass M{r) of material contained within a circle of radius r drawn about its centre is given by M{t) = 2t, t P (t)dt. -H Then the essentially non-negative function />(r) is called the area density distribution of the matter in the disk, and by the first fundamental theorem of calculus it follows that 2nrp(r) = M'(r). Use the form of argument outlined in the previous problem to prove that the moment of inertia / of the disk about an axis perpendicular to its plane and passing through its centre is given by '= 2n\ Jo r 3 p(r) dr. If the disk is of mass M and has a uniform area density p(r) = po, deduce the relationship between />o and M and use it to prove that the moment of inertia of the disk about an axis perpendicular to its plane and passing through its centre is Ma* 2 ' 7-37 Indicate by means of simple examples how the integral inequality (71) may be used to place upper and lower bounds on the integrals denning the area A and the moment of inertia / in Problems 7'34 to 7-36. Systematic integration 8-1 Integration of elementary functions The main objective of this chapter is to explore some of the systematic methods for determining an antiderivative, that is, a function F(x) whose derivative is equal to some given function f(x). As described in the previous chapter, we shall denote the antiderivative of the function /by jf(x)dx with the understanding that J/(x)d* = F(x) + C (8-1) with C an arbitrary constant. Alternatively, as any indefinite integral of/ must also be an antiderivative of/, we may identify F(x) in Eqn (8-1) with f{t)dt where a is arbitrary, to Ja obtain the equivalent expression jf(x)dx = f f(t)dt + C. (8-2) Remember that the symbol §f(x)dx for the antiderivative of/derives from differentiation and denotes the most general function whose derivative is/ C The allied symbol f(x)dx, denoting a definite integral of/ derives from Ja integration and is simply a real number. Considering the definition of an antiderivative, we shall say that two antiderivatives are equal if they only differ by a constant. It should be recalled that the connection between the concepts of an antiderivative and a definite integral is provided by the fundamental theorem of calculus, which asserts that Jf/tod* = { jV« d *) z _ b ~ [\f {x)dx ] In view of Eqn (8.1) this may be written b f(x)dx = F(b) - F{a). (8-3) Ja Very often in texts the term indefinite integral is loosely ascribed to the entire right-hand side of Eqn (8-2) instead of, as here, only to its first term. This is usually justified by the fact that a is arbitrary though, of course, it /■ Ja 346 / SYSTEMATIC INTEGRATION CH 8 does not necessarily follow that all possible constants C can be absorbed into the integral by a suitable choice of a. For example, we have the antiderivative J cos xdx = sin x + C, though if for some particular problem it was appropriate to set C = 3, say, then no choice of the arbitrary constant a would enable us to equate cos xdx and sin x + 3, for this would imply that sin a = — 3. Unfortunately, the theorems for the differentiation of wide classes of functions seldom have any counterpart for determining antiderivatives. Ultimately, success in finding an antiderivative depends on whether or not the function/can be so simplified that one may be recognized by using tables of derivatives in reverse : that is, matching the desired derivative / with one in the table, and reading backwards to deduce an antiderivative. Thus, to find the antiderivative of 3 sec x tan x, we first glean from Table 51 that d — (sec x) = sec x tan x ax or, equivalently, — (3 sec x) = 3 sec x tan x dx showing that the antiderivative is j" 3 sec x tan x dx = 3 sec x + C, In colloquial terms, the process of finding the most general antiderivative of the function /(x) is called the 'integration of/(x)'. Table 8-1 gives a preliminary working list of important integrals which has been compiled from the tables of derivatives in Chapters 5 and 6. The two separate results shown against number 3 are usually contracted to dx J log | x | + C, with the tacit understanding that the arbitrary constant C differs according as x is positive or negative. With obvious modifications, this convention will be extended to include all integrals involving the logarithmic function. Specific examples involving this convention are to be found in Problems 8-1-8-3. The following statement is equivalent to both Eqn (8-1) and Eqn (8-2), and it arises as a direct consequence of the definition of an antiderivative. We formulate it as a general theorem. SEC 8-1 INTEGRATION OF ELEMENTARY FUNCTIONS / 347 Table 81 Basic table for integrals dx = + C (nt- -1); n + 1 '• J* C a x 2. a x dx = J lo g" C dx __ (log x + C ' J x [log(-x) ■ / + C (a > 0); for x > + C for x < 0; 4. I z ax dx = - t" x + C (a T 4 0); r i 5. cos ax dx = - sin ax + C (a + 0); J « 6. I sin ax dx = cos ax + C (a ^ 0); a J- r dx J V(a 2 - x 2 ) ~ J a 2 + x 2 a 1 - I ,t„2. "1 ~2> = arcsin - + C for | x | < | a arctan- + C (o^O); a . dx x 1 \ (a 2 + x 2 ) a i . x arccosh - + C for x > a, io. i r j2 = B - arccosh /—?j + C for x < _ a; X f d - J \ (.V 2 - a dx 1 x : = - arctanh - + C for I x I < I a I ; . , i dx 1 x 12. — ; = arccoth - + C for I x I > I a I V- — /7- fi n 'II x a- a a THEOREM 8-1 ^J7(x)d.Y=/(.v). In words, this general result merely asserts the obvious fact that the derivative of the antiderivative of a function /(x) is the function /(x) itself. Its most frequent application is probably to the verification of antiderivatives. For example, let us use the theorem to verify the antiderivative 348 / SYSTEMATIC INTEGRATION CH 8 g'dx Via 2 - g 2 ) "■" (?) = arcsin ^ + C, (A) where g = g(x) is some difTerentiable function of .v and \ g\ < a. By Theorem 81 we must have d C g'dx _ I s __ * (B) J V(« 2 - S 2 ) dx J V(« 2 - S 2 ) V(« 2 - g 2 ) Now, differentiating the right-hand side of (A) we find d d^ -ft arcsin - + C 1 V(l-(g/«) 2 ) « g' Via 2 - g*) which is identical with (B). Thus, (A) is verified. A final general result of great value is the fact that the derivative of a linear combination of functions is equal to the same linear combination of their derivatives (Theorem 5-4). Expressed in terms of antiderivatives this implies the following general theorem. theorem 8-2 j (kif+ k 2 g)dx = kitfdx + krfgdx. It is, of course, this theorem that permits us to simplify many expressions to the point at which antiderivatives may be deduced from tables of standard integrals (antiderivatives) such as Table 8T. Hence we have J (5jc 2 - 2 cos x)dx = 5jx 2 d;t - 2J cos xdx 5x 3 = 2 sin x + C. 3 The separate arbitrary constants associated with each of the antiderivatives on the right-hand side have, of course, been combined into the single arbitrary constant C. The remaining sections of this chapter are concerned with outlining the details of the main techniques available for finding antiderivatives. 8-2 I ntegration by substitution Possibly the most frequently used technique of integration is that in which the variable under the integral sign is changed in a manner which simplifies the task of finding the antiderivative. This process is known as integration by substitution or integration by change of variable. It is in this technique that SEC 8-2 INTEGRATION BY SUBSTITUTION / 349 the full significance of the symbol dx in Eqn (8-1) is first realized. Indeed, by making a straightforward application of the chain rule for differentiation (Theorem 5-7) we shall arrive at a simple mechanical rule for effecting a variable change by using differentials. Because composite functions (functions of a function) of x often occur under the integral sign we shall consider a general antiderivative of the form / = SKx) .f[g(x)]dx. In order to cover all likely cases we shall consider the effect on /of chang- ing the variable x to the variable w, where x and u are related by g(x) = h(u), (8-4) with/, g differentiable functions. Let us start by supposing that / = Sk(x) .f\g{x)]dx = F(x) + C, (8-5) so that we know ^=k(x).f[g(x)]. (8-6) Applying the chain rule to F(x) gives dF(x) _ dF dx du dx du which, by virtue of Eqn (8-6), may be written On the assumption that Eqn (8-4) may be solved for x in the form x = g-i[h(")] (8-7) we arrive at the result dF(x) dx -±! = kir 1 Wu)]}f[h(u)] -■ (8-8) Now by implicit differentiation (Corollary 5T9 (a)) of Eqn (8-4), it follows that provided g'(x) =£ 0, dx _ AXm) du g'(x) so that dF(x) k{g-i[h(u)]}f[h(u)W(u) du g'lrWu)]} (8-9) 350 / SYSTEMATIC INTEGRATION CH 8 However, Eqn (8-9) simply asserts that F(x) is an indefinite integral of k{g^[h{u)]}[Ku)]h\u) g'ig-HK")]} and thus taking the antiderivative yields / gig- 1 [*(")]} du = F(x) + C*, (8-10) where C* is an arbitrary constant. Comparing Eqns (8-5) and (8-10), and on the understanding that two antiderivatives are equal if they only differ by a constant, we have thus proved that j*(.v)./r,(*)]d*= j g , {g _ 1[h(u)]} d, (8.11) This forms the result of the following theorem: theorem 8-3 (integration by substitution) If g, h are differentiable func- tions and g(x) = h{u), with g'(x) ^ and x = g~ l [h{u)], then \ k(x) .flg(x)]dx = k{g- l [Ku)\}f[h{u)]h\u) g'{g- l [h{u)]} Two special cases occur when (a) k(x) = 1 and g(x) -i x, so that g'(x) = 1, and (b) k(x) ^ 1 and h(u) = u, so that h'(u) = 1. These are stated as Corollaries 8-3 (a, b) below, which are the results most often to be found in textbooks. Corollary 8-3 (a) If x = h{u) is a differentiable function of w, then f/(x)d.v = Sf[h(u)]h'(u)du. In terms of the differential relationship d/i = h'(u)du this is also capable of expression in the form Sf(x)dx = Sf(h)dh. Corollary 8-3 (b) If g(x) = u is a differentiable function of x, with g'(x) ^ 0, then jf[g(x)]dx= jf(u)(~)du SEC 8-2 or, as dx/dw = l/g'[g _1 ( M )]> f flg(x)]dx = INTEGRATION BY SUBSTITUTION / 351 dw. AH of these results may be conveniently summarized in the form of a single simple mechanical rule for changing the variable in an antiderivative. Rule 1 (Integration by substitution) We suppose that in the antiderivative I=SKx).f[g(x)]dx it is required to change from the variable x to the variable u by means of the relationship g(x) = h(u), where g and h are differentiable functions, with g '(x) ^ 0. The result may be deduced from / above by : (a) replacing g(x) in f[g(x)] by h(u); (b) solving g(x) = h(u) for x in the form x = g* 1 [h(u)] and then replacing x in k(x) by this result ; (c) replacing dx by dw, where du is obtained from the differential rela- tionship g '(x)dx = h'(u)du; (d) replacing x in g'(x) by x = g _1 [^( M )L We now illustrate the application of this rule in a series of examples. Unfortunately, although the rule tells us how to change the variable, it offers us no information on the type of variable change that should be made. That is to say it does not tell us the functional form off and g. Only experience can help here. Example 8-1 Evaluate the antiderivative / = f jcVO + x 2 )dx. Solution This antiderivative is of the most general type contained in Theorem 8-3. First we make the obvious identification k(x) = x z and then, to remove the square root function which is difficult to manipulate, we shall try setting 1 + x 2 = I/ 2 . That is to say, in the hope that it will lead to a simpler expression, we make the further identifications g(x) = 1 + x 2 and h(u) = u 2 . The function /in Theorem 8-3 then becomes the square root function, with V(l + x 2 ) = u. Rather than solving for x, for the moment we shall use the result x 3 = x . x 2 = x(u 2 - 1), when we find xV(l + x 2 ) = xu(u 2 - 1). Now g'(x) = 2x and h'(u) = 2m, so that the differential relation g'{x)dx 352 / SYSTEMATIC INTEGRATION CH 8 = h'(u)du gives rise to xdx = udu. Hence, in differential form, *V(1 + x 2 )dx = u(u 2 -l)xdx = « 2 (w 2 - l)dw, and so by the rule derived from Theorem 8-3, / = J" *V(1 + x 2 )dx = f k 2 (m 2 - l)dw. The antiderivative on the right-hand side is now straightforward and may be integrated on sight to give r5 jy3 U° W '-T-? +c or, r (1 + x 2 ) 5/2 (1 + x 2 )*' 2 Example 8-2 Evaluate the antiderivative / = J V(l + x 2 )dx. Solution In this antiderivative k(x) = 1, but it is not immediately clear how best to change the variable. It is left to the reader to see why neither of the possible substitutions u 2 = 1 + x 2 or u = 1 + x 2 bring about any effective simplification. Instead, let us seek to remove the square root by making the substitution x = sinh u, so that the problem becomes analogous to Corollary 8-3 (a). Then 1 + x 2 = 1 + sinh 2 u = cosh 2 u, so that \/(l + x 2 ) = cosh u. Next, as g(x) — x and h(u) = sinhu, g'(x) = 1, h'{u) = coshw and so dx = cosh udu. Applying the rule then gives V(l + x 2 )dx = cosh u . cosh udu = cosh 2 udu, whence / = j 1 cosh 2 udu. Now use the identity cosh 2 u = J(cosh 2w + 1) to give / = if (cosh 2w + l)du u — I sinh u + - + C. To return to the variable x it is necessary to use the results u = arcsinh x, cosh u = V(l + x 2 ) together with the identity sinh 2u = 2 sinh u . cosh u to obtain / = |[xV(l + x 2 ) + arcsinh x] + C. SEC 8-2 INTEGRATION BY SUBSTITUTION / 353 Example 8-3 Evaluate the antiderivative / = J cos(l + 3x)d*. Solution This antiderivative has k(x) = 1, and by setting 1 + 3x = u so thatg(x) = 1 + 3x, h{u) = w it reduces to the situation of Corollary 8-3 (b). Applying the rule we find that cos (1 + 3x) — cos u and 3dx = dw, whence /■ = J" \ cos udu = I sin u + C, and thus / = J sin (1 + 3jc) + C. Example 8-4 Evaluate the antiderivative /=» J2xV0 + x 2 )dx. Solution Setting u = 1 + x 2 it follows that dw = 2xdx, so that 2x\/{\ + x 2 )dx = \/Mdw, whence / = J" V"dw = f m 3/2 + C = |(1 + x 2 ) 3/2 + C. It is interesting to notice that when the situation found in Example 8-4 is expressed in terms of Theorem 8-3 by making the identification k(x) = g'(x) and then setting u = g(x) it gives rise to the general result /£'(*) ./te(*)]d*=f/(ii)d«. (8.12) This is not, of course, a new result since it is no more than the statement of Corollary 8-3 (a) with the roles of x and u interchanged. It is an immediate consequence of Eqn (8-3) that Theorem 8-3, together with its corollaries, also applies to definite integrals provided that the limits are also transformed by the same transformation law. The restatement of Theorem 8-3 in terms of definite integrals is as follows: theorem 8-4 (integration of definite integrals by substitution) If g, ft are differentiable functions and g(x) = h(u), with g'(x) =£ and x = g -1 [A(w)], u = h~ 1 [g(x)], then J« Jl'-H'Xa)] g {g-HKu)]} One specially simple case of this theorem merits recording in the form of a corollary. It is the result corresponding to Eqn (812) and is obtained by 354 / SYSTEMATIC INTEGRATION CH 8 making the identifications k(x) = g'(x), u = g(x). Corollary 8-4 If u = g(x) is a differentiable function, then fb fg(b) Ag(x)].g'(x)dx= f(u)du. Ja Ja(a) When expressed in the form of a mechanical rule, Theorem 8-4 is as straightforward to apply as was our previous rule. Rule 2 (Integrating definite integrals by substitution) We suppose that in the definite integral <•& (x)]dx fV) -mi* Ja it is required to change from the variable x to the variable u by means of the relationship g(x) = h(u), where g and h are differentiable functions, with g '(x) ^= 0. The result may be deduced from / above by : (a) transforming the differential expression k(x) .f[g(x)]dx as indicated in Rule 1 ; (b) solving g(x) = h(u) for u in the form u = /? _1 [^(x)] and replacing the upper limit b by h- l [g(b)] and the lower limit a by /* -1 [g(tf)]. Example 8-5 Evaluate the definite integral 1= [ X 2 A /( 1 _ X 2) dx Jo Solution Let us make the substitution x = sin w, so that dx = cos udu, when x 2 V(l — x 2 )dx = sin 2 u . cos u . cos udu — sin 2 u . cos 2 udu. Then, as u = arcsin x, using the principal branch of the sine function, we find from Rule 2 that J-l /*arcsin 1 x 2 -\/(l — x 2 )dx = sin 2 u . cos 2 udu Jarcsin = sin 2 u . cos 2 udu. Jo To evaluate this last definite integral we use a technique from Chapter 6 which is often helpful. From Definition 6-6 we may write (g<« g-«'M\ 2 /gJM _1_ g-<tt\2 2i M 2 / SEC 8-3 INTEGRATION BY PARTS / 355 i q2iu 2-1- e~'" iW \ i q^ u + 2 4- Q-2tu\ \ -4 ^16 ' : )( ! and thus sin 2 u . cos 2 u = J(l — cos 4w). Using this result in the definite integral, which may then be evaluated on sight, we finally obtain I = i (1 — cos 4u)du Jo J [u - (i sin 4w)] = TO"", I) and so f Jo .Y'V(1 - -V 2 )d.V = ^7T. Example 8-6 Evaluate the definite integral / = (2.v + 5) cosh (.v 2 + 5.y + ljd.v. Solution Inspection shows that this example is of the form of Corollary 8-4, with the function/^ cosh and g(x) = x 2 + 5x + 1. As g(0) = 1, g(l) = 7, by setting u = g{x) we at once obtain / = cosh udu = (sinh 7 — sinh 1). 8-3 Integration by parts This most valuable technique is based on Theorem 5-5, concerning the derivative of the product of two functions. That theorem asserts that if/, g are two differentiable functions of .y, then ^lf(x)g(x)] = lf(x)g'(x)] + [f'(x)g(x)]. Taking the antiderivative of this result gives /(X)g(x) = i'f(x)g'(x)dx + J>(.Y)/'(.Y)d.Y which, on rearrangement, becomes J f(x) g '(.Y)d.Y = f(x) g(x) - / g(x)f'(x)dx. (8.13) 356 / SYSTEMATIC INTEGRATION CH 8 This is one form of the required result. Using the differential notation df = f\x)dx, dg = g'(x)dx enables this to be contracted to the equivalent and easily remembered alternative form Sfdg^fg-Sgdf. (8-14) These results are now formulated as our next theorem: theorem 8-5 (integration by parts) If/, g are differentiable functions of x, then $f(x)g'(x)dx =f(x)g(x) - j>(.Y)/'Cv)d.Y or, expressed in differential notation, Sf&g=fg-Sg&f- This useful theorem is the nearest possible approach to a general theorem for finding the antiderivative of the product of two functions. It depends on the fact that often the antiderivative j" g d/ is easier to determine than the antiderivative J/dg. Naturally, the technique of integration by substitution can also be employed when evaluating J g df. When definite integrals are involved it is not difficult to see that the result is still valid provided the limits are also applied to the product/*. The general result is as follows: theorem 8-6 (integration by parts: definite integral) If/, g are differenti- able functions of x in [a, b], then f/(.v)g'(.v)d.Y=/(.Y)g(.Y) '' - \ h g(x)f\x)dx J a if J a = [f(b)g(b)] - [f(a)g(a)j - jj(x)f'(x)dx. As before, we illustrate both of these theorems by means of a series of examples. These have been carefully chosen to demonstrate a variety of situations in which integration by parts is useful. Example 8-7 Evaluate the antiderivative / = J x k log x dx for.Y >0, k ^ -1. Solution The problem here, as with all applications of the technique of integration by parts, is to decide upon the functions /and g. A little experi- mentation will soon convince the reader that / will only simplify if we set f{x) = log .y and g(x) = x^ l /(k + 1), for then g'(x) = x k and f'(x) = l/.v. Accordingly we write / in the form SEC 8-3 INTEGRATION BY PARTS / 357 /= loexd rk-H lk+ 1. Applying Theorem 8-5 gives x k+l ] g x / = k + 1 J k + 1 x r *+l log X r k+l + c. k+l (k+l) 2 Example 8-8 Evaluate the definite integral (•1/2 Jo ' arcsin x dx. Solution This time we make the identifications/^) = arcsin x and g(x) = x and write -1/2 f arcsin xd[x] = x arcsin x Jo We have 1/2 p o Jo !/2 x dx V(i - * 2 ) (A) x arcsin x 1/2 = 77/12 - = 77/12 but the definite integral on the right-hand side is still not recognizable. To simplify it let us now set u = 1 — x 2 so that x dx = — \ du; using Theorem 8-4 we obtain 2« 1/2 3/4 = 1 - ^1. 1 2 ' f 1 ' 2 xdx _ 1 I* 3 '- 1 du _ x Combining this result with (A) gives arcsin x dx = 77/12 ^ — 1. Jo 2 Example 8-9 Evaluate the antiderivative / = J t ax sin foe dx. Solution This time we choose to make the identification f{x) = sin bx, g(x) = (lla)e ax and to write / in the form / = sin bx d I - e ax I • Integrating by parts we find t ax sin bx dx = - e a * sin fcx J a -?/ e TO cos bx dx. 358 / SYSTEMATIC INTEGRATION CH 8 Now let us use this same device on the second term above to obtain e ax sin bx dx = - c ax sinbx cos bx d | - e ax j J a a J \a / 1 . , b b 2 r = - e ax sin bx e ax cos bx e ax sin bx dx + C. a a 2 a 2 J Combining terms gives , b 2 \ [ ■ , , z ax (a sin bx — b cos bx) 1 + — e ax sin bx dx = + C, and so / . , , e ax (a sin bx — b cos bx) e«* sin bx dx = — + C* a 2 + b z where C* is related to C by C* = a 2 C/(a 2 + b 2 ). In fact there is no necessity to distinguish between C and C*, since as C was an arbitrary constant of integration, C* is also an arbitrary constant. For this reason it is not customary to redefine arbitrary constants when, as above, they are simply multiplied by a constant factor. 8-4 Reduction formulae It not infrequently happens that an antiderivative / involving a parameter m may be reduced by means of the technique of integration by parts to an expression in which the parameter has a value differing by an integer k from its original value. If we denote such an antiderivative by I m , then a typical situation is the one in which we arrive at an expression of the form I m = A(m) + 7 m _i, (8-15) where A(m) is some known function. Expressions of this form provide an algorithm for the computation of any antiderivative of the given type once one of them is known, for the I m are then defined recursively by this relation in terms of h, say. It is customary to refer to expressions of the general form of Eqn (8-15) as reduction formulae. The same idea is equally applicable, without essential modification, to definite integrals. Example 8-10 Determine the reduction formula for l m = J cos™ d0. Use the result to determine h. Solution We rewrite I m as follows and use integration by parts. SEC 8 . 4 REDUCTION FORMULAE / 359 I m = J cos™- 1 d(sin 0) = cos™- 1 . sin - J sin 9 . (m - 1) cos™~ 2 0(-sin 0)d0 = cos™" 1 ^ . sin + (m - 1) J cos™" 2 6 . sin 2 9 d0 = cos™' 1 . sin + (m - 1) J cos™" 2 0(1 - cos 2 0)d0 = cos™" 1 . sin e + (m - 1) / cos™- 2 d0 - (m - 1) / cos™ d0. Recalling the definition of I m we discover that this may be re-expressed in terms of I m and I m -2 as I m = cos™" 1 . sin + (m — l)/ m -2 — (m — X)I m , whence we arrive at the required reduction formula cos™ -1 . sin -/ w — * N 7 m = H I I Im-2. m t^) Setting m = 7 gives cos 6 . sin 6 /7- ~ 7 + -/ 5 cos 6 . sin 6 /cos 4 . sin 4 ) . sin 6 /c "7 + 7(- 5 5 cos 6 . sin 6 „ . . . 24 /cos 2 . sin 2 r \ = + -cos 4 0. S1 n0 + -( - + -/,)• As h = J" cos d0 = sin + C this gives the result / 1 fs R cos 7 dd = - cos 6 . sin + —■ cos 4 . sin 6 + — cos 2 8 , sin i + £*** + <:■ Example 8-11 Evaluate the definite integral Jo cos™ d0 J 'in sin™ d0. o Solution We can make use of the reduction formula determined in the previous example. It follows from cos™ -1 . sin i. Im = 1- ( I Im-2 in that the definite integral J m obeys the reduction formula 360 / SYSTEMATIC INTEGRATION CH 8 J m — cos" 1-1 d . sin 6 ** , lm- 1\ fm- 1\ + ^m-2 = 7 m -2. o \ m J \ m ] We must now consider separately even and odd values of m. Firstly, if m is even, so that we may write m = 2n, then Jin In - 1 2n - 3 > 2« 2m - 2 Secondly, if m is odtf, so that we may write m = 2n + 1, then In In — 2 Jin+l 2n + 1 2« - 1 >■ J 'in /•}* I dd = %tt and /i = cos d0 = 1 , we o Jo obtain : 1 . 3 . 5 . . . (2« - 1) , Jin J2n+1 = 2.4.6. . .2n 2.4.6. . .In 3.5.7, (2« + 1) Finally let us prove that •frr sin m x dx. Jm = COS™ X dx = i Jo Jo To achieve this make the variable change x = \n — u in J m to obtain flir fO /"J" cos m x dx = — cos m (|7T — w)dw = cos m (%tt — tt)dw Jo Jiir Jo 1" = I sin m w dw. 1 This last result is of some interest historically, as it provided the first infinite product representation for -n. One form of the argument used to derive this result proceeds as follows. It is readily seen from the expressions for J% n and /2n+i that in = ' 2.4.6. . .In ' 3.5. . .{In- 1). 1 Jin In + 1 J2n+1 Now in the interval (0, ^tt) the following inequalities hold : sin 2 * 1 - 1 x > sin 2ra x > sin 2 » +1 x > 0, so that as (8-16) SEC 8-4 REDUCTION FORMULAE / 361 J m = sin m x dx, Jo it follows at once that This is equivalent to ^Z1>J^L>1, (8-17) J211+I J2n+1 but as ■/2»-i _ 2n + 1 JWi 2« we must have lim ^ = 1. (8-18) B-.00 J2»+l By virtue of Eqns (8-17) and (8-18) it also follows that hm = 1. n-+oo J2n+l So, taking the limit of Eqn (8-16) as n — *■ 00, we arrive at the expression ,. (2 2 4 4 6 2«-2 2« 2« \ , o inx ^"Jr.ll'STs's- • •2^1-27^T'27TTJ' (8>19) This famous result, called an infinite product, was first obtained by the 16th- century mathematician John Wallis. If S n denotes the nth partial product 2 2 4 4 2«-2 In 2n 13 3 5 2/i - 1 2« - 1 2« + 1 then the limit in Eqn (8T9) is to be interpreted to mean that | \n — S n | -»• as n — »- co. Reduction formulae may involve more than one parameter, as the final example illustrates. Example 8- 12 Show that I m%n = J" sin m x cos" x dx satisfies the reduction formula (m + ri)I m ,n = — sin" -1 x . cos n+1 x + (m — l)I m -2, n - Solution Write I m ,n in the form shown below and integrate by parts. 362 / SYSTEMATIC INTEGRATION CH 8 Im,n = J sin™" 1 x . cos™ x d(— cos x) = — sin"'- 1 x . cos" +1 x — J (— cos x)[(m — 1) sin™" 2 x . cos" +1 x —n sin™ x . cos" -1 x]dx = — sin™" 1 x . cos" +1 x + (m — l)/ OT - 2 , ra +2 — nl m ,„. Next reduce I m -2, n +2 to a simpler form by writing I m ~2,n+2 = J sin"*" 2 x . cos ra + 2 x dx = J sin™- 2 x . cos" x(l — sin 2 x)dx which shows that 'i»-2,»+2 = 'm—2,n — *m,n- Using this to eliminate I m -2,n+2 from the previous result gives Im.n = — sin™" 1 X . cos» +1 x + (m — \)I m -2,n — (m — \)I m ,» — nlm.n or, (m + n)I m ,n = — sin" 4 - 1 x . cos» +1 x + (m — l)/ m -2,«. 8-5 Integration of rational functions — partial fractions It will be recalled from Chapter 2 that a rational fraction is a quotient N{x)jD(x), in which N(x) and Z>(x) are polynomials. Antiderivatives of rational fractions are often required and in this section we indicate ways of expressing the fractions as the sum of simpler expressions, the antiderivatives of which are either known or may be found by standard methods. Our approach to the general problem of finding the antiderivative N(x) J D(x, . dx ) will be to first consider some important special cases. Case (a) Suppose that N(x) is of degree and D(x) is a polynomial of degree 1 and write N(x) 1 D(x) ex + d Then, making the substitution u = ex + d, we find C dx I r du 1 , , , — - - = - — = -log u +C J ex + d c J u c and so I — = - log \cx + d + C. ex + d c SEC 8-5 PARTIAL FRACTIONS / 363 A similar argument establishes that dx -1 1 I (ex + d) n c(n — 1) (ex + d) n - x + C. Case (b) Suppose N(x) is of degree and D(x) is of degree 2 and write N(x) 1 D(x) ax 2 + bx + c Then completing the square in the denominator D(x) gives bV ax 2 + bx + c = a 7 b \ 2 (c b 2 \1 [( b\ + a where a = (cja) — (b 2 /4a 2 ) may be positive, negative, or zero. Making the variable change u = x + (bj2a) then shows that _ r dx _ i r du J ax 2 + bx + c a J u 2 + This is a standard integral which may be identified from Table 8T once the sign of a has been determined. It will involve either the function arctan or the function arctanh. Case (c) Suppose N(x) is of degree 1 and D(x) is of degree 2 and write N(x) px + q D(x) ax 2 + bx + c Then we can write /= f P x + 1 dx = f (Pl 2a )( 2ax + b) + [g- (pbjla)} ^ J ax 2 + bx + c J ax 2 + bx + c from which we find dx . _ p_ [ 2ax + ft H , ( 2a ° — pb \ r 2a J ax 2 + bx + c \ 2a /J ax 2 + bx + c The second antiderivative is the one discussed in (b) above, and by setting u = ax 2 + bx + c, the first antiderivative reduces to C lax + b fdw, ,,„,,„, „ , , , dx = — = log \u \+ C = log \ax 2 + bx + c + C. J ax 1 ' + i>x + c J u Combining this result with that of Case (b) then leads to the desired anti- derivative /. Case (d) Suppose iV(x) is of degree 1 and D(x) is a quadratic raised to the 364 / SYSTEMATIC INTEGRATION CH 8 power n > 1 and write N(x) px + q D(x) (ax 2 + bx + c) n Then, using the identity PX + q ^{^a) {2aX + b)+ { q -fa) enables us to write f PX + q d»-f^f 2UX + b dx J (ax 2 + bx + c) n \2aJ J (ax 2 + bx + c)» 2a) J (a dx + \q (ax 2 + bx + c) n Setting u = ax 2 + bx + c in the first antiderivative on the right-hand side then leads to f 2ax + b cu,f*f-^L)-L + c J (fl.v 2 + bx + c) n J u n \n - 1 / w"- 1 -1 \ 1 + C \n-~\J (ax* + bx + c) n ~ l The second antiderivative on the right-hand side must be evaluated by means of a reduction formula. In the case n = 1 we have the obvious result / 2ax + b dx = log | ax 2 + bx + c | + C. ax 2 + bx + c Having considered a number of special cases we must now examine how we should proceed when D(x) is any polynomial with real coefficients, and the degree of the polynomial N(x) is less than that of D(x). The coefficient ao of the highest power of x in D(x) will be assumed to be unity, since if this is not the case it can always be made so by division of N(x) and D(x) by ao- Now we know from Corollary 41 (b) that D(x) may be factorized into real factors of the form D(x) = (x - af(x -b) 1 ... (x 2 +px + q) m , (8-22) where x = a, b, . . ., are real roots with multiplicities k, I, . . ., and (x 2 + px + q) m represents an w-fold repeated pair of complex conjugate roots. Then from elementary algebraic considerations it may be shown that when the degree of N(x) is less than that of D(x) we may always set N(x) Ay A 2 A* Bx D(x) ~ (x - a) (x - a) 2 (x - a)* (x - b) SEC 8-5 PARTIAL FRACTIONS / 365 5 2 , B l Pix +■ Qi (x-b) 2 (x-b) 1 (x 2 +px + q) P*+Qz + . . . + PmX+Q™ . (8 . 23) (x 2 +px + q) 2 (x 2 +px + of That is to say, every rational fraction may be expressed as a sum of simple fractions of the types whose antiderivatives were obtained in Cases (a) to (d). The expression on the right-hand side of Eqn (8-23) is called a partial fraction expansion of the rational fraction N(x)jD(x) and the coefficients Ai, A%, . . ., P m , Qm are called undetermined coefficients. The undetermined coefficients may be found by cross-multiplication of this expression, followed by equating the coefficients of equal powers of x. Antiderivatives of rational fractions N(x)jD(x) may thus be found by a combination of the method of partial fractions and the results of Cases (a) to (d). If the degree of N (x) exceeds that of D(x) by n, then the situation may be reduced to the one just described by simply adding to the partial fraction expansion (8-23) the extra terms Ro + Rix + R 2 x 2 + • • • + R n x n . This result can also be achieved by first dividing N(x) by D(x). The circum- stances usually dictate which approach is the easier. Example 813 Evaluate r r (x 3 + 5x 2 + 9x + 5\ A / = J I * 2 + 3*+l ) dx - Solution Here, as the degree of N(x) only exceeds that of D(x) by one, we shall start by dividing the integrand to get x 3 + 5x 2 + 9x + 5 , „ 2x + 3 = x + 2 + x 2 + 3x + 1 x 2 + 3x + 1 when r r 2x + 3 / = j (x + 2)dx + j_____ dx . The first antiderivative is trivial, whilst the second is of the form discussed in Case (d), so that v2 / = — + 2x + log I x 2 + 3x + 1 I + C. Example 8-14 Evaluate xdx I = Jc x + 2) 2 (x - 1) 366 / SYSTEMATIC INTEGRATION CH 8 Solution In this case we must adopt the partial fraction expansion x ABC (x + 2) 2 (x - 1) x + 2 (x + 2)2 x - 1 Cross-multiplication gives x = A(x + 2)(x - 1) + B(x - 1) + C(x + 2)2 or x = A(x z + x - 2) + B(x - 1) + C(x 2 + Ax + A). Equating coefficients of equal powers of x gives : Coefficient of x 2 : = A + C Coefficient of x: 1 = A + B + AC Coefficient of x°: = -2A - B + AC, showing that A = -1/9, B = 2/3, and C = 1/9. We may thus write f xdx - _ I f d;c ? J* dx 1 r dx J (x + 2)2(x - 1) ~ ~ 9 J x + 2 + 3 J (7+2)" 2 + 9 J x _ f These antiderivatives were all discussed in Case (a), so that using those results we obtain / = - ^ log | x + 2 | - | ^-- + I log | . - 1 | + C. Example 8-15 Find the antiderivative « 4 - x 3 + 5x 2 + x + 3 ■■/" (X + 1)(X 2 - X + I) 2 dx. Solution Here N(x) = x 4 - x 3 + 5x 2 + x + 3 and D(x) = (x + l)(x 2 - x + l) 2 , so that the degree of N(x) is 4 and the degree of D(x) is 5. Following on from our earlier reasoning we must set x 4 - x 3 + 5x 2 + x + 3 A Bx + C Dx + E (x + l)(x 2 - X + l) 2 X + 1 X 2 - X + 1 (x 2 - X + l) 2 Cross-multiplication gives the identity x 4 - x 3 + 5x 2 + x + 3 = ^(x 2 - x + l) 2 + (Bx + C)(x + l)(x 2 - x + 1) + (Dx + E)(x + 1). Instead of expanding the right-hand side and then equating coefficients of equal powers of x as in the previous example, we shall use the fact that (x + 1) is a factor of D(x) to simplify this expression. Setting x = — 1 we find that 9 = 9A, or A = 1 and so SEC s-5 PARTIAL FRACTIONS / 367 x 4 - x 3 + 5x 2 + x + 3 = (x 2 - x + l) 2 + (Bx + C)(x 3 + 1) + (Dx + E)(x + 1), whence x 3 + 2x 2 + 3x + 2 = (Bx + C)(x 3 + 1) + (Dx + E)(x + 1). Having eliminated A we now proceed as before and equate coefficients of equal powers of x to find B, C, D, and E: Coefficient of x 4 : = B Coefficient of x 3 : 1 = C Coefficient of x 2 : 2 = D Coefficient ofx: 3 = B + E + D Coefficient of x°: 2 = C + E. Thus, B = 0, C = 1, D = 2, E = 1 and so f dx f dx f 2x + 1 „ ^J xT l + J^r7Tl + J (x 2 -x + i) 2 dx = /l + /2 + /3 - f dx A-J^n-iogix + H + d i f dx 2 /2x-l\ „ 72 = J (x - i) 2 + (V3/2) 2 = ~3 arCtan l-^-j + C * Now and To evaluate h write !dx TT) 2 f__2x- 1 l_ f 2d; 73 ~ J (x 2 -x+l) 2dX + J (x 2 -x -1 C 2dx J [(x- (x 2 - x + 1) J [(x - J) 2 + (V3/2) 2 ] 2 Next, setting x - | = (\/3/2) tan 0, so that dx = (V3/2) sec 2 6 d0, gives f ^ = fV3sec 2 0d0 = 16V3 f J Kx - i) 2 + (V3/2) 2 ] 2 J (fsec 2 0) 2 9 J Using the identity cos 2 6 = |(1 + cos 20) this may be evaluated to give 2 dx 8V3 I [(x - I) 2 + (V3/2) 2 ] 2 9 8V3 [0 + 1 sin 20] + C 3 /2x-l\ V3 lc-1 1 „ arctan(— ) + .-. ( ^- 7T ^) + C, 368 / SYSTEMATIC INTEGRATION CH 8 Hence we have shown that -1 /3 = , 8a/3( (2x- 1\ V3 2x- 1 \ + -~- arctan + ) + C 3 . (x 2 - x + 1) Adding h, h, and h to find / finally gives T , , , . 14V3 2x - 1 4x - 5 /= log | x + 1 | + — arctan _ + ^— -^ + C. A factor (x 2 — x + l) 3 in the denominator would have led to J" cos 4 d dd and so, in general, we would obtain antiderivatives of the form J cos 2 " 6 dd. 8-6 Other special techniques of integration A great variety of different methods exist for evaluating particular types of antiderivative, and in this final section we illustrate only a few specially useful ones with the help of some examples. Extensive tables of integrals are readily available and, where possible, should be used to minimise tedious manipulation. 8-6 (a) Substitution t = tan x/2 If we write t = tan x/2 it is easily proved by means of trigonometric identities that It 1 - fi sin x = and cos x = -• (8-24) 1 + t 2 1 + t 2 v Using these results we can also establish the differential relation 2dt dx=— • (8-25) Consequently, in principle, any rational fraction i?(sin x, cos x) that involves only the sine and cosine functions may be transformed by means of (8-24) into a rational fraction involving t. On account of this result and (8-25), it then follows that = .R(sin x, cos x)dx = \ R - It 1 - t 2 ' 2d? 1 + t 2 ' + t 2 1 + t 2 _ Thus / has been transformed into an antiderivative of a rational function involving t. Example 816 Evaluate f cos x dx J 1 + sin x SEC 8-6 OTHER SPECIAL TECHNIQUES / 369 Solution Transforming to the variable t as indicated above gives 2(1 - J + ■dt. m + 1) It is readily established that 2(1 - i) 2 It t 2 + C 1 + t\\ + t) l + t l + t 2 showing that f 2d* f It J = log (1 + i) 2 - log (1 + t 2 ) + C. Thus whence from (8-24), / = log (1 + sin x) + C. 8-6 (b) ' Integration of R[x, \/(ax 2 + bx + c)] We define R[x, \/(ax 2 + bx + c)] to be a rational fraction involving x and ■\/(ax 2 + bx + c). Special cases of this general type in which b = have been encountered in Examples 8-2 and 8-5 where it was shown that the sub- stitutions x = sin u and x = sinh u can be used to reduce the integrand to one involving only trigonometric or hyperbolic functions. If it is of trigo- nometric type then the technique of (a) above may be used to reduce the integrand further to a rational function. If the integrand is of hyperbolic type then the substitution t = tanh x/2, together with 2 1 l+t 2 sinh x = _ g and cosh x = (8-26) and the differential relation 2dt dx = Y^T 2 ' (8-27) will again reduce the integrand to a rational function. If b #0, then completing the square under the square root sign gives 370 / SYSTEMATIC INTEGRATION CH 8 *~ +b * + *-J'[hh)' +G-&1 The substitution u = x + (b/2a) will then reduce the problem to one of the two special cases just discussed, according to the signs of a and Kcla) - (b 2 /4a*)]. Example 817 Evaluate dx -/ V(2 -3x- 4x 2 ) Solution First we complete the square under the square root sign to obtain ,_/ V{4[41/64 - (x + 3/8)2]} Then, setting u = x + f this becomes T _ 1 C d« . 8k y_? J V[(41/64)a-«2]- sarCSin V4i and thus /= * arCSin lV41-) + C - 8-6 (c) Integration by means of differentiation under integral sign This approach utilizes the idea of differentiation under the integral sign with respect to a parameter. It relies on finding a known antiderivative involving a parameter a, say, with the property that the derivative of its integrand with respect to this parameter a is capable of being simply related to the integrand of the desired antiderivative. Specifically, the method uses the result that if F(x, a) = j"/(x, a)dx then, 8F(x, a) f 8f(x, a) -; dx J 3a ■dx. Example 8-18 Evaluate by means of differentiation under the integral sign the antiderivative J (* 2 dx 3/2 + a 2 ) : Solution We first note that the- integrand l/(x 2 + ^2)3/2 j s s j m piy related to the derivative SEC 8-6 OTHER SPECIAL TECHNIQUES / 371 8 8a _(x 2 + a 2 ) 1/2 _ Accordingly, let us consider the familiar antiderivative J dx . , x _ — — — = arcsinh - + C. (x 2 + a 2 ) 1/2 a dx Then 8 8a J (x 2 + a 2 ) 1 ' 2 and so / 8 8a arcsinh — \- C a f 2a dx _ !x\ 1 ~* J ( X 2 + fl 2)3/ 2 - ~ ^j ((x/a) 2 + 1) 1/2 or, / dx (x 2 + « 2 ) 3/2 a 2 (x 2 + a 2 ) 1 ' 2 + C. The arbitrary constant C" has been added since we are deducing an anti- derivative and not just an indefinite integral. 8-6 (d) Integration of trigonometric functions involving multiple angles Antiderivatives of products of trigonometric functions involving multiple angles are of considerable importance and the most frequently occurring ones are : h = J sin mx cos nx dx, (8-28) h = J" sin mx sin nx dx, (8-29) h = $ cos mx cos nx dx. (8-30) These are easily evaluated by appeal to the trigonometric identities : sin mx cos nx = |[sin (m + n)x + sin (m — n)x], (8-31) sin mx sin nx = |[cos (m — n)x — cos (m + n)x], (8-32) cos mx cos nx = M cos ( m + «)* + cos (m — «)x]. (8-33) Substitution of these identities into the above antiderivatives produces : 1 2 cos (m — n)x cos (m + n)x" h = (m — ri) (m + ri) + C for m 2 ^ n 2 1 (8-34) — - — cos 2wx + C for m = n, 4m 372 / SYSTEMATIC INTEGRATION CH 8 h = "sin (m — ri)x sin (m + n)x (m — ri) (m + ri) _ + C for m 2 ^ n 2 ■ (mx — sin mx cos mx) + C for m = n, "sin (m — «)x sin (m + «)x" /3 = (m — ri) m + n + C for m 2 ^ m 2 — (m + sin mx cos mi) + C for w = «. 1 2m (8-35) (8-36) Example 8-19 Evaluate the following two antiderivatives : h = J sin 3x cos 5x d.v, h = $ sin 2 3x dx. Solution The antiderivatives follow immediately by substitution in (8-34) and (8-35): cos 2x cos 8* „ x sin 3x cos 3x /r = - j^ + C, /,-- + C. PROBLEMS Section 81 8-1 Find the following antiderivatives: (a) (d) [4^; (e) ficos4^dx; (f) f 3* d*. 8-2 Verify by means of differentiation that dx f = log | x + V(x 2 - a 2 ) I + C VO 2 - a 2 ) Compare this form of result with that shown against entry 10 of Table 8.1. 8-3 Verify by means of differentiation that I dx a 2 — b 2 x z lab = ^t lo S a + bx a — bx + C. Compare this more general result with those shown against entries 11 and 12 of Table 8.1. 8-4 Verify by means" of differentiation that [ Compare this form of result with that shown against entry 9 of Table 8-1. PROBLEMS / 373 8-5 Use the result of Theorem 81 to verify the following general results: (a) f£dx = log|/| + C; (b) r /<»+d gdx =gf (n) - g'f <n ~ l) +g"f< n - 2) + ■ ■ ■ + (-l)V n, /+ (-l) n+1 . Sg< n+1) fdx; "Hffl'-i = ^ + C; (d) U£l — £Z\dx = iog + c. 8-6 Apply the results of Problem 8-5 together with some slight manipulation to determine the following antiderivatives : , . f/2x sin x — x 2 cos x\ , (a) —„ Ax; ' ' sin 2 x ' J( (b) j{ J?2*Z*ll ) dx; (c) Jx 2 e*/ 2 dx; , f/x sinhx — 3 cosh x\ J \ x cosh x J 8-7 Evaluate the following antiderivatives: (a) J (x 2 + 3 sin x + l)dx; (b) J (4* + 2 cos 2x)dx; (c) J (4 sinh x + sin x)dx; (d) J (e M + 3)d.v. 8-8 Use the following identities to evaluate the four antiderivatives listed below: sinh mx cosh mx = \ [sinh (m + n)x + sinh (m — n)x] sinh mx sinh nx = J[cosh (/n 4- n)x — cosh (m — n)x] cosh rax cosh nx = A [cosh (m + n)x + cosh (m — n)x] (a) J sinh 4x cosh 2x dx; (b) J sinh x sinh 3x dx; (c) J cosh 4x cosh 2x dx; (d) J cosh 2 2x dx. Section 8-2 Use the indicated substitutions to evaluate the following antiderivatives. 8-9 f d * .. , x = l/ii. J xv(x 2 — 4) 810 J V(l - x 2 )dx, a: = sin u. « -.-. C tanh x dx , . 811 TT, — Z rC> cosh x=l + u 2 . J 2 V(cosh x — 1) 8-12 J cos *\/sin x dx, sin x = u. 8-13 J x(3x 2 + l) 5 dx, 3x 2 + 1 = u. 374 / SYSTEMATIC INTEGRATION CH 8 8 - 14 /v§TT)' ^+0-.. Evaluate the following antiderivatives by means of a suitable trigonometric substitution. J V(l - * a ) f V(x* + 816 I vv ~ ' J) dx. 817 I -=^- -dx. Evaluate the following definite integrals. 8-18 f (3x + 1) sinh (x 3 + x + 3)dx. 8-19 f *V(1 + * a )d*- f V(* - 8-20 f a/(* - 2)dx. Section 8-3 Evaluate the following antiderivatives using the technique of integration by parts. 8-22 J Q ax sin x dx. 8-23 J xe ax dx. 8-24 f xdx J sin 2 x 8-25 J sin x sinh x dx. \ 8-26 J 7* cos x dx. 8-27 J log 2 x dx. 8-28 J x arcsin x dx. Secjion 8-4 ^_JB49 Given that /„ = J (1 — x 3 )" dx, where n is an integer, show that (3n + l)/„ = x(l - x 3 )« + 3« / fl -i. Hence prove that (1 _ *3)5 = 3 6/ 2 4 .7.13. /: PROBLEMS / 375 8-30 The integral I m is denned by lm — Show that x — r — — -j dx for integral m >. 0. (x 2 + l) m+3 6 _ m + 2 Jm-l — 7 lm, m — I and by using the substitution x = tan prove that 1 sin 7 cos 5 6 d0 = — • o 8-31 Show that for integral n > 1, x" sin x dx = n \ x n ~ x cos x dx Jo Jo and J' Jit /*i>7 x" cos x dx = (iw)» — n x" -1 sin x dx o Jo Use the result to evaluate x 2 cos x dx. Jo 8-32 The function I P , q is defined by ha = J xp (log x)« dx in which />,^ are positive integers. Show that (p + X)I p , q + q I p , v -i = x*> +1 (log x)«. 8-33 If T M = J tan" d0, where n # 1 is a positive integer, show that tan"- 1 Jra = z Tn-2. n — 1 Use this result to evaluate tan 6 d0. 8-34 The function I m ,n is defined by Im.n = J x m (a + bx)» dx, in which m,n are positive integers. Prove that b{m + n + l)I m ,n + ma 7 m -i,n = x m (u + bx) n+l . 8-35 The function I m , n is defined by I m<n = J S in m cos" d0, in which m,n are positive integers. Show that l m ,n satisfies the reduction formula 376 / SYSTEMATIC INTEGRATION CH 8 (m + n)I m ,n — (n — l)/ m ,n-2 = sin m+1 x cos"- 1 x. Section 8-5 Evaluate the following antiderivatives by means of partial fractions. 8-36 f ^ J (x - l)(x + 2){x + 3) 8 . 37 r *;-f + v J x 2 - 5x + 6 838 f 3 f >4 . • J x 6 — 2x i + x 8-41 f ^ J x 3 - 4x 2 + 5x - 2 C x 2 + 2 8 -43j ( , + 1) ; x _ 2) d- f- *4 + 4*3+lb; 2 +12*+8 J (x 2 + 2x + 3) 2 (x + 1) Section 8-6 Evaluate the following antiderivatives by means of the substitution t — tan x/2. 8 . 45 r ** — J 3 + 5 cos x g-46 l" *! J sin x + cos x 8 . 47 f . . dx „ J 8 — 4 sin x + 7 cos x „ ,„ f sin x 848 -dx. J (1 — cos x) 3 Evaluate the following antiderivatives by means of one or more suitable sub- stitutions. r dx J V(2 + 3x - 849 , ,_ „ ^ PROBLEMS / 377 3x- 6 dx. V(x 2 - 4x + 5) dx xV(.l - * 2 ) 8-52 J v(* 2 + 2x + 5) dx 8-53 J (X-1W&- 2) 8-54 J V(* - x 2 ) dx. Use the technique of differentiation under the integral sign to evaluate the following antiderivatives. 8-55 f xe ax dx. 8-56 / (*» +* a y (Hint: start from / ^TI 2 " in Table 8 ' L) 8-57 / (a 2 -^)3/2 - (Hint : Start from / v^^- * 2 ) '" TaWe 81 - } 8-58 J xa x dx. (Hint : Start from J a x dx in Table 8- 1 .) Evaluate the following trigonometric antiderivatives. 8-59 J cos x cos 2x dx. 8-60 J sin ax sin (ox + e)dx, a, e non-zero constants. 8-61 j cos x cos 2 3x dx. 8-62 J sin x sin 2x sin 3x dx. Use the results of this chapter together with Definitions 7-4 and 7-5 of Chapter 7 to classify the following improper integrals as convergent or divergent. Determine the value of all improper integrals that are convergent stating any conditions that must be imposed to ensure this. .65 r dx J (1 + x)Vx /* 00 66 j cos x * /*oo •68 e^i 8-66 I cos x dx. 8-68 e^dx. Linear transformations and matrices 9-1 Introductory ideas This chapter is concerned with the branch of mathematics known as linear algebra. One aspect of this subject has already been encountered, namely vectors, and it is now necessary to develop in a more general context various of the ideas that were first introduced there. Central to the entire subject is the fundamental idea that the algebraic operations of addition, subtraction, and multiplication can be made meaningful when applied to an array of numbers or functions considered as a single entity. An example will help here to indicate one of the many different ways in which such an array may arise, and at the same time to show something of the type of algebra it is reasonable to want to perform on an array. Three chemical plants numbered 1 to 3 each have separate sources of raw material from which each one produces the same four products numbered 1 to 4. Let plant number m produce product number n at a cost a mn units per ton, then the production costs of the complex of chemical plants is conveniently characterized by the following table of the twelve quantities a mn . Table 91 Product Plant In writing this table or array of quantities a mn we have used the convention that the first of the two suffixes attached to the quantity a mn refers to the row number in which a mn appears, and the second to the column number. Thus the entry aii occurs in row 2, column 3, whilst the entry az% occurs in row 3, column 2. The important use of suffixes in this way is strictly analogous to a map reference in which the first entry is a latitude and the second a longitude. Thus the double suffix notation used here serves to identify the position in the array to which the associated quantity is assigned. On account of the use to which the suffixes have been put, we can now dispense with the extreme left-hand column and the top row of Table 9T, which only serve for identification purposes, and write instead 1 2 3 4 1 an an. ai3 an 2 ai\ azi 023 024 3 031 tf32 033 <?34 SEC 9-1 INTRODUCTORY IDEAS / 379 a\\ fli2 ai3 ai4 #21 022 023 ^24 031 O32 O33 O34 (9-1) with the understanding that the symbol A represents the array of quantities originally contained in Table 9-1. Returning now to the physical situation from which the array (9-1) was derived, let us suppose that at some time the quality of the raw materials changes, so that a revised Table 9T then applies in which entry a mn is replaced by the new entry b mn . Then, in terms of our concise notation, we can characterize this new situation by defining an array B as follows: B b\\ bi% biz bu bzi ba 623 624 .631 £32 £33 634. (9-2) In terms of the information at our disposal, we know that the change in the cost of product n from chemical plant m is a mn — b mn , whilst the average cost of product n from plant m is \{a mn + b mn ). Hence, if C is the array of change in costs of products and D is the array of the average costs of products, in our new notation we may write: C = on — £11 012 — bl2 021 — 621 022 — 622 O31 — 631 032 — 632 O13 — 6l3 O14 — bu 023 — ^23 024 — &24 033 — ^33 O34 — bsi (9-3) and D = |(Oll + 611) |(0l2 + 612) K«13 + 6l3) |(«14 + 614)' J(«21 + *2l) £("22 + 622) |("23 + £>23) K fl 24 + b 2i ) _i(a31 + &3l) K«32 + 632) K«33 + 633) K«34 + 634). (9-4) The form of these results is suggestive, for it would seem that by defining subtraction of two similar arrays to mean the array formed by the subtraction of corresponding elements, we may write C = A - B. (9-5) Similarly, if addition of two similar arrays is taken to mean the array formed by the addition of corresponding entries, and the multiplication of an array by a factor is taken to mean the array formed by the multiplication of each entry by that factor, we may write D = i(A + B). (9-6) 380 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 Hence, in a natural manner, we are starting to perform what appears to be conventional algebraic operations on an entire array of numbers, rather than on the individual entries in the arrays themselves. In mathematical terms an array of the form shown on the right-hand side of Eqn (9-1) is called a matrix of order (3 X 4). Here, analogous to the double suffix notation already introduced, the first number is taken to refer to the total number of rows in the matrix and the second number to refer to the total number of columns in the matrix. In terms of the simple physical situation used to introduce the notion of a matrix and its associated algebra we have so far given no indication of the interpretation to be placed upon multiplication. To elucidate the form taken by this operation when applied to matrices, we again return to our physical situation and consider the cost of buying ci, ct, cz, and a tons, respectively, of products 1, 2, 3, and 4 from each of the three chemical plants in turn. If the product costs are as shown in Table 9-1, and the costs of the orders are denoted by d\, d 2 , and d%, it is readily seen that d\ = CL\\C\ + ai2C2 + A13C3 + «14C4 d 2 = azici + a 22 c 2 + 023C3 + a 2i a (9-7) da = aaici + az 2 c 2 + 03303 + 03404. In terms of the matrix A in Eqn (9-1), the right-hand side of the first equation in (9-7) is obtained by multiplying successive entries in the first row of A by a, c 2 , cz, and c\, respectively, and then adding the four products. The same process will generate the right-hand side of both the second and third equation in (9-7), provided that the entries in the second and third rows of matrix A are used in place of those in the first row. If the four numbers ci, C2, C3, and a are arranged in a column which is then regarded as a (4 x 1) matrix, the basic operation of matrix multiplication is seen to be the multi- plication of a row of the first matrix into the column of the second to yield a single number. Thus, in terms of the first row of A expressed as a (1 x 4) matrix, we have the definition flnci + ai2C2 + a\aca + ana = [an a\ 2 #13 014] where juxtaposition is used to imply multiplication of the row and column matrices on the right-hand side. Similarly, in terms of the second row of A expressed as a (1 x 4) matrix, our definition yields SEC 9-1 INTRODUCTORY IDEAS / 381 fl2lCl + A22C2 + C123C3 + 024^4 = [021 «22 023 O24] and a corresponding result is also true for the third row of A when expressed as a (1 X 4) matrix. This special form of product is called either the inner product or the scalar product of a row matrix and a column matrix. Collectively these results suggest that we should write Eqns (9-7) in the matrix form an 012 ai3 an 021 022 023 «24 031 032 033 034 (9-8) with the understanding, as before, that multiplication is implied by juxta- position and means the inner product of rows of the first matrix with the column of the second matrix. To be consistent, equality of two matrices must then be taken to mean the equality of corresponding entries in two matrices of similar order. Using this convention our suffix notation works for us in the sense that the row number and the column number, taken in that order, which are involved in an inner product are the row and column numbers of the location into which that product is to be put. Thus in matrix equation (9-8), the number 0*2 is in row 2, column 1 of the left-hand column matrix, and it is the result of forming the inner product of row 2 of the first matrix on the right-hand side with column 1 of the second matrix. (The second matrix here only has one column.) If the column matrix with entries d\, 0*2, 03 is denoted by D, and the column matrix with entries a, C2, C3, and C4 is denoted by C, then Eqn (9-8) can be reduced to the deceptively simple equation D = AC. (9-9) It should be noticed that the resemblance to the algebra of real numbers ends here, because although multiplication is a commutative operation for real numbers, it is an easy task for the reader to verify that the matrix product CA is not even defined for the matrices involved here. Later we shall see that the non-commutative character of matrix multiplication is not the only difference between the field of real numbers and matrices. The result of matrix multiplication using numbers is illustrated in the following example: 382 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 '1 2 1 01 " 2" r 4 i 113 1 = -2 1 2 1 4J _-l_ ^ 0_ We remark in passing that the name scalar product of a row matrix and a column matrix derives from a comparison with the scalar product of two vectors. Namely, if a = aii + a 2 j + a 3 k, p = fti + /?2J + fck are two vectors, then a . p = ai/?i + a2/?2 + «3/?3, which is just the result of forming the inner product of a row matrix with entries oci, 012, 1x3 and a column matrix with entries /Si, /S2, /S3. Because of this similarity it is customary to refer to matrices comprising only one row or one column as row vectors or column vectors, respectively. Thus a general (1 x n) row vector may be considered as a matrix representation of an ordinary form of vector having n components, and which belongs to an n-dimensional space. This simple idea proves to be very fruitful in more advanced accounts of linear algebra where it leads to the study of what are called w-dimensional vector spaces. These spaces have properties very similar to those discussed in Chapter 4 and, as in three dimensions, the scalar product is related to the geometrical operation of projection in the space. In an w-dimensional vector space a fundamental set of row or column vectors called a basis takes the place of the unit vectors i, j, and k and lead to the important idea of linear independence which will be examined later. Because of the shape of the array, a general (m x ri) matrix is called a rectangular matrix. The rule just devised for the product of a (3 x 4) matrix and a (4 x 1) column vector also applies to the product AB of two rectangular matrices A and B, provided only that the number of columns in A is equal to the number of rows of B. This last requirement follows directly from the concept of an inner product which is only denned when the number of entries in a row of A is equal to the number of entries in a column of B. Once again the suffix notation works for us, because the inner product of row/> of matrix A and column q of matrix B is the number c pa , which is found in row p and column q of the product matrix C = AB. Consider the following example which illustrates the application of this rule : 12 10" 113 12 14 ["2 1" r 4 71 1 2 2 = -2 7 L-i 1. 1 1 J Then, for example, the entry in row 3, column 2 of the product matrix is the number 11, which is the inner product of row 3 of the first matrix involved in the product and column 2 of the second matrix involved in the product. SEC 9-1 INTRODUCTORY IDEAS / 383 Notice that the rule for forming an inner product also determines the shape of the product matrix C = AB, for C must have as many rows as A and as many columns as B. (Think about this and check it.) In fact these arguments may be formulated into a useful short-hand rule for checking that two matrices are conformable for multiplication, and at the same time displaying the shape of the product matrix. Rule 1 (Multiplication conformability rule) If A is an (m x n) matrix and B is a (p x q) matrix, then the matrix product AB may be formed provided n = p. The resultant product matrix then has the form (m X q). Symbolically we write this (w X n)(p X q) = (m X q) only if n = p. Thus matrix products of the form (3 X 7)(7 X 2) are conformable for multiplication and yield a (3 x 2) matrix. Matrix products of the form (7 X 3)(5 X 4) are not defined and certainly do not yield a (7 X 4) matrix. This rule has various important implications, and at this stage in our argument we would draw attention to the fact that even when for two matrices A and B, both the matrix products AB and BA are defined, they are not usually equal. Indeed, the order of the two product matrices may be different, as the following example shows. If A = "1 2" -1 , B = .4 1_ 12 1" -1 1 then AB = 1 4 r 1 -1 3 9 4 and BA = ' 5 -1 A different but most important way in which matrices can arise is in dealing with sets of simultaneous equations. Consider the following set of simultaneous equations : x + y + 2z = 4 2x — y + 3z = 9 3x — v — 2 = 2. These equations may be written in matrix form by introducing a column vector with entries x, y, z and then using the rule of matrix multiplication to write 384 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 '1 1 2" ~x~ "4" 2 -1 3 y = 9 3 -1 -1 _z_ _2_ With only a little practice, the reader will quickly learn to transcribe systems of equations into matrix form, for the patterns of numbers involved in the two numerical matrices are identical to the patterns of numbers in the equations themselves. For obvious reasons the (3 X 3) matrix is called the coefficient matrix of the simultaneous equations. As in this case there are three equations and three unknowns, the coefficient matrix is square in shape. In general the name square matrix will be given to any (n x n) matrix. If the coefficient matrix above is denoted by A, and the column vectors with entries x, y, z and 4, 9, 2 are denoted, respectively, by X and K, we arrive at the matrix equation AX = K. There is a great temptation to attempt to solve this for X by dividing by A, but as it is meaningless to divide two arrays of numbers this approach must be abandoned. Later we will return to this matter and resolve the diffi- culty by introducing the concept of the inverse of a square matrix via the operation of multiplication. One final and important way in which matrices may arise is in connection with what are called linear transformations. The idea involved here is perhaps best understood if described in terms of coordinate transformations, and for this purpose we now confine attention to a special change of coordinates in a plane. Suppose a set of rectangular cartesian axes 0{x', /} in a plane is derived from a set of rectangular cartesian axes 0{x, y} by rotation about O through an angle 6. Then under this process a point P in the (x, j)-plane with co- ordinates (f , rj) appears as a point with coordinates (£', rj') in the (x', /)- plane, as shown in Fig. 9-1. Simple geometrical considerations show that f ' = £ cos 6 — rj sin 6 rj' = | sin 6 + r\ cos 6. Now this result is true for any point P in the (x, j)-plane and its map in the (x', /)-plane, so that with complete generality we may display the effect of this coordinate transformation by writing x' = x cos 6 — y sin 6 y' = x sin 6 + y cos 6. (9- 10) If the axes 0{x', y'} and 0{x, y} are thought of as belonging to two differ- ent but superimposed planes with a common origin, then Eqns (9- 10) may SEC 9-1 INTRODUCTORY IDEAS / 385 Fig. 9-1 Rotation in a plane. be regarded as describing the relationship between points in each plane when corresponding axes are inclined at an angle d. In this respect the transformation described by Eqns (9-10) can be regarded as a function or mapping, in the sense of Chapter 2, of the set of points comprising the (x, j)-plane into the set of points comprising the (x 1 , y)-plane. The mapping is obviously one to one, and both the domain and range of the mapping is the set of points comprising the plane itself. In matrix notation the relationship becomes x L/. cos d —sin i sin 6 cos I (9-11) Hence by pursuing the simple idea of the geometrical operation of the rotation of a plane about the origin we have arrived at the matrix Ra = cos a —sin I sin 6 cos i (9-12) The idea involved here is a much more general one than that involved in simultaneous equations, since R„ contains a complete description of how an entire plane transforms or maps, together with whatever specific curves of interest it may contain. In addition to this we have also produced an example of a matrix whose entries, or elements as we shall call them henceforth, are functions of a single real variable. Accordingly, it is reasonable to ask whether any meaning can be given to the entity dR 9 /d0, where R 9 is a matrix whose elements are functions of the real variable d. This is not an abstract matter, for in mechanics and many other subjects it is frequently convenient to work with axes that are fixed in a rotating body. Indeed, the same sort of idea was implicit in the example 386 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 first used to introduce matrices. In that case by regarding the quality of the raw material as a function of the time t, we arrive at a (3 x 4) matrix A(?) whose elements a mn (t) are functions of time and any attempt to examine rates of change involves considering the meaning of dA/dt. The term linear transformation in relation to the rotation transformation (9-11) comes about as follows. Consider the effect of a rotation on the two points (a, /?) and (y, d) which map into the points (a', /?') and (/, d'), respectively. Then from Eqns (9-10) we have a' = a cos — (j sin y' = y cos — d sin . „ and j3' = a sin + ft cos 6 d' = y sin + d cos 0, whence a' + y' = (a + y) cos - (0 + <5) sin 0' + 0' = (a + y) sin + (/3 + i) cos 0. So, setting X = a y X* = J] k we have in fact shown that R 8 X + R 9 X* = R,(X + X*), (9-13) which asserts that multiplication by R 9 is distributive with respect to addition. It is the general property described by Eqn (9-13) that is used to characterize a linear transformation, and it is on account of this that R fl X is called a linear transformation of the vector X. In fact matrix multiplication is always distributive with respect to addition, as we shall see later. Thus far in our introductory presentation of matrices only intuitive argu- ments have been used. This approach has been adopted deliberately in an attempt to emphasize that matrices arise naturally, and that an obvious algebra suggests itself for their manipulation. To proceed further it now becomes necessary to formalize these ideas in exact mathematical terms, and then to develop them in systematic form to the point at which they can be used as a useful tool. 9-2 Matrix algebra In this section we return to the fundamental ideas connected with matrices and their algebra which were outlined on an intuitive basis in Section 9T. This time, however, our discussion will be more formal and, relying on our introductory account to provide motivation, we shall proceed quickly through the basic definitions and theorems, which will be illustrated by example. The problem of the solution of systems of linear equations and a discussion of linear transformations and some of their applications will be SEC 9-2 MATRIX ALGEBRA / 387 presented in subsequent sections. definition 9-1 (matrix and its order) A matrix is a rectangular array of elements ay involving m rows and n columns. The first suffix / in element ay is called the row index of the element and the second suffix y is called the column index of the element. These indices specify the row number and column number in which the element is located, with row 1 occurring at the top of the array and column 1 at the extreme left. A matrix with m rows and n columns is said to be of order mby n and this is written (m x «). The order describes the shape of the matrix. Special names are given to certain types of matrix and we now describe and give examples of some of the more frequently used terms. (a) A row matrix or row vector is any matrix of order (1 X n). The following is an example of a row vector of order (1 X 4) : [3 7 2]. (b) A column matrix or column vector is any matrix of order (n x 1). The following is an example of a column vector of order (3 x 1): 11 (c) A square matrix is any matrix of order (« X n). The following is an example of a square matrix of order (3 x 3) : "1 2 4" 3 2 5 1 3_ Three particular cases of square matrices that are worthy of note are the diagonal matrix, the symmetric matrix and the skew-symmetric matrix. Of these, the diagonal matrix has non-zero elements only on what is called the principal diagonal, which runs from the top left of the matrix to the bottom right. The principal diagonal is also often referred to as the leading diagonal. The following is an example of a diagonal matrix of order (4x4): 3 0" 2 5. The diagonal matrix in which every element of the principal diagonal is a 388 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 unity is called either the unit matrix or the identity matrix, and it is usually denoted by I. The unit matrix of order (3 x 3) thus has the form 1 = "1 -o" 1 .0 1_ A symmetric matrix is one in which the elements obey the rule at) = qju so that the pattern of numbers has a reflection symmetry about the principal diagonal. A typical symmetric matrix of order (3 X 3) is: 2 -2 A skew-symmetric matrix is one in which the elements obey the rule ay = —a n , so that the principal diagonal must contain zeros, whilst the pattern of numbers has a reflection symmetry about the principal diagonal but with a reversal of sign. A typical skew-symmetric matrix of order (3 x 3) is: 1 5" -1 -3 -5 3 0_ (d) A null matrix is the name given to a matrix of any order which con- tains only zero elements. It is usually denoted by the symbol 0. The null matrix of order (2 x 3) has the form ro o oi = [o OJ definition 9-2 (equality of matrices) Two matrices A and B with general elements ay and by, respectively, are equal only when they are both of the same order and ay = 6 y for all possible pairs of indices (i,j). Example 91 Is it possible for the following pair of matrices to be equal and, if so, for what value of a does equality occur : 5 fl3" and "5 -27 a a 1 9 1 Solution The matrices are both of the same order and hence they will be equal when their corresponding elements are equal. As corresponding ele- SEC 9-2 MATRIX ALGEBRA / 389 ments on the principal diagonal are indeed equal, we need only confine atten- tion to the off-diagonal elements. Thus the matrices will be equal if there is a common solution to the two equations a 2 = 9 and a 3 = —27. Obviously, equality will occur if a = —3. definition 9-3 (addition of matrices) Two matrices A and B with general elements ay and fey, respectively, will be said to be conformable for addition only if they are both of the same order. Their sum C = A + B is the matrix C with elements cy = ay + fey. As addition of real numbers is commutative we have ay + fey = fey + ay. This shows that addition of conformable matrices must also be commutative, whence A + B = B + A. (9-14) Now addition of real numbers is also associative so that (ay + fey) + cy = ay + (fey + cy). Hence if ay, fey, and cy are general elements of matrices A, B, and C which are conformable for addition, then this also implies that addition of matrices is associative, whence (A + B) + C = A + (B + C). Results (9T4) and (9-15) comprise our first theorem. (9-15) theorem 91 (matrix addition is both commutative and associative) If A, B, and C are matrices which are conformable for addition, then (a) A + B = B + A (Matrix Addition is Commutative) ; (b) (A + B) + C = A + (B + C) (Matrix Addition is Associative). Example 9-2 Determine the constants a, b, c, and d in order that the following matrix equation should be valid : '0 a 3" fe 2 2 + "4 3 5" 7 3 5 Solution Adding the two matrices on the left-hand side we arrive at the matrix equation c (a + 1) 5 .(fe +1) 3 (d + 2). 3 5" 3 5 Equating corresponding elements shows that a = 2, fe = 6, c = 4, and d=3. definition 9-4 (multiplication by scalar) If k is a scalar and the matrix 390 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 A has elements ay, then the matrix B = kA is the same order as A and has elements kay. Example 9-3 Determine 2A + 5B, given that: A = "1 2" .3 4_ and B 1 3" 4 2_ ution 2A + 5B = 2 '1 2" .3 4_ + 5 "-1 4 3" 2_ or, whence 2A + 5B = 2A + 5B = "2 4" + "-5 15" J> 8. . 20 10_ -3 19" 26 18 definition 9-5 (difference of two matrices) If the matrices A and B are both of the same order, then their difference A — B is defined by the relation A-B = A + (-l)B. Example 9-4 Determine A — B, given that: A = Solution "1 3" 4 -2 _1 6_ and B = '4 2" 3 1 -2 B "1 3" "4 T 4 -2 + (-1) 3 1 .1 6_ -2_ and so B = "1 3" 4 -2 + 1 6_ 4 -2" 3 -1 = 2_ -3 1" 1 -3 1 8 definition 9-6 (matrix multiplication) The two matrices A and B with SEC 9-2 MATRIX ALGEBRA / 391 general elements ay and 6y are said to be conformable for matrix multiplica- tion provided that the number of columns in A equals the number of rows in B. If A is of order (m x «) and B is of order (n X r), then the matrix product AB is the matrix C of order (m x r) with elements cy, where ct] = anbij + aabzj + • • • + Qtnbnj- The number cy is called the inner product of the ith row of A with the yth column of B. Example 9-5 Determine A + BC, given that : A = "1 4" B = "1 4 2" |_2 3J [.2 1 lj and C = '3 4 1 2 Solution Matrix B is of order (2 x 3) and matrix C is of order (3 x 2), showing that BC are conformable for multiplication. We have BC = "3 4" "1 4 2" "7 8" 1 = .2 1 1. 7 10 .0 2_ and so A + BC = '1 4" "7 81 "8 12 + = 2 3. 7 IOJ 9 13 On account of the fact that matrix multiplication is not normally com- mutative, it is important to use a terminology that distinguishes between matrix multipliers that appear on the left or the right in a matrix product. This is achieved by adopting the convention that when matrix B is multiplied by matrix A from the left to form the product AB, we shall say that B ispre- multiplied by A. Conversely, when the matrix B is multiplied by A from the right to form the product BA, we shall say that B is post-multiplied by A. The most important results concerning matrix multiplication are con- tained in the following theorem, which asserts that matrix multiplication is distributive with respect to addition and that it is also associative. theorem 9-2 (matrix multiplication is distributive and associative) If matrices A, B, and C are conformable for multiplication, then : (a) matrix multiplication is distributive with respect to addition, so that A(B + C) = AB + AC; 392 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 (b) matrix multiplication is associative, so that A(BC) = (AB)C. Proof To establish result (a) let B and C be of order (m x «), and denote their general elements by b t j and cy, respectively, so that the general element of B + C is b tl + c tl . Then if A is of order (r x m) with general element ay, and d v , is the general element of D = A(B + C) which is of order (r x «), we have from Definition 9-6 that dij = an(bij + cy) + a^by + c 2 /) + • • • + a tm (b m j + c m] ). Performing the indicated multiplications and re-grouping we have dy = (aabij + atzbz] + • • • + aimbm)) + (aacij + awcy + " • • + atmCmj). However, from Definition 9-6 this is seen to be equivalent to D = AB + AC, which was to be proved. Result (b) may be established in similar fashion, and to achieve this we assume A, B, and C to be respectively of order (p X q), (q X m), and (m X n) with general elements ay, by, and cy. From Definition 9-6 we know that the general element occurring in row i, column j of the product BC has the form bilCij + bi2C2} + - • ■ + bimCmj, so that the general element dy occurring in row / column j of the product D = A(BC) which is of order (p X ri) must have the form dy = aa(biiaj + bncij +•••-)- b\ m c m j) + awdbzicij + biicij + • • • + b?, m c m j) + + atqibqlCl] + b Q 2C2j + • • • + bqmCmj). Re-grouping of the terms then gives dy = (aabn + atzHx + • • • + atqb q i)cij + (anbi2 + a«2&22 4- • • • + a tg b g 2)c2} + + (anbim + aizbzm + • ■ ■ + ai g b gm )c m j. Appealing once more to Definition 9-6 we find that this is equivalent to D = (AB)C, which was to be proved. SEC 9-2 MATRIX ALGEBRA / 393 Example 9-6 If A = [1 2], B = verify that (a) A(B + C) = AB + AC, (b) A(BC) = (AB)C. Solution (a) We have '3 4" 2 3 " 1 3" , c = "2 r L-1 2J L3 lj B + C = so that A(B + C) = [1 2] '3 4" 2 3 = [7 10]; whereas AB = [-1 7] and AC = [8 3], so that AB + AC = [7 10]. (b) We have BC = so that A(BC) = [1 2] whereas AB = [1 2] whence (AB)C=[-1 7] " 1 3" "2 r "11 4" .-1 2_ .3 1. . 4 1. 11 4" 4 1 = [19 6]; 1 3" -1 2 = [-1 7], "2 r 3 1 = [19 6]. 394 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 An important matrix operation involves the interchange of rows and columns of a matrix, thereby changing a matrix of order (m x ri) into one of order (n x rri). Thus a row vector is changed into a column vector and a matrix of order (3 X 2) is changed into a matrix of order (2 x 3). This operation is called the operation of transposition and is denoted by the addition of a prime to the matrix in question. definition 9-7 (transposition operation) If A is a matrix of order (m X ri), then its transpose A' is the matrix of order (n x rri) which is derived from A by the interchange of rows and columns. Symbolically, if ay is the element in the /th row andy'th column of A, then aji is the element in the corresponding position in A'. Example 9-7 Find A' and (A')', given that: "1 4 7 31 2-14-1 A = Solution Writing the first row in place of the first column and the second row in place of the second column, as is required by Definition 9-7, we find that "1 2" 4 -1 A' = 7 4 .3 -1. The same argument shows that "1 4 7 3" 2 -1 4 -1 (A')' = It is obvious from the definition of the transpose operation that (A')' = A, as was indeed illustrated in the last example. It is also obvious from Definitions 9-3 and 9-5 that if A and B are conformable for addition, then (A ± B)' = A ± B'. (9-16) Now if A is of order (m x ri) and B is of order (« x r), and the general matrix elements are ay and by, respectively, the element cy in the rth row and y'th column of the matrix product C = AB is cy = aabij + aabzj + • • • + «*»£>»;•• By definition, this is the element that will appear in the /'th row and fth column of (AB)'. Applying the transpose operation separately to A and B we find that A' SEC 9-2 MATRIX ALGEBRA / 395 is of order (n X m) and B' is of order (r x «), so that only the matrix product B'A' is conformable. Now the elements of they'th row of B' are the elements of they'th column of B, and the elements of the /th column of A' are the elements of the /th row of A, so that the element dji in the/'th row and /th column of the product D = B'A' must be dji = bijOa + Z>2/#«2 + • ' ' + bnjdin or, equivalently, da = anbij + a^b^ + ■ • • + ai n b n j. However, equating elements in the y'th row and /th column of (AB)' and B'A' we find that cy = dji, and so (AB)' = B'A'. (9-17) We summarize these results into a final theorem. theorem 9-3 (properties of transposition operation) If A and B are con- formable for addition or multiplication, as required, then : (a) (A')' = A (Transposition is Reflexive); (b) (A + B)' = A' + B'; (c) (A - B)' = A' - B'; (d) (AB)' = B'A'. Example 9-8 Verify that (AB)' = B'A', given that: A = "1 3" Solution We have AB = so that (AB)' = However, B'A' = and B = '2 -1' 3 1 "1 3" "2 -n "11 2" - 2 4. .3 i_ .16 2_ 11 16" 2 2 " 2 3" "1 2" "11 16" -- 1 1. .3 4_ . 2 2. which is equal to (AB)'. 396 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 9-3 Determinants The notion of a determinant, when first introduced in Chapter 4, was that of a single number associated with a square array of numbers. In its subsequent application in that chapter it was used in a subsidiary role to simplify the manipulation of the vector product, and in that capacity it gave rise to a vector. The determinant made yet another appearance in Chapter 5 when, in connection with the change of variable in partial differentiation, it contained functions as elements, and was called a Jacobian. In this role it is often called a functional determinant, and it gives rise to a. function that is closely related to the one-to-one nature of the change of variables involved. These are but two of the situations in which determinants occur in different branches of mathematics, and it is the object of this section to examine some of the most important algebraic properties of determinants. Our results will only be proved for determinants of order 3 but they are, in fact, all true for determinants of any order. We begin by rewriting Definition 4T6 using the matrix element notation as follows : definition 9-8 (third order determinant) Let A be the square matrix of order (3 x 3) an 0i2 ai3 A = 021 «22 023 _031 ^32 «33. Then the expression an «i2 «i3 A 1 = «21 022 #23 031 032 O33 is called the third order determinant associated with the square matrix A, and it is defined to be the number A I = on 021 023 + O13 031 033 #22 023 O21 023 O21 fl22 — 012 + O13 032 033 «31 O33 «31 «32 where for any numbers a, b, c, and d, a b = ad — be. The notation det A is also frequently used in place of | A | to signify the determinant of A. SEC 9-3 DETERMINANTS / 397 This definition has a number of consequences of considerable value in simplifying the manipulation of determinants. Let us confine attention to the third order determinant which is typical of all orders of determinant, and expand the last line of Definition 9-8. We have an ai2 «13 021 022 023 031 032 «33 = «11«22033 — 011^23032 + #12023031 — 012021033 + 013021032 — 013022031, (9'18) showing that one, and only one, element of each row and each column of the determinant appears in each of the products on the right-hand side defining | A |. Hence, if any row or column of a determinant is multiplied by a factor A, then the value of the determinant is multiplied by A, since a factor A will appear in each product on the right-hand side of Eqn (9-18). Conversely, if any row or column of a determinant is divided by a factor A, then the value of the determinant is divided by A. It is also obvious from Eqn (9-18) that | A | = if all the elements of a row or column of | A | are zero, or if all the corresponding elements of two rows or columns of | A | are equal. Suppose, for example, that A = 3 and i 7 \ i X. Al = 1 2 3 2. h h 4, 1: . 2 i! 4- * Then it is easily shown that | A | = — 5, so that 3 | A | = —15. Now this result could have been obtained equally well by using the above argument and multiplying any row or any column of | A | by 3. If the first row of | A | is multiplied by 3 we have ^ / N 31 Al = 3 6 9 2 1 1 4 1 2 "L J = -15 ■ (\ M- or, alternatively, if the third column is multiplied by 3 we have 3 I Al = 1 2 9 2 1 3 4 1 6 = -15. It is readily verified from Eqn (91 8) that interchanging any two rows or columns of | A | changes its sign. Thus we have 398 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 1 4 3 2 1 4 = _ 9 4 -6 1 3 4 3 -2 4 2 4 1 9-6 4 in which the determinant on the left has been obtained from the one on the right by interchanging the second and third columns. A particularly simple case arises when | A | is the determinant associated with a diagonal matrix A, for then all off-diagonal elements are automatically zero. This implies that Eqn (9-18) reduces to | A | = fliio 2 2#33, which is just the product of the elements of the principal diagonal. Thus if A| = then I A I = (3)(-2)(4) = -24. Another useful result is that the value of a determinant is unchanged when elements of a row (or column) have added to them some multiple of the corresponding elements of some other row (or column). We prove this result by direct expansion in the following typical case. Consider the determinant I D I obtained from | A | by adding to the elements of column 3 of | A |, A times the corresponding elements in column 2 of | A | to obtain : #11 #12 #13 + Xa\z D I = 021 #22 #23 + Afl22 031 #32 «33 + Aa32 Then at once Definition 9-8 asserts that D I = an #22 #23 032 033 + Oil 022 A022 O32 Afl32 — #12 021 023 #31 #33 — O12 021 Xa 031 Xa 22 32 + #13 021 #2 031 #3 2 2 + Afll2 #21 #31 #22 #32 Now the second term on the right-hand side is zero, whilst the fourth and last terms cancel leaving only three remaining terms. These are seen to com- prise the definition of | A |, so that we have proved that | D | = | A | or, in symbols, that On #12 #13 + Afli2 #21 #22 #23 + A022 #31 #32 #33 + Afl32 #11 012 #13 #21 022 #23 #31 #32 #33 SEC 9-3 DETERMINANTS / 399 A similar result would have been obtained had different columns been used or, indeed, had rows been used instead of columns. An obvious implication of this result is that if a row (or column) of a determinant is expressible as the sum of multiples of other rows (or columns) of the determinant, then the value of the determinant must be zero. This is so because by subtraction of this sum of multiples of other rows (or columns) from the row (or column) in question, it is possible to produce a row (or column) containing only zero elements. Let us illustrate how a determinant may be simplified by means of this result. Consider the determinant 7 18 8 A I = 1 5 7 3 9 4 Subtracting twice the third row from the first row we find 1 A| = 1 5 7 3 9 4 whence | A | = —43. Let us summarize our findings in the form of a theorem. theorem 9-4 (properties of determinants) (a) A determinant in which all the elements of a row or column are zero, itself has the value zero ; (b) A determinant in which all corresponding elements in two rows (or columns) are equal has the value zero ; (c) If the elements of a row (or column) of a determinant are multiplied by a factor X, then the value of the determinant is multiplied by X; (d) The value of a determinant associated with a diagonal matrix is equal to the product of the elements on the principal diagonal ; (e) The value of a determinant is unaltered by adding to the elements of any row (or column), a constant multiple of the corresponding elements of any other row (or column) ; (f) If a row (or column) of a determinant is expressible as the sum of multiples of other rows (or columns) of the determinant, then its value is zero. Higher order determinants can be defined with exactly similar properties to those enumerated in the theorem above. Thus the determinant | A | of order n associated with the square matrix A of order (« x n) has «! terms in 400 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 its expansion, each of which contains one, and only one, element from each row and column of A. definition 9-9 (fourth order determinant) If A is the square matrix of order (4 X 4) A = All 012 tfl3 «14 021 022 «23 «24 «31 032 «33 #34 JZil 042 «43 G44_ then the expression on ai2 ai3 an «21 «22 023 024 031 032 033 034 041 O42 043 O44 A| = is called the fourth order determinant associated with the square matrix A, and it is defined to be the number A 1 = on 022 023 024 032 O33 «34 — 012 042 O43 O44 + Oi3 021 023 O24 O31 O33 O34 041 O43 044 021 «22 024 031 032 O34 — fli4 «41 042 O44 021 O22 023 031 O32 O33 041 O42 O43 An inductive argument applied to Definitions 9-8 and 9-9 shows one way in which higher order determinants may be defined, but clearly our notation needs some simplification to avoid unwieldy expressions of the type given above. This is achieved by the introduction of the minor and the cof actor of an element of a square matrix. definition 9-10 (minors and cofactors) Let A be a square matrix of order (« x ri) with general element ay, and let | A | be the determinant of order n associated with A. Denote by A/y the determinant of order (n — 1) associated with the matrix of order (n — 1 , n — 1) derived from A by the deletion of row / and column j. Then My is called the minor of the element at) of A, and Ay = (— l) <+ ^My is called the cofactor of the element ay of A. Example 9-9 Find the minors and cofactors of the matrix SEC 9-3 DETERMINANTS / 401 "1 3" 2 1 4 1 2 1. A = Solution The minor Mu is derived from A by deleting row 1 and column 1 and equating Mn to the determinant formed by the remaining elements. That is, Mn = 1 4 2 1 Similarly, minor Mi 2 is derived from A by deleting row 1 and column 2 and equating M\i to the determinant formed by the remaining elements. That is, M\i — 2 4 1 1 = -2. Identical reasoning then shows that M\z = 3, M 2 i = — 6, M22 = —2, M23 = 2, M31 = — 3, M32 = —2, and M33 = 1. As the cofactors At} = (-l)<+'My, it follows that A n = -l,Ai2 = 2, An = 3, An = 6,^22 = -2, A23 — —2, Asi = —3, A32 = 2, and ^33 = 1. If A is a square matrix with general element ay and corresponding co- factor Atj, it is easily seen that: (a) if A is of order (2 x 2), then | A | = anAn + anA\%, (b) if A is of order (3 x 3), then | A | = a\\A\\ + CI12A12 + 013^13, (c) if A is of order (4 x 4), then | A | = auAu + CI12A12 + 013^13 + auAu- This suggests that if A is of order (« x «), then for | A | we could adopt the definition A I = auAu + CI12A12 + + a\ n A\ n . (919) This is a true statement and could be accepted as a definition, but it is not the most general one which may be adopted. To see this we return to Eqn (9T8) and re-arrange the terms on the right-hand side to give I A I = a3l(ai2«23 — Al3«22) — «32(aiia23 — tfl3«2l) + a%%(a\\a22 — 012(121). Hence, working backwards, we have A I = 031 fll2 Ol3 — #32 fl22 #23 flu #13 + #33 #21 #23 #11 #12 #21 #22 402 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 thereby showing that it is also true that I A I = «31^31 + CI32A32 + 033^33. (9-20) We now have two equivalent but different looking expressions for | A | either of which could be taken as the definition of | A | . The expression in (b) above involves the elements and cofactors of the first row of A and the expression in Eqn (9-20) involves the elements and cofactors of the third row of A. A repetition of this argument involving other rearrangements of the terms of Eqn (9-18) shows that | A | may be evaluated as the sum of the products of the elements and their cofactors of any row or column of A. This very valuable and general result is known as the Laplace expansion theorem, and it is true for determinants of any order though we have only proved it for a third order determinant. Let us state this result formally as it would apply to a determinant of order n. theorem 9-5 (Laplace expansion theorem) The determinant | A ] associated with any (n X n) square matrix A is obtained by summing the products of the elements and their cofactors in any row or column of A. If A has the general element ay and the corresponding cofactor is Ay, then this result is equivalent to : Expansion by elements of a row n I A I = J dijAlj 3 = 1 for i=l,2,. . ., n; Expansion by elements of a column n I A I = 2 oyAij i=l fory'= 1,2,. . ., n. Example 9- 10 Evaluate the determinant Al = 1 4 2 3 -2 1 1 5 2 by expanding it (a) in terms of the elements of row 2, and (b) in terms of the elements of column 3. SEC 9-3 DETERMINANTS / 403 Solution (a) | A | = (b) | A | = 3 4 2 -2 1 2 - 1 1 4 5 2 1 2 1 5 3 -2 1 4 1 4 2 - 1 + 2 1 5 1 5 3 -2 = 5 = 5. An important extension of Theorem 9-5 asserts that the sum of the products of the elements of any row (or column) of a square matrix A with the cofactors corresponding to the elements of a different row (or column) is zero. This is easily proved as follows. Let A be a matrix of order (n x «), and let B be obtained from A by re- placing row q of A by row/?. Then B has the elements of rows/? and q equal, so that by Theorem 9-4 (b) it follows that | B | = 0. Expanding | B | in terms of elements of row q by Theorem 9-5 we then find B | = ClpiAqi + dpzAqt + T Qpn-Aqn = 0, which was to be proved. A similar argument establishes the corresponding result for columns and so we have proved our assertion. theorem 9-6 The sum of the products of the elements of any row (or column) of a square matrix A with the cofactors corresponding to the ele- ments of a different row (or column) is zero. Symbolically, if a, ; - is the general element of A and Ay is its cofactor, then : Expansion by elements of a row n 2 a P iAqi = i = l if p ^ q; and Expansion by elements of a column n 2 OipAiq = i = l if p ^- q. Example 9-11 Verify that the sum of the products of the elements of column 1 and the corresponding cofactors of column 2 of the following matrix is zero : 404 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 "1 3 2" 4 1 2 _3 1 3_ A = Solution The elements of column 1 are an = 1, 021 = 4, 031 = 3. The cofactors corresponding to the elements of the second column are A 12 = —6, A22 = — 3, A32 = 6. Hence aiiAu + (I21A22 + C131A32 = (l)(-6) + (4)(-3) + (3)(6) = 0. 9 -4 Linear dependence and linear independence We are now in a position to discuss the important idea of linear independence. This concept has already been used implicitly in Chapter 4 when the three mutually orthogonal unit vectors i, j, and k were introduced comprising what in linear algebra is called a basis for the vector space. By this we mean that all other vectors are expressible in terms of the vectors comprising the basis through the operations of scaling and vector addition, but that no member of the basis itself is expressible in terms of the other members of the basis. Thus no choice of the scalars X, fi can ever make the vectors i and X\ + juk equal. It is in this sense that the unit vectors i, j, k comprising the basis for ordinary vector analysis are linearly independent, and obviously any other set of unit vectors a, b, c which are not co-planar, and no two of which are parallel, would serve equally well as a basis for this space. The same idea carries across to matrices when the term vector is inter- preted to mean either a matrix row vector or a matrix column vector. Thus the three column vectors Ci = c 2 = and C3 = are not linearly independent because C3 = Ci + 2C2, whereas the three row vectors Ri = [1 0], R 2 = [0 1 0], and R3 = [0 1] are obviously linearly independent, because no choice of the scalars A, fi can ever make the vectors Ri and AR2 + ^3 equal. It is these ideas that underlie the formulation of the following definition. definition 9T1 (linear dependence and linear independence) The set of n matrix row or column vectors Vi, V2, . . ., V» which are conformable for addition will be said to be linearly dependent if there exist n scalars ai, «2, . . ., a„, not all zero, such that SEC 9-4 LINEAR DEPENDENCE AND LINEAR INDEPENDENCE / 405 ociVi + a 2 V 2 + • • • + a„V„ = 0. When no such set of scalars exists, so that this relationship is only true when ai = 0C2 = • • • = «.„ = 0, then the n matrix vectors Vi, V2, . . ., V« will be said to be linearly independent. In the event that the n matrix vectors in Definition 9-11 represent the rows or columns of a rectangular matrix A, the linear dependence or inde- pendence of the vectors Vi, V2, . . ., V» becomes a statement about the linear dependence or independence of the rows or columns of A. In particular, if A is a square matrix, and linear dependence exists between its rows (or columns), then by definition it is possible to express at least one row (or column) of A as the sum of multiples of the other rows (or columns). Thus from Theorem 9-4 (f), we see that linear dependence amongst the rows or columns of a square matrix A implies the condition | A | =0. Similarly, if I A I t^ then the rows and columns of A cannot be linearly dependent. theorem 9-7 (test for linear independence) The rows and columns of a square matrix A are linearly independent if, and only if, | A | ^ 0. Conversely, linear dependence is implied between rows or columns of a square matrix A if I A I = 0. Example 912 Test the following matrices for linear independence between rows or columns: A = 1 4 3" 2 18 7 4 -6 1 and B "1 1 0" 3 2 1 _1 1 3_ Solution We shall apply Theorem 9-7 by examining | A | and | B |. A simple calculation shows that | A | = 0, so that linear dependence exists between either the rows or the columns of A. In fact, denoting the columns of A by Ci, C 2 , and C 3 , we have C 2 = 2(C 3 — Ci). As | B | = — 3 the rows and columns of B are linearly independent. Let us now give consideration to any linear independence that may exist between the rows or columns of a rectangular matrix A of order (m X «). If r rows (or columns) of A are linearly independent, where r < min (w, «), then Theorem 9-7 implies that there is at least one determinant of order r that may be formed by taking these r rows (or columns) which is non-zero, but that all determinants of order greater than r must of necessity vanish. This number r is called the rank of the matrix A, and it represents the greatest number of linearly independent rows or columns existing in A. If, for example, A is a square matrix of order (« x n) and | A | ^ 0, this implies that the rank of A must be n. 406 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 definition 9-12 (rank of a matrix) The rank r of a matrix A is the greatest number of linearly independent rows or columns that exist in the matrix A. Numerically, r is equal to the order of the largest order non-vanishing deter- minant [ B [ associated with any square matrix B which can be constructed from A by combination of r rows and r columns. Example 9-13 Find the rank of the following matrix: 10 1 01 •111-11 A = -301-10 Solution The largest order of determinant that can be constructed in this case from the rows and columns of A is 3. As there is certainly one such determinant that is non-vanishing, namely the one associated with the first three columns of A, the rank of A must be 3. The fact that other non-vanishing determinants of order three may be constructed from A is immaterial (e.g., take the last three columns). 9-5 I nverse and adjoint matrix The operation of division is not denned for matrices, but a multiplicative inverse matrix denoted by A -1 can be denned for any square matrix A for which | A | t^ 0. This multiplicative inverse A~* is unique and has the pro- perty that A-iA = AA" 1 = I where I is the unit matrix, and it is defined in terms of what is called the matrix adjoint to A. The uniqueness follows from the fact that if B and C are each inverse to A, then B(AC) = (BA)C, so that BI = IC, or B = C. definition 913 (adjoint matrix) Let A be a square matrix, then the transpose of the matrix of cofactors of A is called the matrix adjoint to A, and it is denoted by adj A. A square matrix and its adjoint are both of the same order. Example 9-14 Find the matrix adjoint to: A = "1 2 r 3 1 2 1 2_ Solution The cofactors Ay of A are: An = 2, A\% — —6, A\z = 1, A<i\ = —3, A 22 = 0, Avz = 3, Azx — — 1, Aii = 3, and A 33 = —5. Hence the SEC 9-5 INVERSE AND ADJOINT MATRIX / 407 matrix of cofactors has the form "2-6 1" -3 3 -1 3 -5 so that its transpose, which by definition is adj A, is " 2 -3 -1" adj A = -6 3 _ 1 3 -5_ Now from Theorems 9-5 and 9-6, we see that the effect of forming either the product (adj A)A or the product A(adj A) is to produce a diagonal matrix in which each element of the leading diagonal is | A |. That is, we have shown that (adj A)A = A(adj A) = A A (9-21) (9-22) whence (adj A)A = A(adj A) = | A 1 1. Thus, provided | A | ^ 0, by writing | A| we arrive at the result A _1 A = AA" 1 = I. (9-23) The matrix A -1 is called the matrix inverse to A and it is only defined fdr square matrices A for which | A | ^ 0. A square matrix whose associated determinant is non-vanishing is called a non-singular matrix. Although the inverse matrix is only defined for non-singular square matrices, the adjoint matrix is defined for any square matrix, irrespective of whether or not it is non-singular. definition 914 (inverse matrix) If A is a square matrix for which | A | ,£ 0, the matrix inverse to A which is denoted by A -1 is defined by the relationship 408 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 A-l = ^- Example 9-15 Find the matrix inverse to the matrix A of Example 9-14 above. Solution It is easily found from the cofactors already computed that j A | = —9. This follows, for example, by expanding | A | in terms of ele- ments of the first row to obtain | A [ — (1)(2) + (2)(— 6) + (1)(1) = —9. Hence from Definition 9-14, we have A-i = ^ = (-l/9) 2 -3 -1 -6 3 1 3 -5 -2/9 1/3 1/9 2/3 -1/3 -1/9 -1/3 5/9. The steps in the determination of an inverse matrix are perhaps best remembered in the form of a rule. Rule 2 (Determination of inverse matrix) To determine the matrix A" 1 which is inverse to the square matrix A proceed as follows : (a) Construct the matrix of cofactors of A; (b) Transpose the matrix of cofactors of A to obtain adj A; (c) Calculate [ A | and, if it is not zero, divide adj A by I A I to obtain A- 1 ; (d) If | A | = 0, then A -1 is not defined. It is a trivial consequence of Definition 9-14 and the fact that for any square matrix A, | A | = | A' | (see Problem 9-34), that (A-i)' = (A')" 1 . (9-24) Also, if A and B are non-singular matrices of the same order, then (B-iA-^AB = J = AB(B- 1 A" 1 ), showing that (AB)- 1 = B^A 1 . (9-25) Accepting the result of Problem 9-35 as being valid for square matrices A, B of arbitrary order (« X «), so that | AB | = | A 1 1 B |, we are able to prove another useful result concerning the inverse matrix. If | A | ■# 0, then AA _1 = I showing that | AA _1 | = 1, or | A [| A -1 | = 1. It follows from this that: SEC 9-5 INVERSE AND ADJOINT MATRIX / 409 A I = 1/1 A-M. (9-26) One final result follows directly from the obvious fact that (A -1 ) - ^ -1 = I, which is always true provided | A -1 | # 0. If we post-multiply this result by A we find (A- 1 )-^-^ = IA giving (A-i)-iI = A, whence (A- 1 )- 1 = A. (9-27) theorem 9-8 (properties of inverse matrix) If A and B are nort-singular square matrices of the same order, then : (a) AA- 1 = A"!A = I; (b) (AB)' 1 == B-U- 1 ; (c) (A-i)' = (A')" 1 ; (d) (A-i)-i = A; (e) | A | = 1/1 A-i I. Example 9-16 Verify that (A- 1 )' = (A')- 1 , given that "1 3" 2 4 Solution ' We have A- 1 = ~-2 3/2" 1 -I/ 2 . ' so that (A' 1 )' = "-2 1 " .3/2 -1/2. However, A' = "1 3 2" 4 » 410 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 SO that ~-2 1 (A')" 1 = .3/2 -l/2_ confirming that (A" 1 )' = (A')" 1 , 9-6 Matrix functions of a single variable All the matrix results that have been obtained so far are equally valid whether applied to matrices whose elements are numerical Constants, or to matrices whose elements are functions of a single variable t. When the latter is the case it is convenient to copy the notation for a function used hitherto, and to represent the matrix by writing A(t). In many respects it is convenient to regard all matrices in this manner, since matrices with constant number elements correspond to the subset of all possible matrices A(f) in which all elements are constant functions. When the elements of A(t) are all differentiable with respect to t in some interval, it is reasonable to define a derivative of A(f) with respect to t, and for this purpose we shall work with the following definition. definition 9-15 (derivative of a matrix) Let A(/) be a matrix of order (m X n) whose elements ay(0 are all differentiable functions of t in some common interval to < t < t\. Then the derivative of A(i) with respect to t in t o < t < ti, written dAjdt, is defined to be the matrix of order (m X n) with elements day/d/. The matrix A(0 will be said to be differentiable in to < t < t\. Symbolically this result becomes: d^ dr a\\{i) ai 2 (0 tf2l(0 022(0 a m i(t) a m z(i) ai„(0 «2n(0 amn(t) dan At dai2 dt ' dai n ' ' ~dT 6021 dt dfl22 ~dT ' da 2n ' dt da m x dt da m i dt ' dOmn ' ' dt for t < t < h. Example 917 Find dA/df given that: "cosh t sin t cosh 2t~ A(t) = sinh t cos t sinh 2t_ SEC 9-6 MATRIX FUNCTIONS OF A SINGLE VARIABLE / 411 Solution From Definition 9-15 we have at once: "sinh t cost2 sinh It cosh t —sin t 2 cosh 2; dA dt for all t. If an(t) and bi]{t) are differentiable functions in some common interval t < t < ti, then we know from the work of Chapter 5 that d day dbu 6t iatj±bii) = -dF ± ^' and so t- (fliifci^ + a<2^2; + • • • + a in bnj) = d? dan da« 2 , , , do<„ -( Consequently, it then follows directly from Definitions 9-3 to 9-6 that for suitably conformable matrices A and B: cl „ x dA dB. T (A ± B) = — ± — ; dt dt dt — (AA) = A — -, for any constant scalar A; dt dt J and d , 4 ™ dA„ dB - (AB) = — B + A — • dt dt dt (9-28) (9-29) (9-30) Notice that in general dA 2 /d? =£ 2A(dA/dr), for setting B = A in Eqn (9-30) yields dA 2 dA A dA dt dt dt (931) It also follows that if K is a constant matrix in the sense that its elements are constant functions of t, then dK d7 = 0. (9-32) 412 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 Using the results of Theorems 9-3 (d) and 9-8 (b) together with Eqn (9-30), we can derive two useful results. The first result applies to any two matrices A, B which are conformable for multiplication and is d d dB' dA' - ( AB)'-- ( <BX,= -A' + B'-; (9-33) the second result applies to any two non-singular square matrices which are conformable for multiplication and is a a dB^ 1 dA -1 f- (AB)- 1 = f- (B-iA-i) = — - A-i + B-i -—• (9-34) dt dt dt dt We now. summarize these results in the form of a general theorem. theorem 9-9 (properties of matrix differentiation) Let A(?) and B(t) be suitably conformable matrices which are differentiable in some common interval to < t < ti, and let K be a constant matrix and X a scalar. Then throughout the interval to < t < h: d dA dB (a) dl (A + B) =d7 + d7 ; ^ d , k m dA dB (b) dl (A - B)= dF-d7 ; (c) — (XX) = X — - ; (X a constant scalar) dt dt d dA dB (d) _ (4B) = _B + A-; dK (e) — — = 0; (K a constant matrix) d dB' dA' (f) dl (AB) ' = ^ A ' + B '^ : d dB^ 1 dA" 1 ®dl (AB) " 1= ^- A " 1 + B "^T' where A and B are non-singular matrices. Example 9-18 Verify Eqn (9-33) for the matrices A(0 = t 11 "2 ?2 and B(f) = -1 t 2 ; a l SEC 9-7 SOLUTION OF SYSTEMS OF LINEAR EQUATIONS / 413 Solution We have It + fi 1 + > 3 " /5_2 AB = so that (AB)' = and thus P/ + /3 /5_ 2 - 1 +? 3 (AB)' = At Now, A'(0 = '2 + 3/ 2 5/ 4 " 3/ 2 1 -1 J > 2 so that '1 _0 It Using these results we have TO 3? 2 ' and B'(/) = "2 f 3 " ?2 1 dA df and dJT At 3/ 2 " 2? dB' k , „, dA' — A' + B' — At At 2f 't -1" 1 r 2 + '2 ? 3 " / 2 1 1 0" 2t '3/ 2 3f«" 2/ 2 -2/ . '2 + 3/ 2 5/ 4 " 3f 2 + 2 2?4" ? 2 2/ -i(AB)'. 9 - 7 Solution of systems of linear equations A system of m linear inhomogeneous equations in the n variables .vi, x%, . . ., x n has the general form flll-Vi + «i 2 .V2 + • • • + ainXn = k]_ A21-V1 + A22-V2 + • • • + at n Xn = k 2 (9-35) OmlXi + a m 2X2 + ■ • • + QmnXn — k m , 414 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 where the term inhomogeneous refers to the fact that not all of the numbers k\, k->, . . ., k m are zero. Defining the matrices "A-i A 2 "«n a\i . . . ay, ' ■\"i A = 021 an . . . 0211 X = V2 a m \ a,n2 ■ ■ ■ amn_ ^n is system can be written AX = K. and K (9-36) Here A is called the coefficient matrix, X the solution vector, and K the inhomogeneous vector. In the event that m = n and | A | ^ it follows that A -1 exists, so that pre-multiplication of Eqn (9-36) by A -1 gives for the solution vector, A iK. (9-37) This method of solution is of more theoretical than practical interest because the task of computing A -1 becomes prohibitive when n is much greater than three. However, one useful method of solution for small systems of such equations (« < 4) known as Cramer's rule may be deduced from Eqn (9-37). Consideration of Eqn (9-37) and Definitions 9- 14 shows that x{, the ;'th element in the solution vector X, is given by (kiAu + A-2^2i + • ' • + k n A ni ) (9-38) for / = 1,2,. . ., «, where An is the cofactor of A corresponding to element fly. Using Laplace's expansion theorem we then see that the numerator of Eqn (9-38) is simply the expansion of | A< |, where At denotes the matrix derived from A by replacing the /th column of A by the column vector K. Thus we have derived the simple result for i'=l,2,.. ., n, (9-39) which expresses the elements of the solution vector X of Eqn (9-35) in terms of determinants. Rule 3 (Cramer's rule) To solve n linear inhomogeneous equations in n variables proceed as follows: (a) Compute | A | the determinant of the coefficient matrix and, if SEC 9-7 SOLUTION OF SYSTEMS OF LINEAR EQUATIONS / 415 | A | ^ 0, proceed to the next step; (b) Compute the modified coefficient determinants | A,- |, / = l, 2, . . ., n where A* is derived from A by replacing the ;'th column of A by the inhomogeneous vector K; (c) Then the solutions .vi, .\2, . . ., x n are given by for ;' = 1,2,. , ., n; (d) If | A | = the method fails. Example 919 Use Cramer's rule to solve the equations: Xl + 3.V2 + -Y3 =?= 8 2-Vi + -V2 + 3a- 3 = 7 •Yl + .Y2 — .Y3 =■ 2. Solution The coefficient matrix A and the modified coefficient matrices Ai, A2, and A3 are obviously: A~ A 3 1" 3 -1. 8' 7 2 A 2 1" 3 -1 and Hence -Yl A I - 12, I Ad . -V2 = 12, I A 2 I = 24, and | A3 I A 2 1 I A 3 1 — -r = 2, x 3 — 7-— 12, so that In the more general case in which m = n, but | A | = 0, the inverse matrix does not exist and so any method using A" 1 must fail. In these cir- cumstances we must consider more carefully what is meant by a solution. In general, when a solution vector X exists whose elements simultaneously satisfy all the equations in the system, the equations will be said to be con- sistent. If no solution vector exists having this property then the equations will be said to be inconsistent. Consider the following equations : xi + x a + 2x3 = 9 4yi — 2x 2 + x 3 = 4 5^1 — X2 + 3xs =» 1. 416 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 These equations are obviously inconsistent, because the left-hand side of the third equation is just the sum of the left-hand sides of the first two equations, whereas the right-hand sides are not so related (that is, 1 ^ 9 + 4). In effect, what we are saying is that there is a linear dependence between the rows of the left-hand side of the equations which is not shared by the inhomogeneous terms. The row linear dependence in the coefficient matrix A is obviously dependent upon the rank of A and we now offer a brief discussion of one way in which the general problem of consistency may be approached. Obviously, when working conventionally with the individual equations comprising (9-35) we know that: (a) equations may be scaled, (b) equations may be interchanged, and (c) multiples of one equation may be added to another. This implies that if we consider the coefficient matrix A of the system and supplement it on the right by the elements of the inhomogeneous vector K to form what is called the augmented matrix, then these same operations are valid for the rows of the augmented matrix. Clearly, the rank will not be affected by these operations. If the ranks of A and of the augmented matrix denoted by (A, K) are the same, then the equations must be consistent; otherwise they must be inconsistent. definition 9-16 (augmented matrix and elementary row operations) Suppose that AX = K, where A = an ai2 . ■ am Xl 021 022 • ■ Ozn , x = X2 a n \ a n i ■ Qnn. Xn and K = 'k{ kz Then the augmented matrix, written (A, K), is defined to be the matrix "on Ol2 • • «21 022 ■ ■ (A,K) a n \ a n 2 a in ki~ «2» k2 Onn . kn_ An elementary row operation performed on an augmented matrix is any one of the following: (a) scaling of all elements in a row by a factor 1\ (b) interchange of any two rows ; (c) addition of a multiple of one row to another row. An augmented matrix will be said to have been reduced to echelon form by elementary row operations when the first non-zero element in any row is a unity, and it lies to the right of the unity in the row above. SEC 9-7 SOLUTION OF SYSTEMS OF LINEAR EQUATIONS / 417 Example 9-20 Perform elementary row operations on the augmented matrix corresponding to the inconsistent equations above to reduce them to echelon form. Find the ranks of A and (A, K). Solution The augmented matrix "1 12 9" (A, K) = 4 -2 1 4 5 -1 3 1_ Subtract from the elements of row 3 the sum of the corresponding elements in rows 1 and 2 to obtain 1 1 2 9 4 -2 1 4 -12 Subtract from the elements of row 2, four times the corresponding elements in row 1 to obtain '1 1 2 9 -6 -7 -32 -12 Divide row 2 by —6 and row 3 by —12 to obtain "112 9 ' 1 7/6 16/3 1 This is now in echelon form and the rank of the matrix comprising the first three columns is 2, which must be the same as the rank of the coefficient matrix A. The rank of (A, K) must be the same as the rank of the echelon equivalent of the augmented matrix which is clearly 3. The general conclusion that may be reached from the echelon form of an augmented matrix (A, K), is that equations are consistent only when the ranks of A and (A, K) are the same. If the equations are consistent, and A is of order (n x n) and the rank r < n, we shall have fewer equations than vari- ables. In these circumstances we may solve for any r of the variables xi in terms of the n — r remaining ones which can then be assigned arbitrary values. theorem 910 (solution of inhomogeneous systems) The inhomogeneous 418 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 system of equations AX = K, where A is of order (n x n) and X, K are of order (« x 1) has a unique solu- tion if | A | 9^ 0. If | A | =0, then the equations are only consistent when the ranks of A and (A, K) are equal. In this case, if the rank r < n, it is possible to solve for r variables in terms of the n — r remaining variables which may then be assigned arbitrary values. Example 9-21 Solve the following equations by reducing the augmented matrix to echelon form : Xl + 3X2 — X3 = 6 8xi + 9x 2 + 4x 3 = 21 2xi + x 2 + 2x3 = 3. Solution The augmented matrix "1 3 -1 61 (A,K)= 8 9 4 21 .2 1 2 3_ Subtract from the elements of row 2, the sum of three times the corres- ponding element in row 3 and twice the corresponding element in row 1 to obtain "1 3 -1 61 2 1 2 3_ Interchange rows two and three to obtain "1 3 -1 6" 2 1 2 3 Subtract twice row 1 from row 2 and divide the resulting row 2 by —5 to obtain ~1 3 -1 6 1 -4/5 9/5 This is now in echelon form and clearly the ranks of A and (A, K) are both 2 SEC 9-7 SOLUTION OF SYSTEMS OF LINEAR EQUATIONS / 419 showing that the equations are consistent. However, only two equations exist between the three variables xi, X2, and X3, for the echelon form of the augmented matrix may be seen to be equivalent to the two scalar equations 4 9 x\ + 3x2 — X3 = 6 and xz — - X3 = -• Hence, assigning xz arbitrarily, we find that 3 7 J 9 4 xi = - — - X3 and x 2 = - + - xz. When the inhomogeneous vector K = 0, the resulting system of equations AX = is said to be homogeneous. Consider the case of a homogeneous system of n equations involving the n variables xi, x 2 , . . ., x n . Then it is obvious that a trivial solution xi — x 2 = • • • = x n = corresponding to X = always exists, but a non-trivial solution, in the sense that not all xu x 2 , . . ., x n are zero, can only occur if | A | = 0. To see this notice that if I A I ^ then A" 1 exists, so that premultiplication of AX = by A -1 gives at once the trivial solution X = as being the only possible solution. Conversely, if | A [ = 0, then certainly at least one row of A is linearly dependent upon the other rows, showing that not all of the variables xi, x 2 , . . ., x n can be zero. When a non-trivial solution exists to a homogeneous system of n equa- tions involving n variables it cannot be unique, for if X is a solution vector, then so also is AX, where A is a scalar. As in our previous discussion, if the rank of A which is of order (« x «) is r, then we may solve for r of the vari- ables xi, x 2 , . . ., x„ in terms of the n — r remaining ones which can then be assigned arbitrary values. theorem 9-11 (solution of homogeneous systems) The homogeneous system of equations AX = 0, where A is of order (n X n) and X, are of order (n x 1) always has the trivial solution X = 0. It has a non-trivial solution only when | A | = 0. If A is of rank r < n, it is possible to solve for r variables in terms of the n — r remaining variables which may then be assigned arbitrary values. If X is a non-trivial solution, so also is AX, where A is an arbitrary scalar. Example 9-22 Solve the equations X\ — X2 + Xz = 2xi + X2 — xz = xi + 5X2 — 5X3 = 0. 420 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 Solution There is the trivial solution xi = X2 = xz — and, since the determinant associated with the coefficient matrix vanishes, there are also non-trivial solutions. The augmented matrix is now (A, 0) = 1 -1 1 0" 2 1-10 1 5-5 which is easily reduced by elementary row transformations to the echelon form 1 -1 1 0" 1-10 .0 This shows that there are only two equations between the three variables jci, X2, and xs, for the echelon form of the augmented matrix is seen to be equivalent to the two scalar equations X\ — X2 + X3 = and X2 — X3 = 0. Hence, assigning xz arbitrarily, we have for our solution xi = and x% = X3 = k (say). A practical numerical method of solution called Gaussian elimination is usually used when dealing with inhomogeneous systems of n equations involving n variables. This is essentially the same method as the one described above for the reduction of an augmented matrix to echelon form. The only difference is that it is not necessary to make the first non-zero element appearing in any row in the position corresponding to the leading diagonal equal to unity. We illustrate the method by example. Example 9-23 Solve the following equations by Gaussian elimination: Xl — X2 — X3 = 3xi + X2 + 2x3 = 6 2xi + 2x2 + X3 = 2. Solution The augmented matrix "1 -1 -1 01 (A,K)= 3 1 2 6 2 2 1 2_ Subtracting three times row 1 from row 2 and twice row 1 from row 3 gives SEC 9-8 EIGENVALUES AND EIGENVECTORS / 421 - 1 - 1 4 5 6 4 3 2_ Subtraction of row 2 from row 3 gives 1-1-1 01 4 5 -2 The solution is now found by the process of 'back-substitution' using the scalar equations corresponding to this modified augmented matrix. That is, the equations •\"1 — -V2 — A'3 = 4.V2 + 5.v 3 = 6 - 2.v 3 = -4. The last equation gives A'3 = 2 and, using this result in the second then gives -Y2 = — 1. Combination of these results in the first equation then gives -V! = 1. It is not proposed to offer more than a few general remarks about the solutions of m equations involving n variables. If the equations are con- sistent, but there are more equations than variables so that m > n, it is clear that there must be linear dependence between the equations. In the case that the rank of the coefficient matrix is equal to n there will obviously be a unique solution for, despite appearances, there will be only n linearly inde- pendent equations involving n variables. If, however, the rank is less than n we are in the situation of solving for r variables xu x%, . . ., in terms of the remaining n — r variables whose values may be assigned arbitrarily. In the remaining case where there are fewer equations than variables we have m < n. When this system is consistent it follows that at least n — m variables must be assigned arbitrary values. 9 -8 Eigenvalues and eigenvectors Let us examine the consequence of requiring that in the system AX = K, (9-40) where A is of order (n x n) and X, K are of order (n x 1 ), the vector K is proportional to the vector X itself. That is, we are requiring that K = AX, where X is some scalar multiplier as yet unknown. This requires us to solve the system AX = AX, (9-41) 422 / LINEAR tRANSFORMATIONS AND MATRICES CH 9 which is equivalent to the homogeneous system (A - AI)X = 0, (9-42) whete I is the unit matrix. Now we know from Theorem 91 1 that Eqn (9-42) can only have a non- trivial solution when the determinant associated with the coefficient matrix vanishes, so that we must have A - XI I = 0. (9-43) When expanded, this determinant gives rise to an algebraic equation of degree n in X of the form X n + ociA"- 1 + <x 2 X n ~ 2 + + *n = 0. (9-44) The determinant (9-43) is called the characteristic determinant associated with A and Eqn (9-44) is called the characteristic equation. It has n roots X\, Xo, . . ., X„, each of which is called either an eigenvalue, a characteristic root, or, in some texts, a latent root of A. Example 9-24 Find the characteristic equation and the eigenvalues corresponding to A = Solution We have A- XI so that I A - XI I = "1 2" "1 0" - X = .3 0. -0 1. 1 - X 2 3 -X 1 - X 3 = ^ _ X - 6. Thus the characteristic equation is A2 - X - 6 = 0, and its roots, the eigenvalues of A, are X = 3 and X = —2. No consideration will be given here to the interpretation that is to be placed on the appearance of repeated roots of the characteristic equation, and henceforth we shall always assume that all the eigenvalues (roots) are distinct. Returning to Eqn (9-42) and setting X = Xi, where Xt is any one of the eigenvalues, we can then find a corresponding solution vector X< which, because of Theorem 9T1, will only be determined to within an arbitrary SEC 9-8 EIGENVALUES AND EIGENVECTORS / 423 scalar multiplier. This vector X« is called either an eigenvector, a characteristic vector or, a latent vector of A corresponding to fa. The eigenvectors of a square matrix A are of fundamental importance in both the theory of matrices and in their application, and some indication of this will be given later in, Section 15-8. Example 9-25 Find the eigenvectors of the matrix A in Example 9-24. Solution Use the fact that the eigenvalues have been determined as being 2 = 3 and X = — 2 and make the identifications fa = 3 and fa = —2. Now let the eigenvectors Xi and X2, corresponding to fa and fa, be denoted by Xi = *i" xi (i) and X 2 = *i l *2 1, Then for the case k = fa, Eqn (9-42) becomes = 0, "(1-3) 2 1 Vx\™' 3 (0 - 3)J |_x 2 (1) . whence -2xi (1 » + 2x2 (1) = and 3xi (1 > - 3x 2 m = 0. These are automatically consistent by virtue of their manner of definition, so that we find that xi ll) = ,V2 (2) . So, arbitrarily assigning to xi a) the value Xl (i) 1 , we find that the eigenvector Xi corresponding to fa = 3 is T Xi = A similar argument for A = fa gives X\ x^ (2)" = 0, "(1 +2) 2 ' _ 3 (0 + 2). whence 3xi (2 > + 2.x 2 (2) = 0. Again, arbitrarily assigning to xi l2) the value xi {2) = 1, we find that X2 i2) = — 3/2. Thus the eigenvector X2 corresponding to fa = —2 is 1" 3 ~2J Obviously ^Xi and /j,X% are also eigenvectors for any arbitrary scalar /u. 424 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 9-9 Linear transformations Any introductory account of matrices would be incomplete were the basic idea of a linear transformation not to be mentioned. Some discussion of this important concept has already been offered in Section 9-1, and we now develop the idea a little further. Indeed, to recapitulate briefly, it was ex- plained there how a linear transformation is just a simple form of mapping of the points of one plane into the points of another. This idea is still useful when a matrix vector X of order (n X 1) is mapped by a matrix transforma- tion into what is called its image X under the transformation. In this context the elements of X are usually considered to be the components of a vector in an n-dimensional space, so that X then specifies a point in that space, and X is its image point under the linear transformation. We propose to work with the following straightforward definition of such a transformation. definition 9T7 (linear transformation) A general linear transformation or point transformation of the vector X of order (n X 1) into the image X of order (n X 1) is defined to be a transformation of the form X = AX + K, where the coefficient matrix A is of order (« X n) and the vector K is of order (n X 1). The special case considered in Section 9T involved a mapping of points of the plane brought about solely by a rotation of the plane through an angle 6 about the origin. In that case the transformation corresponded to K = 0, and A = "cos 6 —sin sin 6 cos (9-45) This matrix is called an orthogonal matrix because it has the property that A' = A -1 , and it is representative of a very important class of square matrices. The first row of A is seen to contain the direction cosines of Ox' with respect to Ox and Oy, whilst the second row contains the direction cosines of Oy' with respect to Ox and Oy. More generally, consider the rectangular axes 0{xi, X2, X3} which are arbitrarily rotated about origin O to form the axes system 0{xi', X2', X3'}, in which the direction cosine of Ox/ with respect to Oxj becomes j>y. Then the matrix Vll V\2 V13 vzi V22 V2Z vzi V32 V33_ (9-46) SEC 99 LINEAR TRANSFORMATIONS / 425 is strictly analogous to matrix (9-45), and it is easily seen that X and X are related by X = AX. (9-47) In the special case that the rotation is only about the ^3-axis through an angle 8 in the sense shown in Fig. 9-1, then ri3 = i>3i = V32 = V23 = and V33 = 1, and A = cos u sin 6 sin 6 0" cos 6 1_ (9-48) When discussing an application of a linear transformation to the theory of elasticity in the next section we shall have occasion to refer to this matrix again. Aside from the rotation transformation characterized by Eqns (9-46) and (9-47) there are three other simple transformations worthy of note and these are listed below. It is left as an exercise for the reader to verify their main properties when related to the plane which give rise to their names. 1. The identity transformation This is the transformation X = X, and it corresponds to the case K = and A = I. Under this transformation X and its image X are coincident. 2. The translation transformation This is the transformation X = X + K, and it corresponds to an arbitrary non-zero vector K and A = I. The effect of the transformation is to translate X to its image X, without rotation or change of scale. 3. Dilatation transformation This is a transformation X = AX, in which A is a non-singular diagonal matrix. Its effect when mapping X into X is to change the scale of the different elements of X without translation or rotation. In the special case that all the diagonal elements are equal say to X, where A > 1, its effect is one of a magnification of X. Example 9 26 If x X = y. x = x y'A and A = '5 0" 2 deduce the image of the curve y = sinh x under the transformation X = AX. Solution We have 426 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 X "5 0" x~ y. .0 2 .y. j so that x' = 5x and y' = 2y. Thus the image curve of y = sinh x is given parametrically by x = 5x and y' = 2 sinh x or, equivalently, by ; 2 sinh (x'/5). 9-10 Applications of matrices and linear transforma- tions It is the object of this final section to indicate a few of the diverse applications of the work of this chapter. Of necessity, we will be able to do no more than outline this large and fruitful field of study, and for our first example we look to the notion of rank to enable us to prove an important result in dimensional analysis known as the Buckingham Pi theorem. 910(a) Application of rank to dimensional analysis — Buckingham Pi Theorem In many branches of engineering and science, a valuable method of approach to difficult problems is via the method of dimensional analysis touched on briefly at the start of Chapter 5. In essence, this method seeks first to char- acterize a physical situation by forming dimensionless groups from the variables involved, and then to determine the functional relationships which relate these dimensionless groups. Our contribution will be to the first part of this process, for we shall determine how many dimensionless groups exist. Let us suppose that a physical situation is described by n variables Ki, W2, . . ., u„, each of which corresponds to a physical quantity. Suppose also that each of these quantities is capable of expression dimensionally in terms of length [L], mass [M], and time [T], and that m has dimensions [L} ai [M} bi [TJ\ Then the product of powers mi* 1 uj* . . . u n \ (9-49) where k\, k^, . . ., k n are real numbers, must have dimensions Such products of powers will be dimensionless, in the sense that they are SEC 9-10 APPLICATIONS OF MATRICES / 427 pure numbers having dimensions [L]°[M]°[7T, only if aiki + a 2 k% + • ■ ■ + a n k„ = biki + b 2 k 2 + • ■ • + b„k n = c\k\ + c 2 k 2 + • • • + c n k n = or, equivalently, if fli a 2 bi b 2 ci c 2 ~k{ an k 2 b n Cn_ _k n _ = 0. (9-50) Now if the rank of the coefficient matrix of order (3 x n) in Eqn (9-50) is r, then we know from the work of Section 9-7 that it is possible to express n — r of the variables k\, k 2 , . . ., k„ in terms of the remaining r variables. That is to say, it will be possible to form n — r dimensionless quantities 7ti, 7T 2 , . . ., TTn-r from the n variables u\, u 2 , . . ., u n . The dimensionless variables 77, are called Pi-variables. Hence we have proved the following result. theorem 9T2 (Buckingham Pi theorem) Let a physical situation be capable of description in terms of n physical quantities wi, u 2 , . . ., u n , where ut has dimensions [L] ai [M] 6i [77\ Then, if r is the rank of the matrix «1 «2 b\ b 2 a c 2 a n b n the physical situation is capable of description in terms of n — r dimension- less variables m, tt 2 , . . ., ir n - r formed from the variables m, u 2 , . . ., u„. This is best illustrated by example. In the slow viscous flow of a fluid between parallel planes, some functional relationship of the form V=f{k,d, n ) exists between the average flow velocity V, the pressure gradient k along the flow, the distance d between the planes and the viscosity tj. The dimensions of these quantities which will form the matrix in the Buckingham Pi theorem are shown in the table below: 428 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 V k d n L 1 -2 1 -l M 1 l T -1 -2 -i The rank of the (3 x 4) matrix whose elements comprise the entries in this table is 3, as may be seen, for example, by using elementary row operations to reduce it to its echelon equivalent "1 -2 1 -r 1 1 in which the determinant formed from the first three columns is non-zero. Thus, from the conditions of the theorem, the number of v variables is 4 — 3 = 1. A dimensionless grouping in this case is kd 2 jr]V, and any product of powers of the form shown in (9-49) must be a power of this one dimension- less group. Hence this physical problem is capable of description in terms of the one dimensionless grouping -n = kd 2 lr)V. As the velocity profile across the flow only depends on the distance x from one of the walls, our result implies that all such flows will be characterized by one curve describing the variation of -n with x/d. 9-10 (b) Differentials as linear transformations We now consider a generalization of the total differential as described in Theorem 519 and subsequently used in Theorem 5-22. Let us suppose that «1 = /l(*l, X2, . . ., x„) M2 = fc(xi, x 2 , . . ., x„) (9-51) Itn =fn(xi, X2, . . ., X n ) then it follows from Theorem 5T9 and the properties of matrices that ~ 8 A d A 8 A~ dx\ dx% ' ' dx n dwi dw 2 du n a/2_ 8x\ 8x2 8fi_ 8X U 3/» 8x\ 3/» 8x2 8x„ dxi d*2 dx„ (9-52) SEC 9-10 APPLICATIONS OF MATRICES / 429 This can be written du = A dx (9-53) by identifying du, dx with the (n x 1) column vectors in Eqn (9-52) in the obvious manner, and A with the (n X n) matrix of partial derivatives. Viewed in this light, Eqn (9-53) may be seen to be a local linear transformation mapping dx into du. The adjective local is used here because the transforma- tion will only be a linear transformation when A is a constant matrix, and as the elements of A are functions of x\, X2, . ■ ., x n , they can only be approxi- mated by constants in the neighbourhood of any fixed point P with co- ordinates {xi F , X2 P , . . ., x n v ). For different points P, the transformation A will be different, showing that Eqn (9-53) represents a more general type of transformation than a general linear point transformation. Transformation (9-53) will be one-to-one provided that A -1 exists, for then a unique inverse mapping dx = A- 1 dx (9-54) will exist. The condition for this is, of course, that | A | + at the point P. This will be recognized as the non-vanishing Jacobian condition already encountered in Chapter 5. Fig. 9-2 Spherical polar coordinates. By way of example, consider the relationship between the spherical polar coordinates (r, <f>, d) and the Cartesian coordinates (x, y, z) illustrated in Fig. 9-2 and described by x = r sin 6 cos <f> y = r sin 6 sin <f> z = r cos 6. Making the identifications u\ = x, ui = y, u% = z, and xi = r, x% = 6, 430 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 X3 = <f>, a simple calculation shows that Eqn (9-52) will take the form dx' dy dz sin 6 cos <f> r cos d cos <j> — r sin d sin <£" sin sin <^ r cos sin ^ r sin 6 cos <£ cos 6 —r sin "dr d0 (9-55) Denoting the square matrix in Eqn (9-55) by A, it is easily established that the Jacobian determinant | A | = r 2 sin 0. Calculating the inverse matrix A -1 and using it to deduce the inverse mapping we have, provided r 2 sin =£ 0, that dr' d0 J4>. r<- sin i r 2 sin 2 cos <f> r 2 sin 2 sin <f> r 2 sin cos 0" r sin d cos 9 cos <£ r sin 6 cos sin <£ — r sin 2 — rsin<£ rcos<f> dx d/ .dz. (9-56) 910 (c) Linear transformation of the stress tensor In the mathematical theory of elasticity it is useful to introduce the concept of the stress vector associated with any plane element of area within a solid body. The magnitude of the stress vector is the force per unit area acting on that plane element of area, and its sense is the sense of the force which is exerted on that element located at point P, say, by the surrounding material. In a solid, unlike a liquid, this force depends on the orientation of the element of area, and it is convenient to describe the situation at point P by considering elements of plane area normal to each of the unit vectors xi, X2, X3 of a rectangular Cartesian system 0{x\, X2, X3}. If the components in the xi, X2, and X3 directions of the stress acting on the element of area with x* as its normal are t*i, t*2, and t* 3 , then the complete information concerning the components of stress acting on all three mutually orthogonal elements of area at P will be contained in the following table : Stress Components at P 1 2 3 Surface Normal to xi Til T12 T13 Surface Normal to X2 T21 T22 T23 Surface Normal to x% T31 T32 T33 In general there will be a different table of this type for each point P in the solid. SEC 9-10 APPLICATIONS OF MATRICES / 431 The matrix T defined by Til T12 T13 T = T21 T22 T23 _T31 T32 T33. is called the stress tensor at the point P, and it is fundamental to the develop- ment of the mathematical theory of elasticity. Let us now indicate how the stress tensor transforms when axes centred on P are rotated, since this is a situation of considerable practical importance, being related to the deter- mination of the directions of minimum and maximum stress at any point in a solid body. Fig. 9-3 Rotation 6 about *3-axis. For this purpose we shall assume that no external moments act on the body, for then it can be shown that T is symmetric. In addition, we will set T13 = T23 = T33 = which characterizes what is called a. plane state of stress, since all the forces then lie in the (xi, X2)-plane. The appropriate rotation matrix A relating the system 0{xi, xi, x%} to 0{xi', x 2 ', *3'} when a rotation 6 about the X3-axis has been made is that given in Eqn (9-48) (see Fig. 9-3). Hence, setting F = AT, (9-57) then the elements of row i of F will contain the components of the trans- formed force vector acting on the element of area with Ox/ as normal. To relate this result to the stress components t</ relative to the new axes 0{xi', X2', X3'}, we must use the fact that rt/ is equal to the projection of the force acting on the element of area normal to Ox/ along Ox/. To achieve this result by matrices we must post multiply A by the transpose F' of F. This is so because row 1 of A contains the direction cosines of axis Ox/ and row j of F contains the components of the force acting on the element of area with Ox/ as normal, and the rule for matrix multiplication is 'rows into 432 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 columns'. Thus if T is the transformed stress tensor, then f = AF' = AT A', but as T is symmetric, T' = T, giving f = ATA'. Using the fact that T = Til T12 0" T12 T22 .0 0. (9-58) (9-59) when evaluating the indicated matrix products in Eqn (9-59) then shows that the stress components of the transformed stress tensor are Tn' =* tii cos 2 6 + T22 sin 2 — 2ti2 sin cos d T22' = tii sin 2 6 + T22 cos 2 6 + 2ti 2 sin 6 cos (9-60) T12' = (th — T22) sin cos 6 + ri2(cos 2 — sin 2 0) with T13' = T23' = T33' = 0. These results form the basis of many important studies involving plane stress in solids on which no external moment is acting. PROBLEMS Section 91 91 Suggest two physical situations in which the outcomes may be displayed in the form of a matrix. 9-2 Find the sum A + B and difference A — B of the matrices " 1 2 3 4 " " 2 3 1 2 " A = 2 12 2 .12 0. , B = 2 2 . 1 -2 1 1 . 9-3 Evaluate the following inner products: "1" "2" " 2" (a) [2 113] 2 2 ; (b) [1 -2 7 4] 3 ; (c) [2 -1 3 1] -1 3 -1- -1- - 1. 9-4 Evaluate the following matrix products: 3 12" "1 2" 1 -1 12 2 2 ; (b) 1 1 1 0_ -1 1- (a) 9-5 State which of the following forms of matrix product are defined and, where appropriate, give the shape of the resulting product matrix: PROBLEMS / 433 -1 2 1 3" ~-i r 2 2 1 -1 1 -1 1 1 1 1. - 3 2- (a) (7 x 3)(3 x 9); (c) (1 x 9)(9 x 1); 9-6 If the matrices I, A, and B are given by (b) (5 x 3X2 x 3); (d) (3 x 1X1 x 4). 1 = "1 0" "2 1 3" 1 , A = 1 2 1 .0 1_ .5 1 4. and B = 1 2 1 -1 3 1 2 1 2 1 show that (a) IA = AI = A; (b) I B = B but that B I is not defined. 9-7 Give an example of matrices A and B for which: (a) the product A B is defined but the product B A is not; (b) the products A B and B A are both defined but are matrices of different order ; (c) the products A B and B A are both defined and are the same order as A and B, but they are not equal. 9-8 Display each of the following sets of simultaneous equations in matrix form: (a) 2x + 4/ + z = 9 x - 3y + 2z = -4 x + y — z = 1, (b) w + 2x - y = 4 x - 3y + 2z = - 1 2w + 5x - 3z = Aw — y + Az = 2, (c) 3w + x-2y + 4z=l w — 3x + y — 3z = 4 w + Ix + 2y + 5z = 2, (d) 2x + y — z = Ax 3x + 2y + Az = Ay x - 3y + 2z = Iz. 9-9 Let matrices R„ and R^, be defined as follows: cos 8 -sin 81 r CO s 4. -sin f ■ „ „ and R * = _sm cos 0J v [sin <f> cos <f> R„ = 434 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 and let X = ~x~ , X' = V" -y- -/- and X' -El- Then, if X' = R S X and X" = R^X', show by matrix multiplication and use of trigonometric identities that R#[RflX] = Rg+^X. Interpret this result geometrically. Section 9-2 9-10 Construct the matrices of order (3 X 2) whose general element atj has the form : (a) aij = / 2 + f - 2ij; (b) at) «= sin iO cosjd. 9-11 State which of the following pairs of matrices can be made equal by assigning suitable values to the constants a, b, and c. Where appropriate, determine what these values must be. fa) (b) "1 2 1 0" "1 2 1 0" 3 a b 2 and 3 1 2 2 » 1 2 c 1_ .1 2 4 1. 1 5 a 2" "1 5 1 2" 2 «2 3 b and 2 4 3 4 > .4 3 2 c_ .4 3 2 1. 1 (a + b) 3 " "1 4 3" (fl + c) 2 4 and 2 4 1 2 (b + c)_ .1 2 2. (c) 9-12 Find the numbers a, b, c, and d in order that the following matrix equation should be valid: '6 4 6" 6 -1 -2 .3 6 6. 9-13 Use Definitions 9-3 and 9-4 to prove that if ?., n are scalars and matrices A and B are conformable for addition, then (a) A(A + B) = M. + AB, (b) AA + M = (A + /*)A. 9-14 Determine 3 A + 2B and 2A — 6B given that "2 - 2a 1 5" "2 3 r 3 2 -b + 3 </ 4 = 3c 4 1. .3 2 5. _ ri 37' ~ L2 -1 6 and B = 1 *]• 3 2j PROBLEMS / 435 915 If and D '2 1 3 2 1 B = 1 1 0" 1 1 3 1 "2 3 4~ , c = 1 5 6 find the matrix products A B and C D. 9-16 This example shows that the matrix product A B = does not necessarily imply either that A = or that B = 0. If, T 1 -1 11 -3 2 -1 -2 1 oJ and B = "1 2 2 4 .1 2 find A B and B A and show that AB^BA. 9-17 Show that the matrix equation AX = K, where A = "i 3 r ~Xl~ l i 2 , x = X2 .2 2 0. . x ^ and K = may be solved for x u x 2 , and x 3 by pre-multiplication by B, where -i i B = -i - i -iJ 9 18 Use matrix multiplication to verify the results of Theorem 9-2 when A B and C are of the form ' ' "1 3 2" --1 2 r ~-2 2 1 1 4 , B = 3 -2 -l , and C = 2 4 .2 3 1_ . 1 4 2_ .13 1 9-19 If A is a square matrix, then the associative property of matrices allows us to write A" without ambiguity because, for example, A 3 = A(A A) = (A A)A. If "cosh x sinh x ^sinh x cosh * use the hyperbolic identities to express A* and A 3 in their simplest form and use induction to deduce the form of A". 436 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 2 5 7 9" ; (0 4 3 1_ 4 2 19' 2 4 9-20 Transpose the following matrices: (a) [1 4 17 3]; (b) (d) 9-21 Use Definition 9-7 and Theorem 9-3 to prove that: (a) the sum of a square matrix and its transpose is a symmetric matrix; (b) the difference of a square matrix and its transpose is a skew-symmetric matrix. Illustrate each of these results by an example. 9-22 Verify that (A B)' = B' A', given that "-4 2" 3 3 2 -1 -2" 1 ; (e) "4" 3 1 0. -0- A = 1 4 7 9 -3 1 and B = 3 1 -5 6 9-23 If a matrix A contains complex numbers as elements it is said to be a complex matrix. Its complex conjugate is denoted by A* and is defined to be the matrix obtained from A by replacing each element by its complex conjugate. Show from this and the definitions given in the text that : (a) (A*)* = A; (b) (A±B)* = A* ±B*; (c) (/<A)* = //A*, where /i is any complex number and ,« is its complex conjugate. 9-24 Find the complex conjugates of the matrices A and B, where A = 1 2 + 'i , „ r ' J - 2i i and B = - 2/ i J 11 + i 1 + i J and, taking /i = 1 — i, use them to verify the results of the previous problem. Section 9-3 9-25 Evaluate the determinants v (a) 1 3 1 2 ; (b) 2 5 ; (c) 4 7 1 3 7 -5 -5 / 9m Without expanding the determinant, prove that L 1 + a\ a\ ax «2 1 + a2 a% 03 03 1 + «3 = (1 + a\ + 02 + 03). PROBLEMS / 437 9-27 Use Theorem 9-4 to simplify the following determinants before expansion : (a) (0 A| = A| = 42 61 50 3 2 4 6 5 2 1 5 5 17 56 4 1 7 (b) |A| = 9 16 2 9-28 Without expanding prove that + — _ x 2 + oi 2 0102 aids i- a?fX\ X 2 + a% 2 0203 asai dllgh x 2 + as 2 9-29 Show without expansion that a 2 b 2 c 2 a b c 1 1 1 = x\x 2 + ai 2 + a 2 2 + A3 2 ). = (a - b)(a - c)(b - c). This determinant is called an alternant determinant. Illustrate the result by means of a numerical example and verify it by direct expansion. 9-30 Prove that sin (x + Jit) sin x cos x | A | = sin {x + Jit) cos x sin x 1 a 1 - a is independent of a, and express it as a function of x. 9-31 Find the minors My and cofactors Ay of each element a i} in the matrix i -! y 9-32 If A is an arbitrary matrix of order (3 x 3) with general element a« and co- factor At], show by direct expansion that: (a) aiiAm + ai2A3?. + ai3A33 = 0; (b) Ol3Al2 + £23/422 + 033/432 = 0. 9-33 Use the Laplace expansion theorem to expand determinant (b) in Problem 9-25 first in terms of elements of the third row, and then in terms of elements of the third column. 9-34 If A is a matrix of order (3 x 3) and A' is its transpose, prove by direct expansion that I A I - I A' I. 438 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 Use the Laplace expansion theorem to prove that this result is true for a square matrix A of any order. 9-35 Verify by direct expansion that for any square matrices A and B of order (2 X 2): |AB| = 1A[|B|. This result is, in fact, valid for square matrices A and B of any order. Section 9-4 9-36 Which of the following sets of vectors are linearly independent, and where linear dependence exists determine its form: (a) Ci (b) Ri = [1 9 -2 14], R 2 =[-2 -18 4 -28]; "3 _ " 0" " 0" , C 2 = -7 , C3 = .0. 0. .15. "2" "1" "1" "5" 1 , C2 = 1 , C3 = 2 , C4 = 6 .0. .2. .1. .4. (c) Ci = 9-37 Test the following matrices for linear independence between their rows or columns: (a) 1 2 -1 0" " 2 3 r "1 2 1 5 - 2 3 1 1 1 1 2 ; (b) -2 -3 1 -1 2 -2 ; (c) 2 12 10 2 1 1 2 3- — 1 -2 2 0- -5 3 7 7. 9-38 Find the rank of the following matrix: 2 1 4 3' A= -1 2 4-2 6 7 -4 -12 14 -12 9-39 Construct an example of a matrix of order (4 x 3) which is (a) of rank 2, and (b) of rank 3. Section 9-5 9-40 Show that adj A = A when -_4 _3 _3" A= 1 1 .443. 9-41 Find the matrix adjoint to each of the following matrices: PROBLEMS / 439 (a) 9-42 Set '1 2 3' 2 3 2 3 3 4 (b) 1 2 3' 1 3 4 1 4 3 (c) a b c d caca-ca and equate corresponding elements to determine the inverse of ca- 9-43 Find the inverse of " 3 -2 -1" A = -4 1 -1 .201. Verify that: (a) A" 1 A = AA- 1 = I; (b) (A-i)-i = A. 9-44 Given that A and B are "1 2 r "1 -1 A = 1 4 2 and B = 2 _0 3 2. .1 verify that (A B)- 1 = B 1 A" 1 . Section 9-6 9-45 Find dA/df and determine the largest interval about the origin in which it is defined, given that AW = ~2t 3 tanr cosfl .3 4-f« 1 + /J' •46 Given that A(0 = "cosh t sinh /") _sinh t coshtj It < 2 J verify results (d), (f), and (g) of Theorem 9-9. 9-47 Show that for the matrix "cosf — sinr" _sin t cos /_ it is true that (d/d^A 2 = 2A(dA/dO, but that this is not true for the matrix A(/) = 440 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 A(0 L2 /tj" 9-48 Show that if A(f ) is a non-singular matrix, then At At Verify this result when fcos t — sin r~l A= . |_sin / cos tj Section 9-7 9-49 Solve the following equations using Cramer's rule: Xl + X2+ X3= 7 2xi — X2 + 2x3 — 8 3xi + 2*2 — xa = 11. 9-50 Solve the equations of the previous example using the inverse matrix method and compare the task with the previous method. 9-51 Solve the following equations using Cramer's rule: Xl — X2 + X3 — Xi — 1 2xi — X2 + 3X3 + X4 = 2 Xl + JC2 + 2X3 + 2X4 = 3 Xl + X2 + X3 + -X4 = 3. 9-52 Write down the augmented matrix corresponding to the equations : 2xi — X2 + 3X3 = 1 3xi + 2x2 — X3 = 4 xi — 4x2 + 7x3 = 3. Show, by reducing this matrix to its echelon equivalent, that these equations are inconsistent. 9-53 Write down the augmented matrix corresponding to the equations: 3xi + 2x2 — X3 = 4 2xi — 5x2 + 2x3 = 1 5xi + 16x2 — 7x3 = 10. Show, by reducing this matrix to its echelon equivalent, that these equations are consistent and solve them. 9-54 Solve the following equations, in which a is an arbitrary constant, by reducing the augmented matrix to echelon form : Xl + <XX2 + <XX3 = 1 OXl + X2 + 2aX3 = —4 axi — 0x2 + 4x3 = 2. Consider the effect of a on the solution. PROBLEMS / 441 9-55 Solve the following homogeneous equations, in which a is an arbitrary con- stant, by reducing the augmented matrix to echelon form: <xxi — xi — X3 = — Xl + aX2 — xs = — Xl — X2 + 0CX3 = 0. Consider the effect of a on the solution. 9-56 Solve the following equations using Gaussian elimination : l-202;ci - 4-371*2 + 0-651x3 = 19-447 -3-141xi + 2-243x2 - l-626x 3 = -13-702 0-268x1 - 0-876x2 + l-341x 3 = 6-849. 9-57 Discuss briefly, but do not solve, the following sets of equations: (a) xi + x 2 = 1 (b) xi + x 2 = 1 2xi - x 2 = 5; 2xi - x 2 = 5 xi — X2 = 0; (C) Xl + X2 = 1 (d) Xl + X2 - X3 = 2xi - x 2 = 5 2xi - x 2 - 5x3 = 0. — xi — 2x2 = 0; Section 9-8 9-58 Write down the characteristic equations for the following matrices: (a) A = "1 4" 1 1 ■ (b) A = "1 2" 2 11. .0 2 1_ 9-59 Find the eigenvalues and eigenvectors of A = ' 1 -2 -r 0. 9-60 Prove that the eigenvalues of a diagonal matrix of any order are given by the elements on the leading diagonal. What form do the eigenvectors take. Section 9-9 9-61 Verify that the matrix A in Eqn (9-46) is orthogonal, and justify the assertion that X = A X describes the effect of a general rotation of the rectangular cartesian axes 0{xi, X2, X3}. 9-62 Justify the name reflection transformation of the plane when applied to a transformation of the form X = AX, where either A = -1 01 .0 -lj or A = -1 01 lj 442 / LINEAR TRANSFORMATIONS AND MATRICES CH 9 9-63 Show that if X = AX, Where A = "cos 6 _sin 6 —sin 6' cos 9_ then XX = ■- XX, where the prime signifies the transpose operation. Interpret this result geo* metrically. 9-64 If X = ~x~ , x = ~x~ .y. J- „ , and A = "_1_ _ J_ V2 V2 1 _1_ -V2 V2J deduce the image of the curve y = x 2 under the transformation X = AX. Is the shape of the curve changed ? 9-65 If X = deduce the image of the curve y — x 2 '+ 2x + 1 under the transformation X = AX. Describe the effect of the transformation in geometrical terms. x~ , x = ~x~ , and A = "-3 0' J- J- •0 3. Section 9- 10 9-66 How many dimensionless groups of variables (w variables) characterize a physical situation described by: (a) the four physical quantities: work (L 2 MT~ 2 ), viscosity (L^MT' 1 ), pressure (,L~ l MT~ 2 ) and mass transfer rate (Mr 1 ); (b) the five physical quantities: length (L), viscosity (L^MT' 1 ), velocity (LT- 1 ), area (Z. 2 ) and pressure (L^MT' 2 ). 9-67 Express in matrix form the relationship between the differentials dx, dy and d«, dv, given that u = sinh (x 3 + y 3 ), v = cosh (x 3 — y 3 ). For what values of x and y does this transformation fail to have an inverse? PROBLEMS / 443 9-68 Given that u = x 2 + ly + 1, v = X s - Ixy + y 3 and p = sin (u + v), q — cos (« — v), display in matrix form the relationship between the differentials dx, dy and d«, dv and between d«, dy and dp, dq. Use matrix multiplication to express directly the relationship between the differentials dx, dy and dp, dq. 9-69 Justify the matrix equations (9-55) and (9-56). 9-70 Verify that the square matrix in (9-55) is an orthogonal matrix. 9-71 Perform the calculations required in (9-59) to give the transformed stress tensor components (9-60). Functions of a complex variable 10-1 Sequences of complex numbers and limits When considering a definition of a sequence {z n } of complex numbers, we should first examine to what extent the work of Chapter 3 on sequences of real numbers is still relevant to complex sequences. It will obviously be necessary to formulate new definitions, and this will be our next task. However, since a sequence {u„} of real numbers is just a special case of a sequence {z„} of complex numbers, any new definitions must be compatible with the corresponding situations in Chapter 3 when related to real sequences. Therefore, the behaviour of sequences of complex numbers will be directly determined by the behaviour of the sequences of real numbers that may be formed by considering separately the real and imaginary parts of {z n }. Thus if z n = [1 + (l/«)] + /(l/« 2 ) we would need to consider the two real sequences {1 + (l/«)} and {l/« 2 } associated with {z n }. Here we must note that expressions such as 'monotonic', 'finitely oscil- lating', and 'bounded above' cannot be applied to sequences of complex numbers as they cannot be ordered like the real numbers. definition 10-1 (limit of complex sequence) The infinite sequence {z„} of complex numbers z n = x n + iy n will be said to converge or tend to the limit y = /j, + iv if, and only if, for every e > there exists a number TV, such that for n> N, | y — z n I < e. When the sequence {z n } is convergent to y in this sense we shall write lim z n — y. This definition is easily seen to reduce to Definition 3-3 when applied to a sequence of real numbers, for then the complex modulus and the absolute value become identical in meaning. The essential difference between Definitions 3-3 and 101 is embodied in the following theorem. theorem 10-1 (conditions for convergence) Let {z n } be an infinite sequence SEC 10-1 SEQUENCES OF COMPLEX NUMBERS AND LIMITS / 445 of complex numbers z n = x n + iy n - Then necessary and sufficient conditions for lim z n = y, n— *°o where y = ju + iv, are that lim x n = n and lim y n = v. Proof A paraphrase of this theorem would be that if {z„} converges to y, then the sequence of the real parts of {z„} converges to the real part of y and the sequence of the imaginary parts of {z n } converges to the imaginary part ofy. To establish the necessity of the conditions of the theorem suppose that for some positive number e, \ y — z n | < e for n > N. Then I V ~ z n | = | H + iv - (x„ + iy n ) \ = | (fi - x„) + i(v - y n ) \, amd so by the definition of the modulus of a complex number, (y - -n) - [(// - xn) 2 + (v- y n y}v*. Neglecting first the positive term (/i - x n ) 2 , and then the positive term — }'n) 2 , shows that I y — z n | > | fi - x n | and \y - z n \ >\ v - y n \. Hence | p - x n | < e and | v - y n \ < e for n > N showing, by virtue of Definition 3-3, that lim x n = fi and lim y n = v. n-»oo n— cc The sufficiency of these conditions is almost immediate. If lim x n = /J, and lim y n = v, then for any positive e choose N such that \ p — x n \ < e and | v — y„ \ < e for n> N. Then, as \y- Zn \ = [(fl- X„) 2 + (v- J„)2]l/2, it follows that I y - z n | < V(2« 2 ) = £^2. This establishes our result because e was arbitrary and so \y — z n \ can always be made arbitrarily small by a suitable choice of e. The fact that a sequence of real numbers can only have one limit implies the uniqueness of p and v, and hence the uniqueness of y. Consequently we have arrived at the following result. Corollary 10-1 If the sequence {z„} of complex numbers is convergent, then 446 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 it only has one limit. Example 10-1 Examine each of the following sequences {z n } of complex numbers for convergence and, where appropriate, determine the limit. (a) z„ = I 1 + -I + *' sin y 2 (c) z n = n sin - + iV n W( n + 6) — \/n]. n Solutions We shall obtain our results by means of a direct application of Theorem 10-1. (a) Making the identifications 1 mr x n = 1 + - y n = sin — n 2 we see that limx» = 1, whereas the sequence {y n } has no limit since y n n— *-oo assumes successively only the three values 1, 0, and — 1. Hence the sequence {z n } does not converge and so has no limit. (b) Making the identifications 2n + 1 /n-\Y -W x n - -3^-, y we see that lim x n = f and lim y n = 1 . n—*oo n~*<x> Hence the sequence {z„} converges and lim z n = f + /. n— *-oo As the numbers f and 1 are not members of their defining sequences {x n } and {y n }, the complex limit y is not included as a member of the sequence (c) Make the identifications . 2 x n = n sin -, j„ = \/nW(n + 6) — \/n]. Then lim ;r„ = 2 SEC 10-1 SEQUENCES OF COMPLEX NUMBERS AND LIMITS / 447 and lim y„ = lim \/n . \/n = lim« 1 + = 3. Thus the sequence {z n } converges and lim z n = 2 + 3i. n-*oo For the same reason as in (b) above, the limit 2 + 3/ is not a member of the sequence {z n }. Arguments essentially similar to those given in Theorem 101 establish results from complex sequences that are strictly analogous to those of Theorem 3-1. We state them below without proof. theorem 10-2 If it can be shown that {w„} and {z„} are two convergent sequences of complex numbers with lim w n = X and lim z n = y, then n— f oo n— *■ oo (a) wi + z\, wz + Z2, wz + Z3, . . . is a sequence such that lim(w„ + z„) = X + y; n—*oo (b) w\Z\, W2Z2, W3Z3, ... is a sequence such that lim w n z n = Xy ; n— *oo (c) provided y =£ 0, w\\z\, wzjzz, W3/Z3, ... is a sequence such that Km (5) - X,r. Example 10-2 If w„ = [«(1 + /)/(« + 1)] and z n = (1/n) + [(n 2 + 1)/ (2« 2 + 3)]/, find (a) lim (w n + z„); (b) lim (w n z n ); and (b) lim (w n jz n ). n-*oo n-*oo n—>oo Solution By inspection we have lim w„ = 1 + i and lim z» = \i. n— >co n~+oo Hence by Theorem 10-2, (a) lim (w n + z») = (1 + i) + Ji = 1 + ^; n— *-co Z (b) lim (w n z n ) = (1 + 0J/ = 10' - 1); 448 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 (c)lta (S)_l+.'_ 2 _ 2l n— oo \ z n I f« If the terms of a sequence are plotted as points in the complex plane, Theorem 10-1 and its Corollary imply that when that sequence is convergent, it will have the property that with increasing n its points will cluster ever closer to the one point that represents its limit y. In other cases it may happen that although a sequence is not convergent, nevertheless when its terms are plotted in the complex plane they cluster around two or more distinct points. By analogy with sequences of real numbers, these points will be called limit points of the complex sequence, and for their definition they require the notion of a neighbourhood of a point. Accordingly, we shall use the term a neighbourhood of the point £ in the complex plane to mean the interior of any circle centred on £. This idea enables us to define a limit point. definition 10-2 (limit point) The point £ will be called a limit point of the sequence {z n } of complex numbers if every neighbourhood of £ contains at least one point of {z„} other than the point £ itself. It is an immediate consequence of this definition that every neighbourhood of a limit point £ of {z„} contains an infinite number of points of {z„}. We again emphasize that Theorem 10-1 together with its Corollary imply that a convergent sequence {z n } of complex numbers can have only one limit point. Example 10-3 Identify the limit points of the sequence {z n } where z ._ ( 2 _ I) +/(_!). (l + l 8in ^). Solution Make the identifications 1 / 1 . mr\ x„ = 2-- and y„ = (~\) n 1 + - sin— • n \ n 2 1 Then {.\ n } converges to the limit 2 and thus has one limit point, whilst {y n } does not converge but has the two limit points 1 and — 1. Hence the sequence {z„} has the two limit points 2 + i and 2 — /. 10-2 Curves and regions The notions of a curve and a region in the real plane may be immediately extended to the complex plane. As a closed and not necessarily smooth curve is a connected set of points which serves to de-limit two areas of the plane, which we shall call the interior and exterior regions relative to that curve, we SEC 10-2 CURVES AND REGIONS / 449 ought first to define a curve C in the complex plane. It is frequently convenient to give a parametric representation by expressing C as the set of points z = x(s) + iy(s) fora<s<b, (10-1) where x(s) and y(s) are continuous real functions of the parameter s. It should be apparent from Section 2-5 and subsequent work that the require- ment of continuity for the real functions x(s), y(s) will ensure that C is a continuous curve (that is, unbroken), but that it does not necessarily possess a tangent at every point. As a simple illustration C might be a rectangle, for then tangents would not be defined at the corners though the curve would be continuous everywhere. We shall return to these general matters later when a continuous function of a complex variable has been defined. For conciseness let us henceforth call such curves C, continuous curves. For a less trivial example, suppose that the curve C in the complex plane is defined by z = x(s) + iy(s), where x(s) = sin s for — \n < s < — . [sin 2 s for— in < s < I y(s) = for \n < s < 3tt Fig. 10-1 Continuous curve C having no tangent defined at P and Q. Then it is readily seen that C is the continuous closed curve comprising the parabola y = ** in the interval - 1 < x < 1, together with the points of the line y = 1 common to that same interval. The curve C is shown in Fig. 10-1 and it is continuous everywhere, though it is not smooth everywhere for no tangent can be defined at points P and Q. The darkly shaded area in that 450 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 Figure comprises points which are interior relative to C and form the interior region, whilst the lightly shaded area comprises points which are exterior relative to C and form the exterior region. When speaking in terms of regions, the points comprising the curve C itself are usually called the boundary points and they may, or may not, belong to a region. A parametric representation of a curve C is not always the most convenient method for its description in the complex plane and, on occasions, it is better to identify the points z comprising a curve directly in terms of z itself. When necessary, regions are usually defined in the complex plane by means of a combination of curves and inequalities, as was done in the real plane. Example 10-4 Describe the curve C defined by the equation U-2|=| and use the result to define the region exterior to C. Solution This expression defines a connected set of points that all have a modulus 3/2 relative to the point z = 2 as origin, that is to say, the set of points which are all distant 3/2 from the point z = 2. Hence the equation | z — 2 | =3/2 describes a circle C of radius 3/2 centred on the point z = 2. Algebraically, the same result is obtained by writing z = x + iy, when | z — 2 | = | (x — 2) + iy |, so that from the definition of the modulus of a complex number, | z — 2 | = 3/2 is seen to be equivalent to the algebraic equation (x — 2) 2 + y 2 = 9/4. This is a circle of radius 3/2 centred on the point (2, 0). The region exterior to C is the entire complex plane less the points lying in and on this circle. Example 10-5 Describe the region interior to and including the curve C defined by arg (z - 1) - arg (z - /) = \n, and also satisfying the inequalities i < Re z < f and Im z > 0. Solution Consider the construction in Fig. 10-2 (a) in which P is the point z = 1, Q is the point z = i and R is a general point z. Simple geometrical arguments then establish that the angle y is related to the angles a and /? by the equation y = 77 + a — /S. However, the line PR is the vector z — 1, whilst the line QR is the vector z — /, so that arg (z — ;') = a and arg (z — 1) = /?. Since by the conditions of the problem we must have /3 — ot = \-n, it follows that y = \tt. The angle QRP is thus a right angle and hence the curve C must be a semi-circle drawn SEC 10-2 CURVES AND REGIONS / 451 A k y Complex plane f) <^f\<l M /HA p x> o Complex plane \ I H (a) i I I (b) Fig. 10-2 Region in complex plane: (a) boundary curve; (b) region interior to C and satisfying stated inequalities. from P to Q with PQ as its diameter. The semi-circle must lie above the diameter PQ, since were.the general point R to be taken below that line the equation relating the arguments would no longer be satisfied. To define the lower semi-circle the following condition would be needed : arg - 1) - arg (z - = -\n. To complete the solution to the problem it is now necessary to interpret the inequalities. The inequality \ < Re z < f describes the narrow strip bounded by the lines x = J and x = f , with the points of the line x = J excluded from consideration. The inequality Im z > is the half plane above and including the x-axis itself. Figure 10-2 (b) presents a composite diagram with the shaded area representing the region satisfying all the conditions of the problem. Boundary points belonging to the region are indicated by a heavy line and those excluded by a dotted line. Notice from this and the previous example that there is more than one way of specifying a given curve and region. The condition arg (z - 1) - arg (z - i) = \-n is an alternative expression of the condition \z-\-¥\ = ^Y with Re z > 0, Im z > 0, which, in turn, is an alternative expression of the algebraic condition (x - W + (y-h) 2 = h with x > 0, y > 0. 452 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 10-3 Function of a complex variable, limits, and continuity In Chapter 2 we used the term 'a real valued function of a real variable' to mean any rule that associates with each real number from the domain of definition of the function a unique real number from the range of that func- tion. Symbolically, if D denotes the set of points in the domain of a function /, and R denotes the set of points in the range of/, this relationship or mapping is given by R = /(/>). These ideas still hold good when the domain D and the range R include complex numbers. Thus if z is any point in D, and w is the unique number assigned to z by the function/, we write w=f(z). (10-2) The number z = x + iy is allowed to assume any value in D and so, if desired, could be called a complex independent variable, when w could then properly be called a complex dependent variable. Usually we shall simply refer to z and w as complex variables. It must be appreciated that, like z, the variable w has a real part and an imaginary part, both of which are in general dependent on x and y through the variable z = x + iy. We summarize these ideas formally as follows. definition 10-3 (function of a complex variable) We shall say that/is a function of the complex variable z = x + iy, and write w =/(z), if / associates a unique complex number w = u + iv with each complex number z belonging to some region D of the complex plane. Specific examples of functions of a complex variable are : (a) w = iz + 1 ; (b) w = zz; (c) w = z 2 + 2z + 1 ; (d) w = l/(z - 2); (e) w = sin z. With the exception of (d), which is not defined for z = 2, these functions are defined for all z. The difference between a function of a complex variable and a real valued function of a real variable is made clear by expressing these examples in real and imaginary form. Thus writing z = x + iy and w = u + iv we find: (a) w = i(x + iy) + 1 = (1 — y) + ix, showing that u = 1 — y, v = x; (b) w = (x + iy)(x — iy) = x 2 + y 2 , showing that u = x 2 + y 2 , v = 0. This is an example of a function that always maps a complex variable into a real variable. SEC 10-3 LIMITS, CONTINUITY / 453 (c) w = (x + iy) 2 + 2(x + iy) + 1 = (x 2 + 2x - y 2 + 1) + i(2y + 2xy), showing that u = x 2 + 2x - y 2 + 1, v = 2y(l + x); (d) w = l/(x + *> - 2) = [(x - 2) - »>]/(x 2 + J 2 - 4x + 4), showing that u = (x - 2)/(x 2 +yz-4 x + 4),v= -y/(x 2 + y 2 - Ax + 4), pro- vided only that x =£ 2 and y ^ 0; (e) w = sin z = sin (x + /j) = sin x cos iy + cos x sin i)>, and so using the results of Problem 6-33, that cos iy = cosh y, sin iy — i sinh y, we arrive at w = sin x cosh y + i cos x sinh y. Thus in this case « = sin x cosh y, v = cos x sinh y. Any function of x, y and complex constants that gives rise to a unique complex number when x and y are specified defines a function of the complex variable z by virtue of the relationship z = x + iy. For suppose that (x+y+l) + i(x-2y)=f(z), then to determine /(z) when z = 1 + 2i we simply write x + iy = I + 2i, showing that x = 1, y = 2, after which it follows from the form of /(z) that/(l + 2/) = 4 - 3/. Our Definition 10-1 of a limit of a sequence of complex numbers extends without difficulty to include the concept of a limit of a function of a complex variable. In essence, we shall say that/(z) has the limit w as z -> Zo and will write lim/(z) = h'o (10-3) z—zo when, for any small e > 0, we can always ensure that |/(z) — w \ < e by confining z to some suitably small circular neighbourhood \ z — z \ < d of the point z . That is to say/(z) can be made arbitrarily close to w by taking z sufficiently close to z , irrespective of the manner of approach of z to z . As in the real variable case, we do not require that/(z) be defined at zo or, if it is, that/(z ) should equal w . Expressed formally this becomes: definition 10-4 (limit of a function of a complex variable) The function /(z) will be said to tend to the limit w as z ->■ z , and we shall write lim/(z) = wo, if, and only if, for any e > there exists a d > such that |/(z) - w | < e when | z - z | < 6 with z ^ z . This form of statement should be compared with that in Definition 3-8 relating to a real valued function of two real variables. There is no essential difference, since the complex modulus is equal to the distance function p used in that definition. 454 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 Example 106 Prove that lim (z 2 + 1) = 1 + 2/. z-»l + i Solution The result is self-evident since the function /(z) = z 2 + 1 is uniquely denned for all z, but let us prove it using Definition 10-4. |/(z) - (1 + 2i) | = | z 2 - 2/ 1 = | [z - (1 + OH* + (1 + 0] I and from the properties of the modulus this becomes |/(z)-(l +2i)| = |z- 1 - = |z-l- <|z-l- . I z + 1 + I I . I z - 1 - i + 2(1 + | { | z - 1 - / I + 2 | 1 + / 1 }. Hence we may make |/(z) — (1 + 2/) | < e, where e > is arbitrarily small, provided that we choose the number 6 > such that | z — 1 — / 1 < d and d{d + 2 | 1 + / 1 } < e. The conditions of Definition 10-4 are satisfied, thereby establishing that 1 + 2/ is the limit. In other words, as z approaches the value 1 + /, so the function f(z) = z 2 + 1 approaches the number 1 + 2/, which is its limit. In this case it also happens to be true that lim/(z) = /(zo). Z-»Z Example 10-7 Prove that z^2i \Z — 2j/ Solution Unlike the previous situation, the function /(z) = (z 2 + 4)/(z — 20 is not defined when z = 2/'. To establish the desired result we notice that |/(z)-4/| = z 2 + 4 _ 4/ z - 8 z-2/ z 2 — 4/z — 4 = (z - 2/) 2 z-2/ z-2/ = I z - 2/ 1 Thus we can ensure that | /(z) — 4/ 1 < e by taking | z — 2/ 1 < d, where here d = e. The conditions of Definition 104 are satisfied, and thus we have established that despite the fact that the function f{z) = (z 2 + 4)/(z — 2/) is not defined at z = 2/. The results of Theorem 10-2 generalize to give limit theorems for functions of a complex variable. SEC 10 ' 3 LIMITS, CONTINUITY / 455 theorem 10-3 (operations on limits of complex functions) If /(z) and g(z) are two complex functions for which lim/(z) = v and lim^(z) = u' , z—zq z—zq then (a) lim [f(z) + g(z)] = v + „■„; Z-Z (b) lim/(z) 5 -(z) = y H ; z—zo (c) provided w ^ 0, lim [f(z)]/[g(z)] = v /w . z—z The proofs of these results follow directly from Definition 10-4 and are left to the reader. Example 10-8 Apply the results of Theorem 10-3 to the functions /(z) = z 2 + 2z + 1 and g(z) = 1 - iz to determine the limits of /(z) + g(z), f{z) g{z), and f(z)/g(z) as z — /. Solution The functions/(z) and g(z) are defined for all z and so it is easily seen that lim/(z) = lim (z 2 + 2z + 1) = 2i z-*i z—i and lim g(z) = lim (1 - iz) = 2. z—i z—i These results, which have been obtained by direct substitution, may be verified by using Definition 10-4, as in Example 10-3. Results (a), (b), and (c) of Theorem 10-3 may thus be applied to yield: (a) lim [f(z)+g(z)] = 2(1+/); z—i (b)limf(z)g(z) = 4i; (c) as lim g(z) = 2^0, lim ^ = /. z-i z^i g(z) It is now a simple step to extend the idea of continuity for, as with real valued functions of a real variable (c.f. Definition 3-9), we shall say that the function /(r) is continuous at z if lim/(z) = w exists and/(z ) = h . We z—z thus arrive at the following statement. -*0 definition 10-5 (continuity of a function of a complex variable) The complex function /(z) will be said to be continuous at z if: 456 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 (a) Iim/(z) = it'o exists, and (b) /(z ) = no. A complex function will be said to be continuous in a region of the complex plane if it is continuous at all points of that region. Example 10-9 Prove that the function /(z) = a + bz is continuous everywhere. Solution If zo is any complex number we have |/(z) -/(zo) I = I a + bz - a - bz | == ] b || z - z |, so that for any e > 0, |/(z)-/(z )| <e if \z-z \<d provided that we take 8 = e/\ b \. We have proved that lim/(z) = a + bzo, Z-'ZO which is condition (a) of Definition 10-5. Condition (b) is obviously true as f(zo) = a + bzo for all z . As zo was arbitrary, it follows that we have proved the required property of continuity for/(z). Notice that by first setting b = and then setting a = 0, b = 1, the continuity of the functions /(z) = a (constant) and/(z) = z follow as special cases. Example 10-10 Prove that the function f n (z) = z n , where n is a positive integer, is continuous for all z. Solution The proof is by induction. In the previous example we proved as a special case the continuity of/i(z) = z. If we assume that/ m (z) is continuous, then since f m +i(z) = z m+1 = z . z m =/i(z) .f m (z), it follows directly from Theorem 10-3 (b) that/ m+ i(z) is continuous. Thus if P{m) is the property that f m {z) is continuous, we have proved directly that P{\) is true and also that if P(m) is true, then so also is P(m + 1). Hence it follows by induction that P(m) is true for all m, which establishes our result. Further use of Definition 10-5 coupled with Theorem 10-3 makes it a straightforward matter to establish many other important and useful results concerning continuity. Typical of results that follow from such reasoning are that a complex polynomial P(z) = a + aiz + a 2 z 2 + • • • + a„z n is continuous everywhere, whilst a complex rational function SE C 10-3 LIMITS, CONTINUITY / 457 a + a\z + a 2 z 2 + • ■ • + cimz™ m.= bo + biz + b 2 z 2 + • • • + b n z n is continuous everywhere except at the n zeros of the denominator. It is interesting to give an alternative proof of the continuous nature of a polynomial P(z). As z = x + iy, it follows that we may express P(z) in the form P(z)= Qi(x,y) + iQ 2 (x,y), where Qi(x, y) and Q 2 (x, y) are real polynomial functions each with general terms of the form x s y l in which s, t are either zero or positive integers. Now from the behaviour of real functions of two real variables we know that Qi(x, y) and Q 2 (x, y) must be continuous functions of x and y everywhere in the plane. However, if n and z 2 are any two points with zi = xi + iy x and z 2 = xi + iy 2 , then | P(z 2 ) - P(zi) | = | Qi(x 2 ,y 2 ) - 2i(.xi, ji) + i[Q 2 (x 2 ,y t ) - Q 2 (x u yi)] | < I Qi(x 2 ,y 2 ) - Qi(x uyi ) | + | Q 2 ( X2 , y 2 ) - Q 2 ( Xl , yi ) |. Now as Qi(x,y) and Q 2 (x, y) are continuous, it is true that lim Qi(x 2 , J2) = Qi(x h ji) and lim Q 2 (x 2 , v 2 ) = Q 2 (x u y{), vi-*vi v2—yi and so | P(z 2 ) — P(z{) \ may be made arbitrarily small by taking z 2 sufficiently close to z\. This proves our assertion of the continuity of P(z) for all z, since zi, z 2 were arbitrary points in the complex plane. Obvious extensions of the other continuity theorems proved for real variables are also possible and the most useful ones are summarized below without further proof. theorem 10-4 (continuity theorem for complex functions) If/(z) and g(z) are two complex functions each continuous at z =z , then (a) f(z) + g(z) is continuous at z ; (b) /(z) g(z) is continuous at z ; (c) f(z)jg(z) is continuous at z provided g(z ) ^ 0; (d) if/(w) is continuous at w = n , and vv = g{z) is continuous at z = z , with H'o = g(z ), then the composite function (function of a function) f[g ( z )] is continuous at z = zq. It is, for example, condition (d) of this theorem that validates the assertion that (z 2 + 3z + 2) 3 is continuous everywhere. (Why ?) 458 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 10-4 Derivatives — Cauchy-Riemann equations Thus far the related concepts of a limit, a function, and of continuity have been successfully extended to include a function of a complex variable. It is now reasonable to attempt to generalize the notion of a derivative, and at this point we encounter a major dissimilarity between a function of a complex variable and a real valued function of two real variables. Indeed, whereas we have already seen that most real valued functions of two real variables are partially differentiable with respect to those variables, it will shortly be shown that the operation of differentiation can only be defined for a very special class of complex functions. Before discovering the exact nature of the restriction on a complex function if it is to be differentiable, we must extend our definition of a derivative in a manner compatible with the real variable case. definition 10-6 (derivative of a complex function) Let w =f(z) be defined in some neighbourhood of the point z = zo and let | h | be sufficiently small for z = zo + h to lie within this neighbourhood. Then, if the difference quotient f( Zo + h) - f(z ) tends to the limit y as | h | ->■ 0, we shall call y the derivative of/(z) at zo and will write either /'(zo) = y or dw dz = y- Z = 20 If this difference quotient has a limit for all points zo of some region in which h' = f(z) is defined, then /(z) will be said to be differentiable in that region. The derivative, as a function of a general point z, will be denoted either by f'(z) or dw/dz. Alternatively expressed, this definition asserts that the complex number y is the derivative of w = /(z) at z = zo if, for every e > 0, there exists a 6 such that /(ZQ + h) -/(zo) y <e for I h I < <5. Notice that although h is small when | h \ is small, the condition | h | — >- that is imposed in our definition of a derivative requires the limit defining the derivative to exist for all possible methods of approach of h towards zero. This means that if the derivative is to exist, then it must be independent of the manner in which h-*0. This is a vitally important feature of the definition and one to which we shall return. SEC 10-4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 459 Example 10-11 Prove that if n> is an integer and w = z", then chv d7 for all z. = nz n ~ x Solution Consider some point zo and form the difference quotient (z + h) n - z n , — _, h where It is any complex number. Then by the binomial theorem (ZQ + h) n - Zq" h zo" + ftfoo"- 1 + [n(n - l)]/2! ft 2 z "- 2 + • • • + h n - z » and thus (zo + h)« - zo" , n(« - 1) Now as | /? | ->■ implies A -»• 0, taking the limit of this expression as | h \ -> we arrive at the derivative of the function w = z« at the point z : lim 1*1-0 "(zo + h) n - zo"' nzo"- 1 . Since the point z was arbitrary this result is true for all z , and so the function is differentiable for all z and d(z») _ , — — = nz n ~ x . dz A more subtle argument shows that this result is, in fact, true for any value of n and not just for n a positive integer. Example 10*12 Prove that if w = sin z, then dvv — = cos z dz for all z. Solution Let z be any value of z and form the difference quotient sin (zo + A) — sin zo 460 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 where h is any complex number. Then using a familiar trigonometric identity we have sin (zo + ft) — sin zo sin zo cos h + cos zo sin h — sin zo ^sin/A /l = cos zo | —7- I — sin zo I — /l — cos h\ zo \—r-y Now u' =/(z) will be differentiable if the limit of the right-hand side of this expression can be shown to exist. This is most easily done by utilizing the formal power series expansions for the sine and cosine functions, which show that and sin h 1 ~h ' = h h- h» h 5 3i + 5! + - = 1 A 2 /t 4 _ 3~! + 5! — 1 1 — cos h h 1 ~h r A 2 . 1 - 1+ 2! /? 4 -4! + - -1 = h .2! h* 1 It is clear from these that because | h \ — >■ implies h —*■ 0, then /sin h\ 1 1 — cos h\ hm —— =1, hm -— ui-*0 \ ft I I /1 l-*o \ n = 0. Returning to our problem, taking the limit of the difference quotient as I h I — *■ and using the above limits gives for the derivative of if = sin z at the point zo the result lim ( I ft M) \ sin (zo + h) — sin zo = COS Zq. Once again, as zo was arbitrary, we have shown that w = sin z is differ- entiable for all z, so we may write dz (sin z) = cos z. Alternative derivations of the two limits involved in this example are indicated in Problems 10-22 and 10-23. The following theorem is an obvious extension of Theorems 5-4 to 5-8 relating to the real variable case. theorem 10-5 (rules of differentiation) If/, g are differentiable functions in some region, then throughout that region : (a) -*- [/(z) + g(z)] = $f + -? (Derivative of Sum); dz dz dz SEC 10-4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 461 d de df (b) dz [ ^ Z) ^ (Z)] = f® df + ?(z) dl ( Derivative of Product) ; (c) dz U(z). g(z)(d//dz) -/(z)(dg/dz) ^-j provided g(z) ^ (Derivative of Quotient) ; d ( d ) falfigi 2 )]} = f'[g(z)]g'(z) or, by writing u = g(z), this takes the form - {f[g(z)]} = ~ ^ (Chain Rule); (e) /(z) and g(z) are continuous functions of z (Differentiability implies Continuity). Proof All these results may be established directly from the definition of a derivative by arguments that are essentially similar to the real variable case. We give the proofs of (a) and (e) as illustrations. Result (a) follows because az |/i|-*0 = lim [■ |A|-0 L 7(z + A) + g(z + h) -/(z) - g(zy h) -f{z) + lim UI-0 g(z + h)- g (zy dz dz Result (e) follows because differentiability of a function /(z) requires the difference quotient [/(z + h) — f{z)]jh to have a limit as | h \ ->■ 0, which in turn requires that \f(z + h) — f(z) | ->- as | h \ ->■ 0. This is just the formal statement that/(z) is continuous and so our assertion is proved. Example 1013 Use the derivatives established so far together with Theorem 10-5 to differentiate the functions : (a) w = z 2 + 3 sinz; (b) w = z 3 sin z; (c) w = 1/(1 + z); (d) w = sin (z 2 + z + 3). Solution (a) Using Theorem 10-5 (a) with/(z) = z 2 , g(z) = 3 sin z we obtain dw — - = 2z + 3 cos z dz for all z. 462 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 (b) Using Theorem 10-5 (b) with/(z) = z 3 , g(z) = sin z we obtain — - = z A cos z + 5z l sin z dz for all z. (c) Using Theorem 10-5 (c) with/(z) = 1, g(z) = (1 + z) we obtain Aw —1 dz = (1 + z) 2 forz ^ 1. (d) Writing w = sin w, where u = z 2 + z + 3 enables us to apply the chain rule (Theorem 10-5 (d)): d — - (sin u) QU du — = (cos u)(2z + 1), dz dw dz"' whence — =(2z+ 1) cos (z 2 + z + 3). dz Let us now explore more carefully the implications of the requirements of differentiability. This is perhaps best prefaced by an illustration of a simple function of a complex variable that is not differentiable. We shall attempt to compute the derivative at zo = of the function /(z) = z, where z = x + iy. We have f(z) = x — iy, from which it follows that /(0) = 0, so that in computing the required derivative we are led to consider the behaviour of the difference quotient /(0 + h) -/(0) = f(h) -Q = h h h h as | h | — >• 0. Writing h = a + ifl this becomes h a - //9 a 2 - /5 2 h a + //? a 2 + ft* _ ,/ 2*/? \ ' U 2 + ^f Obviously this expression can have no limit as | h \ —*■ because the result is dependent on the manner of approach of h to zero. To see this we need take only two special cases: (a) if a = 0, and j8 -»- 0, then /3->0 SEC 10-4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 463 whereas (b) if /? = 0, a ■ ■ 0, then lim a-0 \h fi = = 1. The limit thus depends on the manner in which h ->■ so that no derivative exists in the sense of Definition 10-6. Obviously some conditions must be devised such that it is possible to decide, without appeal to Definition 10-6, whether or not a given function f{z) has a unique derivative— that is, whether or not the limit of the difference quotient in Definition 10-6 is independent of the manner in which h -*■ 0. Consider a function /(z), assumed to be differentiate in some region, and express it in the form /(z) = u + iv, ( 10 -4) where u, v are functions of x and / by virtue of the relationship z = x + iy. (Cf. the illustrative examples (a) to (e) following Definition 10-3.) Let us now compute the derivative of/(z) and, in doing so, appeal to Fig. 10-3. Complex plane Fig. 10-3 Derivative of a complex function. As/(z) is assumed to be differentiable, we shall choose an arbitrary h as shown in the Figure and allow it to tend to zero along the line QP inclined at an angle a to the x-axis. Then if h = X + ifi, it follows that z + h = x + X + i(y + (i), and so if we also make use of the alternative representation of h in the form h = | h \ e ia , where | h \ = (A 2 + /a 2 ) 1 ' 2 , we have f/(z + h) -f{z)~\ ,._ V f[(x + X) + i(y + ,£)]- f{ x + ,»"] i L f\z) = lim I h |-0 = lim l*l-o or f'(z) = e "lim |*|-0 (X 2 + /*2)i/2 e ia u(x + X, y + ju) + iv(x + X,y + ju)- u(x, y) - iv(x, y)~ (A 2 + /*2)l/2 (10-5) 464 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 As/(z) is assumed to be differentiable, result (10-5) must be independent of the angle a. To see the implications of this let us first consider the real part of the bracketed expression inside the limit (10-5) which is u{x + K y + ji) — u(x, y) (A 2 + /z 2 )i/2 (10-6) By adding to and subtracting from the numerator of this expression the term u(x, y + fj), it is soon verified, with a little manipulation, that it is equivalent to tu{x + A, y + n) — u(x, y + fi) + (A 2 + Z* 2 ) 1 / 2 u(x, y + fi)- u(x, y)\ P /" (A 2 + /l 2 )l/2 (10-7) Geometry tells us that A = cos a, /" = sin a (A 2 + /i 2 ) 1 ' 2 ' (A 2 + /< 2 )i/2 so, when taken in conjunction with the fact that | h | -»■ implies X -> 0, fi-*0, the limit of expression (10-7) as | h \ ->• becomes du du . — cos a H sin a. 8x By (10-8) An identical argument applied to the imaginary part of the bracketed expression inside the limit (10-5) yields the result dv dv . — cos a. -\ sin a. ex cy Hence, the limit (10-5) is equivalent to /'(z) = e-'» (du ou . \ (8w dv \ — - cos a + — sin a I + / 1 — cos a -\ sin a I \ox cy J \dx By / (10-9) For f'(z) to be independent of the manner in which h-*0, it follows that Eqn (10-9) must be independent of the value of a. In particular, the real and imaginary parts of this expression must be independent of a. Expressing f'(z) in real-imaginary form we obtain /'(z) = ~Su — cos 2 a + Bx + i Bv By tou ov sin'' a + I 1 \By Bx sin a cos a 'Bv Bu . (Bv 8u\ . — cos 2 a — — sin 2 a + — sin a Bx By \8y Bx) cos a (10-10) SEC 1 0' 4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 465 Inspection shows that it can only be independent of a if both the following conditions are satisfied: 8u 8v , 8u 8v Tx'Ty 3nd Ty--JZ C ' 11 ) These are known as the Cauchy-Riemann equations and are fundamental to the development of the theory of functions of a complex variable. An immediate consequence of the Cauchy-Riemann equations is that Eqn (10-10) may be written either as _,, 8u 8v f (z) = — + i — (case a = 0) (10-12) or as ,,, dv 8u J^ = Jy~ l Jy fc^ K = ^ (10-13) It has thus been established that if a function /(z) is to have a uniquely determined derivative at a point in the sense of Definition 10-6, then it must satisfy the Cauchy-Riemann equations (10-11). We now check whether the converse— the satisfaction of the Cauchy- Riemann equations by a function automatically implying that the function has a unique derivative— also holds. Let w = u + iv be a function such that u, v satisfy Eqns (10-11). Consider first the function u at some point z = x + iy. We know from Chapter 5 that at a neighbouring point z + h with h — X + ijx, for Aw = u(x + X, y + ji) — u(x, y) we may substitute the expression 8u 8u where ei, r\\ -*■ as X, /j, -*■ provided that u x and u y are continuous. A similar result is of course true for Ai>, the change in v consequent upon moving from z to z + h, though for ei, r\\ we must substitute £2, r\i and require that v x , v y are continuous. Thus if A/=/(z + h) -f{z), we have . . 8u , 8u (dv . 8v \ A/= Tx l + 8~y * + ' \Tx X + Ty * ) + ^ + ^ + ^ + W>" Using the Cauchy-Riemann equations this can be re-expressed as (8u 8v\ , f= \8x +i 8x'} h + (£1 + iE2)X + (m + ir,2)f *> 466 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 whence A/ du dv n dx dx (fy + (Vi + iV2) (£)• (10-14) However, | A | < | A | , | /* | < | A | so that A <1, <i; and as ei, £2, r?i, and 172 all tend to zero as A, /j, -> 0, by taking the limit of Eqn (10-14) as | A | -*■ we arrive at ft \ du _l • 8v ox dx The fact that/(z) is assumed to satisfy the Cauchy-Riemann equations and to have continuous partial derivatives u x , u y , v x , and v y has thus enabled us to prove that/(z) has a unique derivative. We have established the follow- ing fundamental theorem. theorem 10-6 (Cauchy-Riemann theorem) If u(x,y) and v(x,y) have continuous first order partial derivatives in some region, then necessary and sufficient conditions that/(z) — u + iv should have a derivative at each point z = x + iy of that region are that du dv du dv dx dy dy dx Results (10-12) and (10-13) may be used to deduce the form off'(z) by using the simple observation that when z is purely real, so that z = x, the forms assumed by/'(z) and/'(x) are identical. Similarly, when z is purely imaginary, so that z = iy, the forms of/'(z) and/'Oj) are identical. This gives the following straightforward rule for determining the derivative f'(z) of the function /(z) which is sometimes helpful. Rule 1 (Determination of the derivative of a complex function) If/(z) = u + iv satisfies the Cauchy-Riemann equations, then the derivative f'(z) expressed in terms of z may either be deduced (a) from the result W = -x + 1 Tx by formally setting y = 0, and then replacing x by z; or (b) from the result SEC 10-4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 467 ft \ dV ■ tu f(z) = — -l — cly cly by formally setting x = 0, and then replacing iy by z. Example 10-14 Determine which of the following functions satisfy the Cauchy-Riemann equations and thus possess uniquely defined derivatives. Give the form of this derivative when it is defined. (a) iv = z 2 ; (b) w = cos z; (C) IV = | Z |. Solution (a) If iv = z 2 , then iv = (x + iy) 2 = x 2 - y 2 + ilxy and so u = x 2 - y 2 , v = 2xy, So u x — 2x, u y = — 2y, v x = 2y, and v„ = 2x. It is readily seen that these expressions satisfy the Cauchy-Riemann equations and so we may conclude that iv = z 2 possesses a unique derivative. It follows from Eqn (10-12) that f\z) = 2x + i2y = 2z. This result was so simple that appeal to Rule 1 was not necessary. (b) If w = cos z, then w = cos (x + iy) = cos x cos iy — sin x sin iy, when w = cos x cosh y — i sin x sinh j, and so u == cos x cosh 7, r = — sin x sinh j. Hence, u x — — sin x cosh 7, Hj, = cos x cosh y, v x = — cos x sinh y and v y — — sin x cosh ;-. Here also it is immediately apparent that the expressions satisfy the Cauchy-Riemann equations, showing that w = cos z possesses a unique derivative. Let us choose to work with Rule 1 (a) to determine f'(z) in terms of z. We must therefore start with the equation ft \ Bu , ' 8v ex ex In this case We find f\z) =b —sin x cosh y — i cos x sinh y. Then, setting y = and replacing x by z gives /'(z) = - sin z. It is instructive to compare this rapid method with the direct approach we now indicate. f\z) = — sin x cosh y — i cos x sinh y = — sin x cos iy — cos x sin iy = — sin (x + iy) = — sin z. 468 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 (c) If w=\z |, then w = (x 2 + y 2 ) 1 ' 2 , showing that u = (.v 2 +J 2 ) 1 / 2 , v = 0. Then, as u x = xj{x 2 + y 2 ) 1 ' 2 , u y = yj(x 2 + y 2 ) 1 ' 2 , v x = v y = 0, it is clear that w = | z | cannot satisfy the Cauchy-Riemann equations anywhere in the complex plane. We conclude that w = | z | has no derivative at any point in the complex plane. Example 10- 15 Determine the constants a and b in order that »v = x 2 + ay 2 — 2xy + i(bx 2 — y 2 + 2xy) should satisfy the Cauchy-Riemann equations. Deduce the derivative of ir. Solution Here we have u — x 2 + ay 2 — 2xy, v = bx 2 — y 2 + 2.yv so that u x = 2x — 2y, u y = 2ay — 2x, v x = 2bx + 2y, and v y = —2y + 2x. It is certainly true that u x = v y , so that the first of the Cauchy-Riemann equations is automatically satisfied. For the second equation to be satisfied we must require that u y = — v x , or 2ay — 2x = —{2bx + 2y). This is only possible ifa= -l,b= 1. Now as/'(z) = u x + iv x , we have f'{z) = 2x - 2y + i(2x + 2y). Again, working with Rule 1 (a) gives f\z) = 2(1 + i)z. Had we chosen to work with Rule 1 (b) to express/'(z) in terms of z we should have started from the equation f'{z) = Vy — iUy which in this case becomes f\z) = -2y + 2x + i(2y + 2x). Then, setting x = and this time replacing iy by z, we again arrive at f\z) = 2(1 + i)z. As the complex number z can also be expressed in modulus argument form by writing z = re'", it is necessary to know the form taken by the Cauchy- Riemann equations in terms of the variables (r, 6). This is most readily achieved by appeal to Theorem 5-22. It follows directly from Theorem 5-22 that : 8u _ 8r du 86 du 8u _8r 8u 86 cu 8x~~dx~dr ~d~x 8~6 8y ~ ' Ty Tr + 8~y 86 } (10-15) 8v_8r8v 86 8v 8v _ or 8v 86 8v 8x dx 8r 8x 80 8y cy or 8y 86 SEC 10-4 DERIVATIVES - CAUCHY-RIEMANN EQUATIONS / 469 In these equations (r, 6) are the polar coordinates of the point (x, y) and so x = r cos 8, y = r sin 8. (See Eqns (4-13).) These relationships may now be used to determine drjdx, BrjBy, B8\Bx, 86 jdy as follows : 8r 1 cos 6 ana so 8x cos 6 cos 6 = - r whence ■ ,88 — sin 8 — = 8x 1 , 38 and so — r ox 1 r sin 6' sin 6 and so 8r 1 8y sin 6' sin 6 = y - r whence 88 1 cos 8 — — - 8y r and so — = 3y 1 rCOS 8 8u 1 8v and 1 8u 8v dr r 86 r 88 8r Combination of these results with Eqns (10- 15), followed by some simple manipulation, then establishes that the polar form of the Cauchy-Riemann equations is i 1 /)ii Pin (10-16) Functions f{z) that are uniquely defined in some neighbourhood of a point zo and satisfy the Cauchy-Riemann equations at zo and throughout that neighbourhood are called either analytic or regular functions. Points at which a function ceases to be analytic are called singularities of the function. Thus the function /(z) = l/(z + 1) is easily seen to be analytic everywhere except at the point z = — 1 , which is a singularity. Supposing that u X y, v xy exist and are continuous, it follows directly by partial differentiation of the Cauchy-Riemann equations u x = v y , u y = —v x that 8 2 u B 2 u 8 2 v 8 2 v Bx 2 By 2 Bx 2 By 2 These equations are identical in form and are examples of an important partial differential equation called Laplace's equation, any solution of which is called a harmonic function. The harmonic functions u and v associated with an analytic function /(z) = u + iv are called conjugate harmonic functions. For example, we have seen that cos z = cos x cosh y — i sin x sinh y is an analytic function with u = cos x cosh y, v = — sin x sinh y. Now both u and v are such that u xy , v xy are continuous, so it follows immediately that 470 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 u and v satisfy Eqns (10-17). Hence u = cos x cosh y, v = — sin x sinh y are conjugate harmonic functions. The term conjugate is, of course, used here in a different sense from when discussing complex conjugates. If u, v are harmonic functions and we consider the analytic function w = u + iv, then an obvious modification of the arguments that gave rise to Rule 1 leads to the following rule for the expression of w in terms of z. Rule 2 (Expression of an analytic function in terms of z) If u, v are conjugate harmonic functions, then the analytic function w = u + iv expressed in terms of z may be deduced either by : (a) formally setting y = in the expression w — u + iv and then re- placing x by z; or (b) formally setting x = in the expression w = u + iv and then re- placing iy by z. Example 10-16 Show that u = 2xy + 3y is harmonic and determine its harmonic conjugate v. Express the functions dn'/dz and w = u + iv in terms of z. Solution We have u x = 2y, u xx = 0, u y = 2x + 3, u yy = 0, showing that Uxx + Uyy = 0. Hence u is harmonic. If v is to be the harmonic conjugate of u then the functions u, v must satisfy the Cauchy-Riemann equations u x = v y , u y = —v x . Using the known expressions for u x , u y we find that (a) 2y = Vy, and (b) 2x + 3 = — v x . Integration then gives : from (a), v — y 2 + f(x) + const, from (b), v = — x 2 — 3x + g(y) + const, where as yet/(x) is an arbitrary function of x and g(y) is an arbitrary function of j. However, as these are two alternative expressions for the same function v they must be identical, whence /(x) = — (x 2 + 3x) and g(y) = y 2 . Thus we have arrived at the expression v = y 2 — x 2 — 3x + const for the function v, which is the harmonic conjugate of u. Applying Rule 1 (a) to find/'(z) requires that we start from 8u ,dv f (z) = 8- X + 'Tx SEC 10-5 CONFORMAL MAPPING / 471 or, in this case, from /'(z) = 2y - i(2* + 3). So, setting y = and replacing x by z, gives f'(z) = -i(2z + 3). To express w = u + iv in terms of z we must work with Rule 2. We have H' = (2xy + 3y) + i(y 2 — x 2 — 3x) + const, so that if we apply Rule 2 (a), we must set y = and replace x by z to arrive at w = —i(z 2 + 3z) + const. It is important to notice when using Rule 2 that the functions u and v must be conjugate harmonic functions, since otherwise they will not satisfy the Cauchy-Riemann equations and the rule will be inapplicable. Indeed, if the rule is applied to harmonic functions that are not conjugate, then the functions of z that are generated by Rules 2 (a) and 2 (b) may, or may not be identical. In neither case will the result be correct. For example, u = sin x cosh y and v = cos x cosh y are harmonic functions but they are not harmonic conjugates. Applying Rule 2 (a) to w = u + iv generates the function w = sin z + / cos z, whereas applying Rule 2 (b) generates the function w = i cos z. For a different example, take u = x 2 — y 2 and v = xy, which are also harmonic functions that are not conjugate. In this case both Rules 2 (a) and 2 (b) generate the same function w = z 2 , though of course this also is incorrect. 10-5 Conformal mapping Thus far we have examined some of the analytical consequences of requiring that a function w =f(z) be differentiable. Let us now pursue this matter further by studying some of the geometrical implications of differentiability. Take two complex planes, which we shall refer to as the z-plane and the w-plane, the connection between their respective points being through the differentiable function w — f(z). Because each value of z gives rise to a unique value of w, it follows that any curve y in the z-plane must correspond to some other curve T in the iv-plane. In this sense the iv-plane can correctly be described as a mapping of the z-plane. For a specific illustration, let us determine how the straight line y = olx in the z-plane is mapped by the function w = iz + (1 + i) onto the w-plane. We begin by setting w = u + iv, z = x + iy, after which a simple calculation yields u=l— y, v = x+l. Hence to find the line in the vv-plane that corresponds to y = xx in the z-plane it is now only necessary to set y = olx in these expressions for u, v and then to eliminate x between them. Performing 472 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 w-plane Fig. 10-4 Mapping by the function w = iz + (1 + /). these operations we find u = 1 — ax, v = x + 1, whence \ a / <x This is again an equation of a straight line but this time in the iv-plane, The line passes through the point (0, (1 + a)/«) and has the gradient — 1/a. Representative lines y\, yi are shown in the z-plane of Fig. 10-4 and their respective maps or images are shown as the lines Y\, Yi in the associated w-plane. The lines yi, yi correspond, respectively, to a = 1, a = 2. It is not difficult to see that the map in the n-plane has been obtained from the map in the z-plane by first rotating the original pair of lines anti-clockwise through an angle \n and then translating the resulting picture to the point 1 + / as a new origin. More important than this, however, is the fact that the angle 6 between the lines yi, yi is equal to the angle between the lines Ti, Yi and, moreover, the sense of rotation is preserved. That is to say if yi is inclined to y\ at an angle 6, measured anti-clockwise, then T 2 is also inclined to Ti an an angle 6, measured anti-clockwise. This is no chance result and, indeed, we now prove that if a function f(z) is analytic (that is, satisfies the Cauchy-Riemann equations and so has a uniquely defined derivative) then, except for points z at which /'(zo) = 0, the function w = /(z) will preserve both the angle and the sense of rotation when mapping intersecting curves yi, y% in the z-plane onto corresponding intersecting curves Ti, Yi in the n-plane. These properties of a mapping or transformation are recognized by saying that the transformation is conformed. To prove this general result we now consider a function w = /(z) that is analytic in some region of the z-plane and take a point z in that region at which/'(zo) ¥= 0. Let yi, yi be two curves drawn in the z-plane that intersect SEC 10-5 CONFORMAL MAPPING / 473 t a z-plane ■m (a) Fig. 10-5 Conformal mapping w = /(z). at zo and let z\ denote a point Q on the curve y\ as indicated in Fig. 10.5. We shall suppose that as Q moves away from P along y\ in the direction indicated by an arrow in the Figure, so the point h'i =/(zi), which we denote by Q', moves away from point P' in the direction indicated. This process thus associates a sense of direction with each of the corresponding curves y\ and Y\. A similar argument defines directions along y% and ITV Now as Q approaches P, so the secant PQ will assume its limiting position in which, when it is inclined at an angle ai to the x-axis, it is tangent to y\ at zo. As PQ = zi — zo we have ai = lim arg (zi — z ). zi— zo Identical reasoning shows that /Si = lim arg (in — w ), where /Si is the angle of the tangent to Ti at P' measured from the w-axis. Hence we have /Si — ai = lim arg (in — iv ) — lim arg (zi — z ) and, as arg a — arg b = arg ajb, this may be written „ .. /H'i - M'o\ /Si — ai = lim arg • zi^zo \ Zl — Zo / However, as we are assuming/(z) is differentiable 474 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 /Wl — R'o\ /'(zo) = lim . zi-«*o \ z l — z / and provided /'(zo) ¥= it then follows that 0i - ai = arg/'(zo). (10-18) In the case that/'(zo) = 0, the amplitude of/'(zo) is indeterminate. Such points are called critical points of/(z), by analogy with the real variable case. We have seen that/'(zo) is unique, so that the expression on the right- hand side of Eqn (10-18) is a constant. The result must, then, also be true for any other curve yz, say, and its map r 2 . Hence we have /?1 — «1 = 02 — <*2 or a 2 — ai = 02 — 01. The curves y\, y% were any two curves which intersected at zo, so we have proved the following result. theorem 10-7 (conformal mapping) If f(z) is analytic in some region, then apart from those points zo in that region for which f'(z ) = 0, the mapping w = f(z) preserves both the angle and the sense of rotation when mapping intersecting directed pairs of curves in the z-plane into corresponding intersecting directed pairs of curves in the w-plane. Such a mapping is said to be conformal. To close this chapter we now examine some important special conformal mappings. Rather than emphasize the algebraic details of the transformations or mappings, we shall aim primarily at interpretation in terms of basic geometrical operations such as translation, rotation, and change of scale (dilatation). 10-5 (a) The general linear transformation The general linear transformation is the name given to the mapping described by the equation w = az + b, (10-19) where a, b are arbitrary constants with a ^ 0. Our introductory example was of this form with a = i, b = 1 + /. The mapping (10-19) obviously satisfies the Cauchy-Riemann equations and, as dw/dz = a =£ 0, it has no critical points and so provides a conformal mapping of the entire z-plane. To appreciate the geometrical effect of this mapping consider first the case in which a = 1 so that w = z + b. This has the effect of generating the w-plane by simply adding a constant complex number b to every point in the z-plane. Using the vectorial repre- SEC 105 CONFORMAL MAPPING / 475 sentation of complex numbers this is seen to be equivalent to generating the H-plane by shifting the entire z-plane through a distance | b | parallel to the vector b. Such a mapping is accordingly called a translation. Another way of expressing this result is by saying that if the w- and z-planes were to be superimposed, then the 0{u, v} axes would be obtained by translating the 0{x,y} axes, without rotation, such that in their new position the origin coincided with the point z = — b. To see this, remember that b is a vector and that the position vector of the origin of 0{w, v} is b relative to 0{jc, y}, but that the position vector of the origin of 0{x, y] relative to 0{w, v} is — b. Consequently, we may conclude that the mapping w = z + b leaves invariant the shape and size of any curve in the z-plane. Next we consider the consequences of setting b = so that w = az. If we write a = pe ia and z = re 6 , we have w = pre i(,x+e) . This shows that the effect on the z-plane of the mapping w = az is to multiply the modulus of z by a constant factor p and to increase the argument of z by a constant angle a. Hence w = az corresponds to a magnification, or dilatation, of every z by a constant factor \a\, and a rotation about the origin of every z by a constant angle a. Thus we may deduce that the general linear transformation w = az + b of the z-plane may be described geometrically as the combination of a dilatation, a rotation, and a translation. In the trivial case a = 1, b = the mapping reduces to an identity. 10-5 (b) The mapping w = z n A typical example of this form is provided by the function w = z 2 . As it is interesting to interpret mappings in terms of both polar coordinates and cartesian coordinates, let us first study the polar representation. To do this we set z = re? e , w = pe? 4 , when we find p(cos (/> + i sin <£) = r 2 (cos 26 + i sin 26), showing that p = r 2 and <j> = 26 + 2mr, where n = 0, 1, 2, . . .. However, for our purposes we shall disregard this ambiguity of the angle <f> with respect to multiples of 277, since all angles in polar coordinates are indeterminate in this manner. In words, the effect of the mapping w = z 2 is to square the modulus of every number z and to double its argument. This is very easily illustrated by appeal to Fig. 10-6 depicting the mapping of a shaded portion of an annular region in the z-plane into another, larger, annular region in the w-plane. The conformal nature of the mapping is reflected by the fact that at the corres- ponding corners of the figures the angles between the boundary lines together with their senses have been preserved. They are of course equal to \n in this instance. Because of the properties just outlined it is readily seen that the function 476 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 (a) w-plane (b) Fig. 10.6 The polar mapping w = z 2 . w = z 2 maps the upper half z-plane onto the entire w-plane. When this is done it is necessary to exclude the origin in the w-plane together with all the points on the positive w-axis, since these are mapped twice. In fact they correspond to points on both the positive and negative parts of the real axis in the z-plane. The origin in the w-plane is in fact a critical point, for w' = 2z vanishes at z = 0. This exclusion of a line of points in the w-plane is often described by saying that the w-plane has been cut along the real axis. The effect of the mapping is more striking if it is displayed in terms of x and y by again setting w = u + iv, but this time writing z = x + iy to obtain u = x 2 — y 2 , v = 2xy. These equations show, for example, that the straight line x = a maps into the curve u = a 2 — y 2 , v = 2ay in the w-plane which, after elimination of y, is seen to be equivalent to v 2 = 4<x 2 (a 2 — u). Similarly, the straight line y = /S may be seen to map into the curve v 2 — SEC 10-5 CONFORMAL MAPPING / 477 Fig. 10-7 The Cartesian mapping w = z 2 . (a) -6 x = 0-5- x=\ -4 y = l r ^^^ \ -6 x= 1-5 x = 2 -j>=1 -^ = 0-5 vf\ (b) 478 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 4/5 2 (/? 2 + u) in the w-plane. These equations describe two parabolas that are symmetrical about the w-axis, as shown in Fig. 10-7. The lines x = l,y = 3/2 denoted by y\ and 72, respectively, in the z-plane map into the parabolas Y\ and T 2 in the w-plane. This shows that the single point z = 1 + 3z'/2 denoted by P in the z-plane (that is, the point (1, 3/2)) maps into the pair of points P' and P" in the w-plane determined by the two points of intersection of parabolas Y\ and T 2 . Again the conformal nature of the transformation is reflected in the easily checked geometrical fact that the two families of parabolas are mutually orthogonal, as are the lines x = const, y — const in the z-plane. The more general mapping w = z n may be analysed in similar fashion, though the algebraic complexity is naturally greater. When n is integral the mapping may be seen to transform the segment < arg z < 2-rr\n into the complete w-plane with a suitable cut along the w-axis. (Care must be exercised when n is fractional for then the mapping is many valued. We shall not pursue this matter further.) 10-5 (c) The inversion w = 1/z For obvious reasons the mapping w = 1/z is called the inversion mapping. Its geometrical effect may be deduced by setting w = />e**, z = re ld to find p(cos <f> + i sin <f>) = - (cos 6 — i sin 6). Arguing as with the function w = z 2 , we then see that this implies that P =llr,4>=-6. Expressed in words, the inversion mapping w = 1/z transforms a point in the z-plane with modulus r and argument 6 into a point in the w-plane with modulus \jr and argument —d. This may be interpreted geometrically by appeal to Fig. 10-8 in which the w- and z-planes are shown superimposed with a common origin, and P is any point in the z-plane with P' denoting its image in the w-plane. The circle shown in Fig. 10-8 is the unit circle \z\ = 1, and point Q on the radius vector drawn from O to P is such that OP . OQ = 1. Hence if OP = r, then OQ = \jr. In geometrical terms point Q is said to have been obtained by inverting point P with respect to the unit circle. Point P', which is the image in the w-plane of the point P in the z-plane, is then obtained by reflecting Q in the x-axis. Thus the mapping w = 1/z corresponds to the inversion of points z with respect to the unit circle, followed by their reflection in the real axis. The inversion mapping thus maps the points interior to the unit circle about the origin of the z-plane onto the exterior of the unit circle about the origin of the w-plane, and vice-versa. The two unit circles map onto one another. Algebraically, we write w = u + iv, z = x + iy, when SEC 10-5 CONFORMAL MAPPING / 479 Fig. 10-8 Inversion in unit circle followed by reflection in the x-axis. + /' V — -y x 2 + _y 2 To learn how the line x = a in the z-plane maps onto the w-plane we need only set x — a in the expressions for u and v and then eliminate y to obtain the equation M a + v z _ _ = a Similarly, the line y = {} in the z-plane maps onto the curve in the w-plane defined by the equation W 2 + v * + = 0. P When these equations are rewritten in the form (-=)* + -(=)' and M 2 + (" + ^) ! -(^)' it is easily seen that the line x = a in the z-plane has for its image in the w- plane a circle of radius \a. with its centre at (|oc, 0), whilst the line y = ft in the z-plane has for its image in the w-plane a circle of radius \$ with its centre at (0, — \p). We may conclude that lines parallel to the x- and j-axes map onto circles in the w-plane which pass through the origin and have their 480 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 centres on the u- and u-axes. Had the general straight line y — mx + c in the z-plane been mapped, then this same form of argument would have shown that any such line not passing through the origin will transform into a circle through the origin in the w-plane. Lines through the origin in the z-plane transform into lines through the origin in the tv-plane. The verification of these remarks is left as an exercise for the reader. 10-5 (d) The bilinear transformation Any mapping of the general form az + b w = — — , (10-21) cz + d ' is called a bilinear transformation or a linear fractional transformation. The general linear transformation and the inversion mapping are special cases of the bilinear transformation. We now show that bilinear transformations are characterized by the property that they map circles and straight lines in the z-plane onto circles and straight lines in the w-plane, though not necessarily in this order. Let us now write the transformation (10-21) in the form a ad — be w = ~c ~ c* { z + mi (10 ' 22) We assume c ^ and ad — be ^ 0; this is justified since if c = the trans- formation reduces to the general linear transformation, whereas if ad — be = 0, then w reduces to a constant. So, if we define new variables z\ and z 2 by d 1 zi = z + -, z 2 = -. (10-23) C Zi then (10-22) becomes a lad — bc\ w= c-(^H z2 - (10 ' 24) We must now consider the sequential effect of the mappings that trans- form from the z-plane to the w-plane via the intermediate planes z\ and zz. The mapping from the z-plane to the zi-plane is a pure translation and thus leaves the shape and size of all curves invariant. The mapping from the zi-plane to the Z2-plane is an inversion and, as we have just seen, maps straight lines not passing through the origin onto circles, and straight lines through the origin onto straight lines. Finally, the mapping from the Z2-plane to the w-plane is a general linear transformation and so comprises a rotation and a translation. Hence, in particular, this final mapping will transform straight lines into straight lines and circles into circles. This justifies our earlier statement that the bilinear transformation maps straight lines and circles into straight lines and circles, though not necessarily in this order. SEC 10-5 CONFORMAL MAPPING / 481 Example 10-17 Find the image in the vc-plane of the circle \z\ = 2 if z — i z + i u — Solution Setting w = u + iv, z = x + iy we find that x 2 + y 2 - 1 -2x X 2 +y 2 +2 y +\ V ~ X 2 + yl + 2 y + 1 Now the circle | z | = 2 has the equation x 2 + J 2 = 4, which used in the expressions for u, v gives 3 -2x U = , V = 2y + 5 2>< + 5 Next, solving these for x and _y, we find -3v 1 /3 x = — — » v = 5 2« 7 2 \ M so that on the required circle x 2 + y 2 = 4 this pair of equations is equivalent to 3(m 2 + v 2 ) - 10m + 3 = 0. When this equation is expressed in the form it can be recognized as the equation of a circle in the iv-plane having a radius of 4/3 and its centre at the point (5/3, 0). This conclusion could have been obtained more easily by using the following argument. The equation z — I w = z + i is equivalent to 5 Y , 16 ./l + w\ Hence, as zz = .y 2 + j 2 , we have *2 + j2 = /( _ /!±^\/i±^n = i + w + w + z g \ 1 — Wf \ 1 — WJ 1 — W — W + WW In terms of w — u + iv, w = u — iv this becomes 1 + 2h + M 2 + V 2 x l + y l = 7 1 - 2m + « 2 + ^2 482 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 and, on the circle x 2 + y 2 = 4, it reduces our previous result 3(w 2 + v 2 ) - 10m + 3 = 0. 10-6 Applications of conformal mapping In any first account of the theory of conformal mapping, it is impossible to do more than merely indicate its application in science and engineering. From the fields of elasticity, electromagnetic theory, fluid mechanics, and heat conduction in which these ideas play important roles, we choose just one simple example. Our choice, from fluid mechanics, is solving the problem of the two-dimensional flow of an incompressible fluid around the interior of a wedge shaped region, on the assumption that the flow has a special property which enables it to be classified as being irrotational. These are in fact con- ditions which are usually valid in most low speed flows of ordinary fluids. In books on fluid mechanics it is established that if q\ and qz are the x and y components of velocity at a point in an incompressible inviscid fluid that is undergoing two-dimensional flow, then under the stated conditions these components may be written in the form qi = ^ V = ^ (10-25) 8x By where <f>(x, y) is a function called the velocity potential of the flow. The lines fy(x, y) = constant are called equipotentiah. Using the vector interpretation of complex numbers we may thus represent the fluid velocity q by the complex variable *-¥ + & d°- 26 ) 8x 8y It can also be established that if fluid is neither created nor lost within the flow region, then <f>(x, y) must be such that ^ + »+ - 0. (10-27) 8x 2 8y % Thus <f> satisfies Laplace's equation and so is harmonic. Introducing the harmonic conjugate of <f>, which we shall denote by ip(x, y), enables us to define a further complex variable F{z) by the equation F(z) = <Kx, y) + iy(x, y). (10-28) This is called the complex potential and xp{x, y) itself is called the stream function of the flow. Now by the nature of the construction of F(z), it is differentiable in the sense of Definition 10-6 and so satisfies the Cauchy- Riemann equations. Hence <f>x = Vy, <l>v = —Wx SEC 10-6 APPLICATIONS OF CONFORMAL MAPPING / 483 or, in terms of q\ and q 2 , qi=<j>x = y y , q^ = 4> y = -f x . (10-29) These relationships provide the justification for the name stream function, for they show that the velocity vector is everywhere normal to the curves </"(*> y) = const. This follows because on cf>(x, y) = const, <}>^dx + <f>ydy = showing that dy _ — 4>x dX (f>y Hence if n is the gradient of the normal to a curve <f>(x, y) = const, then n(dyldx) = -1, whence n = j> v \<$> x . However, from results (10-29) this is equivalent to n = q 2 lqi, which is the slope of the curve traced by a fluid particle. Hence the curves f(x, y) = const are curves along which fluid flows and so can properly be called streamlines. Consider the complex potential F(z) = Uoz, (10-30) where Co is a positive real number. Then we have at once <£ = U x, ip = U y. (10-31) The streamlines y = a are thus the lines y = a/t/ , and the velocity q is ox By Thus the complex potential F(z) = U z must characterize a uniform flow, with velocity U parallel to the x-axis and directed in the sense of increasing x. This is illustrated in Fig. 10-9 (a). II -plane iv-plane *► (a) (b) Fig. 10-9 Transformation of fluid Row: (a) uniform flow in upper half plane; (b) flow inside wedge. 484 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 Now if we consider the transformation w = z 1 ' 3 , (10-32) then we know from the arguments used in connection with the mapping w = z" that it will map the upper half of the z-plane onto the wedge < arg w < 577 in the w-plane. Then, as (10-32) is equivalent to z = w 3 , we must have x + iy = (u 3 — 3uv 2 ) + i(3u 2 v — v 3 ) giving x = u 3 - 3uv 2 , y = 3u 2 v - v 3 . (10-33) Hence the velocity potential is <£ = u (u 3 - 3uv 2 ) (10-34) and the stream function f = U (3u 2 v - v 3 ). (10-35) Thus the curves f = const define the streamlines inside the wedge shaped region, and some representative streamlines are shown in Fig. 10-9 (b). To determine the speed at any point within the wedge we use the fact that dF 8(j) 8w — = — + /—= ?i - iqi, dz ox ox showing that the speed | q \ is given by , , dF \q\ = dz (10-36) As the complex potential is F(z) = U w 3 , (10-37) we have ^=3U w 2 dz and, finally, | q | = 3t/ | w 2 | = | (u + iv) 2 | = u 2 + v 2 . (10-38) Thus at a point P with coordinates (mo, vo) within the wedge, the speed | q | = mo 2 + tfo 2 . The streamline through the point P is provided by Eqn (10-35), for the constant associated with this streamline through P must be 3«o 2 f o — ^o 3 , so that the streamline itself has the equation 3u 2 v — v 3 = 3uo 2 vo — vo 3 . PROBLEMS / 485 As mentioned at the beginning of this section, conformal mapping has many other applications, all related to solutions of Laplace's equation in two dimensions. The application described here can provide no more than an indication of one of these situations. PROBLEMS Section 101 10-1 Test the following sequences {z n } for convergence and, where appropriate, find the limit y stating whether or not it is a member of the sequence. (a) z„ = 2» + /'3-«; 3 1 (b) z n = n tan - + in sin - ; n n < c > *» = „-(_i). + 4 '; (e)z n = isin^+icos^. 10-2 Give examples of: (a) a non-convergent sequence {z„}; (b) a convergent sequence {z„} with limit 2 + 3i. 10-3 Given that the sequences {w n }, {z„} are defined by -=( 1+ -I) + '-(5^t) - *— -l + 'fci^)- find the limits of the sequences {w n + z„}, {w„z n } and {w„/z„}. 10-4 Identify the limit points of the sequence {z„} where 10-5 The general term of the sequence {z„} is _ / 2«2 + 1 \ . / «>W \ *•" (3^ + 2 w + 3J + ' COS (^nj- Find values of a for which {z„} has : (a) one limit point, (b) two limit points, and state their location. Are the values of a unique? 10-6 Construct examples of a sequence {z„} which has : (a) two limit points; (b) three limit points; (c) no limit points. 486 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 Section 10-2 10-7 Sketch each of the following curves defined in the complex plane: (a) x = s, y = \/(l - s 2 ) for -1 < s < 1 ; (b) x = a sin s, y = b cos s for < s < 2-n (a, b real) ; (c) x = cosh s, y = sinh j for — co < j < co ; (d) |z+2-/| =3; (e) zz = 4. Sketch the region defined by each of the following sets of inequalities and indicate when the boundary points belong to the region so defined. 10-8 Im(z + iz) > and Re z > 0. 10-9 2 < | z | < 3 with < arg z < \n. 10-10 1 < | z - 1 | < 2 and 1 < | z + 1 | < 2. 10-11 Sketch the region that lies inside the curve defined by arg (z + 2) - arg (z + 3) = in and is such that Im z > J. Give an alternative representation of this region. 1012 Draw the curve C defined by arg (z - - arg (z - 1) = \t*. Problem 10-13 10-13 Define the figure-eight-shaped curve shown in the diagram in terms of argu- ments of complex numbers. The curves Ci and Cz are arcs of circles with centres Oi and O2, respectively. 10-14 Sketch a simply shaped region in the complex plane and define it: (a) parametrically; (b) directly in terms of z. PROBLEMS / 487 Section 103 10-15 For what values of z are the following complex functions defined : (a) w = z 2 + iz + 1 ; (b) w = (z - l)/(z - 2); (c) (z + l)(z - i)(z 2 + 4); (d) h> = sinh z. 10- 16 If/(z) = u + iv, find the expressions for the functions u, v in terms of x, y given that: (a) /(z) = z 2 + zz + 1 ; (b) /(z) = £±i; (c) /(z) = cosh z; (d) /(z) = cos z. 1017 Given the following forms of /(z) deduce their value if z = 1 + 2i: (a) /(z) = x 2 + 3xj + iy 2 ; x 2 + 2;> + 1. x 2 + y (c) /(z) = sin y (x 2 - ;> 2 ) + / cos ^f (x 2 + if), (")/(*)= r24 ., ;2 1018 Use Definition 10-4 to prove lim (2z 2 - 1) = -(1 + 4/). z— 1 -i 1019 Use Definition 10-4 to prove Hm (£=!)- -6. z— 8/2 \2z + 3 J 10-20 Use Definition 10-4 to prove (2 - /z)(z 2 - 1) hm — r; — u — = 2( - 2 - ')• z— 1 (.Z — 1) 10-21 Given that/(z) = z 2 + z - 2,g(z) = z + 2 deduce: (a) lim[/(z) + 2 (? (z)]; z— 2 (b) limf(z)g(z); z->-i (0 Hm M z-i-2^0) 10-22 Prove that lim fc\ = 1*1-0 \ z J by considering lim 1*1 o \ zz / writing z = x + iy, and then arriving at the result by displaying the function whose limit is to be considered in terms of its real and imaginary parts. 488 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 Deduce that lim |s|-*o (sin az\ — )-"■ where a may be a complex number. 10-23 Use the result lim l^) = 1 UI-o\ z / established in Problem 10-22 above together with the identity — cos 2 z (sin z\ 1 (l + : to prove that lim (Lz™l\ = o. 10-24 For what value of a is the function 3z for z t^ i for z = i, continuous at z = /. 10-25 Give an example of a function /(z) that : (a) is continuous everywhere ; (b) has a limit 3 + 2i as z — >- 1 + i, but is not continuous at z = 1 + /'. 10-26 Use Definition 10-5 to give a direct proof that /(z) = z 2 is continuous everywhere. 10-27 Use the trigonometric identity Iz + zo\ . /z - zo\ sin z — sin zo = 2 cos I — - — I . sin I — - — I and the last result of Problem 10-22 above to give a direct proof that/(z) = sin z is continuous for all z. 10-28 Give reasons to justify the assertion that /(z) = z sin (z 2 + 3z + 2) + l/(z + 2 - /) is continuous everywhere except at z = — 2 + i. Section 10-4 10-29 Use Definition 10-6 to prove that if w = az 2 , where a is any constant, then dtv — = 2az dz for all z. 10-30 Use Definition 10-6 to prove that if /(z) is a differentiable function of z in some region, then in that region & WW -& + >%■ PROBLEMS / 489 10-31 By using the series representation of the hyperbolic sine function prove that /sinh z\ Po(— J" Then, using the identity sinh zi — sinh Z2 = 2 sinh [(zi — z 2 )/2] cosh [{zi + z2>/2], which may be derived directly from identity (6-29), show by means of Definition 10-6 that if w = sinh z, then dw -r- = cosh z dz for all z. 10-32 Show by means of Definition 10-6 that the function /(z) = | z | is not differ- entiable at the origin. Find the limiting value assumed by the difference quotient at the origin (that is, with zo = 0) as h -»■ along the line y = he. 10-33 Determine which of the following functions /(z) satisfy the Cauchy-Riemann equations : (a) /-(z) = z 3 -;z 2 + 3; (b)/(z) = cosh(z + 3/); (c) /(z) = z sin z + zz; (d) /(z) = (*3 - 3xy 2 ) + iQx 2 v - y 3 ); (e) /(z) = z(r + z)/2; (f ) /(z) = sinh 3x cos j + i cosh 3x sin j\ 10-34 Find the points, if any, at which the following functions are not analytic: (a) /(z) = 3z + sinhz; (b) /(z) = z\(z + 2); (c) f(z) = cos 1/z; (d)/(z) = |^. 10-35 Find the values of the constants a and b in order that the functions w should satisfy the Cauchy-Riemann equations : (a) w = a sin x cosh Z>/ + /2 cos jr sinh/; (b) w = x 3 - oxy 2 - x + 1 + /'(3^ 2 - by 3 - 1). 10-36 Using the method outlined in the text, show that if x = r cos 8, y = r sin 9, then the polar form of the Cauchy-Riemann equations is : — = l— a \ d JL- 8v dr ~ r' dd 7' dd Jr 10-37 Determine which of the following functions /(z) satisfy the Cauchy-Riemann equations : (a) w = (r 2 cos 2 6 + 2) + ;> 2 sin 2 0; (b) w - (r 3 cos 36 + 2r cos 6 + 4) + i(r 3 sin 36 + r sin «); (c) w = |r + -| cos d + i (r - -\ sin 9; (d) w = r 2 cos 2 9 + /V 2 sin 2 6 + 4; (e) w = sin (r cos 9) . cosh (r sin 9) + / cos (r cos 9) . sinh (r sin 9). 490 / FUNCTIONS OF A COMPLEX VARIABLE CH 10 10-38 Find the values of the constants a, b, and c in order that the following functions should satisfy the Cauchy-Riemann equations: (a) w = a log r + i(6 + br); (b) w = r a cos £0 + ibr c sin \ d. 10-39 Verify that the following functions w satisfy the Cauchy-Riemann equations and in each case express the derivative of w as a function of z: (a) w = (x z — 3xy 2 + y) + i(3x 2 y — y 3 — x); (b) w = (x sinh x cos y — y cosh x sin y) + i(y sinh x cos y + * cosh x sin _y) ; (c) w = e ax (cos ay + i sin ay). 10-40 Find which of the following pairs of functions are harmonic conjugates. Deduce the representation of w = u + iv in terms of z for the pairs that are harmonic conjugates, first by using Rule 2 (a), and then by using Rule 2 (b) : (a) u = x 2 — y 2 + 1y, v = 2x(j — 1); (b) u = sin x cosh j, v = cos x sinh y; (c) u = x sin x cosh y — y cos x sinh j, y = — (x cos x sinh y + y sin x cosh j) ; (d) u = sinh x cos j, t> = cosh x sin y. 10-41 Show by differentiation that v = x 2 — y 2 + 2y is harmonic and deduce its harmonic conjugate u. Express the function w = u + iv, and its derivative, in terms of z. 10-42 Show by differentiation that u = cosh x cos y is harmonic and deduce its harmonic conjugate v. Express the function w = u + iv, and its derivative, in terms of z. Section 10 5 10-43 Sketch the images in the >v-plane of the line y = 2x — 1 in the z-plane that result from the mappings : (a) w = iz — (2 + i) ; (b) tv = 2z + 3; (c) w = (l + 7)2+ 1. 10-44 Determine the images in the iv-plane of the circle | z — 1 | = 1 in the z-plane that result from the mappings : (a) w = Iz — i; (b) w = (i - 1)2 + 2. In each case shade the regions in the w-plane that correspond to the interior of the circle | z — 1 | = 1. 10-45 Sketch the region in the iv-plane corresponding to the region x > 2, y < x in the z-plane given that w = (2i - X)z + (1 + /). 10-46 Determine the equation of the line in the tv-plane which is the image of the line x = 1 in the z-plane under the mapping w = z 3 . 10-47 Give an algebraic proof that if c =£ 0, then the general straight line y = tnx + c in the z-plane is mapped by the transformation w = \\z onto a circle in the w-plane. PROBLEMS / 491 10-48 Find the image in the w-plane of the circle \ z\ = 2 if 2z + ;' w = -• Z — 51 10-49 Show that w = e z maps the straight lines y = const in the z-plane onto straight lines through the origin in the tv-plane, and the straight lines x = const in the z-plane onto circles about the origin in the w-plane. 10-50 Locate the critical points of w = sin z and show that it maps the region — Jw < x < Jtt, y > in the z-plane onto the upper-half of the H'-plane. Section 10-6 y z - plane U k \* Problem 10-51 10-51 Using the argument given in the text, show how the complex potential F(z) = Uoz and the mapping w = z 1 ' 2 may be used to find the streamlines indicated in the figure. Find the speed of flow at a point P with coordinates (wo, vo) and determine the streamline and the equipotential through P. Scalars, vectors, and fields 11-1 Curves in space If the coordinates (x, y, z) of a point P in space are described by *=/('), y=g(t), z = h(t), (l 11) where/, g, h are continuous functions of t, then as t increases so the point P moves in space tracing out some curve. It follows that Eqns (11-1) represent a parametric description of a curve V in space and, furthermore, that they define a direction along the curve V corresponding to the direction in which P moves as 1 increases. For example, the parametric equations x = 2 cos 2rrt, y = 2 sin 277?, z = 2t, for < / < 1 describe one turn of a helix, as may be seen by noticing that the projection of the point P on the (x, j)-plane traces one revolution of the circle x 2 + y 2 = 4 as t increases from t = to t = 1, whilst the z-coordinate of P steadily increases from z = to z = 2. If we now denote by r the position vector OP of a point P on T relative to the origin O of our coordinate system, and introduce the triad of ortho- gonal unit vectors i, j, k used in Chapter 4, it follows that (Fig. 11-1) r=m + g(t)i + h(t)k. (11-2) Expressions of this form are called vector functions of one real variable, in which the dependence on the parameter / is often displayed concisely by writing r = r(f). The name vector function arises because r is certainly a vector and, as it depends on the real independent variable t, it must also be a function in the sense that to each / there corresponds a vector r(f). Knowledge of the vector function r(/) implies knowledge of the three scalar functions /, g, and h, and conversely. The geometrical analogy used here to interpret a general vector function r(r) is particularly valuable in dynamics where the point P(?) with position vector r(/) usually represents a moving particle, and the curve Y its trajectory in space. Under these conditions it is frequently most convenient if the parameter t is identified with the time, though in some circumstances identi- fication with the distance s to P measured along T from some fixed point on T is preferable. Useful though these geometrical and dynamical analogies are, we shall in the main use them only to help further our understanding of general vector functions. SEC 11-1 CURVES IN SPACE / 493 z = h(t) *=m Fig. 11-1 Vector function of one variable interpreted as a curve in space. The name vector function suggests, correctly, that it is possible to give satisfactory meanings to the terms limit, continuity, and derivative when applied to r(t). As in the ordinary calculus, the key concept is that of a limit. Intuitively the idea of a limit is clear: when we say u(?) tends to a limit v as / -»■ to, we mean that when t is close to to, the vector function u(/) is in some sense close to the vector v. In what sense though can the two vectors u(t) and v be said to be close to one another? Ultimately, all that is necessary is to interpret this as meaning that | u(t) — v | is small. So, we shall say that u(/) tends to the limit v as t ->■ to if, by taking / sufficiently close to to, it is possible to make | u(/) — v | arbitrarily small. As with our previous notion of continuity we shall then say that u(t) is continuous at to if lim u(t) = v and, in addition, u(/o) = v. We incorporate these ideas t-*t into a formal definition as follows : definition 111 (vector functions — limits and continuity) Let u(t) = ui(t)i + «2(0j + «3(?)k and v = ni + ^2J + J>3k, then if for any e > there is some number d such that | u(0 — v | < e when \ t — t \ < d, we shall say that u(7) tends to the limit v as t —*■ to, and write lim u(f) = v. 494 / SGALARS, VECTORS AND FIELDS CH 11 If in addition u(/ ) = v, then u(Y) will be said to be continuous at / = to- A vector function that is continuous at all points in the interval a<t<b will be said to be continuous throughout that interval. As usual, a vector function that is not continuous at t = to will be said to be discontinuous. It is obvious from this definition that u(f) can only tend to the limit v as t — »• to if the limit of each component of u(f) is equal to the corresponding component of the vector v. Thus the limit of a vector function of one variable is directly related to the limits of the three scalar functions of one variable u\(t), m(t), and m{t). This is proved by writing I u(0 - v | = [(«i(0 - i7i)2 + (w a (0 - f; 2 ) 2 + («s(0 - i*) 2 ] 1 ' 8 , showing that | u(t) — v | < e as t — *■ to is only possible if lim (m(t) - v ( ) = for / =1,2, 3, <— to or lim ui(t) = vi, lim u^it) = v 2, lim uz(t) = vz. t-*tQ t-*t() t—*t() A systematic application of these arguments enables the following theorem to be proved. theorem 11-1 (continuous vector functions) If the vector functions u(0, v(0 are defined and continuous throughout the interval a<t<b, then the vector functions a(t) + y(t), u(t) x v(f), and the scalar function u(0 . \(t) are also defined and continuous throughout that same interval. Example 11-1 At what points are the vector functions u(t), \(t) discontinuous if u(f) = sin ti + sec t\ H k, v(f) = ti + (1 + t 2 )\ + e'k. Verify by direct calculation that u(/) + v(f), u(?) . v(0, and u(/) x v(t) are continuous functions in any interval not containing a point of discontinuity of u(?) or v(t). Solution The i component of u(f ) is defined and continuous for all t , whereas the j component is discontinuous for / = (2n + 1)^77 with n = 0, ±1, ±2, . . . and the k component is discontinuous for the single value / = 1. All three components of v(0 are continuous for all t. We have by vector addition u(0 + v(r) = (t + sin t)i + (1 + r 2 + sec t)\ + ( e* + ——r) K SEC 11-1 CURVES IN SPACE / 495 showing that the components of u(?) + v(f) give rise to the same points of discontinuity as the function u(t). We may thus conclude that the vector sum is continuous throughout any interval not containing one of these points. For example, u(0 + \(t) is continuous in both the open interval (\tt, 3tt/2) and the closed interval [5, 7] but it is discontinuous in (0, 77). The scalar product u(?) . v(f) is given by at u(r) . v(r) = t sin t + (1 + t 2 ) sec t + 0-1) which is, of course, a scalar. Again we see by inspection that the scalar product is continuous in any interval not containing a point of discontinuity of u(r). The vector product u(f) x \(t) is u(r) x v(r) = * J k sin t sec t 1/0 — 1) t 1 + t 2 e* giving, u(r) x v(/) = I e ( sec / 1 + fi t- 1 + t- 1 — e f sin / j + [(1 + t 2 )sint — ?sec?]k. Here also inspection of the components shows that the vector product is continuous in any interval not containing a point of discontinuity of u(r). The following definition (interpreted later) shows that, as might be ex- pected, the idea of a derivative can also be applied to vector functions of one variable. definition 11-2 (derivative of vector function) Let a(t) be a continuous vector function throughout some interval a < t < b at each point of which the limit lim A<— u(f + Ar) - u(Q At is defined. Then u(/) is said to be differentiate throughout that interval with the derivative du u(t + A/) — u(/) — = lim dt A (-o A? The geometrical interpretation of the derivative of a vector function of a real variable is apparent in Fig. 11-2. In that figure the curve T is described 496 / SCALARS, VECTORS AND FIELDS CH 11 u~M r'«+ A'W rf „ ,j&~ o Fig. 11-2 Geometrical interpretation of du/dt. by a point P(0 with position vector u(t) relative to O. The point denoted by P'(t + At) is the position assumed by u at time t + At, so that OP = u(t), OP' = u(; + At), and PP' = Au is the increment in u(?) consequent upon the increment A? in t. It is obvious that as At -*■ 0, so the vector Au tends to the line of the tangent to the curve V at P(t) with Au being directed from P to P'. To inter- pret du/dt in terms of components when u(0 = ui(t)i + W2WJ + "3(0k, we need only observe that — = lim at a«— o = lim A(--0 u(t + At) — u(0 _ ui(t + AQ - ui(t) ' At i + lim A«->0 ~u 2 (t + At) - m(t)' J + lim A«-»0 At ' u 3 (t + AQ - u 3 (t) ~ At from which it follows that du dwi . d«2 . d«3 d7 = "d7 1 + "d7 J+ d7 (11-3) The unit vector T that is tangent to T at P(?) and points in the direction in which P(0 will move with increasing t is obviously T = du dt du d7 (11-4) If 5 is the distance to P measured positively in the sense P to P' along F from some fixed point on that curve (Fig. 11-2), then we know from our work with differentials that d«i = u\dt, d«2 = u'2df, duz = u'zdt. Now as the differentials d«i, d»2, d«3 are mutually orthogonal and represent the increments in the coordinates [ui(t), mit), U3(t)] of P to an adjacent point distant d* away along T with coordinates [ui(t + dt), m 2 (/ + dt), « 3 (f + dt)], SEC 11-1 CURVES IN SPACE / 497 we may apply Pythagoras' theorem to obtain (ds) 2 = OAd/) 2 + (u'odt) 2 + (u' 3 dt)\ whence ds dt VdwA 2 /dw 2 \ 2 /dw 3 \ 2 l \1t) + \~dt) + \d7J J' Comparison of Eqns (11-3) and (11-5) then gives the result du d7 ds d7 (11-5) (116) from which we see that if / is regarded as time, then the vector function v = du/d? is the velocity vector of P(?) as it moves with speed dsjdt along T in the direction of T. These results merit recording as a theorem. theorem 11-2 Let u(t) = ui(t)i + u 2 (t)'} + uz(t)k be a differentiable vector function of the real variable t, then du d«i . du2 d«3 dt dt dt ' dt If T denotes the curve traced out by the point P(r) with position vector u(7) as t increases, and s is the distance to P(/) measured along Y from some fixed point, then d.? dt and the unit tangent T to the curve T at P(?) oriented in the sense of increasing t is du T = du\ dill dt As a consequence of this theorem we may write du ds /du\ d; ~ d; \dt) du d7 = — T dt ' (11-7) which is a result of considerable use in dynamics when / is identified with time. Higher order derivatives such as d 2 u/d/ 2 and d 3 u/d/ 3 may also be defined in the obvious fashion as d 2 u/d? 2 = (d/dO(du/d?), d 3 u/d/ 3 = (d/d0(d 2 u/d? 2 ) provided only that the components of u(f) have suitable differentiability properties. Thus, for example, if the second derivatives of the components of u(0 exist we have d 2 u d 2 «i d 2 «2 d 2 «3 dt 2 dt 2 d/ 2 d? 2 (11-8) 498 / SCALARS, VECTORS AND FIELDS CH 11 We have seen that if t is identified with time and u(t) is the position vector of a point P, then du/dr is the velocity vector of P. It follows from this same argument that d 2 u/d? 2 is the acceleration vector of P. Example 11-2 The position vector r of a particle at time t is given by t = a cos mti + a sin cot) + oc? 2 k, where i, j, k have their usual meanings and a, to, and a are constants. Find the acceleration vector at time t, and deduce the times at which it will be perpendicular to the position vector. Hence deduce the unit tangent to the particle trajectory at these times. Solution By making the identifications u = r, m(t) = a cos cot, w 2 (0 = a sin cot and uz(t) = at 2 and then applying Theorem 11-2 we find that the velocity vector is dr — = — aco sin coti + am cos mti + 2<xtk. at A further differentiation yields the required acceleration vector d 2 r — — = — aco 2 cos coti — am 2 sin cot\ + 2ak. dt 2 J Expressed vectorially, the condition that r and d 2 r/df 2 should be perpendicular is simply that r . (d 2 r/d/ 2 ) = 0. Hence to find the time at which this condition is satisfied we must solve the equation (a cos coti + a sin mt\ + at 2 k) . (—am 2 cos mti — aco 2 sin cot] + 2ock) = 0. Forming the required scalar product gives — a 2 co 2 cos 2 mt — a 2 m 2 sin 2 mt + 2<x 2 t 2 = which immediately simplifies to a 2 u 2 = 2a 2 / 2 , showing that the desired times are aco a.\/2 To deduce the unit tangent T at these times we use the fact that dr T where here -® dt SEC 11-1 CURVES IN SPACE / 499 = V(« 2c ° 2 + 4t * 2 ' 2 )- Denoting by T±, the unit tangent to the trajectory at t = ±amJa\/2, we find by substitution of these values of / in the above expression that 1 / . aw 2 . am 2 \ and 1 / . aco 2 aw* . aw 2 ■ i + cos — — j — \/2 k t \/2 «V2 - With the obvious differentiability requirements, if u(f) and v(/) are differ- entiable vector functions with respect to t, then so also are u + v, u . v, u x v, and <f>u, where <f> = <f>(t) is a scalar function of t. As the following theorem is easily proved by resolution of the vector functions involved into component form, it is stated without proof. theorem 11-3 (differentiation, sums and products of vector functions) If u(f) and v(/) are differentiate vector functions throughout some interval a < t < b and cf>(t) is a differentiable scalar function throughout that same interval then, , , d , N du dv , ^ d t s dv du (c) d-r (u - v) = u 'd^ + d7- v; , .s d , . dv du (d)-(uxv) = ux-+-xv; and, if c is a constant vector, (e)ic = 0; where the order of the vector products on the right-hand side of (d) must be strictly observed. When considering the geometry of twisted curves in space it is convenient to identify points on a curve T by specifying their distance s measured along the curve itself from some fixed point. This is of course equivalent to identi- fying / with s in the position vector r(?) so that T is then defined as the locus 500 / SCALARS, VECTORS AND FIELDS CH 11 of the points having the equation r = r(s). This equation is called the intrinsic equation of the curve T. In terms of the intrinsic equation it follows from Eqn (11-7) that the unit tangent T to the curve T at r = r(s) is T = -- (11-9) Now although T is a vector function of s, it is also a unit vector, and so T . T = 1 . Differentiating this scalar product with respect to s by means of Theorem 11-3 (c) then gives dT dT n T + T .— =0 as as or, as vectors in a scalar product commute, T ■ — =0. as Hence, provided dT/ds =£ 0, the derivative of the unit tangent T with respect to 5 is normal to T. Next, denoting by N the unit vector along dT/ds, we define the essentially positive scalar function k = k(s) by means of the equation dT — =/cN. (11-10) ds Here k is called the curvature of the curve at the point in question, and on account of the relationship between T and N, the vector N is called the principal unit normal to the curve Y at that point. As k is positive by definition and N is a unit vector it follows from Eqn (11-10) that dT ds (11-11) It is convenient to define a third and mutually orthogonal unit vector B called the unit binormal by means of the equation B = TxN. (11-12) The three unit vectors B, T, and N are, in general, all functions of s and they serve as a specially useful triad of mutually orthogonal unit reference vectors at points on the curve Y. It is important to appreciate that in general B, T, N, and k vary from point to point on the curve Y, being always defined in relation to the local properties of the curve in question. The positive number p = l//c defined at each point of the curve Y is called the radius of curvature of the curve at that point. Example 11-3 Find B, T, N, and the scalars k, p for the curve defined SEC 11-1 CURVES IN SPACE / 501 parametrically in terms of / by the expression r = 2 cos (/ + /n)i — 2 sin (t + //)j + 4tk, where /u is a constant. Hence deduce the values of these quantities at the point on the curve corresponding to t = 0. Solution First notice that t is not the arc length s along the curve, because were this the case then it would follow that ds/dt = 1, whereas from Eqn (11-5) we have ds — = V[4 cos 2 (r + /*) + 4 sin 2 (/ + /*)+ 16] = 2^5. Now, using Eqn (11 -9) we have dr dr dt T = ds dt ds -o/a whence T Thus 2V5 \dt) 1 d T = 2V5 d? (2 C0S (t + ^ ~ 2 Sin ( ' + ^ + 4tk) (-2 sin (f + fi)i — 2 cos (f + ^)j + 4k), 2V5 and so T = ^- (sin (f + /u)i + cos (t + /j)\ - 2k). Next, to find N and k we write Eqn (11-10) as ~ ds ~ dt 'ds~ \dt)l\di)' Hence = _J_ d_ / -sin (t + ju)i - cos (t + n)\ + 2k \ 2V5drl V5 J _ 1 / —cos (t + fi)i + sin (t + p)j \ Using «: = | dT/ds |, it then follows that 502 / SCALARS, VECTORS AND FIELDS CH 11 -cos (; + [i)i + sin (t + /u)\ l_ 10 10 and, consequently, that N = —cos (t + n)i + sin (t + /u)\. Since the radius of curvature p is defined by the relationship p = \Jk, we have p = 10. Finally, using the definition B = T x N gives ■2k) x X (-cos (t + fi)i + sin (t + ,u)j), B = — (sin (t + fj)i + cos (t + [£)\ — 2k) x whence B = (2 sin (t + n)\ + 2 cos (/ + fi)j + k). The point on the curve corresponding to t = is r(0) = 2 cos //i — 2 sin /*j, and at this point: T(0) = — (sin /x\ + cos fi\ — 2k), VJ N(0) = — cos jA + sin fi\, B(0) = — (2 sin fii + 2 cos /*j + k). The curvature k = 1/10 is independent of t, and so k is the same for all points on the curve, as is the radius of curvature p = \Jk = 10. Thus far we have defined the triad of unit vectors B, T, and N which serve as a moving set of reference vectors along the curve Y. We have also cal- culated the derivative dT/ds, and to complete our examination of these vectors it only remains for us to find dB/ds and dN/ds. For our starting point we take Eqn (11-12), which we differentiate with respect to s, using Theorem 11-3 (d), to obtain dB dT _ , dN — = — xN + Tx — as as as which, on account of Eqn (11-10), reduces to dB dN ds ds Next, forming the vector product of this equation with N and expanding the SEC 11-1 CURVES IN SPACE / 503 resulting triple vector product on the right-hand side gives dB / dN\ dN However as N is a unit vector it follows, as in the derivation of Eqn (11-10), that N . (dN/ds) = 0, whilst the orthogonality of N and T implies that N . T = 0. Thus, dB N x — = 0, as and hence the vectors N and dBjds must be parallel, differing only by a scalar factor. This scalar factor is usually a function of 5 and it is called the torsion of the curve T. Torsion is conventionally denoted by — t, so we can write dB — = -tN. (11-13) ds If required, the torsion t may be calculated by using the obvious result dB T =-N-— • (11-14) as See Problems 11-16 to 11-18 for an alternative treatment of the calculation of p and t. The manner of construction of B, T, and N is such that they form a right- handed set in this order and, consequently, B = TxN, T = NxB, N = BxT. (11-15) This relationship is indicated in Fig. 1 1 -3 for a point P on the curve T. To find dN/ds we differentiate the last result of Eqn (11-15) with respect to s, and use Eqns (11-10), (11-13) together with the other results of Eqn (11-15) to obtain, dN dB ^ dT — = — XT + Bx— = -tN X T + /cB X N, ds ds ds whence dN — =rB-*T. (11-16) The study of the geometrical properties of space curves using the calculus techniques is called the differential geometry of curves, and it has as its basis the three equations dT T dB dN -dS = K ™> d7=" TN ' Ts =tB - kT > (1H7 > 504 / SCALARS, VECTORS AND FIELDS CH 11 Fig. 11-3 Moving triad of reference vectors. which are called the Serret-Frenet equations. Naturally, similar ideas lead to the differential geometry of surfaces, though we shall make no further use of such ideas in this first account of the subject. Example 11-4 Find the torsion of the circular helix of Example 11-3. Solution In the previous example it was shown that dsjdt = 1/(2 \/5) and N = — cos (t + /x)i + sin {t + (i)\, B = (2 sin (/ + (i)i + 2 cos (t + fi)\ + k). Hence, dB /dB\ l/ds\ 1 , . , d7 = (d7)/(d7J = 5 (cos( ' + ^ ), - sin(/ + ^ ) - An application of Eqn (11 14) gives t = — J [- cos (t + ,a)i + sin (t + fj,)\] . [cos {t + fi)i — sin (t + /j)\] = ^. This result might have been anticipated, for the circular helix in question is similar to a screw thread with a constant pitch, and consequently its curvature and twist properties must be the same at all points. 11-2 Antiderivatives and integrals of vector functions The notion of an antiderivative, already encountered in Chapter 8, extends SEC 11-2 ANTIDERIVATIVES AND INTEGRALS OF VECTOR FUNCTIONS / 505 naturally to a vector function of a real variable. definition 11-3 (antiderivative — vector function) The vector function F(0 of the real variable t will be said to be the antiderivative of the vector function f(<) if Naturally, an antiderivative F(t) is indeterminate so far as an additive arbitrary constant vector C is concerned, because by Theorem 11-3 (e), dC/dt = 0. Continuing the convention adopted in Chapter 8, the operation of antidifferentiation with respect to a vector function of the single real variable t will be denoted by J, so that Jf(/)df.= F(/) + C, (11-18) where C is an arbitrary constant vector. It is obvious that Eqn (11-18), when taken in conjunction with Theorem 11-2, implies the following result. theorem 11-4 (antiderivative of vector function) If ff(0d/ = F(0 + C, where f(/) = /i(f)i +/ 2 (0j + fa(t)k, F(f) = Fx{t)i + F 2 (t)\ + F 3 (t)k and C = Cii + C2J + C3I1 is an arbitrary constant vector, then SMQdt = Fi(t) + C it i = 1, 2, 3 with IT =m - Expressed in words, the antiderivative of f(/) has components equal to the antiderivatives of the components of f(t). As with the scalar case, in many books the entire right-hand side of Eqn (11-18) is loosely referred to as the indefinite integral of the vector function f(0, rather than as here using this term to refer only to its first member. Example 11-5 Find the antiderivative of f(?) given that f(0 = cos ti + (1 + * 2 )j + e-«k. Solution It follows immediately from Theorem 1 1 -4 that, J f(t)dt = i J cos / dt + j J (1 + t z )dt + k J e-« dt sin ti + 1 1 + - j j - e- ( k + C. 506 / SCALARS, VECTORS AND FIELDS CH 11 The obvious modification to Theorem 11-4 to enable us to work with definite integrals of vector functions of a single real variable comprises the next theorem. Because it is strictly analogous to the scalar case it is offered without proof. theorem 11-5 (definite integral of vector function) If F(/) is an anti- derivative of f(t), then ■b f{t)dt = F(b) - F(a). r Example 11-6 Evaluate the definite integral 'It (t 2 \ + sec 2 t\ + k)df. Jo Solution From Theorem 11-5 we have the result (t 2 i + sec 2 t\ + k)dt = ( '- i + tan ?j + k?| = T^i + J + i-k. A slightly more interesting application of a definite integral is provided by the following example concerning the motion of a particle in space. Example 11-7 A point moving in space has acceleration sin 2ti — cos 2tk. Find the equation of its path if it passes through the point with position vector ro = j + 2k with velocity 2j at time t = 0. Solution If r is the general position vector of the point at time t, then the velocity v(t) = di/dt and the acceleration a(f) = d 2 r/d/ 2 . Hence d 2 r — = sin 2ti — cos 2/k, at* so that integrating the acceleration equation from to t and replacing t in the integrand by the dummy variable t gives (jl) dT= (sin 2ri — cos 2Tk)dT Hence ( dr) = — £(cos 2ri + sin 2rk) o and so SEC 11-2 ANTIDERIVATIVES AND INTEGRALS OF VECTOR FUNCTIONS / 507 v(0 = vo+ 1(1 - cos 2t)i ~ I sin 2tk. Now from the initial conditions of the problem vo = 2j, so that the velocity equation becomes v(0 = 1(1 - cos 2t)\ + 2j — \ sin 2/k. To find the equation of the path a further integration is required so, setting v(/) = dr/df, integrating the velocity equation from to / gives |_£ j dr = j (i(l - cos 2r)i + 2j - \ sin 2rk)dr. Hence = iO - \ sin 2r)i + 2rj + \ cos 2rk) r(r) and so r(/) = r + \{t - \ sin 2/)i + 2/j + J(cos 2/ - l)k. Again appealing to the initial conditions of the problem we find that ro = j + 2k, so that, finally, the particle path must be r(0 = 1(7 - \ sin 2t)\ + (1 + 2t)\ + \{1 + cos 2f)k. The form of definite integral of a vector function so far considered is itself a vector. We now discuss one final generalization of the notion of a definite integral involving a vector function that generates a scalar. Let a curve T denned parametrically in terms of the arc length s have the general position vector r = i(s) and unit tangent vector T(s), and let F(s) be a vector function of s. Then at any point of T the scalar function <f>(s) = F(s) . T(s) represents the component of F(s) tangential to T. If the scalar function <f>(s) is then integrated from s = a to s = b, this is obviously equiva- lent to integrating the tangential component of F(s) along T from the point r = r(a) to the point r = r(b). An integral of this form is therefore called either a line integral or a curvilinear integral of the vector function F(s) taken along the curve T, which is sometimes referred to as the path of integration. definition 1 1 -4 (line integral of vector function) The line integral of the vector function F(s) taken along the curve T between the points A and B with position vectors r = r(a) and r = t{b), respectively, is the quantity rb /»« J= </>(s)ds= F.Tds, Ja Ja where <f>(s) = F(s) . T(s), s denotes arc length along T, and T(j) is the unit tangent vector to I\ 508 / SCALARS, VECTORS AND FIELDS CH 11 In terms of the general position vector r of a point on the curve and the fact that s is the arc length along V, we obviously have the relationship dr = T ds, so that the line integral may also be written -r F.dr or, more simply still if T denotes part of a curve, as /-JY*. In component form, setting the differential dr = dxi + dy\ + dzk and F = Fii + Fz\ + F3IC, we have at once j F . dr = f Fi dx + F 2 dy + F 3 dz. (11-19) If desired, the line integral (11 19) may be defined vectorially in terms of the limit of a sum in a manner strictly analogous to the definition of an ordinary definite integral. To achieve this, let the interval a < s < b be divided into n sub-intervals Si-i<s<su with i = 1, 2, . . ., n, where so = a and s n = b. Then setting dr* = r(s e ) — r(^-i) as in Fig. 11-4, the line Fig. 11-4 Line integral of F along F. SEC 11-3 SOME APPLICATIONS / 509 integral (11-19) may be approximated by the sum J n = 2 F(j<) . diy. (11-20) If the number of sub-divisions n is now allowed to tend to infinity in such a manner that the lengths of all the sub-divisions tend to zero then, as with an ordinary definite integral, we arrive at the result 'II n F.dr = lim £ F(^) • dr«. (11-21) r n— »oo i=l When used in this context, the differential dr« is usually called a line element of the curve T joining A to B. Example 11-8 Evaluate the line integral F.dr, I given that F = yz\ + xz] + 2xyk and T is that part of the circular helix x = a cos t, y = a sin t, z = kt that corresponds to the interval < t < 2tt. Solution First we use Eqn (11-19) to write the line integral as F . dr = yz dx + 2xz dy + xy dz. Now along the path V we have the relationships x = a cos t, y = a sin t, z = kt which imply the differential relationships dx = —a sin t dt, dy = a cos t dt, dz = k dt. Hence /-'•*-!! 2tt (—a 2 kt sin 2 / + 2a 2 kt cos 2 1 + a 2 sin / cos t)dt „, T* 2 /sin 2* cos2f| 2 " + = aWk. t 2 t sin 2t + 4 4 cos 2f ■-|2l7 cos 2/ 11-3 Some applications Kinematics, an important branch of mechanics, is essentially concerned with the geometrical aspect of the motion of particles along curves. Of particular 510 / SCALARS, VECTORS AND FIELDS CH 11 Fig. 11-5 Planar motion of particle in terms of polar coordinates. importance is that class of motions that occur entirely in one plane, and so are called planar motions. In many of these situations, for example, particle motion in an orbit, the position of a particle is best denned in terms of the polar coordinates (r, d) in the plane of the motion. Let us then determine expressions for the velocity and acceleration of a particle in terms of polar coordinates. We first appeal to Fig. 11-5, which represents a particle P moving in the indicated direction along the curve I\ The unit vectors R, are normal to each other and are such that R is directed from O to P along the radius vector OP, and points in the direction of increasing 6. Then clearly R and are vector functions of the single variable d, with R = cos 0i + sin 6\ and = — sin 0i + cos 0j. It follows from these relationships that dR d0 = and d0 = -R. (11-22) (11-23) In terms of the unit vectors R, the point P has the position vector r = /-R, (11-24) so that the velocity drjdt must be dr dr dR dt dt dt dr dR dd -dt* + r Td~d? showing that the velocity vector of P is T = rR + r60, (11-25) SEC 11-3 SOME APPLICATIONS / 511 where differentiation with respect to time has been denoted by a dot. Here the quantity f is called the radial component of velocity and r& is called the transverse component of velocity. A further differentiation with respect to time yields for the acceleration vector f = d 2 r/df 2 the expression r = FR + rR + rOQ + rdQ + rtiQ or t = rR + f6^ + (rd + r0)0 + r6 2 ~ do do Hence by Eqn ( 1 1 -23) this is seen to be equivalent to r = (r - r0 2 )R + (2r6 + rd)Q. (1 1-26) The quantity f — rd 2 is called the radial component of acceleration, and 2rfi + rd is called the transverse component of acceleration. Example 11-9 A particle is constrained to move with constant speed v along the cardioid r = a(l + cos 6). Prove that v = 2a$cos (-)» and show that the radial component of the acceleration is constant. Solution From Eqn (11-25) and the expression r = a{\ + cos 6), it follows that the velocity vector r is given by r= -a sin 6&R + o(l + cos 0)00. Now as v 2 = i -2 = r . r, we have v 2 = a 2 & 2 sin 2 6 + a 2 6 2 (l + cos 0) 2 = 2a 2 6 2 (l + cos 0). Using the identity 1 + cos = 2 cos 2 (0/2) in this expression and taking the square root yields the required result v = 2a0 cos (0/2). To complete the problem we now make appeal to the fact that the radial acceleration component is f — r6 2 , whilst by supposition v = constant. From our previous working we know that v 2 = 2a 2 2 (l + cos 0), so that differentiating with respect to t and cancelling 6 gives .. _ 2 sin ~ 2(1 + cos 0) 512 / SCALARS, VECTORS AND FIELDS CH 11 or, as 6 2 -. V 2 2a 2 (l + cos e) = v 2 sin 4a 2 (l + cos 0)2 Hence as f = —a(cos 00 2 + sin 66), substituting for r, & 2 , and 6 in the radial component of acceleration we find, as required, that — 3v 2 r — r& 2 = — - — = constant. 4a A vector treatment of particle dynamics follows quite naturally from the ideas presented so far. Thus a particle of variable mass m moving with velocity v has, by definition, the linear momentum M, where M = my. Now by Newton's second law of motion we know that, with a suitable choice of units, we may equate the force F to the rate of change of momentum, so it follows that we may write ., dM However, dM dm dv — - = — v + m — At At At and hence Am dv F = -y + m-. (11-27) In the case of a particle of constant mass m, we have Am/At = 0, reducing Eqn (11-27) to the familiar equation of motion F = wa, (11-28) where a = Av/At is the acceleration. Similarly, the angular momentum of a particle of fixed m»ss m about the origin is defined by the relation SI = r x my, where r is the position vector of the particle relative to the origin and v == At/At is its velocity. Then the rate of change of angular momentum about the origin is ASl dv — — - = my X v + mr x — At At = rxF, (11-29) SEC 11-3 SOME APPLICATIONS / 513 by virtue of Eqn (11-28). This is the vector form of the principle of angular momentum, which asserts that the rate of change of angular momentum about the origin is equal to the moment about the origin of the force acting on the particle. The line integral ■'-//•* also occurs naturally in many contexts, perhaps the simplest of which is in connection with the work done by a force. If F is identified with a force, and dr is a displacement along some specific curve T joining points A and B, then / represents the work done by the varying force F as it moves its point of application along the curve V from A to B (cf. Fig. 1 1 -4). In the special case that F is a constant force and T is a straight line segment with end points at s = a and s = b this simplifies to an already familiar result. Suppose that F = F a and dr = dsp, where a, (3 are constant unit vectors inclined at an angle 6, then J = F . dr = F(tt . p) ds J.i Ja = F(b — a) cos 6. Thus, as would be expected in these circumstances, the work done by F is the product of the component F cos 6 of the force F along the line of motion and the total displacement (b — a). The line integral also occurs in fields other than particle dynamics, and in fluid mechanics for example, if F is identified with the fluid velocity v and T is some closed curve drawn in the fluid, then the scalar quantity y defined by the line integral y -jl..* is called the circulation around the curve T. In more advanced works it is shown that y provides a measure of the degree of rotational motion present in a fluid. For a special class of fluid flows known as potential flows the cir- culation is everywhere zero, irrespective of the choice of T. These flows are said to be irrotational and are of fundamental importance. Line integrals around closed curves are generally denoted by the symbol § with the conven- tion that the path of integration is taken anti-clockwise, so that for the circulation y we would write y = (b v . dr. A reversal of the direction of integration around T would change the sign of y. 514 / SCALARS, VECTORS AND FIELDS CH 11 An exactly similar application of the line integral occurs in electromagnetic theory, where the electromotive force (e.m.f.) between the ends A and B of a wire coinciding with a curve T is related to the electric field vector E by the line integral e.m.f. =r E.dr. Example 11-10 Find the work done by a force F = yz\ + xj + xzk in moving its point of application along the curve Y defined by x = t, y = t 2 , z = fi from the point with parameter t = 1 to the point with parameter t = 2. Solution Work done = F . dr = (yzi + xj + xzk) . (dxi + dyj + dzk) 1 = yz dx + x dy + xz dz. Now as x = t, y = t 2 , z = t 3 , it follows that dx = dt, dy = It dt, dz = 3t 2 dt and so, substituting in the above expression, we find Work done = | (4? 5 + 2t 2 )dt = 140/3 units. Example 11-11 If the fluid velocity v = x 2 y\, determine the circulation y of v around the contour Y comprising the boundary of the rectangle x = ±a, y = ±b. ^y R r b A o -< ^^^B 1* —a O fl* S F -b p SEC 11-4 FIELDS, GRADIENT, AND DIRECTIONAL DERIVATIVE / 515 Solution By definition, the circulation y is y = (j> v . dr = o x 2 y\ . (dxi + dvj + dzk) Jr Jr = (p x 2 ^ dx, where the direction of integration is anti-clockwise around T. Now the line integral around T may be represented as the sum of four integrals as follows, rQ rR rs rp y = x 2 y dx + x 2 y dx + x 2 y dx + \ x 2 y dx, where the limits refer to the corners of the rectangle in Fig. 11-6. The first and third integrals vanish since x is constant along PQ and RS, with the consequence that dx = 0. Along QR, y = b and along SP y = —b, so that J" -a fa bx 2 dx + —bx 2 dx = a J — a -4a 3 b 11-4 Fields, gradient, and directional derivative The scalar function <f> = VO - x 2 ) + V(l - J 2 ) + V(l - z 2 ) is defined within and on the cube shaped domain |x|<l, | / | < 1, | z | <; 1 and assigns a specific number <£ to every point within that region. In the language of vector analysis, <f> is said to define a scalar field throughout the cube. In general, any scalar function <£ of position will define a scalar field within its domain of definition. A typical physical example of a scalar field is provided by the temperature at each point of a body. Similarly, if F is a vector function of position, we say that F defines a vector field throughout its domain of definition in the sense that it assigns a specific vector to each point. Thus the vector function F = sin xi + xyj + je z k defines a vector field throughout all space. As heat flows in the direction of decreasing temperature, it follows that associated with the scalar temperature field within a body there must also be a vector field which assigns to each point a vector describing the direction and maximum rate of flow of heat. Other physical examples of vector fields are provided by the velocity field v throughout a fluid, and the magnetic field H throughout a region. To examine more closely the nature of a scalar field, and to see one way in which a special type of vector field arises, we must now define what is called the gradient of a scalar function. This is a vector differentiation operation that associates a vector field with every continuously differentiable scalar function. 516 / SCALARS, VECTORS AND FIELDS CH 11 definition 11-5 (gradient of scalar function) If the scalar function <j>(x, y, z) is a continuously differentiable function with respect to the inde- pendent variables x, y, and z then the gradient of <^, written grad <f>, is defined to be the vector , . 8<h . 8<J> 8J> grad<£ = -ri + -rj + -^k. ox 8y 8z For the moment let it be understood that r = xi + y\ + zk is a specific point, and consider a displacement from it dr = dxi + dy\ + dzk. Then it follows from the definition of grad <j> that 8<t> 8<h 8d> dr . grad <£ = / dx + -£■ dy + ■£■ dz, ox oy 8z in which it is supposed that grad <f> is evaluated at r = xi + y\ + zk. Theorem 5T9 then asserts that the right-hand side of this expression is simply the total differential d<f> of the scalar function <f>, so that we have the result d<£ = dr . grad <£. (11-27) If we set ds = | dr |, then dr/ds is the unit vector in the direction of dr. Writing a = dr/ds, Eqn (11-27) is thus seen to be equivalent to ^ = a.gradf " (11-28) ds Because a . grad <f> is the projection of grad <f> along the unit vector a, expres- sion (11-28) is called the directional derivative of <f> in the direction of a. In other words, a . grad <f> is the rate of change of <f> with respect to distance measured in the direction of a. We have already utilized the notion of a direc- tional derivative in connection with the derivation of the Cauchy-Riemann equations, though at that time neither the term nor vector notation was employed. As the largest value of the projection a . grad <j> at a point occurs when a is taken in the same direction as grad <f>, it follows that grad (f> points in the direction in which the maximum change of the directional derivative of <f> occurs. In more advanced treatments of the gradient operator it is this last property that is used to define grad <j>, since it is essentially independent of the coordinate system that is utilized. From this more general point of view our Definition 11-5 then becomes the interpretation of grad^ in terms of rectangular Cartesian coordinates. The vector differential operator V, pronounced either 'del' or 'nabla', is defined in terms of rectangular Cartesian coordinates as 8 d d 8x 8y oz SEC 11-4 FIELDS, GRADIENT, AND DIRECTIONAL DERIVATIVE / 517 As the name implies, V is a vector differential operator, not a vector. It only generates a vector when it acts on a suitably differentiable scalar func- tion. We have the obvious result that J , 86 . 86 . 86 1.8 8 8\ , grad ^ s _ 1+ _ J + _| k= ^_ +j _ + k _^ sV ^ (U . 30) Example 11-12 Determine grad 6 if 6 = z 2 cos (xy — Jtt), and hence deduce its value at the point (1, \-n , 1). Solution We have dx -yz* sin (xy - \rr), — = -xz 2 sin (xy - \tt) and 8z Hence - = 2z cos (xy - Jtt). <3<i d<£ d<£ = -jz 2 sin (xy - \n)\ — xz 2 sin (xy - Jtt)]' + 2z cos (xy - £ir)k. At the point (l, \tt, l) we thus have (grad 6) a< K „ = — (_(| CT )i - j + 2k). Example 11-13 If r = xl + y\ + zk, and r = \ r |, deduce the form taken by grad r n . Solution As r = (x 2 + j 2 + z 2 ) 1 / 2 , it follows from Eqn (11-30) and the chain rule that pad,-- (,-+,- +k-)r- (.dr 8 8r 8 dr 8\ \ 8x dr 8y dr 8z dr) , (dr 8r dr \ -^fe' + ^ + SV However, dr x dr y dr z dx r 8y r dz r 518 / SCALARS, VECTORS AND FIELDS CH 11 and so grad r n = nr n ~ 2 (xi + y] + zk) = nr n ~ 2 T. The following theorem is an immediate consequence of the definition of the gradient operator and of the operation of partial differentiation. theorem 11-6 (properties of gradient operator) If <f> and tp are two con- tinuously differentiable scalar functions in some domain D, and a, b are scalar constants, then (a) grad a = 0; (b) grad {a<j> + tip) = a grad <j> 4- b grad rp; (c) grad (<f> xp) = <f> grad y> + y> grad <f>. The surfaces <f>(x, y, z) = constant associated with a scalar function <f> are called level surfaces of<f>. If we form the total differential of j> at a point on a specific level surface <f> = constant then &<f> = and, as in Eqn (5-23), we obtain the result d ± Ax + d A dy + d ± Az = o. ox oy dz This is equivalent to dr. grad ^ = 0, (11-31) where now dr is constrained to lie in the level surface. This vector condition shows that grad <j> must be normal to dr, and as dr is constrained to be an arbitrary tangential vector to the level surface at the point in question, it follows that the vector grad </> must be normal to the level surface. The unit normal n to the surface is thus n = grad^/| grad<£ |. Notice that this normal is unique apart from its sign. This simple argument has proved the following general result. theorem 11-7 (normal to level surface) If <j> is a continuously differentiable scalar function, the unit normal n to any point of the level surface <f> = con- stant is determined by grad <£ n = grad<£ Example 11-14 If <f> = x 2 + 3xy 2 -f- yz z — \2, find the unit normal n to the level curve <p = 3 at the point (1, 2, 1). Deduce the equation of the tangent plane to the level surface at this point. Solution The level surface <f> = 3 is defined by the equation %p = 0, where y> = x 2 + 3xy 2 + yz 3 — 15 = 0. SEC 11-4 FIELDS, GRADIENT, AND DIRECTIONAL DERIVATIVE / 519 Hence grad y> = (2x + 3/ 2 )i + (6xy + z 3 )j + 3yz 2 k which, at (1, 2, 1), becomes (grad v)(i,2,i) = 14i + 13j + 6k. As xp = is the desired level surface, it follows from Theorem 11-7 that the unit normal to this surface at the point (1, 2, 1) must be, 141 + 13j + 6k 141 + 13j + 6k n = V[(14) 2 + (13)2 + (Q2] ^401 Now the equation of a plane is n . r = p, where r = xi + y\ + zk is a general point on the plane, n is the unit normal to the plane, and p is its perpendicular distance from the origin. The point ro = i + 2j + k is a point on the plane so that n . r = n . ro (=/?). Hence /14i + 13j + 6k\ /14i+13j + 6k\ ,. „. 1N showing that the required equation is 14* + 13j + 6z = 46. We have seen how the gradient operator associates a vector field grad <f> with every continuously differentiable scalar field <f>. Any vector field F = grad <£ which is expressible as the gradient of a scalar field <f> is called a conservative vector field, and <f> is then referred to as the scalar potential associated with the vector field. This has an important implication when line integrals involving con- servative vector fields are considered. Let us suppose that F = grad <j>, and that -L JA Then B F.dr. (11-32) J= J grad^.dr, (11-33) and by virtue of Eqn (11-27) this can be written J = f d<t> = «£(B) - #A). (11-34) Hence when F belongs to a conservative vector field, results (1 1 -32) and (11-34) show that the line integral J of F depends only on the end points of the path of integration, and not on the path itself. This fundamental result has far reaching consequences and forms the 520 / SCALARS, VECTORS AND FIELDS CH 11 basis of many important developments, of which gravitational potential theory is but one. Suppose, for example, that F is identified with a conserva- tive force field, then result (11-34) represents the change in the potential energy of a particle as it moves from point A to point B. That /depends only on the difference <f>(B) and <£(A) and not on the path joining A to B explains why, when using potential energy considerations in mechanics, no considera- tion need be given to the path that is followed. 11-5 An application to fluid mechanics Let the velocity field v in a fluid as a function of position (x, y, z) and the time t be denoted by V = Vj\ + V2] + v$l, (11-35) where v t = v ( (x,y, z, t) for / = 1, 2, 3. Clearly, if at any fixed time t = h, dr denotes a differential displacement along the line of flow at the point with position vector r = r(x, y, z, h) then dr must be parallel to the velocity vector v at that point. Hence the respective components of dr and v must be proportional. The lines determined in this manner, which are everywhere tangential to the velocity field vector, are called the streamlines of the flow field. More properly these should be called stream surfaces since in three space dimensions they correspond to surfaces. In the case of a general vector field F, not necessarily defining a velocity field, they are called field lines. The condition that dr must be parallel to v implies that the field lines or streamlines must satisfy the equations ^f = ^ = ^. (11-36) VI V2 Vz Equations of this form are called differential equations, and methods for their solution will be explored systematically in the last three chapters. If, now, r is the position vector of a fluid particle at time t, we have the obvious vector equation ^ = v, (11-37) dt which implies the three scalar differential equations dx dy dz . . . T- = t, i» -£ = v *> -r t =V3 - (11-38) dt dt dt Together, the solutions of these last three equations define curves called the particle trajectories. The particle trajectories are functions of the time, and are so named because they describe the path followed by individual fluid particles, as they pass through the flow field. SEC 1 1-5 AN APPLICATION TO FLUID MECHANICS / 521 Example 11-15 Find the differential equations of the streamlines and particle trajectories corresponding to the fluid velocity field v = 2t 2 xi + ty\ + 3z 2 k. Solution In this case vi = 2fix, v 2 = ty, and v 3 = 3z 2 , showing that the differential equations describing the streamlines are dx dy dz 2^ = 7y" = 3z" 2 (Wth ' = constant )> whilst the differential equations describing the particle trajectories are dx dy dz 5F- 2 "*' 37"* a?- 3A In this case all of these differential equations are of the type called variables separable which may be solved by direct integration after some slight re- arrangement. Although the main discussion of the solution of differential equations will be postponed until the final chapters of this book, it will be instructive to solve the ones that have arisen in connection with this problem. The differential equations defining the streamlines are equivalent to two different relationships between the three space variables x, y, and z. We choose to work with the first and last pairs of the equations which are, respectively, equivalent to dx o, d y a A y * dz — = 2t— and — = , x y y 3z 2 with t regarded as a constant. Taking the antiderivatives of these gives log x = 2?{log y + constant} and log y = — + constant. 3z Re-arrangement shows that the streamlines or stream surfaces are described by the equations x = (Ciy) 2 ', y = eWSe-'/fc where d, C 2 are arbitrary constants. If flow in the plane z = z is considered, then these equations define a curve that is correctly called a streamline. It would be the curve x = Ci 2 *e 2C,2(2/3 e _2(2/3zo , y = e^V" 3 * . The particle trajectories are found in similar fashion by finding the antiderivatives of d * -. o . dy dz — =2fidt, -i + /d/, -«3dr. x y z 2 522 '/ SCALARS, VECTORS AND FIELDS CH 11 Hence 2 t 2 log x = - t 3 + constant, log/ = — + constant, 1 — 3t + constant, z showing that x = c 3 e 2(3 ' 3 , 7 = C^\ z = ^-^ where Cz, d, and C5 are arbitrary constants. The position vector of a particle must thus be r=C 3 e 2(3 ' 3 i + C 4 e (2 / 2 j + — 5— -k. C5 — it PROBLEMS Section 111 11-1 Sketch and give a brief description of the curves described by the following vector functions of a single real variable t : (a) r = a cos 2nti + b sin 2nt] + tk; (b) r = a cos 2w/i + b sin 2rtj + t 2 k; (c) r = d + t 2 ) + t 3 k. 11-2 State which of the following vector functions are everywhere continuous and, if they have points of discontinuity, where these occur: ( a > u w=(rr^) i+ (r^)j + ' 2k; (b) U (o = (7-Z--2) * + tan 'i + ?e ~' k ; (c) u(f) = tanh M + cosh t\ + t sinh tk; (d)u(0= (^-L^-X i + I sin r| j + 3k. 11-3 A vector function u(f) of a real variable t may be assigned left- and right-hand limits uOo— ) = Hm u(/) and u(/o+) = lim u(0 with respect to the point to t—>to — t-+to + in an obvious manner. The vector function u(f) will be continuous at t = to if u(/o— ) = u(?o+). Use this concept to determine which of the following vector functions are continuous at the stated points : , .. , . sinh 2t . ,. , , (a) u(0 = i + /e'j + cosh tk at t = 0; (fn _ a n\ 1 i + cosh t\ + tanh tk at t = a. PROBLEMS / 523 11-4 Form the vector functions u(f) + v(/), u(0 x v(t), and the scalar function u(0 . v(r) given that: „(,) = ,2 i + sinhfj+ jj_lfj and \(t) = 2ti + cosh t\ + sin tk. 11-5 Determine du/d? and d 2 u/d/ 2 for the vectors u defined in (a) and (c) of Problem 111, and find du/d/ for the vector r , d 2 r u = - + (a . r)b + a x — , where r = t(0, r = | r [ and a, b are constant vectors. 11-6 The position vector of a particle at time t is r = cos (/ - l)i + sinh (t - l)j + <xt s k. Find the condition imposed on a by requiring that at time t = 1 the accelera- tion vector is normal to the position vector. 11-7 Find the unit tangent T to the curve r = tl + t 2 } + t 3 k at the points corresponding to t — and t = 1. 11-8 Prove results (a) to (c) of Theorem 11-3. 11-9 Prove result (d) of Theorem 11-3: (a) by expansion of the vector product u x v followed by subsequent recombination of the results; and (b) using determinants. 11-10 Find B, T, and N when t = Jw, given that r = (cos t + sin 2 t)i + sin t(\ — cos t)\ — cos tk. 11-11 Find B, T, N, k, and t for the helix r = (1 — cos t)\ + sin t] + tk, when t = 1 77. 11-12 Prove that if r(t) is a suitably differentiable function of the real variable /, and s is the arc length along the parametrically defined curve r = r(f), then with the usual notation dr _ dy df dt ' dr 2 Hence show that dr djr At X dt 2 dr 2T + K [dt) N - t •a) «• and deduce that 524 / SCALARS, VECTORS AND FIELDS CH 11 dr d 2 r dr d 2 r dt * dt 2 1 dt'~ dr 2 dr d 2 r o dr 3 dr * dt 2 dt 11-13 Apply the results of Problem 11-12 to deduce B, T, and N for the curve r = ti + t 2 \ + 1 3 k when f = 1. 11-14 This problem gives an elementary derivation of the radius and circle of curvature for a plane curve. If at a point (f, jj) on a curve j = /(x), a circle of radius p and centre (a, p) is tangent to the curve and has the same second derivative, then it is called the circle of curvature at (£, rj). The number p is called the radius of curvature and (a, /?) the centre of curvature. Let the circle of curvature at (f, j?) have the equation (X — a) 2 + (7 - ft 2 = p 1 , where (Z, Y) is a general point on the circle (see figure). By differentiation of this equation with respect to X, and using the tangency condition dYldX = /'(!) at (I, »?)> show that (f _ «) + fo - fl/XD = 0. By a further differentiation of the equation with respect to X, and by using the equality of second derivatives d 2 Y\dX 2 =/"(£) at (I, ??), show that 1 + (/'(f)) 2 + fo - «/"(?) = o. Use the fact that (f, jj) lies o/j the circle of curvature to deduce that: a= l--^|(l+(/'(f)) 2 ) PROBLEMS / 525 and that __ CI + (AfflT" ' LTWl Find the centre and radius of curvature of the curve y = 1 + x 2 at the point (1, 1). 11-15 Use the results of Problem 11-12 to show that the circular helix r = a cos ti + a sin t\ + btk has the constant radius of curvature p = (a 2 + b 2 )\a. 11-16 Show from the Serret-Frenet equations and the fact that T = drjds, that the torsion t may be expressed in the form : dr As "djr dV ds 2 ds 3 ©" 11-17 If r = r(f), where the parameter t is not the arc length along the curve, prove that dr _ dr ds "di ~~ ds dt' d^r _ djr /ds\ 2 dr dJ£ dr 2 ~ d5 2 \dt) + dsdt 2 ' ds\ 3 d 2 r ds d 2 s dr d 3 j dl 2 di di 2 + ds df 3 " _ djr /ds\ 2 dr ~~ d5 2 [dtj + ds dh _ dfr /ds\ dr 3 - di 3 \df j nee show that dr /&r djiA = dr /dh dh\ /ds\ dt ' [dt 2 * dr 3 ) ~~ ds ' [ds 2 X ds 3 J [dtj dr d/ "djr dV ds 2 di 3 , that dr dr ' 'djr dV df 2 X dr 3 Use the result of Problem 11-12 to deduce that and k = — = p dr [~d 2 r d 3 rl dr dr 2 X dr 3 "dr d 2 r~ dr X dt 2 2 dr d 2 r dr * dr 2 dr 3 dr 11-18 Apply the result of Problem 11-17 to find the torsion r of the non-constant pitch helix r = cos ti . . /e ( — e~ e \ I + S1I1/J + / jk. 526 / SCALARS, VECTORS AND FIELDS CH 11 Section 11-2 11-19 Find the antiderivative of the following two functions t(t): (a) f(0 = cosh 2ti + - j + t 3 k; (b) f(0 = t 2 sin ti + e ( j + log tk. 11-20 Verify the following antiderivatives using Definition 11-3: (a) |7r . ^\ dt = Kr . r) + C = it 2 + C; /L , f/dr d 2 r\ _, 1 dr dr ^ 1 /dr\ 2 „ (b) J(d-r-d7i) d, = 2d7-d7 + c = 2(d7) +C; ^ f d 2 r dr ^ (c) Jrx- = rx- + C, I where C, C are arbitrary constants. 11-21 Use the result of Problem 11-20 to express dr/dt in terms of r, given that r satisfies the vector differential equation d 2 r _ + n 2 r = 0. 11-22 Evaluate the definite integral > (t 2 e ( i + t log t\ + t 2 k)dt. J\ 11-23 The displacement of a particle P is given in terms of the time / by r = cos 2ti + sin 2t] + t 2 k. If v and / are the magnitudes of the velocity and acceleration respectively, show that 11-24 A point moving in space has acceleration cos ti + sin t\. Find the equation of its path if it passes through the point (— 1, 0, 0) with velocity — j + k at time / = 0. 11-25 Evaluate the line integral of F = xyi + yz\ + zk along the contour defined by r = ti + t % ) + t s k from t = to t = 1. 11-26 Evaluate the line integral of F = 4xyi — 2x 2 \ + 3zk from the origin to the point (2, 1, 0) along the contour: (a) from the origin to the point (2, 0, 0) and then from the point (2, 0, 0) to the point (2, 1,0); (b) from the origin to the point (0, 1 , 0) and then from the point (0, 1 , 0) to the point (2, 1,0); (c) from the origin to the point (2, 1, 0) along the straight line joining these two points. (Hint : the contours (a), (b), (c) all lie in the plane z = 0.) 11-27 Evaluate the line integral F = 4xyi + 2x 2 ) + 3zk from the origin to the point (2, 1, 0) along the contours of Problem 11-21. PROBLEMS / 527 Section 11-3 11-28 A particle moves in a curve given by r = a(l — cos 0) with — = 3. At Find the components of velocity and acceleration. Show that the velocity is zero when 8 = 0. Find the acceleration when 8 = 0. 11-29 A particle moves on that portion of the curve r = ae e cos 6 (a = constant) for which < 6 < £ir, so that its radial velocity u remains constant. Find its transverse velocity and its radial and transverse components of acceleration as functions of u and d. 11-30 If the fluid velocity v = yi + 2x\, determine the circulation y by integrating anti-clockwise around the rectangular contour x = ±a, y = ±b. Show that the sign of y is reversed if the direction of integration is taken clockwise around the same contour. 11-31 Consider the three rectangular regions (a)0<*<l, — l<y<l, (b) < x < 1, 1 < y < 2, and (c) < x < 1, - 1 < y <c 2 and denote their boundary curves by I 1 !, r 2) and r 3 . If F = 2yi + x\, evaluate the three line integrals Ji = F . dr, Ji = F . dr, J 3 = f F . dr, Jri Jr 2 Jr 3 and hence show that J\ + Ji = h. 11-32 Given that F = cosj-i + sin*j, evaluate the line integral of F taken anti- clockwise around the triangle with vertices at the points (0, 0), (£w, 0), (£*•> £")• 11-33 A vector field F is said to be irrotational if its line integral around any closed curve T is zero. By integrating around two conveniently chosen contours, deduce which of the following vector fields are irrotational : (a) F = y sinh z\ + x sinh z\ + xy cosh zk; (b) F = xi + y\ + zk; (c) F = xyzH + x 2 z a \ + x 2 yzk. Section 11-4 11-34 Find the gradient of the following functions <l>: (a) $ = cosh xyz; (b) ^ = x 2 +y 2 + z 2 ; (c) $ = xy tanh (x — z). 11-35 Find the directional derivative of the following functions <£ in the direction of the vector (i + 2j — 2k) : (a) </> = 3x 2 + xy 2 + yz; (b) ^ = x 2 yz + cosy; (c) <l> = 1 1 xyz. 11-36 If new independent variables S, n, i are introduced through the equations £ = x + a., n = y + p, and t = z + y, where a, /3, and y are constants, and 4> is a suitably differentiable function, prove that 528 / SCALARS, VECTORS AND FIELDS CH 11 Deduce from this result the fact that grad 4> is unchanged by a translation of the origin of the coordinates. This property is described by saying that grad <l> is invariant with respect to a translation of the coordinate system. 11-37 If new independent variables f, rj, { are introduced through the equations $ = aux + ai2y + avsz, V = a2ix + az?y + a23Z, £ = 031X + 032)> + A33Z, and <j> is a suitably differentiable function, prove that /. 8 . 8 , 8 \ , I. <> .8 . 3 \ Deduce from this result the fact that grad ^ is unchanged by a rotation of the coordinate system. This property is described by saying that grad <t> is invariant with respect to a rotation of the coordinate system. 11-38 If a is a constant vector and r = xi + y] + zk, r = | r | prove that (a) grad (a. r) = a; (b) gradr = r; (c)grad^ =-~ 11-39 By using the Cartesian representation of grad <£ as expressed in Definition 11-5, prove that (a) grad (a<j> + by>) = a grad <t> + b grad y>; (b) grad (^y>) = <f> grad y> + V> grad <f>, where a, b are scalar constants and <j>, y> are suitably differentiable functions. 11-40 A vector field F will be irrotational if it is expressible in the form F = grad <f>, with </> a scalar potential. Find the most general scalar potential ^ that will give rise to the irrotational vector field F = O + $y 2 z 2 )i + xyz 2 ) + xy 2 zk. 11-41 Find the unit normal n to the surface x 2 + 2y 2 — z 2 — 8 = at the point (1, 2, 1). Deduce the equation of the tangent plane to the surface at this point. 11-42 Find the unit normal n to the surface x 2 — 4y 2 + 1z 2 = 6 at the point (2, 2, 3). Deduce the equation of the plane which has n as its normal and which passes through the origin. 11-43 If (xo, jo, zo) is a point on the conic surface x 2 + y 2 + z 2 = a b c show that the tangent plane to the surface at that point is xxo yy zzo _ 1 a b c 11-44 The vector field F is generated by the scalar potential <j> = x 2 y. Verify directly by integration that the line integral of F along each of the three paths PROBLEMS / 529 of Problem 11-26 is equal to 4. Confirm this result by using the fact that if F = grad 4, then F . dr = -KB) - 4(A). I 11-45 The Newtonian law of gravitation asserts that the force of attraction between point masses m u rm distant r apart acts along the line joining them and has magnitude (Gmi m2)lr 2 , where G is the gravitational constant. Show that this force law corresponds to a potential 4 = (Gm\ m^lr. 11-46 If v = vii + v 2 ) + Dak is a vector field, then the scalar operator v . grad expressed in Cartesian coordinates is defined to be v . grad = v . V = vi — + v 2 — + v 3 -■ ox 8y 8z Hence if F, 4> are suitably differentiate vector and scalar fields, respectively, it follows that v . grad 4> is a scalar and v . grad F is a vector. Given that = x 3 yz 2 , find (a) v . grad 4>\ (b) v . grad F; (c) v . grad v. y = xyi+yj + xzk, F = xH + y*] - z 2 k, 11-47 Special differential operators called the divergence and the curl of a vector can be defined in terms of Cartesian coordinates by means of the operator V. If F = Fii + F 2 j + F 3 k is a suitably differentiable vector field, then the divergence of F is denoted either by div F or V . F and is the scalar defined by divF^V.F = ^ + ^ + ^ 3 - 8x 8y ^ 8z The curl of F is denoted either by curl ForVxF and is the vector defined by k 8 curl F = V x F = Show that 8 8x Fi J 8 Ty F 2 8z \8y 8z J T \8z 8x) l ^\8x Sy ) *' If ^ is a suitably differentiable scalar function show by direct substitution into the definitions that (a) div (4>F) = F . grad 4 + 4> div F; (b) curl (4>F) = F x grad 4> + 4- curl F. 11-48 Find V . F and V x F given that F = x 2 y 2 i + y 2 z 2 ) + xzk. 530 / SCALARS,VECTORS.AND FIELDS CH 11 11-49 Prove from the definitions that (a) curl grad 4> = (b) div curl F = 0, where <j>, F are suitably differentiable scalar and vector functions respectively. 11-50 Give an example of a differentiable scalar potential $ and vector field F. Use them to confirm the results of Problem 11-49 by means of direct differ- entiation. 11-51 In the slow one-dimensional flow of a viscous fluid between parallel plates the velocity field has the form -(-*) k, where the plates coincide with the planes x = ±d and the z-axis points in the direction of flow. By selecting a convenient contour in the (x, z)-plane, prove that the circulation is non-zero so that the flow cannot be irrotational. Section 11-5 11-52 The velocity field describing a fluid flow is v = 2i + yt) + k. Write down the differential equations describing the streamlines and the particle trajectories and solve them as in Example 11-15 in the text. Series, Taylor's theorem and its uses 12-1 Series The term series denotes the sum of the members of a sequence of numbers {a n }, in which a n represents the general term. The number of terms added may be finite or infinite, according as the sequence used is finite or infinite in the sense of Chapter 3. The sum to N terms of the infinite sequence {a n } is written x a\ + a 2 + • ■ ■ + ajv = 2 a„, «=i and it is called a finite series because the number of terms involved in the summation is finite. The so called infinite series derived from the infinite sequence {a n } by the addition of all its terms is written 00 «i + 02 + • • • + a r + ■ • • = 2 a n . n = l The following are specific examples of numerical series of essentially different types: N I n = \ (a) 2 »a = 12 + 22 + • • • + n\ in which the general term a n = n 2 ; 00 1 11 i (b)2 --i + i+l + i. + . ..+! + r\ in which the general term a n = 1/nl; 00 1 11 1 (c)2 i = i+I + - + . .. + ! + ... n=i n I 5 r in which the general term a n = \jn; » 2n 2 + 1 1 9 19 , 2r2 + 1 in which the general term a n = (2n 2 + l)/(4n + 2); (e) | (-!)»+! =1-1 + 1-1 + . .. + (_l)r+l + K = l 532 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 in which the general term a„ = (— l) n+1 . Only (a) is a finite series ; the remainder are infinite. There is obviously no difficulty in assigning a sum to a finite series, but how are we to do this in the case of an infinite series ? A practical approach would be to attempt to approximate the infinite series by means of a finite series comprising only its first N terms. To justify this it would be necessary to show in some way that the sum of the remainder Rn of the series after N terms tends to zero as N increases and, even better if possible, to obtain an upper bound for Rn. This was, of course, the approach adopted in Chapter 6 when discussing the exponential series which comprises example (b). In the event of an upper bound for Rn being available, this could be used to deduce the number of terms that need be taken in order to determine the sum to within a specified accuracy. The spirit of this practical approach to the summation of series is exactly what is adopted in a rigorous discussion of series. The first question to be determined is whether or not a given series has a unique sum; the estimation of the remainder term follows afterwards, and usually proves to be more difficult. To assist us in our formal discussion of series we use the already familiar 00 notion of the nth partial sum S n of the series 2 a n, which is defined to be the n = l finite sum n Sn =2 °r = fl l + «2 + ' - " + «»• r=l Then, in terms of S n , we have the following definition of convergence, which is in complete agreement with the approach we have just outlined. definition 12T (convergence of series) The series J #« will be said to be convergent to the finite sum S if its «th partial sum S n is such that lim S n = S. n~*oo If the limit of S n is not defined, or is infinite, the series will be said to be divergent. The remainder after n terms, R n , is given by Rn = a n +i + a n +2 + ■ • • + a n ^ r + * ' ', so that if {S n } converges to the limit S, then R n = S — S„ and Definition 12-1 is obviously equivalent to requiring that SEC 12,1 SERIES / 533 lim (5" - S„) = lim R„ = 0. n— *oo n—*co Example 12-1 Find the «th partial sum of the series , 111 1 1 H 1 1 1- • • • H h • • ■ 3927 3» and hence show that it converges to the sum 3/2. Find the remainder after n terms and deduce how many terms need be summed in order to yield a result in which the error does not exceed 001. Solution This series is a geometric progression with initial term unity and common ratio 1/3. Its sum to n terms, which is the desired nth partial sum S n , may be determined by a well known formula (see Problem 12-2) which gives 1 - (l/3)» 3 Sn - -p^f = - 2 [1 - (1/3).]. We have 3 hm S n = lim - n-»oo n— *-oo -^ -ffl = 3/2, showing that the series is convergent to the sum 3/2. As S„ is the sum to n terms, the remainder after n terms, R„, must be given by R„ = 3/2 - S n , and so *-i(T- If the remainder must not exceed 001, R n < 001, from which it is easily seen that the number n of terms needed is n > 5. The determination of R n was simple in this instance because we were fortunate enough to have avail- able an explicit formula for S„. In general such a formula is seldom available. The definition of convergence has immediate consequences as regards the addition and subtraction of series. Suppose Sa B and S6„ are convergent series with sums a, /S. (It is customary to omit summation limits when they are not important.) Let their respective partial sums be S n = a\ + a 2 + ■ • ■ + an, S n ' = bi + bi + • • • + b n and consider the series S(a B + b n ) which has the partial sum S n " = S n + S n '. Then lim Sn" = lim (S n + S„') n— *-co n— >oo = lim 5„ + lim S n ' = a + /?, 534 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 showing that 00 1 (a» + b„) = x + p. n = l A corresponding result for the difference of two series may be proved in similar fashion. We have established the following general result. theorem 121 (sum and difference of convergent series) If the series 00 00 2 a n and J b n are convergent to the respective sums « and /S, then n=\ n=l 00 00 2 ( a » + b n ) = « + ft; 2 ( a » — An) = a - /S. n=l n=l Example 12-2 Suppose that a n = (l/2)» and 6„ = (1/3)", so that the 00 series involved are again geometric progressions with 2 0/2) n = 2 and n = l CO 2 (l/3)» = 3/2. Then it follows from Theorem 12-1 that n = \ CO 00 2 [(1/2)- + (1/3)"] = 7/2 and 2 [(1/2)" - (1/3)"] = 1/2. Let us now derive a number of standard tests by which the convergence or divergence of a series may be established. We begin with a test for divergence. Suppose first that a series Sa„ with «th partial sum S n converges to the sum S. Then from our discussion of the convergence of a sequence given in Chapter 3, we know that for any e > there must exist some integer N such that | S n - S | < e for n > N. This immediately implies the additional result 1 S n +i - S | < e. Hence, e + e > | S»+i - S | + | S n - S \ = \ S n +i - S \ + \ S - S„ | > | Sn+l — S n |. However, as S n +i — S n = a n +i, we have proved that I tfn+1 | < 2e for n > N. As e was arbitrary, this shows that for a series to be convergent, it is necessary that lim | a n | = n~>oo SEC 12-1 SERIES / 535 or, equivalently, lim a„ = 0. n— *co GO If this is not the case then the series J «« must diverge. This condition thus n = \ provides us with a positive test for divergence. 00 theorem 12-2(a) (test for divergence) The series £ a„ diverges if lim a n ^ 0. This theorem shows, for example, that the series (d) is divergent, because a n = {In 2 + l)/(4« + 2), and hence it increases without bound as n increases. It is important to take note of the fact that this theorem gives no information in the event that lim a n = 0. Although we have shown that this is a necessary JI-+00 condition for convergence, it is not a sufficient condition because divergent series exist for which the condition is true. Theorem 122(a) gives no information about either series (a) or (c) as in each case lim a n = 0. In fact, by using another argument, we have already n-*oo proved that the series representation for e in (b) is convergent, whereas we shall prove shortly that the harmonic series (c) is divergent. Series (e) must also be divergent according to our definition, because a n oscillates finitely between 1 and — 1, and also S n does not tend to any limit. The terms of series are not always of the same sign, and so it is useful to associate with the series Za„ the companion series S | a„ |. If this latter series is convergent, then the series 2a„ is said to be absolutely convergent. It can happen that although 2a„ is convergent, S [ a„ | is divergent. When this occurs the series £a» is said to be conditionally convergent. Now when terms of differing signs are involved, the sum of the absolute values of the terms of a series clearly exceeds the sum of the terms of the series, and so it seems reason- able to expect that absolute convergence implies convergence. Let us prove this fact. theorem 12-2(b) (absolute convergence implies convergence) If the series 00 _ to 2 | an | is convergent, then so also is the series T a n . " =1 n=l Proof The proof of this result is simple. Let S n = | a x | + | a 2 | + • • • + I a n | and S n ' = a\ + « 2 + • • • + a n be the «th partial sums, respectively, of the series in Theorem 12-2. Then, as a r + | a r \ is either zero or 2 | a r \, it follows that < S n + S n ' < 2S n '. 536 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 Now by supposition Iim S n ' = S' exists, so that taking limits we arrive at n— «-oo < lim (S n + S n ') < 25'. n— *-oo This implies that the series with «th term a n + \a n \ must be convergent and hence, using Theorem 12-1, that ]T a n must be convergent. Example 12-3 Consider the series »r «! 2! 3! ^ As a» = (—l) n jnl, we have | a„ | = l/«!, which is the general term of the exponential series denning e. Thus Theorem 12-2, and the convergence of the exponential series, together imply the convergence of 2 ( — I)*/"! 1° M = fact this is the series representation of 1/e. Suppose 26» is a convergent series of positive terms, and that Sa» is a series with the property that if N is some positive integer, then | a n | < b n for n > N. Then clearly the convergence of S6« implies the convergence of 2 I fl » I an d, by Theorem 12-2, also the convergence of 2a w . By a similar argument, if for n > N, < b n < a„, and S6 n is known to be divergent, then clearly 2a ra must also be divergent. We incorporate these results into a useful comparison test. theorem 12-3 (comparison test) (a) Convergence test Let S6« be a convergent series of positive terms, and let 2a„ be a series with the property that there exists a positive integer N such that | an I < b n for n > N. Then Sa„ is an absolutely convergent series. (b) Divergence test Let Sft„ be a divergent series of positive terms, and let Sa n be a series of positive terms with the property that there exists a positive integer N such that < b„<a n for n > N. Then 2a„ is a divergent series. Example 12-4 iider t 2 + (-l)» 3 2» 2 n ' (a) Consider the series 2 [2 + (— l) M ]/2». We have fln SEC 12-1 SERIES / 537 and as £ 3/2» = 3 J 1/2" = 9/2, the conditions of Theorem 12-3 (a) are w = 1 n = 1 satisfied if we set 6„ = 3/2". It thus follows that the series Sa M is convergent. CO (b) Consider the series ]T (n + l)/« 2 . Here we have n+\ 1 /«+ 1\ 1 a„ = — — = - > -, n 2 n\ n J n and as the harmonic series Sl//j is divergent, the conditions of Theorem 12-3 (b) are satisfied when we set b„ = \jn. Hence Sa„ is divergent. n - 1 « x' O I 2 3 4 n -1 n x* d ~~~~™~ (a) (b) Fig. 121 Comparison between series and integral. A powerful test for the convergence or divergence of a series Sa re of positive terms follows by a comparison of the shaded rectangles in Fig. 12.1. Let f(x) be a non-increasing function denned for 1 < .v < oo which decreases to zero as x tends to infinity, and let/(«) = a n , where n is an integer. Then we have the obvious inequality n r*n n 2/(r)< f(x)dx<Zf(r) or, equivalently, n rn n 1 a r < f(x) dx<2 a r . r = 2 Jl r = \ As the right-hand side of this inequality only exceeds the left-hand side by the single term a\, it must follow that in the limit, the infinite series Sa r and the integral lim f{x K— ►OO Jl :) dx 538 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 converge or diverge together. This conclusion may be incorporated into a test as follows. theorem 12-4 (integral test) Let/(x) be a positive non-increasing function defined on 1 < x < oo with lim/(.v) = 0. Then, if a n =/(«), the series X— »-co OO 2 (tn converges or diverges according as n = l 1 \x) dx is finite or infinite. Corollary 12-4 (R N deduced from integral test). Letf(x) be a positive non- ce increasing function defined on 1 < x < oo with \imf(x) = 0, and let J «« n~*ou n~ 1 be convergent, where a n =f(n). Then the remainder R N after N terms satisfies the inequality JN Rn< f(x)dx. JN Proof The result follows at once from the obvious inequality y' ry' y' y t ry y X a r< \ f(x)dx<^a r = * + l JN r = N by taking the limit as N' —*■ oo. This is possible because, by hypothesis, Sa ra is convergent so that the improper integral involved exists. Example 12-5 OO (a) Consider the series 2 !/«*> where k > 0. Then the function /(x) = 1/jc* n = l satisfies the conditions of Theorem 12-4. Hence this series converges or diverges according as lim r* n-»co Jl X k dx k is finite or infinite. If k =^= 1 we have lim — = I- -I lim n— *-°o Jl -^ \A ^/ n-+co 1 nk—1 Hence for < k < 1 this limit is infinite, showing that the series is divergent for k in this range, whereas for k > 1 this limit has the finite value l/(k — 1), showing that the series is convergent for k > 1. Applying Corollary 12-4 SEC 12-1 SERIES / 539 shows that when k > 1, the remainder ^ after N terms must satisfy the inequality Rx<N^- k ^{k - 1). When fe = 1 we obtain the harmonic series, which must be treated separately. As it follows that lim — = lim log n — >- oo, ft-* CO Jl X ft-* CO we have proved that the harmonic series is divergent. (b) Consider the series £«/(l + n 2 ). Here we set/(x) = x/(l + x 2 ), so M = l we must examine rn = lim ft-* oo Jl xdx 1 + x 2 Setting x 2 = h we find Z, = lim Mlog (1 + x 2 ) - log 2] -> oo. ft— *00 Hence the series is divergent. Two other useful tests known as the ratio test and the «th root test may be derived from Theorem 12-3, essentially using a geometric progression for purposes of comparison. The idea involved in these tests is that a series is tested against itself, and that its convergence or divergence is then deduced from the rate at which successive terms decrease or increase. Suppose that Ea„ is a series for which the ratio a n+1 la n is always defined and that lim | a n +ila„ | = L, where L < 1 . Let r be some fixed number such that L < r < 1. Then the existence of the limit L implies that there exists an integer A^ such that I Oft+i | < r ] a n | for n >.N. Hence it follows that I a N +2 | < r | a N+ i |, | a N+3 \ <r\ a N+2 \ < r 2 \ a N+1 ],..., and in general [ «A-+m+i | < r m I a N+ i \. Thus if R N is the remainder after N terms we have GO CO Rn = J, a n < ^ I a n | < | oa-+i I (1 + r + r 2 + • • •). (*) «=jV+1 «. = J\ t +1 v ' 540 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 The expression in brackets is a convergent geometric progression because, by hypothesis, r < 1 . As the remainder term Rn is finite, and is less than the sum of the absolute values of the terms comprising the tail of the series, it is easily seen that the series 2a« must be absolutely convergent. If L > 1 the terms grow in size, and the series Sa« is divergent. Nothing may be deduced if L = 1 for then the series may either be convergent or divergent as illus- trated by Example 12-5 (a). In that case a n +ila„ = n k J(n + l) k , giving lim | a n +ila n | = 1 ; and the series was seen to be divergent for < k < 1 and convergent for k > 1. Expressed formally, as follows, these results are called the ratio test. CO theorem 12-5 (ratio test) If the series ^ o n is such that a n ^ and Cln+l lim «-*00 a n = L, then (a) the series Sa w converges absolutely if L < 1, (b) the series 2a„ diverges if L > 1 , (c) the test fails if L = 1. Example 12-6 (a) Consider the series Then a n ¥= and 2=2 = r_i )2 » + i <" + W"" a„ K ' (n + l)»+i, !» / 1\ _ = ( _ 1)2 „ +1 ( 1+ -J l\-» Hence lim n-*co tf»+l -, , r„ 1 /( 1+ ;)"'"' 5 - where the final result follows by virtue of the work of Section 3-3. As e > 1, the ratio test proves the absolute convergence of this series. (b) Consider the series 2 l/ w - Here a n = l/«! =^=0 and n = l ni 1 dn+l a n («+!)! n + 1 Ctn+l On SEC 12-1 SERIES / 541 Hence lim On+1 = lim — n—a> n + 1 = 0, and as < 1 the ratio test proves the series to be convergent. 00 (c) Consider the series 2 3"/w. n = l Then a n ^ and flit+i = / 3"+* \/n\ _ / n \ _ a n \ n+ [j\3n)- \„+\)~ tf»+l a n Now lim n— *oo 0«+l On = lim 3n „-*«, n + 1 = 3, and as 3 > 1 the ratio test proves the series to be divergent. (d) Consider the series J l/(2« + l) 2 . «=i Then n,^0 and gn+l Now lim n— *oo = / 2 "+ 1 N i 2 \2n + 3/ ~ tf»+i «n tf»+i On n-.oo \2n + 3/ so that the ratio test fails in this case. In fact the series is convergent, as may readily be proved by use either of the comparison test, with b n = Ijn 2 , or the integral test. As the remainder term R N used in the proof of the ratio test may be either positive or negative, the estimate (*) is equivalent to I Rn I < | a N+1 \(l+ r + r*+- • •) or, summing the geometric progression, to Rn\<- 1 -r This simple result provides an estimate of the error if the summation is terminated after N terms and comprises our next result. Corollary 12-5 (R N deduced from ratio test) Let the series J* a„ be con- vergent, and let the ratio test be applicable with " =1 542 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 lim n->oo tf«+l a n = L. Then, if r is a number such that L < r < 1, the remainder R N after N terms is such that \R»\< laN+l1 1 -r Let us use Example 12-6 (a) to illustrate this and to compute \ Rs\. We have L = 1/e and, as e = 2-7182 . . ., we could take r = 0-5. Then 1/(1 - r) = 2, whence , 2NI Hence \Rs\< 48/625. In the so called nth root test, appeal is also made to the geometric pro- gression to prove convergence. Suppose the series £#« is such that lim VI an\=L, and that L <\. Then if r is some definite number such that L < r < 1, the existence of the limit implies that there exists an integer N such that "VI a„\ < r for n> N. Hence | a n \ < r n for n > JV. Thus, as with the ratio test, the remainder after N terms may be overestimated by the sum of the absolute values of the remaining terms, and the result still further overestimated in terms of | fljv-n | and a geometric progression with common ratio r. As r < 1 this re- mainder is finite, thereby establishing that 2a„ is absolutely convergent. If L > 1, then successive terms grow and the series is divergent. As with the ratio test, the nth root test fails when L = 1, for then Sa„ may be either convergent or divergent. Stated formally we have : 00 theorem 12-6 («th root test) If the series J a n is such that lim "VI a n \=L, then (a) the series 2a„ is absolutely convergent if L < 1, (b) the series 2a„ is divergent if L > 1 , (c) the test fails if L = 1. Example 12-7 (a) Consider the series SEC 12 ' 1 SERIES / 543 nk \ n V / nk V n =i \3n + 1/ where k is a constant. Then / nk \ n °n — I I and lim » VI «» I = h'm nk k 3«+ 1 3 Thus the nth root test shows that the series will be convergent if k < 3 and divergent if k > 3. It fails if k = 3, though Theorem 12-2 then shows the series to be divergent. 00 (b) Consider the series £ nj2 n . »=i Then a„ = n\l n = | a n |, and "VI «» 1 = i B V«- Taking logarithms we find log [VK |] = log | + - log w. n Now by Theorem 6-4 (b) we know that lim (log n)\n = 0, so that «->co lim log [VI a» |] = log 1, whence lim V" = s- n-*co As | < 1 the test thus proves convergence. In this instance it would have been simpler to use the ratio test to prove convergence. If Sa„ is convergent by the «th root test, then we have seen that a number N exists such that | a n | < r n for n > N, where < r < 1. Hence we have «> co oo jv+1 Rn= 2 a n <\R N \< ^ |a»l< X rn = - ' n = N + l n = N + l n = N + l I — T and so Rn\< r ■JV+l 1 -r We express this overestimate of the remainder term as a corollary to the nth root test. CO Corollary 12-6 (R N deduced from nth root test) Let X a n be convergent by the nth root test with n=1 lim VI a n | = L. 544 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 Then if r is a number such that L < r < 1, the remainder Rn after N terms is such that r Rn\< ■jv+i 1 - r We illustrate this result by obtaining an estimate for the remainder after three terms of the convergent series of Example 12-7 (b). In that case L = \ so that we must choose r such that \ < r < 1 . If we select r = f , then Rz\< 8/5\ 4 _ 625_ 3\8/ ~ 1536' Had r been chosen closer to the value \, then a sharper estimate would have been obtained. Thus by taking r = 9/16 it follows that , „ , 6561 *3 < 28672 For our final result we prove that all series in which the signs of terms alternate, whilst the absolute values of successive terms decrease monotonic- ally to zero, are convergent. Such series are called alternating series and are of the general form 00 2 (-l) n+1 a n = ai - 02 + as - 04 + • • •, n = l where a n > for all n. To prove our assertion of convergence we assume «i > a% > az > • • ; and lim a n = and first consider the partial sum SW corresponding to an even number of terms 2r. We write Sir in the form S 2r = (ai — a 2 ) + {az — at) + • • • + (a 2r -i — air). Then, because fli > ai > az > • ■ ; it follows that SW > 0. By a slight rearrangement of the brackets we also have Sir = a\ — (02 — 03) — («4 — O5) — • ■ • — («2r-2 — «2r-l) — «2r, showing that as all the brackets and quantities are positive, S 2r < fli- Hence, as iS , 2r is a bounded monotonic decreasing sequence, we know from Chapter 3 that it must tend to a limit S, where < S < ai. Next consider the partial sum SW+i corresponding to an odd number of terms 2r + 1. We may write S% r +i = S 2 r + a 2r +i- Then, taking the limit of S 2 r+i we have lim Szr+i = lim S 2r + lim a 2r +i = S, SEC 12-1 SERIES / 545 because by supposition lim a 2r +i = 0. Thus both the partial sums 5 2r and the partial sums S 2r +i tend to the same limit S. Hence we have proved that for n both even and odd lim S n = S, n—-co thereby showing that the series converges. CO theorem 12-7 (alternating series test) The series 2 (— l) n+1 a n converges n = l if a„ > and a n +i < a n for all n and, in addition, lim a n = 0. Example 12-8 (a) Consider the alternating series i<-=!)?_ 1 _I + _L__L + ... n ~i 2» 2 2* 23 in which the absolute value of the general term a n = $». Then, as it is true that «n+i < a n and lim a n = 0, .the test shows that the series is convergent, (b) Consider the alternating series 00 2 (-1)" +1 2 ""V2 = a/2 - V2 + V2 - V2 + • • -, «=i in which the absolute value of the general term a n = n+1 s/2. Now it is true that a n +\ < a n , but lima« = 1, so that the last condition of the theorem is violated rendering it inapplicable. Theorem 12-2 shows the series to be divergent. The form of argument that was used to show < S 2r < ai also shows that 0< | (-l)»+ia r <a 2m+1 and, by a slight modification, that 00 -tf2m< 2 (-l)" +1 a r <0. r=2m 00 As R 2m = 2 ( — ^) ra r is the remainder after an even number 2m of terms, r = 2m + l and Rzm-i = 2 (~ 1 ) rfl >- is tne remainder after an odd number 2m — 1 of r=2m terms, it follows that if N is either even or odd, then < | R N | < a N+1 . 546 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 Expressed in words this asserts that when an alternating series is termi- nated after the Nth term, the absolute value of the error involved is less than the magnitude a^+i of the next term. CO Corollary 12-7 (R N for alternating series) If the alternating series £ (— l) n+1 a n converges, and R N is the remainder after N terms, then < | R N | < a N+1 . Using the convergent alternating series in Example 12-8 (a) for purposes of illustration we see that a„ = 1/2", and so the remainder Rn must be such that < | R N | < l/2- zv + 1 . For example, termination of the summation of this series after five terms would result in an error whose absolute magnitude is less than 1/64. A calculation involving the summation of a finite number of terms is often facilitated by grouping and interchanging their order. Although these operations are legitimate when the number of terms involved is finite, we must question their validity when dealing with an infinite number of terms. Later we shall show that the grouping of terms is permissible for any conver- gent series, but that rearrangement of terms is only permissible in a series when it is absolutely convergent, for only then does this operation leave the sum unaltered. An example will help here to indicate the dangers of manipulating a series without first questioning the legitimacy of the operations to be performed upon it. Consider the alternating series 1 2T3 — 4-I-5— 6"T , which is seen to be convergent by virtue of our last theorem, and denote its sum by S. Then we have s = 1 - i + 4 - i + 1 - * + I - i + i - h + h - h + ■ ■ ■ or, on rearranging the terms, C 1 _l_l_Ll_l_l_|_l 1- -1- _1_ . . . = (i - I) - 1 + (i - i) - i + (i - -A-) - A + • • • = 45. This can only be true if S = 0, but clearly this is impossible because Corollary 12-7 above shows that the error in the summation after only one term is less than \ and therefore S is certainly positive with \ < S < 1 . What has gone wrong. The answer is that in a sense we are 'robbing Peter to pay Paul'. This occurs because both the series 21/(2« + 1) and the series SEC 12 ' 1 SERIES / 547 21/2/j from which are derived the positive and negative terms in our series are divergent, and we have so rearranged the terms that they are weighted in favour of the negative ones. Other rearrangements could in fact be made to yield any sum that was desired. In other words, we are working with a series that is only conditionally convergent, and not absolutely convergent. It would seem from this that perhaps if a series 2a„ is absolutely convergent, then its terms should be capable of rearrangement and grouping without altering the sum. Let us prove the truth of this conjecture, but first we prove the simpler result that the grouping or bracketing of the terms of a convergent series leaves its sum unaltered. Suppose that 2 fl „ is a convergent series with sum S. Take as representative of the possible groupings of its terms the series derived from 2a„ by the insertion of parentheses (brackets) as indicated below: Oi + a 2 ) + (a s + 04 + a 5 ) + a 6 + (a 7 + a 8 ) + ■ • ■. Now denote the bracketed terms by b u b 2 , . . ., where b± = ai + a 2 , bi = a 3 + 04 + a 5 , . . ., so that we have associated a new series ~Zb n with the original series 2a„. If the nth partial sums of 2a„ and U„ are S n and S' n , respectively, then the partial sums S' u 5" 2 , 5" 3 , S" 4 , ... of 2Z>„ are, in reality, the partial sums S 2 , S 5 , S 6 , S s , . . . of 2a„. As 2a„ is convergent to S by hypothesis, any subsequence of its partial sums {S„} must also converge to S. In particular this applies to the sequence S z , S 5 , S 6 , S s , . . ., derived by the inclusion of parentheses. Hence 26„ is also convergent to the sum S, which proves our result. We now examine the effect of rearranging the terms of a series. Let 2 a „ be absolutely convergent so that 2 | a n \ must be convergent, and let 26„ be a rearrangement of 2a„. Then, as the terms of 2 | b n \ are in one-to-one correspondence with those of 2 | a n |, it is clear that 2 [ b n \ = 2 | a n |, from which we deduce that T,b„ is also absolutely convergent. Next we must show that 2a„ and 26„ have the same sum. If S n is the nth partial sum of 2a n which has the sum S, then by taking n sufficiently large we may make | S n — S | as small as we wish; say less than an arbitrarily small positive number e. Now let S' m be the mth partial sum of U„. Then, as S n contains the first n terms of 2a„, with their suffixes in sequential order, by taking m large enough we can obviously make S' m contain all the terms of S„ together with m - n additional terms a p , a q , . . ., a r , where n<p<q< ■ • • < r. Hence we may write S' m ■=■ S n + cip + a q + • • • + a r , whence S m — S = S n — S + a v + a q + • • ■ + a r . Taking absolute values gives 548 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 | S' m - S | < | Sn - S | + | a v | + | a q \ + ■ ■ ■ + | a r |. Now, n was chosen such that | S n — S \ < e, so that | S' m - S | < e + | a v | + | a q \ + • • ■ + \a r \. However, the remaining terms on the right-hand side of this inequality all occur after a n in the series £a„, and as | S n - S \ < e, it must follow that their total contribution cannot exceed e, and thus | S' m - S | < 2e. This shows that the wth partial sum of S6« converges to the sum S, so that rearrangement of the terms of an absolutely convergent series is permissible and does not affect its sum. 00 theorem 12-8 (grouping and rearrangement of series) If the series ^a n M = l is convergent, then parentheses may be inserted into the series without affect- CO ine its sum. If, in addition, the series 2 a n is absolutely convergent, then its terms may be rearranged without altering its sum. Example 12-9 (a) Consider the series 2 1 . m =i m{m + 1) which is easily seen to be absolutely convergent by use of the comparison test with b m = 1/w 2 . As absolute convergence obviously implies convergence, the first part of Theorem 12-8 asserts that we may group terms by inserting parentheses as we wish. So, using the identity 1 1 1 m{m +1) m m + 1 we find for the «th partial sum S n the expression s,_5(! ' ). M = i\/w m + 1/ Now successive terms in this summation cancel, or telescope as the process is sometimes called, leaving only the first and the last. This is best seen by writing out the expression for S n in full as follows : *-G-iK-i)— ♦ (^i-i) + (i-dn) 1 SEC 12-1 SERIES / 549 Hence, if the sum of the series is S, we have = 1. S = lim S„ = lim [l — (b) Consider the series 2 3 2 2 3 2 2 3 3 3 ' which can be shown to be absolutely convergent by an extension of the «th root test. (See Problem 12-14.) The second part of Theorem 12-8 is applicable, so that we may rearrange terms and, denoting the sum by 5, we obtain 00 1 00 1 s = I z- + 2 - 1 _ 1 + r = 7/2. The use of parentheses in a divergent series can sometimes produce a convergent series and, conversely, when attempting to alter the form of a convergent series a divergent series may sometimes be produced inadvertently. For instance, taking Example 12-9 (b), we could have written y —!— = y( r !JiA- , L±l) = y " + l _ y n + 2 ~ i n r »+ 1 = 2 + f«_±i_|«_±i = 2 , 2 « 2 « which we know to be an incorrect result. The error is, of course, contained in the first line in which we attempt to equate an absolutely convergent series with the difference between two divergent series. 12-2 Power series Up to now we have been concerned entirely with series that did not contain the variable x. A more general type of series called a power series in (x - x ) has the general form 00 2 a n {x - x ) n = a + ai(x - x ) + a 2 (x - x ) 2 + • • ; (12-1) in which the coefficients a , a u . . .,a n ,. . . are constants. When x is assigned some fixed value f, say, the power series Eqn (12-1) reduces to an ordinary 550 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 series of the kind discussed in the previous section, and so may be tested for convergence by any appropriate test mentioned there. For simplicity we now apply the ratio test to series Eqn (12-1), allowing x to remain a free variable, in order to try to deduce the interval for x in which the series is absolutely convergent. If a»(x) is the absolute value of the ratio of the (n + l)th term to the nth term as a function of x, we have (*) = a„+i(x — x ) B+1 a n (x — x ) m Gn+l a n X — Xo Now for any specific value of x, the ratio test asserts that the series will be convergent if lim oc»(x) < 1, whence we must require lim n-*-oo a»+i dn x — xt> I < 1. Thus the largest value r, say, of ) x — x \ for which this is true is given by r = lim n-*co a n tfn+1 provided that this limit exists. The inequality I x — Xo I < r (12-2) (12-3) thus defines the x-interval {x — r, x + r) within which the power series Eqn (12-1) is absolutely convergent. For x outside this interval the ratio test shows that the power series must be divergent. (See Fig. 12-2.) The interval itself is called the interval of convergence of the power series, and the number r is called the radius of convergence of the power series. The interval of con- vergence has been deliberately displayed in the form of an open interval because the ratio test can offer no information about the behaviour of the series at the end points. In fact the power series may either be convergent or divergent at these points. Divergent Absolutely convergent Divergent xo + r Fig. 12-2 Interval of convergence. The radius of convergence of a power series can also be deduced from the nth root test, when it is easily seen that 1 r = lim ,, (12-4) provided that this limit exists. SEC 12-2 POWER SERIES / 551 definition 12-2 (radius of convergence of power series) The radius of GO convergence r of the power series £ a„(x - x ) n is denned either as: «=o r = lim n->co a n Qn+l or r = lim „^oo M V| a n | provided that these limits exist. Example 12 10 (a) Let us show that the series for the exponential function is absolutely convergent for all real x. We have x n 2! 3! «! in which the general term a n = l/«!. Now + an Ctn+1 (n + 1)! nl = (« + 1), so that r = lim (n + 1) ->- oo. ?l— *-00 We have thus proved that the power series for e* is absolutely convergent for all real x. This was an example of a power series with an infinite radius of convergence. (b) Consider the series x 2 x 3 x 4 x 1 u- • • 2^3 4 ^ which reduces to the illustrative example following Corollary 12-7, when x = 1. We shall see later that this is the power series expansion of log (1 + x). Then, again applying limit (12-2), we have a n = (-l)»+i/«, and so On a n +i Thus we have =m 552 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 .«Iim(l±i\- I. Hence, the series is absolutely convergent for | jc | < 1. As we already know the series is convergent for x = 1, and divergent for x = —1 for then it becomes the harmonic series with the signs of all terms reversed, we have proved that the power series for log (1 + x) is absolutely convergent for — 1 <x< 1. This was an example of a power series with radius of convergence unity. (c) Consider the series 1 + x + (2a:) 2 + (3x) 3 + • • • + (iw)» + ■ • ", then a n — n n so that 1 1 VI a n \ n Hence, from Eqn (12-4), r = lim - = 0. This series has zero radius of convergence and so is absolutely convergent only when x = 0. That is to say this power series has a finite sum, and so is convergent, only at the one point x = on the real line. As a power series is yet another example of the representation of a func- tion of the variable x, it is reasonable to enquire how we may differentiate and integrate functions that are so defined. For simplicity we will take xo = 0, and work with the power series about the origin 00 fix) = 2 a«x». (12-5) M = This is no restriction because Eqn (12-1) can be brought into this form by shifting the origin by means of the change of variable t = x — xo. We will assume that Eqn (12-5) has a radius of convergence r > 0. Intuition suggests that the derivative of/(x) could be obtained by differ- entiating the right-hand side of Eqn (12-5) term by term and, similarly, that r Jo fit)dt could be obtained by term by term integration. However, extreme 'o caution must be exercised in such matters for we have already seen that what is legitimate for the sum of a finite number of terms is not necessarily legiti- mate for an infinite series. Furthermore, we are now dealing with an infinite series of functions, and not just an ordinary series. In fact we shall show that termwise differentiation and integration of a power series is always per- missible when x lies within the interval of convergence — r < x < r of Eqn (12-5). SEC 12-2 POWER SERIES / 553 The justification of termwise differentiation that we now offer is perhaps the most subtle and difficult proof to be found in this book. It has been in- cluded because differentiation of functions defined by a power series is fundamental to many branches of mathematics. In fact we have already employed termwise differentiation when deriving the series representation for e x in Chapter 6, and we shall use it again when discussing differential equa- tions. The proof of this result also serves to indicate how any study of the subject beyond this level must, of necessity, involve the notion of uniform convergence. This aspect of the proof is not emphasized here, since it is beyond the scope of a first account. Our object will be to prove that the function oo F(x)=J t na n x n - 1 (12-6) n = \ is the derivative of the function /(x) of Eqn (12-5), that is to say that/'(x) = Fix). First notice that Eqns (12-5) and (12-6) have the same radius of convergence. This follows because, by hypothesis, a n lim n— *oo I Qn+1 = r, and the ratio of the wth to the (m + l)th coefficient of Eqn (12-6) is ma m j{m + \)a m +\, whence lim m-*co ma m (m + l)a m+ i = lim I J . lim m— * oo \ W -f- 1 / m— * co a m Qm+l Next, if x and x + h are points in the interval of convergence, form the difference quotient fix + h)- f(x) _ - /( X + h)» - x»\ h - h an \ i r (12 ' 7) The grouping of terms on the right-hand side is permissible because of the absolute convergence of the power series for/(x) in — r < x < r. Then, applying the mean value theorem for derivatives (Theorem 5T2) to the general term on the right-hand side of Eqn (12-7), we have (x + /j)» - x n = hn£ n "-\ where x < g„ < x + h for n = 1, 2, . . .. Thus we arrive at the result fix + h) -fix) » . J = I na n $ n «-\ (12-8) It n = l 554 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 Then, as Eqns (12-5) and (12-6) have the same radius of convergence, we may consider the difference between Eqns (12-6) and (12-8), again using the fact that absolute convergence permits rearrangement of terms to give F(x) f(x + h)-f(x) h = 2 na n (x«- 1 - tn^ 1 ), n = 2 or F(x)- f(x + h)-f(x) In n = 2 a n \\x K - 1 - fn"" 1 Let us again use the mean value theorem for derivatives to obtain the result X"- 1 - in"- 1 = (n- IX* - tnMn n ^, where x < rj„ <£». Then, as | x — |» \ < \ h \, we have f(x + h)-f(x)\ F(x) < I h | 2 n | a n | »?„ B - 2 , (12-9) for —r<x< r. Now the form of argument used to prove that the power series Eqn (12-6) has radius of convergence r, also proves that the series on the right-hand side of this inequality has radius of convergence r. So, allowing h to tend to zero, as the sum of the series is finite the right-hand side of Eqn (12-9) also tends to zero whilst the difference quotient approaches/'(*)- Hence we have proved our result. The difficult part of this proof was in showing that the right-hand side of Eqn (12-9) can be made arbitrarily small independently of x in the interval of convergence. This is the property of uniform convergence mentioned in Chapter 3. As differentiability implies continuity we have, as an incidental result, proved that a power series is continuous within its interval of convergence. A more direct proof is indicated in Problem 1219 at the end of the chapter. The termwise integrability of power series is easier to prove. Denote by H(x) the series H(x) = 2 a n i-n+l (1210) 'o n + 1 which is obtained by termwise integration of Eqn (12-5). That is H(x) = f 7(0 dt. Jo Now the ratio of the «th to the (n + l)th coefficients of Eqn (12T0) is (« + \)a n -\\na n , whence lim n— i-co (n + 1) a n -\ n a n = lim I I lim a-n-l a n SEC 1 2'2 POWER SERIES / 555 This shows that the power series Eqn (12-10) also has radius of convergence r. We have just established that a power series is differentiable for x within its interval of convergence, so that H'{x) =f(x) for — r < x < r. Thus by the fundamental theorem of calculus /; f(t)dt = H(x) - H(0) = H(x), which was to be proved. Let us collect together these results into the form of a theorem. theorem 12-9 (differentiation and integration of power series) Let the function /(x) be defined by the power series GO f{x) = 2 a n x\ n — O with radius of convergence r > 0. Then, within the common interval of convergence — r < x < r, (a) f(x) is a continuous function ; 00 (*>)/'(*) = 2 ««»*«-i; J % X CO fiOdt = 2 B = l n = n + 1 x" Example 12-11 Find the radius and interval of convergence of °0 v» /(*)=2 T^-TT- „^i n(n + 1) Deduce/'(^) and find its interval of convergence. Solution The «th coefficient a„ of the power series f orf(x) is a n = \]n(n + 1), and so the radius of convergence r is given by r = lim «n+l = lim n— i-co n + 2 = 1. To specify the complete interval of convergence it remains to examine the behaviour of the power series at the end points of the interval — 1 < x < 1. The series may be seen to be convergent at x = 1 by using the comparison test with b n = l/«2. when x = - 1 the series becomes an alternating series and is seen to be convergent by Theorem 12-7. Thus the complete interval of convergence for/(x) is — 1 < x < 1. Under the conditions of Theorem 12-9 (b) we may differentiate the power series for/(x) term by term within — 1 < x < 1, so that 556 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 00 Y «-l /"(*) = 2 n = l n+ 1 To specify the complete interval of convergence for this new series which, by Theorem 12-9 (b), is certainly convergent in — 1 < x < 1, we must again examine the end points of the interval —1 < x < 1. The series for f'(x) becomes an alternating series when x = — 1, and is convergent by Theorem 12-7. At x = 1 it becomes the harmonic series, and so is divergent. The com- plete interval of convergence for f'(x) is thus — 1 < x < 1 . The effect of termwise differentiation has been to produce divergence of the differentiated series at the right-hand end point of an interval of convergence at which f(x) is convergent. Example 1212 Find the power series representation of arctan x by considering the integral dt arctan x -fr + t 2 Deduce a series expansion for \n. Solution An application of the Binomial Theorem to the function (1 + a) -1 gives the result = 1 - a + a 2 - a 3 + a 4 - • • •, 1 + a for —1 < a < 1. Setting a = t 2 we arrive at the power series representation Of (1 + **)-!, 1 = 1 - /2 + ,4 _ ,6 + ,8 _ . . ._ ( A ) 1 + r 2 The conditions of Theorem 12-9 (c) apply, and we may integrate this power series term by term to obtain r x dt C x arctan x = = (1 - t 2 + f 4 - t 6 + t s - • • -)dt Jo 1 + ? 2 Jo or, v3 \-5 v-7 arctan x = x — r+-r ^ + ' ' '• (B) This is the desired power series for arctan x and by the conditions of Theorem 12-9 (b) it is certainly convergent within the interval —1 < x < 1, which is the interval of convergence of the original power series Eqn (A). At each of the end points x = ± 1 of this interval, the power series Eqn (B) becomes an alternating series which is seen to be convergent by Theorem SEC 12-2 POWER SERIES / 557 12-7. Hence the interval of convergence of the integrated series Eqn (B) is — 1 < x < 1. Using the fact that arctan 1 = £77, we find frr=l-l+l-*+--- 12-3 Taylor's theorem So far we have discussed the convergence properties of a function/(x) which is defined by a given power series. Let us now reverse this idea and enquire how, when given a specific function /(x), its power series representation may be obtained. Otherwise expressed, we are asking how the coefficients a n in the power series 00 /(*)=!>«*" (12-11) M = may be determined when/(x) is some given function. First, by setting x - 0, we discover that/(0) = a . Then, on the assump- tion that the power series Eqn (12-11) has a radius of convergence r > 0, differentiate it term by term to obtain /'(*) = 2 nanx"' 1 , (12-12) for —r<x<r. Again setting x = shows that/'(0) = ai . Differentiating Eqn (12-12) again with respect to x yields CO f(x) = 2 n(n - l)a„x»-2, (12-13) from which we conclude /"(0) = 2\a%. Proceeding systematically in this manner gives the general result 00 fm)( x ) = 2 m ( m _ i) . . . ( OT _ „ + l)a m x»-™, (12-14) n = m so that /<»>(()) = n\a„. Thus the coefficients in power series Eqn (12-11) are determined by the formula /«»>(0) an = — (12-15) for n > 1 and ao = /(0). Substituting these coefficients into Eqn (12-11) we finally arrive at the power series x 2 x n f(x) =/(0) + xf (0) + - fiO) + ■ ■ • + -/<»>(0) + ■ • -. (12-16) 558 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 The expression on the right-hand side of this equation is known as the Maclaurin series for fix), and it presupposes that f(x) is differentiable an infinite number of times. To justify the use of the equality sign in Eqn (1216) it is, of course, necessary to test the series for convergence to verify that its radius of convergence r >• 0, and to show that \f{x) — S n {x) \ -*■ as n -* oo, where S n (x) is the sum of the first n terms of the Maclaurin series. We shall return to this matter later. To transform Eqn (12-16) into a power series in (x — xq) we set x = xo + h and let/(x + h) = 4(h). Then <f>\h) =f'(x + h), <f,'(h) = /"(x + h), . . ., 4 in) (h) = /<»>(x + h), . . .. It thus follows that <£<»>(0) =/<»>(*„) for n > 1 and <£(0) = /(xo). The Maclaurin series for <j>(h) is h 2 h n <f>{h) = 4(0) + Af (0) + - ^"(0) + ... + _ 0<«)(O) + • • -, or, reverting to the function/, f{x) =f(x ) + (x- xo)f'(x Q ) + (X ~ v Xo)2 r(xo) + • • • (x -Xq)" + f^f^Kxo) + ■ ■ : (12-17) Expressed in this form the expression on the right-hand side is called the Taylor 'series for/(x) about the point x = xo- Example 12-13 Find the Maclaurin series for log (1 + x) and log (1 — x). Deduce the expansion for log [(1 + x)/(l — x)]. Solution Setting /(x) = log (1 + x) we find 1 -1 (-l)"- 1 ^ - D! f™ - T+? '"« " (TT# ' ' " /""W " (i + V and so /(»)(0) = (-l)»-i(«- 1)! for n > 1 and /(0) = 0. Combining this expression for / (n) (0) with Eqn (12-16) gives for the Maclaurin series for log (1 + x), x 2 x^ x 4 log (1 + x) = x --+--- + •• -. This has already been examined for convergence in Example 12-10 (b) and found to be absolutely convergent in the interval — 1 < x < 1. In the case of the function log (1 — x) the same argument shows that /<»>(0) = -(« - 1)! SEC 12-3 TAYLOR'S THEOREM / 559 for n > 1 and/(0) = 0, so that the Maclaurin series for log (1 — x) has the form log (1 — x) = —x • ■ •. ev ' 2 3 4 This can readily be seen to have — 1 < x < 1 for its interval of convergence. Using the fact that log {(1 + x)j{\ - x)} = log (1 + x) - log (1 - x) gives the desired result m- ■y*o v-5 v* '°g(7-^J= 2 ^ + j + y + y + - for — 1 < jc < 1. Strictly speaking, we are not yet entitled to use the equality sign between the function and its Maclaurin series, as we have not yet established the con- vergence of the «th partial sum of the series to the function it represents. We will do this later. Example 12-14 Use Taylor's series to express the polynomial P(x) = x 4 + 3x 3 + x 2 + 2x + 1 in terms of powers of (x — 1). Solution To utilize the Taylor series in Eqn (12- 17) we must set xo = 1 and f(x) = P(x). Then a simple calculation shows that P(l) = 8, P'(l) = 17, P"(\) = 32, P'"(l) = 42, P<iv)(!) = 24 and P ( »>(l) = 0forn>5. Hence we arrive at the finite power series P (x) = 8 + (x-l). 17 + ^=^. 32 + ^i^. 42 + ^=i>. 4 .24, or P( x ) = 8 + I7(x - 1) + 16(jc - l) 2 + l(x - l) 3 + (jc - I) 4 . The use of the equality sign is fully justified here since we are dealing with a finite power series. It can happen that the derivatives of a function f(x) are not defined at x = so that its formal Maclaurin series expansion cannot be obtained. In this case, provided the function is infinitely differentiable at the point x = xo, then/(X) may be expanded in a Taylor series about that point. Such a case is discussed by the following simple example. Example 12-15 Derive the nth derivative f {n) (x) of the function f(x) = 560 / SERIES, TAYLOR'S THEOREM AND ITS USES CH 12 x log x, and show that / (n, (0) is not denned. Deduce the Taylor series expansion of /(a;) about the point x = 1. Solution Direct differentiation shows that/ (1) (x) = 1 + log x,/ (2, (x) = l/x, /<3>( X ) = _i/ x 2 ; y<4)( x ) = 2!/x 3 ,/ <5, (x) = -3!/x 4 , . . ., and in general (-l)«(n-2)! for n ^ 2. Hence it is clear that/ (n) (0) is not defined for any n. However, the numbers /<»>(1) are defined for all n and/ ( »»(l) = (-l)"(n - 2)! for n > 2 and/(l) = 0,/ (1) (l) = 1. The Taylor series for x log x can now be obtained from Eqn (1217) by making the identification xo = 1 and then using the derivatives / (n) (l) which have just been computed. We find xlogx = (*-l) + — 2Y~ + ~TT~ " 4.5 + " which is the desired result. Again, we have used the equality sign without first showing that the «th partial sum of the Taylor series converges to x log x as n -*■ oo. Regarding this as a power series in the variable t = (x — 1) we find that the coefficient a„ of the power t n is a„ = (-l)»/n(n - 1), whence the radius of convergence n(n + 1) /• = lim «-*0O On #n+l = lim n— >-°o (« - 1)« = 1. The power series is thus absolutely convergent in the interval — 1 < t < 1 or, equivalently, in < x < 2. The series is convergent when x = 2, because then it becomes an alternating series. It is also convergent when x = by comparison with the series with the general term b„ = 1/n 2 . In fact we can do better than this when x = 0, for then we can actually sum the series. Aside from the first term, which becomes — 1, the sum of the remaining terms must be + 1 by virtue of Example 12-9 (a), showing that if the equality sign may be believed, then lim (jc log x) s= 0. X-+0 This is encouraging, because it is in agreement with the result which can be obtained from Theorem 64 (b) by replacing x by l/x. This would strongly suggest that our series is in fact equal to x log x in the complete interval of convergence < x < 2. We have attempted to emphasize that although we have indicated how a Maclaurin or Taylor series may be associated with a function f(x) that is infinitely differentiable, the general question of just exactly when the series is equal to the function with which it is associated still remains open. To S£ C 12-3 TAYLOR'S THEOREM / 561 indicate that an infinitely differentiable function need not be represented by its Maclaurin series at more than a single point, despite the fact the series is convergen