h/\TH He, U*l (QatntU HnioerattH ffitbrarg ittiuta, Jfro ^orb BOUGHT WITH THE INCOME OF THE SAGE ENDOWMENT FUND THE GIFT OF HENRY W. SAGE 1891 MATHEMATICS Date Due SEP ' M958 _r *£ft j ® 23 233 , ,_ Cornell University Library HG8781.F53 1922 An elementary treatise on frequency curv 3 1924 001 546 971 The original of this book is in the Cornell University Library. There are no known copyright restrictions in the United States on the use of the text. http://www.archive.org/details/cu31924001546971 An Elementary Treatise on FREQUENCY CURVES and their Application in the Analysis of Death Curves and Life Tables by Arne Fisher. Translated from the Danish by E. A. Vigfusson. With an Introduction by Raymond Pearl, Professor of Biometry and Vital Statistics .Johns Hopkins University, Baltimore. i* American Edition. New York. THE MACMILLAN COMPANY. 1922. N EM k^\>KQ>%f Printed by Bianco Luno, Copenhagen. INTRODUCTION 1 he fact that actuarial science is fundamentally a branch of biology rather than of mathematics is overlooked far more generally than ought to be the case. Most people, even those of education and wide culture, are inclined to look upon an actuary as a particularly crabbed, narrow, and intellectually dusty kind of mathematician. In reality his subject is one of the liveliest in the whole domain of biology, and none surpasses it in its practical interest and import- ance to mankind. Because, what the actuary is, or at least should be, trying always to formulate more and more definitely are the laws which determine the duration of human life. Why the actuary in fact is too often intellectually but little more than a sort of glorified computer, is really only the result of a defect in the teaching of biology in our colleges and universities. It has only lately come to be recognized anywhere that a biologist needed a substantial founda- tion in mathematics in order successfully to practice a biological profession. It is' not too rash a prediction to say that presently the time is coming when no important actuarial post will be held by a mathe- matician who knows little or no biology. The vigor and originality of his biological outlook will be valued as highly as the rigidity of his mathematical sub- structure now is. II Introduction. The thing which chiefly makes this book by my friend Arne Fisher notable, lies, in a broad sense, in the fact that it is a highly original and absolutely novel essay in general biology. The language is to a considerable extent mathematical, to be sure, but the subject matter, the mode of logical approach, and the significant conclusion — all these are pure biology. Unfortunately many biologists will not be able to appreciate its significance, or even to read it intel- ligently. But this is their loss, and at the same time an exposure of the dire poverty of their intellectual equipment for dealing with the problems of their science. There are two broad features of Fisher's work which want emphasis. The first is the successful construction of a life table from a knowledge of deaths alone. That the construction is successful his results set forth in this book abundantly demonstrate. To have done this is a mathematical and actuarial achievement of the first rank. It may fairly be regarded as fundamentally the most significant ad- vance in actuarial theory since Halley. It opens out wonderful possibilities of research on the laws of mortality, in directions which have hitherto been wholly impossible of attack. The criterion by which the significance of a new technique in any branch of science is evaluated, is just this of the degree to which it opens up new fields of research. By this criterion Fisher's work stands in a high and secure position. But of vastly more significance considered purely as an intellectual achievement is his discovery of the fundamental biological law relating the several causes of death to each other, which made the tech- nical accomplishment possible. More than one accepted Introduction. JJT_ text book on vital- statistics has scornfully instructed its readers that no good whatever could come from any tabulation or study of death ratios; that they must be avoided as the pestilence by any statistician who would be orthodox. But orthodoxy and discovery are as incompatible intellectually as oil and water are physically, a cosmic law often overlooked by our " safe and sane" scientific gentry. This book is an outstanding demonstration that this law is still in operation. Fisher has had the temerity to study the ratios of deaths from- one cause or group of causes to those from another group, or to all causes together, and 1 has discovered that there abides a real and hitherto unsuspected lawfulness in these ratios. Here again his pioneer work opens out alluring vistas to the thoughtful biometrican. Altogether we of America are to be warmly congratulated that this brilliant Danish mathematical biologist has chosen to come and live with us. Baltimore, November 1921. Raymond Pearl. AUTHOR'S PREFACE 1 he classical method of measuring mortality rests essentially upon the fundamental principles first enunciated by the British astronomer, Halley, in his construction of the famous Breslau Life Table. Since the time of Halley this method has been so thoroughly investigated and has been perfected to such an extent that new developments along this line cannot be expected. Any improvements on the original principles of Halley are after all nothing but refinements in graduating methods; and even in this line it appears that the limit of further perfection has been reached. Halley's method, which is purely empirical in scope and principle, rests primarily upon the know- ledge of the number of persons exposed to risk at various ages and the correlated number of deaths among such exposures. In all cases where such information is at hand the old and tried method meets all requirements to our full satisfaction; and it would appear superfluous to try to supplant it with fun- damentally different principles. In presenting the new method outlined in this little book I wish to state most emphatically that it has never been my intention to try to supersede the conventional methods of constructon of mortality tables wherever such methods are applicable. My proposed method is only a supplement to the former Author's Preface. V tools of statisticians and actuaries, and aims to utilize numerous statistical materials to which the older system of Halley is not applicable. The idea, whether it is new or not, meets in reality a very frequent need in mortality investigations. It is a well known fact that in the determination of certain statistical ratios, it is easier to determine the nume- rator than the denominator, as for instance in life or sickness assurance, where the losses can be ascertained with a very close degree of accuracy, while the collection of persons exposed to risk at various ages is often difficult to obtain. Similar remarks hold true in the case of numerous statistical summaries of mortuary records as published in most government reports on vital statistics. The desire to utilize this enormous statistical material was what led me to try the proposed method. In principle the plan is fundamentally different from that of the empirical method of Halley, inasmuch as I have attempted to substitute the inductive principle for that of pure empiricism. In the first place, I consider the d x curve, or the number of deaths by attained ages among the survivors of an original cohort of say 1,000,000 entrants at age 10, as being generated as a compound curve of a limited number (say 8 or less) of subsidiary component curves of either the Laplacean-Charlier or Poissori-Charlier type. The method of induction now consists in deter- mining the constants or parameters of these sub- sidiary curves. These parameters fall into two separate categories: — A. The statistical characteristics or semi-invari- ants which determine the relative frequency distribu- VI Author's Preface. tion by attained age at death, as expressed by the mean, the dispersion, the skewness and the excess of each subsidiary or component curve. B. The areas of each subsidiary or component curve. The working hypothesis which I have put forward is that the relative frequency distribution of deaths by at- tained ages, classified according to a limited number of groups (generally 8 or less) of causes of death among the survivors of the original cohort of entrants, tend to cluster around certain ages in such a way that it is possible from biological considerations to estimate in practice with a sufficiently close degree of approximation the statistical characteristics or semi-invariants of the relative frequency distributions of the component curves, corresponding to a previously chosen classification of causes of death (into 8 or less subsidiary groups). This implies briefly that I suppose it is possible from biological considerations to select a priori the statistical characteristics of the category as mentioned above under A. Once this hypothesis is accepted as a true supposi- tion, the areas of each of the component curves can be determined by purely deductive methods (as for instance the method of least squares) from the observed values of the proportionate death ratios R B (x) (x = 10, 11, 12, 100; B =1, II, III, ) corresponding to the groups of causes of death. Thus the parameters as determined in this manner exhaust the given statistical material, i.e. the observed proportionate death ratios R B (x). A mere addition of the subsidiary or component curves Author's Preface. VII gives us then the compound d x curve from which it is an easy task to find the functions, i x and q x . The scheme as we have briefly outlined it above is, therefore, not a cut-and-dried doctrine or a sort of "mathematical alchemy" as some of my critics have implied. Nor is it an authoritative or infallible dogma. The keystone upon which its success depends is merely a working hypothesis; i.e. a temporary or preliminary supposition. I suppose something to be true and try to ascertain whether, in the light of that supposed truth, certain facts fit together better than they do with any other supposition hitherto tried. The validity of the working hypothesis must, in my opinion, be proved or disproved either by- independent methods and principles of construction of mortality tables, such as for instance the empirical principle of Halley, hitherto exclusively used by the actuaries, or through additional biological studies. l 1 The biological basis of Mr. Fisher's working hypothesis, which is of far greater importance than the purely ancillary mathematical deduc- tion, has apparently been overlooked by many of his American critics, such as Little, Thompson and Carver. Dr. Carver in the Proceedings of the Casualty Actuarial Society of America (Vol. VI, page 357) remarks that "if we can construct a table from death alone as in Proc. Vol. IV, and by dividing these deaths by q x , determine the unenumer- ated population — why not the converse?" The answer to this remark is obvious. In the case of mortuary records, Fisher considered two different and distinct attributes, namely 1) the purely quantitative attribute of attained age at death, and 2) the purely biological attribute of cause of death, which in conjunction with the working hypothesis to a certain extent aims to replace the unknown exposures. If we were to follow Dr. Carver's facetious suggestion and, to use his phrase, "go the proposed plan one better by using enumerated populations only", we should, however, encounter a statistical series with the single attribute of attained age only, but no second attribute corres- ponding to that of the biological factor of the cause of death. Criticisms VIII Author's Preface. In the meantime I feel justified in presenting to my readers the practical results obtained by this method, which although perhaps not unimpeachable in respect to mathematical rigour, neverthelees in my opinion offers a means to attack a vast bulk of collected statistical data against which our former actuarial tools proved useless. The celebrated Russian mathematician Tchebycheff, once made a remark to the effect that in the antique past the Gods proposed certain problems to be solved by man, later on the problems were presented by halfgods and great men, while now dire necessity fo.rces us to seek some solution to numerous practical problems connected with our daily conduct. The problem towards which I have made an attempt to offer a sort of solution in the present little essay is one of these numerous problems of dire necessity mentioned by Tchebycheff, and I hope that my work along this line, imperfect as it is, may nevertheless prove a beginning towards more improved methods in the same direction. In conclusion I wish to extend my thanks to a number of friends and colleagues both in America and Europe and Japan who have kept on encouraging me in my work along these lines in spite of much adverse criticism from certain statistical and actuarial circles. I wish in this connection to thank Mr. F. L. Hoffman, Statistician of the Prudential Insurance Company, for permitting me to apply the method to various collections of mortuary records while working as a computer in his department. My thanks are also of the sort of Dr. Carver's brings to light the fundamentally different principles applied by Mr. Fisher in sharp contradistinction to the purely empirical methods of the orthodox actuary and statistician. Translator. Author's Preface. IX due to Mr. E. A. Vigfusson for. making the trans- lation from my rough Danish notes. If the resulting English is perhaps open to criticsm, I beg to remind the reader that my original manuscript was written in Danish and translated into English by an Icelander, while the composition and proof reading was done by a Copenhagen firm. To Professor Glover of the University of Michigan I also wish to extend my thanks for inviting me to deliver a series of lectures on the construction of mortality tables before his classes in actuarial methods during the month of March 1919. This invitation afforded me the first opportunity to bring the proposed method before a professional body of statistical readers. Last but not least I desire to acknowledge my obligations to Professor Pearl whose introductory note I consider the strongest part of the book. In these departments of knowledge the appreciation of one's peers is after all the only real reward one can possibly expect. The fact that this eminent biologist has recognized that the nucleus of the whole problem is of a purely biological nature, and that the mathematical analysis is merely ancillary, is particularly pleasing to me, because it represents my own view in this particular matter. p. t. Newark, U. S. A., November 1921. Arne Fisher. TRANSLATOR'S PREFACE During the spring of 1919 the attention of the present writer was called to a brief paper entitled Note on the Construction of Mortality Tables by means of Compound Frequency Curves by the Danish statisticican, Mr. Arne Fisher. The novelty and originality of this paper impressed me to such an extent that I became desirous' of obtaining more detailed information about the process than that which necessarily was contained in the above summary note, originally printed in the Proceedings of the Casualty and Acturial Society of America. I wrote therefore to Mr. Fisher and inquired whether he intended to publish any further studies on this 1 subject. From his reply I learned that he had delivered a series of lectures on this very topic before Professor Glover's insurance classes at the University of Michigan during the month of March 1919, but that the proposed method had been met with such captious opposition in certain actuarial circles that he had decided to abandon the plan of publishing anything further on the subject and had even destroyed the English notes prepared for the Michigan lectures. In the meantime the proposed scheme had received considerable attention in actuarial circles in Europe and Japan and several highly commendatory Translator's Preface. XI reviews had appeared in the English and Continental insurance periodicals and various scientific journals, notably the Journal of the Royal Statistical Society and the Bulletin de V Association ales Actuaires Suisses. The proposed method seemed indeed so novel and unique that I could not help feeling that it deserved a better fate than that of being forgotten. I sug- gested therefore to Mr. Fisher that he prepare a new manuscript. But unfortunately his time did not allow this. He consented, however, to turn over to rne his original Danish notes on the subject from which he had prepared his Michigan lectures and permitted me to make an English translation for the Scandinavian Insurance Magazine. I gladly availed myself of this opportunity to bring this fundamental work before an international body of readers and started on the translation in the summer of 1919. At the same time Mr. Fisher decided to put the proposed method and working hypothesis to a very severe test, which would meet even the most stringent requirements of some of his critics and their conten- tion that the method would fail in the case of a rapidly changing population group. For this purpose he selected a- series of statistical data contained in the annual reports and statements of a number of the leading Japanese Life Assurance Offices, relating to their mortuary records for the four year period from 1914—1917. More than 35,000 records of male lives, arranged according to the Japanese list of causes of death and grouped in quinquennial age intervals formed the basis for the construction of the final life table which was completed in November 1919. This table, which like Mr. Fisher's other tables was derived without anv information of the number of XII Translator's Preface. lives exposed to risk at various ages, is shown in the addenda of this treatise. Immediately after its construction Mr. Fisher isent this table to the well known Japanese actuary, Mr. T. Yano, and asked him for an opinion regarding the trustworthiness of the final death rates of q x as derived by his new method. The Japanese actuary's answer arrived in April 1920. Mr. Yano had after the receipt of Mr. Fisher's letter ascertained the exposures and deaths among male lives at each seperate age for about 40 Japanese life offices during the period 1914 — 1917 and constructed by means of the conventional methods a complete series of q x by integral ages from age 10 to 90. These ungraduated data are shown as a broken line polygon in the appended diagram (Figure 1). In spite of the fact that Mr. Fisher had no information whatever about the exposed to risk the agreement of the continuous curve of q x as determined by the frequency curve method with Mr. Yano's ungraduated data is so close that I think further comments superfluous. The slight differences in younger ages might indeed rise from the fact that Mr. Yano had access to all the experience (containing more than 45,000 deaths) of all the Ja- penese companies, whereas Fisher only used the mortuary records as published by some of the leading Japanese companies. Like all scientific methods of induction Mr. Fi- sher's proposed plan rests upon a working hypothesis, namely that it is possible from biological considera- tions to group the deaths among the survivors at various ages in any mortality table according to causes in such a maimer that their percentage or relative frequency distribution according to attained Translator's Preface. XIII age at death will conform to a previously selected system or family of Laplacean-Charlier or Poisson- A-L -as. li i li li (i J [i [1 1 ft Co Ifr la y M • _8 >^^ ,f ^2^ ,*^* 4_ 1* SS**^^"*?^^^ A^. M Pac**, Fig. 1. Charlier frequency curves. Mr. Fisher himself is very frank in ■ stating that this is a working hypothesis XIV Translator's Preface. upon which hinges the success of the whole method. One of the main objections of his critics is that it seems impossible to prove the truth of this working hypothesis. Naturally its truth cannot be proved by mathematics or logic any more that we can prove or disprove the existence of Euclidean space, which in itself constitutes a working hypothesis for most of our applied mathematics. Mr. Fisher's critics might as well be asked to prove or disprove Newton's hypothetical laws of motion and attraction as extended by Maxwell and Hertz, or the newer hypothesis recently put forwards by the relativists, or the Lorentz hypothesis of contraction. It would indeed be a terriffic blow to science and the extension af knowledge if it was required that no working hypothesis would be alloved in scientific work unless such hypothesis could be proved to be true. What position would biology occupy to-day if biologists had insisted that Darwin's great hypothesis be proved before it could be allowed 1 as a foundation in the study of evolution? The most convincing answer to Mr. Fisher's captious critics among the old school of actuaries and statisticians is, however, the undisputed fact that his working hypothesis as such really does work. As pointed out by Dr. Pearl in the introductory note of this book the results set forth in the present treatise abundantly demonstrate this fact. The 6 widely different mortality tables as shown in the addenda stand as mute and yet as the most eloquent evidence to the fact that the method works. It might indeed' not appear impertinent to suggest that Mr. Fisher's actuarial critics would render a greater service to their profession by proving that these six Translator's Preface. XV mortality tables cannot be considered as reasonable approximator to tables derived by orthodox means from the same population groups than by starting to poohpooh and ridicule his proposed method. Winnipeg, Canada, November 1921. E. A. Vigfusson. "Nothing is less warranted in science than an uninqui- ring and unhoping spirit. In matters of this kind, those who despair are almost invariably those who have never tried to succeed." W. Stanley Jevons. CHAPTER I (TRANSLATED BY MISS DICKSON) AN INTEODUCTION TO THE THEOEY OF FEEQUENCY CUEVES 1. introduction The following method of con- structing mortality tables from mortuary records by sex, age and cause of death rests essentially upon the theory of frequency curves originally introduced by the great Laplace and of recent years further developed and extended through the elegant and far reaching researches of the Scandinavian school of statisticians under the leadership of Gram, Charlier and Thiele and their disciples. This method is, however, comparatively little known and unfortunately not always fully appreciated by the majority of English statisticians and ac- tuaries, who prefer to apply the well known methods of the eminent English biometrician, Karl Pearson. For this reason it may be advisable to give a preliminary sketch of Charlier 's methods so as to obtain a better understanding of the 1 2 Frequency Curves. following chapters dealing with the more specific problem of mortality tables. The treatment must necessarily be brief and represents essentially an outline of the more detailed theory which I hope to present in my forthcoming second volume of the Mathematical Theory of Probabilities. By the method of Charlier any frequency function is expressed as an infinite series rather than as a closed and compact algebraic or tran- scendental expression by the Pearsonian methods. By power series the thoughts of the majority of students are associated with the famous series which bear the names of Taylor and Maclaurin. In these series the function is derived as an in- finite series of ascending powers of the inde- pendent variable whose coefficients are expressed by means of the correlated successive derivatives of the function for specific values of f(x). Thus for instance we know that the Maclaurin series may be written as follows : m = /<o) + g-f (0) + ^/-(O) + . . .~no) + ... where /"(0) is the symbol for the value of the n th derivative when x = and n = 1, 2, 3, 4 . . . . n. There are, however, contrary to the belief of many immature students, only comparatively few functions which allow a rigorous expansion by this Introductory Remarks. 3 method, in which the derived functions and the differential calculus play the leading roles. But on the other hand there are other methods of expansions in infinite series which are more general and by which the coefficients of the in- dependent variable are expressed by operations other than those of differentiation. One of these methods is to express the coefficients as definite integrals either of the unknown function itself or some auxiliary function. The range of practical problems which lay themselves open to a successful attack along those lines is much wider than the corresponding range of practical problems to which we may apply the Taylor series. Speaking generally as a layman (who continu- ously has to face practical rather than abstract problems) and specifically as a mathematical novice (who considers mathematics as a means rather than as an end) this fact appears to me quite obvious from a purely philosophical point of view. In nature and in all practical observations we encounter finite and not infinitesimal quantit- ies. In other words, what we actually observe are finite sums or definite integrals, i. e. the limit of a sum of infinitely small component parts. The definite integral rather than the derivative and the differential seems, therefore, to be the 4 Frequency Curves. more elementary and primitive operation and the one which suggests itself first hand. History of Mathematics indeed proves this contention. Ar- chimedes had (as shown by the researches of the Danish scholar, Heiberg) laid the essential foun- dation for an integral calculus about 500 B. C. And nearly 25 centuries later, almost simultane- ously with the historical discovery of Heiberg an- other Scandinavian, the Swedish mathematician and actuary, Fredholm, gave to the world his epochmaking work on integral equations. Fred- bolm's monumental memoir "Sur une nouvelle methode pour la resolution du problems de Dirich- let" was first published in the "Ofversigt af aka- demiens forhandlinglar" (Stockholm 1900). Mea- sured by time the subject of integral equations is thus a mere infant in the history of mathematical discoveries. Measured by its importance it has already become a classic. Its application to a steadily increasing number of essentially practical problems in almost every branch of science has placed it in a central position of modern mathe- matical research and it bids fair to become the most important branch of mathematics. Fredholm in introducing his now famous in- finite determinants, known as the Fredholmean determinants, had a forerunner in the Danish actuary, Gram, whose Doctor's dissertation "Om Introductory Remarks. o Rsekkeudviklinger ved de mindste Kvadraters Me- tode" (Copenhagen 1879) gave prominence to a certain class of functions which later on have become known as orthogonal functions, and by which Gram actually gave the first expansion of a frequency distribution or frequency curve in an infinite series. Scandinavians in general and Scandinavian actuaries in particular may, there- fore, feel proud of their share of imparting know- ledge on this important subject, which makes a strong bid to place mathematics on a higher plane than ever before, not alone as an abstract but equally well as an applied science. The genius of the Italian renaissance Leonardo da Vinci, as early as 1479 proclaimed "that no part of human knowledge could lay claim to the title of science before it had passed through the stage of mathe- matical demonstration". Comparatively few bran- ches of learning measure up to the standard of Leonardo da Vinci, and our learned friends among the economists and sociologists have a long road to travel before they succeed in placing their methods in the coveted niche of science. But the new vistas of possibilities opened up to them by means of M. Fredholm's discovery ought to furnish them a powerful tool towards the attain- ment of the high standard set by the great Italian. The principal theorems of integral equations 6 Frequency Curves. are bound to be especially fruitful in their ap- plication to mathematical statistics and the pro- blems of frequency curves and frequency surfaces together with the associated problems of mathe- matical correlation. 2. frequency If N successive observations DISTRIBUTIONS originating from the game eg _ functions sen tial circumstances or the same source of causes are made in respect to a certain statistical variate, x, and if the individual observations o. (i = l, 2, 3, . . . . N) are permuted in an ascending order then this particular per- mutation is said to form a frequency distribution of x and is denoted by the symbol F(x). The relative frequencies of this specific per- mutation, that is the ratio which each absolute frequency or group of frequencies bear to the total number of observations, is called a relative frequency function or probability function and is denoted by the symbol cp(aO. If the statistical variate is continuous or a graduated variate, such as heights of soldiers, ages at death of assured lives, physical and astro- nomical precision measurements, etc., then dzcp(z) is the probability that the variate x satisfies the following relation Frequency Functions. 7 z — -^-dz<x<z + -^dz or that x falls between the above limits. If the statistical variate assumes integral (dis- crete) values only such as the number of alpha particles radiated from certain metals and radio- active gases as polonium and helium, number of fin rays in fishes, or number of petal flowers in plants, then cp(z) is the probability that x assumes the value z. From the above definitions it follows a fortiori that (a) F(z) = Nq(z) (Integral variates) (b) dz F(z) =N(p(z)dz (Integrated variates) Interpreting the above results graphically we find that (a) will be represented by a series of disconnected or discrete points while (b) will be represented by a continuous curve. As to the function <p (z) we make for the present no other assumptions than those follow- ing immediately from the customary definition of a mathematical probability. That is to say the function 9 (z) must be real and positive. Moreover it must, also satisfy the relation + » \ cp (z) dz = 1 , — 00 or in the case of discrete variates : 8 Frequency Curves. '!>(*) = i which is but the mathematical way of expressing the simple hypothetical disjunctive judgment that the variate is sure to assume some one or several values in the interval from — go to + oo. The zero point is arbitrarily chosen and need not coin- cide with the natural zero of the number scale. Thus for instance if we in the case of height of recruits choose the zero point of the frequency curve at 170 centimeters an observation of 180 centimeters would be recorded as +10 and an observation of 160 centimeters as — 10. 3. property of In regard to a frequency func- CONSTANTS OR , • • • parameters tion we may assume a prion that it will depend only upon the variate x and certain mathematical relations into which this variate enters with a number of constants \, A 2 , A 3 , A 4 , symbolically ex- pressed by the notation F(x, \, A,, A 3 , A 4 . . . .) where the A's are the constants and x the variate. All these constants or parameters are naturally independent of x and represent some peculiar pro- perties or characteristic essentials of the frequency Property of Parameters. 9 function as expressed in the original observations o i (i=l, 2, 3, N). We may, therefore, say that each constant or statistical parameter entering into the final mathematical form for the frequency function is a function of the observa- tions o v This fact may be expressed in the follow- ing symbolic form : — \ = S 1 (o 1? o 2 , 0.,, ... 0^) X N = S n(°1> °2,0 a , . . . N ). But from purely a priori considerations we are able to tell something else about the function S . (i=l, 2, 3 .... N). It is only when per- muting the various o's in an ascending magnitude according to the natural number scale that we obtain a frequency function. This arrangement itself has, however, no influence upon any one of the o's which were generated before this purely arbitrary permutation took place. The ultimate and previously measured effects of the causes as reflected in each individual numerical observa- tions, 0., depend only upon the origin of causes which form the fundamental basis for the stati- stical object under investigation and do not depend 10 Frequency Curves. upon the order in which the individual o'e occur in the series of observations. Suppose for instance that the observations occurred in the following order o lt o 2 , o 3 , o X' By permuting these elements in their natural or- der we obtain the frequency distribution F(x). But the very same distribution could have been obtained if the observations had occurred in any other order as for instance o 7 , o 9 , o N , . . . o 3 . . . . o x . so long as all of the individual o's were retained in the original records. Or to take a concrete ex- ample as the study of the number of policyholders according to attained ages in a life assurance office. We write the age of each individual policy- holder on a small card. When all the ages have been written on individual cards they may be per- muted according to attained age and the resulting series is a frequency function of the age x. We may now mix these cards just as we mix ordinary playing cards in a game of whist, and we get an- other permutation — in general different from the order in which we originally recorded the ages on the cards. But this new permutation can equally Symmetric Functions. 11 well be used to produce the frequency function if we are only sure to retain all the cards and do not add any new cards. 4. parameters- The various functions S (o lt symmetric o 2 , °3 °jy) are there- fore, symmetric functions, that is functions which are left unaltered by arbitrarily permuting the N elements o, and no interchange whatever of the values of the various o's in those symmetric functions can have any influence upon the final form of the frequency function or fre- quency curve, F(x). We now introduce under the name of power sums a certain well known form of fundamental symmetrical functions denned by the following relations 5 = 0° + 0% + o° 3 + - ■■o° N = N s l = 0] + o\ + o\+.. ■ °\ =z°\ S 2 = 0\ + o\ + <%+■■ o 2 1 • u s = Z°i S X = f + 0» + of+ ■ N = Z°! Moreover, a well known theorem in elementary algebra tells us that every symmetric function may be expressed as a function of s lt s 2 , s 3 . . . . . s N . 12 Frequency Curves. From this theorem it follows a fortiori that we are able to express the constants A in the fre- quency curve as functions of the power sums of the observations. While such a procedure is pos- sible, theoretically at least, we should, however, in most cases find it a very tedious and laborious task in actual practice. It, therefore, remains to be seen whether it is possible to transform these symmetrical functions of the power sums of the observations into some other symmetric functions, which are more flexible and workable in practical computations and which can be expressed in terms of the various values of s. 5. THiELE-s It is the great achievement of invariants Thiele to have been the first mathematician to realize this possibility and make this transformation by intro- ducing into the theory of frequency curves a pe- culiar system of symmetrical functions which he called semi invariants and denoted by the symbols ^i, \, \ • ■ ■ Starting with power sums, s ; . Thiele defines these by the following identity XjOT X 2 oo 2 X 3 ro 3 e TL + Hr + ~pr which is identical in respect to co ■^ ^^ =*o+f H-f + S -F + - (1) Semi-Invariants. 13 Since s { =^o i the right hand side of the equa- tion may also be written as e i ra + e°* co + eP3 m +...= ST 0,-co = Z«' ■ Differentiating (1) with respect to co we have A, a> X,co 2 X,co 3 * n e \1_ |2_ ■ +... A 2 co XgCO 2 Xi+ TT _ + T + s o + jY co +jy co2 +iy M3 +- , AnCO Ao „ Multiplying out and equating the various coefficients of equal powers of co we finally have s x = \s So = \s x + \ 2 s s s = \s 2 + 2 \ 2 s x + X s s s i = \ x s 3 + 3A 2 s 2 + 3a 3 s x + X 4 s where the coefficients follow the law of the binomial theorem. Solving for A we have \ = s t : s X 2 = (s 2 s — sl):sl a 3 = ( s 3*o — 3s 2 s 1 s + 2sl):sl 14 Frequency Curves. x 4 = Si si-4s sSl si — 3*;*; + i2s 2 *;*o — 6s t)=^ The semi-invariants X in respect to an ar- bitrary origin and unit are as we noted denned by the relation A,co \,co 2 Xoco 11 _1 |_ _? L _? L . . . 11 1 2 1 3 o,a> o,co o,ct> s e>- — = e 1 +e 2 +e 3 +... where o 1 , o 2 , o 3 . . . are the individual observa- tions. Let us now change to another coordinate system with another unit and origin defined by the following linear transformations : — o'i = aoi + c (i = 1, 2,3,.. .). The semi-invariants in this new system are given by the relation A' to X' oo 2 X'„a>3 -A | ? 1 § 1- ... 1 1 1 2 1 3 • o', ro o'„co o'„a> s e — — = e 1 +e J +e 3 + ... = (aoj + tOco (ao 2 +c) co = e +e + ... Since the various values of X' do not depend upon the quantity co we may without changing the value of the semi-invariants replace co by co : a in the above equations, which gives Semi-Invariants. 15 \\ m X'„co 2 X'-co 8 s e = (aoj + c) — (oo 2 + c) — (ao 3 + c) — a a a e + e + e + . . . = a T o,co o„co o,co ceo XjCO X 2 co 2 X 3 co "a" ~[l + l2~ ¥ ~\* = e 5 e .] = Taking the logarithms on both sides of the equa- tion we have a^ o«[2_ o 8 [3_ CCO XtCO X,C0 2 XotO 3 ~a + |l L + [2_|3_ + Differentiating successively with respect to co we have X' X' to X'.to 3 c , , , , , X3C0 2 a\l_ a 2 2a 3 a d> * + *= + *S? + ...-». + *. + f + ... 5 + ^ + ...-x. + w.. 16 Frequency Curves. Letting co = we therefore have A, or \\ = aXj + c a a J. X = X 2 or X', = a 2 X 2 K a 3 = X 3 or X' 3 = o»X 8 from which we deduce the following relations Xj (ax + c) = aX x (x) + c X r (a#+ c) = a r X r (x) for r > 1, which shows how the semi-invariants change by introducing a new origin and a new unit. We shall for the present leave the semi in- variants and only ask the reader to bear in mind the above relations between X and s, of which we shall later on make use in determining the con- stants in the frequency curve cp (x) . 6. the fourier Before discussing the genera- INTEGRALS ,■ £ ,-. , , , » tion of the total frequency curve it will, however, be nec- essary to demonstrate some auxiliary mathema- tical formulae from the theory of definite integrals and integral equations which will be of use in the Fourier's Integrals. 17 following discussion as mathematical tools with which to attack the collected statistical data or the numerical observations. One of these tools is found in the celebrated integral theorem by Fourier, which was the first integral equation to be successfully treated. We shall in the following demonstration adhere to the elegant and simple solution by M. Charlier. Charlier in his proof supposes that a function, F(co) , is defined through the following convergent series. F(v) = a[/(o) + /(a)e + /(2a)e +... + /(a)e +/(— 2a)e 4-... or in = <w ^(oo) = a ^/(cwi)e amtoi (2) where / = \ — 1. We then see by the well known theorem of Cauchy that the integral + x /(o9) = < ^f(x)e x ' oi dx (3) is finite and convergent. If we now let ma = x and let a = as a limiting value, a, becomes equal to dx and /(am) = fix). Consequently we may write 18 Frequency Curves. lim F(o) = jT(co). a = Multiplying (2) by e~ rami da and integrating between the limits — n/a and + n/a we get on the left an expression of the form + */<* {F(a)e- ra<oi dco — ?t/a and on the right a sum of definite integrals of which, however, all but the term containing f(ra) as a factor will vanish. This particular term reduces to a\f(ra)d(o or 2nf(ra). — -x/a Hence we have + 3t/a %*) -rami f(ra) = ^F(a,)e "*""*». (4a) By letting a converge toward zero and by the substitution ret = x this equation reduces to 8»J — X03i /(*) = izVW* **■ (4b) Fourier's Integrals. 19 Charlier has suggested the name conjugated Fourier function of f{x) for the expression F (co). We then have, if we introduce a new function ib (to) defined by the simple relation : j/2jr\|>(co) = limF(co) a = ib (to) = 77 =C/(a:)c* Di dx. (5 a) \/2: + 00 J/2J /(*) = i -^=\i|)(a))e- xa,i doo. (5b) The equations (5a) and (5b) are known as integral equations of the first kind. The eXpreS- sion e (or e ) is known as the nucleus of the equation. If in (5b) we know the value of i]' (co) we are able to determine fix). Inversely, if we know f(x) we may find i|> (co) from (5a). 7 cv^e'asVhe ^ e are now * n a P° s iti° n *° a^Yntegral ma ke use OI * ne semi-invariants equation f Thiele, which hitherto in our discussion have appeared as a rather discon- nected and alien member. On page 13 we saw that the semi-invariants could be expressed by the relation 20 Frequency Curves. ■ CO + ttt CO 2 + - Q. | 2 I : ^3 <i i e— = 2^e ■ where 0; (i = 1, 2, 3 ) denotes the in- dividual observations. The definition of the semi-invariants does not necessitate that all the o's must be different. If some of the o's are exactly alike it is self-evident that the term e i must be repeated as often as o occurs among all of the observations. If there- fore Ny(oi) denotes the absolute frequency of o, where cp (o;) is the relative frequency function, then the definition of the semi-invariants may be written as : — V / n Ll Li LL v i \ "i For continuous variates, x, the above sums are transformed into definite integrals of the form ■co 2 + -ro 3 +. e \ cp(x)aa = \ <p(a:)e rfx. Let us now substitute the quantity co \ — ] , or ica, for co in the above identity. We then have : — X l • , X 2 -2 2 - A 3 .3 3 . + <* +""° |1_ ' [2_ |_3_ \ cp(*)rfx = \ (p(aj)e 1M °da; Approximate Solution. 21 under the supposition that this transformation holds in the complex region in which the func- tion is denned. In this equation the definite integrals are of special importance. The factor \ y(x)dx is, of course, equal to unity according to the simple considerations set forth on page seven. The in- tegral on the right hand side of the equation is, however, apart from the constant factor j/2ji nothing more than the i|) function in the conjugate Fourier function if we let cp(#) = f(x), and e {± ^ ^ = l/2^(co). According to (5b) we may, therefore write f(x) or cp(a;) as „ i + { £«>+&**+&**+- -«,. cp(*) = ^ Je e An as the most general form of the frequency func- tion cp (x) expressed by means of semi-invariants. 8. first approx- The exactness with which solutwn 9 0*0 is reproduced depends, of course, upon the number of A's we decide to consider in the above formula. As a first approximation we may omit all X's 22 Frequency Curves. above the order 2 or all terms in the exponent with indices higher than 2. Bearing in mind that i 2 = — 1 we therefore have as a first ap- proximation ^ /, tro^! — *)-j2-co 2 «Po(*)=2^Jc - *»■ — CO The above definite integral was first evaluated by Laplace by means of the following elegant analysis. Using the well known Eulerean relation for complex quantities the above integral may be written as + °° \ 2 a> 2 \ e cos [(X 1 — x'jcoj cko + + co \2 . C ~^ ( + I sin [(X 1 — :r)co] dco. The imaginary member vanishes because the factor e is an even function and sin|(X 1 — a;)coj an uneven function, the area from — oo to will therefore equal the area from to + oo , but be opposite in sign, which reduces the total area from — oo to + oo or the integral in question to zero. Approximate Solution. 23 In regard to the first term, similar conditions hold except that cos [(A 1 — a;) col is an even func- tion and the integral may hence be written as V-i An f -IT CD 2 I = 2 \ e cos (rco) dm where r = X x — ■?. Regarding the parameter r as a variable and dif- ferentiating 7 in respect to this variable we have dI 2 f ( ^ ~ ) sin (rco)dco. From this we have by partial integration : — dl_ 2 dr X 2 r - v raJ T " - - ro ' e sin (rco) dco — — \ e cos ( rco ) ^ (1 " = — -r— or A Id/ r / rfr ~" X ' From which we find log / = -j^- + log A where log A is a constant. Hence we have : — / = Ae 2 ^ 24 Frequency Curves. In erder to determine A we let r = = and we have /„ = A = 2 \ e dco = 2 /^- = !/?■ This finally gives the expression for cp (cc) in the following form : as a preliminary approximation for the frequency curve 9(33). The first mathematical deduction of this ap- proximate expression for a frequency curve is found in the monumental work by Laplace on Probabilities, and the function cp (a;) entering in the expression cp (a;) dx, which gives the probab- ility that the variate will fall between x — \dx and x +\ dx, is therefore known as the Lapla- cean probability function or sometimes as the Normal Frequency Curve of Laplace. The same curve was, as we have mentioned also previously deduced independently by Gauss in connection with his studies on the distribution of accidental errors in precision measurements. Laplace's probability function, cp (x) posses- ses some remarkable properties which it might Approximate Solution. 25 be well worth while to consider. Introducing a slightly different system of notation by writing \ = M and \/\ 2 = a, q> (x) reduces to the fol- lowing form. o|/2tt which is the form introduced by Pearson. The frequency curve, cp (a;), is here expressed in reference to a Cartesian coordinate system with origin at the zero point of the natural number system and whose unit of measurement is also equivalent to the natural number unit. It is, however, not necessary to use this system in pre- ference to any other system. In fact, we may choose arbitrarily any other origin and any other unit standard without altering the properties of the curve. Suppose, therefore, that we take M as the origin and c as the unit of the system. The frequency function then reduces to 1 - x' : 2 Since the integral of cp (x) from — oo to + oo equals unity the following equation must neces- sarily hold. +* 26 Frequency Curves. 9. development The Laplacean Probability BY POLYNOMIALS ^^ ^^^ howeyev , some other remarkable proper- ties which are of great use in expanding a func- tion in a series. Starting with cp (x) we may by repeated differentiation obtain its various der- ivaties. Denoting such derivatives by cp x (x) , <p 2 (x), cp 3 (x) . . . respectively we have the fol- lowing relations. 1 ) — x': 2 cp (a;) = e <Pi(z) = —xy (x) (p 2 (z) = (z 3 — l)cp (a;) Vsfa) = — (« 3 — 3x)cp (a;) (p 4 (a;) = (j? — Gx* + 3)y Q (x) and in general for the nth derivative : cp B (a;) = (-ir n(n — l)(w — 2)(w — S)x _ n(n~l) n ~ 2 X> 1 y-^ X + 2-4 ra (n-1) (/i- 2) (ra-3) (re-4) (rc-5) aT~ 6 2-4,-6 + " cp (aj). 1 In the following computations we have omitted temporarily the constant factor l;j/"2ir of <p (a:) and its derivatives. Hermite's Polynomials. 27 It can be readily seen that the derivatives of <p (x) are represented throughout as products of polynomials of x and the function cp (x) itself. The various polynomials H (x) = 1 H^x) = — x H a (x) = x 2 — l H a (x) = -(x*-3x) H^r) = (.,<* — 6 a* + 3) and so forth are generally known as Hermite's polynomials from the name of the French mathe- matician, Her mite, who first introduced these polynomials in mathematical analysis. The following relations can be shown to exist between the three polynomials H n+ i(x) — xH n (x) + nH n --i.{x) = and d 2 H n (x ) xdH n (x) _ A numerical 10 decimal place tabulation of the first six Hermite polynomials for values of x up to 4 and progressing by intervals of 0.01 is given by J0rgensen in his Danish work "Frekvens- flader og Korrelation" . There exist now some very important relations between the Hermite polynomials and the deriva- tives of <p (x) , or between H n (x) and y n (x). 28 Frequency Curves. Consider for the moment the two following series of functions ToO"). M^Di %(*)> <?s( x h <?i( x ), ■ • • H (x), H x {x), H 2 (x), H,{x\ H t (x),. . . where cp„(a;) = i/„ (a;) cp (a;) and where lim y„(x) = for ./' = ± oo. We shall now prove that the two series cp„ (x) and H n (x) form a biorthogonal system in the interval — oo to + oo , that is to say that they are (1) real and continuous in the whole plane (2) no one of them is identically zero in the plane (3) every pair of them cp n (x) and H m {x) , satisfy the relation. +.<* > \ <p n (x)H m (x)dx = (n < m). We have the self evident relation (letting x = z) -f-eo -j-OO 5 H m (z)y n (z)dz = $ # m (z).ff„(z)(p (z)dz = CC CO +« = jj #„(z)cp m (z)dz. Since this relation holds for all values of m and n it is only necessary to prove the proposition for n>m. For if it holds for n>m it will according to the above relation also hold for n<m. Hermite's Polynomials. 29 By partial integration we have : — jj H m (z)(p n (z)dz = 00 + 00 + 00 = H m {z)y n -i{z) ] — $ H' m (z)y n -i(z)dz — Go — or. when H' m {z) is the first derivative of H m {z). The first member on the right reduces to since <p„_i(z) = for z = ± qo. We have therefore : — + 00 , - tt $ # m (z)cp„(z)dz = — jj H' m (z)y n -i(z)dz — Co — 00 -j-co -pOo jj H' m (z)<? n _i(z)dz = — jj H&(z)y n -2(z)dz — co — Co -}-00 -f-ao J ^m(z)(p„_2(z)rfz = — $ /7£(z)q> n _ 3 (z)dz. — 00 OC Continuing this process we obtain finally an ex- pression of the form + (lf m (z)<? n (z)dz = (-ir +1+ ^V +1, 9 n _ w _,( Z )&, — Co — -oc when #°" +1) (z) is the m + 1 derivative of # (z) and n — to — 1>0. Since H m (z) is a polynomial in the TO.th degree its w + 1 derivative is zero and we have finally that 30 Frequency Curves. + » jj H m (z)y n (z)dz = for all values of m and n where ^^ /h . For m = n we proceed in exactly the same manner, but stop at the mth integration. We have, therefore, by replacing m by n in the above partial integrations + (HA*)Vn{z)dZ = (-l)"'f< ) ( Z )«p,_ n («)& = — 00 — CO The nth derivative of H n (z) is, however, nothing but a constant and equal to ( — l)"|_ra_. Hence we have finally 'fjy n (8)q. n (8)cfe = {-lf{-lf\±\ e -^dz = 00 CO = |_ra |/2jt. The above analysis thus proves that the func- tions H m (z) and <p B (s) are biorthogonal to each other for all values of n different from m through- out the whole plane. We can now make use of these relations be- tween the infinite set of biorthogonal functions H m (z) and <p„(z) in solving the problem of ex- Hermite's Polynomials. 31 panding an arbitrary function cp (z) in a series of the form 9(0) = C <p (2) + Cl Cp, (z) + C,cp 2 (3) + . . . the series to hold in the interval from — 00 to + 00. If we know that 9(2) can be developed into a series of this form, which after multiplication by any continuous function can be integrated term for term, then we are are able to give a formal determination of the coefficients c. This formal determination of any one of the c's, say C{ consists in multiplying the above series by Ht(z) and integrating each term from — 00 to co. All the terms except the one con- taining the product Hi (z) <pi vanish and we have for Ci. + oo +00 CO — CO °i = +S ~ • \yi(z)Hi(z)dz |J_|/2^ CO If we define the Hermite functions as H (z) =1 HAz) = z 2 — 1 if,(«) = z s — 3z HAz) - 2 4 — 6« 2 + 3 32 Frequency Curves, the above formula takes on the form + 00 + CO jj cp (z) Hi (z) dz § <p (z) #i (z) rfz 00 00 |j cpi (z) #* (z) <fe (— l) r [i_ \/ 2 JT — CO which we shall prefer to use in the following discussion. It will be noted that this purely formal cal- culation of the coefficients c is very similar to the determination of the constants in a Fourier Series, where as a matter of fact the system of functions cosz, cos 2«, cos 32, sin;-;, sin 22, sin Zz, is biorthogonal in the interval 0<z<l- But the reader must not forget that the above representation is only a formal one, and we do not know if it is valid. To prove its validity we must first show that the series is convergent and secondly that it actually represents 9(2) for all values of 2. This is by no means a simple task and it can- not be done by elementary methods. A Russian mathematician, Vera Myller-Lebedeff, has, how- ever, given an elegant solution by means of some well known theorems from the Fredholm integral Gram's Series. 33 equations. She has among other things proved the following criterion : — "Every function cp (z) which together with its first two derivatives is finite and continuous in the interval from — co to + oo and which vanishes together with its derivatives f or z = ± oo can be developed into an infinite series of the form : — cp( 2 )=^>- z ' :2 #.( 0) where Hi(z) is the Hermite polynomial of order i" . 10. gram's series It is, however, not our inten- tion to follow up this treatment which is outside the scope of an elementary treatise like this and shall in its place give an approximate representation of the fre- quency function, cp(z), by a method, which in many respects is similar to that introduced by the Danish actuary Gram in his epochmaking work "Udviklingsrsekker" , which contains the first known systematic development of a skew frequency function. Gram's problem in a some- what modified form may briefly be stated as follows : — Being given an arbitrary relative fre- quency function, cp (z), continuous and finite in the interval — oo to + oo (and which vanishes 3 34 Frequency Curves. for z = ± oo J to determine the constant coeffi- cients c , c 1 , c 2 , c 3 in such a way that the series c 9o(g) + Ci9i(g) + c 2 cp 2 (z) + + c n yn($ = |/<Po(z) l/<PoO) l/?o( 3 ) |/9o( 2 ) gifles ifte besi approximation to the quantity cp fa;,) : )/cp (zj in ifoe sense of the method of least squares. That is to say we wish to determine the constants c in such a manner that the sum of the squares of the differences between the func- tion and the approximate series becomes a mini- mum. This means that the expression ^ C 9(2) X'^c^iiz) y\ |/?o( 2 ) ^—> 1/<PoO)- dz must be a minimum. On the basis of this condition we have j^f m<) Zci(?i{2) = ^^^ = U(s} where the unknown coefficients c must be so de- termined that Gram's Series. 35 / = nVvM dz equals a minimum. +» Taking the partial derivatives in respect to Ci we have — CD —00 Now since -i- CO \ [U{z)] 2 dz = 05 {{cl [H,{z)}*+< [#:(*)]'+ • ..cl[H n {z)Y}^{z)dz, we get 4-co -f 00 ¥-= -2 [ ^=H i (z)]/^)dz+2c i \ [ff,(s)]'<p («)t where the latter integral equals $ <p t (e)Hi(e)dz = (— l)*[i|/2«. Equating to zero and solving for c* we finally obtain the following value for d — .+00 d = ,^=U y{z)Hi{z)dz (1=1,2,3,...)- |i J/2n J 36 Frequency Curves. This solution is gotten by the introduction of |/cp (z) which serves to make all terms of the form Cicpi(0):|/<p o (0) = |Ap (» C;#»(z) (i = 1, 2, 3 . . . n) orthogonal to each other in the interval — oo to + oo. In all the above expansions of a frequency series we have used the expression % (z) = e~ za/a as the generating function (see footnote on page 26) , while as a matter of fact the true value of <p (z) is given by the equation <p (z) = e~ z " /2 : |/2ji. The definite integral on page 32 + 00 -t~°° (- 1)* \ H t (e) Vi {z)de = \i_ $ e-^dz = \£fte will therefore have to be divided by |/2jt, and the value of the gen forth be reduced to the value of the general coefficient c$ will hence- $ ^{z)H i {z)dz Ci== ""(_l) i li where Sj (z) is the Hermite polynomial of order i defined by the relation %K) 2 2-4 i (t — 1) (t — 2) (i — 3) (t — 4) (t — 5) z f ~ 6 2-4-6 + '"' Gram's Series. 37 On this basis we obtain the following values for the first four coefficients : — +» c = jj y(s)dz = 1 — cc +» c x = (— l) 1 $ cp(z)zefe : |l_ — CO +«> c, = (— l) 2 jj (z 2 — 1) cp(z) & : |_2_ — CO +°° c 3 = (_ 1)3 J 2 s_ 3z)cp(z)<fe:|3_ — Co c 4 = (— 1)*^ (z 4 — 6z 2 + 3 e) 9 (2) cfe : |5_ — CO While the above development of an arbitrary frequency distribution has reference to 9 (z) , or the relative frequency function, it is, however, equally well adapted to the representation of ab- solute frequencies as expressed by the function, F(z). If N is the total number of individual observations, or in other words the area of the frequency curve, we evidently have -{-Go "j~°o F(z) = iVcp(z) or $ F{z)dz = N J y(z)dz = N. 00 — CO Since N is a constant quantity we may, there- fore, write the expansion of F(z) as follows: 38 Frequency Curves. F(z) = iV[c <p (z) + c 1 cp 1 (z) + c 2 cp 2 (2)+ . . .] = = NZctHMe-** where the coefficients ci have the value + CO d = t~Z J F(z)H i (z)dz for i = 1, 2, 3, . . . • CO and where N = \ F{z)dz. CO Since all the Hermite functions are polynom- ials in z, it can be readily seen that the coeffi- cients c may be expressed as functions of the power sums or of the previously mentioned sym- metrical functions s, where s r = jj z r F{z)dz. — Co These particular integrals originally introduced by Thiele in the development of the semi-in- variants have been called by Pearson the "moments" of the frequency function, F(z), and s r is called the r* A moment of the variate z with respect to an arbitrary origin. It can be readily seen that the moment of order zero, or s is Gram's Series. 39 -f-CD -|-Q0 s = \ z°F(z)dz = N = N \ y{z)dz. — Co — co Hence we have for the first coefficient c . + 00 -j-00 c = $ F(z)dz: $ F(z)dz = 1. CC -)~Q0 We are, however, in a position to further simplify the expression for F(z). As already mentioned we are at liberty to choose arbitrarily both the origin and the unit of the Cartesian coordinate system for the fre- quency curve without changing the properties of this curve. Now by making a proper choice of the Cartesian system of reference we can make the coefficients c 1 and c 2 vanish. In order to ob- tain this object the origin of the system must be so chosen that ^ \ zF(z)dz : \ F(z)dz = 0. c, = This means that the semi invariant s r : s = A x must vanish. It can be readily seen that the above expression for X u is nothing more than the usual form for the mean value of a series of variates. Moreover, we know that the algebraic sum (or in the case of continuous variates, the integral) of the variates around the mean value is always 40 Frequency Curves. equal to zero. Hence by writing for z the expres- sion (z — M) when M equals the mean value or \j we can always make c x vanish. To attain our second object of making c 2 vanish we must choose the unit of the coordinate system in such a way that the expression + 00 -{-00 c 2 = t~^ jj F(z)R 2 (z)dz : ^ F{z)dz = which implies that -(-03 -}-O0 + « \ F(z)z 2 dz — $ F(e)de : $ F(z)dz = or that s 2 : s — 1 = 0, or when expressed in terms of the semi-invariants that X 2 = (s 2 s — s\):sl = 1. But by choosing the mean as the origin of the system the term s x : s is equal to and we have therefore X 2 = 2 = s 2 : s = 1. Hence, by selec- ting as the unit of our coordinate system j/X 2 or o, where o is technically known as the dispersion or standard deviation of the series of variates, we can make the second coefficient c 2 vanish. In respect to the coefficients c 3 and c 4 we have now c* = (-1)3 Semi-Invariants. + 00 + C0 41 J z*F(z)dz — 3 ^ 2i?(2)cfe : J i?(g)<fe + 00 which reduces to (-1) 4 A^- while r+ao C, = |4 ^ 2 + F(z)cfe— 6 J 2 2 F(s)& + + 00 -I -|-00 + 3 $ F{z)dz : J i^(2)rf2 which reduces to A ± D Siy Oft 5 $ Q S Q 14 = — 3 While the coefficients of higher order may be determined with equal ease, it will in general be found that the majority of moderately skew fre- quency distributions can be expressed by means of the first 4 parameters or coefficients. n. coefficients We shall now show how the semi-invariants same results for the values of the coefficients may be ob- tained from the definition of the semi-invariants. Since we have proven that a frequency function, F(z), may be expressed by the series 42 Frequency Curves. F ( z ) =JEciyi(z) we may from the definition of the semi-invariants write down the following identity : — X,co X./o z — \-— 1- \U + [2_ +■" s e = +» = N $ e 0ra (c o cp o (2) + c 1 cp 1 (z) + c 2 cp 2 (s) + ...)d2 where N is the area of the frequency curve. The general term on the right hand side of the equation will be of the form +» c r $ e zw Q? r (z)ds where the integral may be evaluated by partial integration as follows : — -(-00 ^\~ x H -00 $ e z< °y r (z)dz = e'Vi(«) ] — ra $ e? a y r - X {z)dz, — oo — co — oo and where the first term on the right vanishes leaving + 00 -j-00 $ e 20 > r (2)cfe = (-co) 1 $ e"°<pr-i(e)de. — 00 CO Continuing in the same manner we obtain by successive integrations Semi-Invariants. 43 +» +00 (_„)! J e "°y r -.^z)dx = (-co) 2 J e 2m cp r _ 2 (z)dz — oc —00 + CO -[-GO (-co) 2 5 e 2m cp r _ 2 (2)(fe = (— co)8 J e zro (p r _ 3 (0)d2 from which we finally obtain the relation + 0= +00 ij e zw <p r (z)dz = (-co)' J e za> <p (z)dz = — 00 — 05 +» z * 1 "IT.'-'*. ^ ?s This latter integral may be written as 1/2* 3 Consequently the relation between the semi-in- variants and the frequency function may be writ- ten as follows : — CO" ~2~ 44 Frequency Curves. X,co \,co z — U_J 1- LL + LL + s e = = N [c — c ± co + c, co 2 — c 3 co 3 + . or J^CO CO 2 lr + [I (x,-i) + ... s e = = iV [c — q CO + c 2 CO — c 3 CO 3 + . . .] . By successive differentiation with respect to co and by equating the coefficients of equal powers of co we get in a manner similar to that shown on page 13 the following results : — . _ £o _ fo _ 1 C ° - N ~ s - c x = — \ = ri[(*2-l) + ^ ° 3 = ^f A 3 + 3(X 2 -l)A 1 + X^] c* = rjk+4\3X 1 + 3(A 2 -l) 2 + 6(X 2 -l)^+Xt]- If we now again choose the origin at A x , or let Aj = 0, and choose j/A 2 = 1 as the unit of our coordinate system we have : — c o = 1, <h = °) C 2 = 0. c 3 = -ry- A 3 , c 4 = .-^-A 4 . Linear Transformation. 45 12. linear trans- The theoretical development of the above formulae explicitly assumes that the variate, z, is measured in terms of the dispersion or |A 2 (z) and with X x (z) as the origin of the coordinate system. In practice the observations or statistical data are, however, invariably expressed with reference to an arbitrarily chosen origin (in the majority of cases the natural zero of the number scale) and expressed in terms of standard units, such as centimeters, grams, years, integral numbers, etc. Let us denote the general variate in such ar- bitrarily selected systems of reference by x. Our problem then consists in transforming the various semi-invariants, \(x), X 2 (x), \(x), \(%) to the z system of reference with \ (z) as its origin and |A 2 (z) as its unit. Such a trans- formation may always be brought about by means of the linear substitution z = ax+b 1 which in a purely geometrical sense implies both a change of origin and unit. On page 16 we proved the following general properties of the semi-invariants \ t (s) = X 1 (ax + b) = a\(x) + b \ r ( 2 ) = X r (ax+b) = a r X r (x). 46 Frequency Curves. Let us now write \ (x) = M and A 2 (x) = d 2 , we then have the following relations : — X^z) = aM + b XjjOO = a 2 c 2 . Since the coordinate system of reference must be chosen in such a manner that \ (z) = and )A 2 (z) =1 we have : — aM + b = ad = 1 , • 1 , l — M from which we obtain a = — and o = — - — , o <3 which brings z on the form : z= (x — M) : c while cp (2) becomes , •. 1 — (i — ilf 2 ):2 ! J/ZTTd Moreover, we have \ r (z) = X r (a;) : C for all values of r > 2. We are now able to epitomize the computations of the semi-invariants under the following simple rules. (1) Compute \ (x) in respect to an arbitrary origin. The numerical value of this parameter with opposite sign is the origin of the fre- quency curve. (2) Compute A, (as) for all values of r > 2. The numerical values of those parameters divided with (J/X 2 (x) r , or cr, for r = 2, 3, 4, . . . .... are the semi-invariants of the frequency curve. Charlier's Scheme. 47 13. chablier's The general formulae for the SCHEME OF ■ • • , computation semi-invariants were given on page 13. In practical work it is, however, of importance to proceed along systematic lines and to furnish an automatic check for the correctness of the computations. Several systems facilitating such work have been proposed by various writers, but the most simple and elegant is probably the one proposed by M. Char- lier and which is shown in detail with the neces- sary control checks on the following page. Char- lier employs moments, while we in the following demonstration shall prefer the use of the semi- invariants. If we define the power sums of the relative frequencies 9(2;) by the relation -j-00 "h °° m r = \ x r F(x)dx : jj F(x)dx (r = 0, 1, 2, 3, . . .), — 00 — CO we find that the expressions for the semi-invariants as given on page 13 may be written as fol- lows : — ■ Aj = m 1 A, = m 2 — m l A 3 = m 3 — 3m 2 w 1 + 2m^ A 4 = m 4 — 4m 3 m 1 — 3ml + 12m 2 m[ — 0>m l 48 Frequency Curves. The advantage of the Charlier scheme for the computation of the semi-invariants lies in the fact that it furnishes an automatic check of the final results. If we expand the expression (x + l) 4 F(x) we have: — x i F(x) + ix 3 F{x) + 6x 2 F(x) + 4:xF(x) + F(x) or ^(x+l) 4 F(x) = s i + 4:S 3 + Qs 2 + 4:S 1 + s , which serves as an independent control check of the computations. Moreover, another check is furnished by the relation m i = A 4 + 4m 1 A 3 + §m\ \ 2 + 3\ 2 2 +m\. In order to illustrate the scheme we choose the following age distribution of 1130 pensioned func- tionaries in a large American Public Utility cor- poration. Ages No. of Pensioners Ages No. of Pensioners 35—39 i 65—69 286 40—44 6 70—74 248 45—49 17 75—79 128 50—54 48 80—84 38 55—59 118 85—89 13 60—64 224 over 90 3 The complete calculations of the coefficients c are shown in the appended scheme by Charlier. Charlier's Scheme. 49 fc<mcot~cooooco s» 02 eo r- co i-H oo + T-H y-h 8 ioj w in w oo ■* t N n eo oo oo (M w 1 T"H CO "^ OO T"^ «CDO00O"*«*O "in m » o» ■* oi g< O-l r- O OJ 05 OJ % 1-1 I-" «co©oiOioj-«<o "co in n to n w «( T-l OJ "* "* OJ «CO©00»*CO-*0 ST CO JO ■* CO N •a i-i c^ w CtH tONOO 00 ■*« •3- H ■■* rt N M fc, rn (M W to in ■* COM th o C5 »* os ^ cs •* as ■■ ■* in in tD CD _ in 6 in Q in ■<*( •* in in co co CD to ., C) CO N H in woo moJ 00 00 00 CO O CO •* ■* n oi inos IMOO COIMOJ Oi CO CO rH rH OO ■* to w o to ■* nn coin « NO O OOOitM CO OONOiOOO CD CO OJ CO 00 m -^ r-t -* o in co (N in CO N CO CO Op t- cS co i— i T-H Oi CO 00 CO ■* CN O CD "* in i—i in t-h 00 00 00 CO tN rt N CO ■* in CO + + + + + + Oi 00 28 CO 3 en OS CO CO in T-H Oi CO CO OJ CO OS Oi in m OS T-* CO CO •* O. ^ in 50 Frequency Curves. >• >• >- fw 4 CO ^ 3 § 3 S § I I © l-» 05 CO en 05 Cn *. h* 00 Oi v] o k* go © Cn t-* o co m O CO 05 © o o o © O O M o o o co O © *. Cn p£ a CO s I-* 1 CO CO | 1 2 ffi if U § s § -• u- H t* 10 to OS 1 II CO CO o 1 CO CO to to to Cn en b to CO o o GO b en .. h^ CD *> o h-- o OS 05 GO Cn O Cn o CO *~ eo CO 00 W O o ^J o 05 to 05 CO Cn 00 © *• f-k CO CO C5 05 go to eo eo #• -j w. o ** CO 05 CO ^J CO CO CO en o CO o © o o o o er © _ en *. 05 © IV o o iP he- Co CO Co CO O O o va CO rfi a II II 1 II II I II 1 ^> 1 CO O i-»- m CO C5 tr H* GO (35 cn tc CO CO CD 00 >*» o o CO CO CO C5 w Observed and Theoretical Data. 51 The above computations give the numerical values of the frequency function which now may may be written as follows : F(x) = 1130 [(cpoCz) + .0258 cp 3 (z). 0158 <p 4 (re)] where _ ^ / x + .oi95 \' 1 2 V 1.6240 ) "' betwUnob- The next ste P is now to work SE Yhe D ob^¥ica1 ND out the numerical values of values F(x) for various values of x and compare such values with the ones originally observed. This process is shown in detail in the following scheme . Column (1) gives the values of the variate x reckoned from the provisional origin, or the centre of the age interval 65-69. (2) is x less the first semi-invariant, whereby the origin is shifted to the mean or X. Column (3) represents the final linear transformation : z =(x — A-,): d. Columns (4), (5) and (6) are copied directly from the standard tables of J0rgensen or Charlier. Column (7) is (5) multiplied by 0.0258 or the product — [c 3 <? 3 (z)]:{3_, while (8) is [c 4 cp 4 (z)]: [4. Column (9) is the sum of (4), (7) and (8). If we now distribute the area N = s or 1130 pro 52 Frequency Curves. + + + en en *• + + + COM w M M CO lis. Ol C tn a -J ' ' — • •q Ol Ul *. W M K + I © o H»to*.oioi *. W CO iO w ©_ o -3 *- CO M CO ■^ © *. » cnofr S + po6. M ^MCoco 2 b io bo *- O ^ co en mm © en i- 1 i CT> CO OO © JO © >- 05 8 o SOO^-COCOCOi-'OOOOO-e t-A~J0OM)CC>COCD-*i^©©©© coooou-qooto<»*.toco5oo ^OJ^UlCOCDCl^^OOCfJOl^O + 11 + O UI O CO h-i o cc en f£- Oi to CO . " © £0 C7t $K 00 ■ ?^ ot cTt ^ ^1 h3 p -^ , i I OOi-^O^-^IfOi-J-CO-^ifc-.© ?o 5; y -rr 3 OS -si -4 nS, §88888282888888 OP^WHOOCOOW^^WMOO 88888282828888 Oo»^ffiw*-oocJ^aoMo W"40HU1COKCDO^CC!OOM fOOOOi-^CO^-COi-^OOO OWhi(OOU , CO)&.CD»*-vl- CO^O5tO--iaoN>rO©OlC3Cn>ft.0O CO ^ K) to ro 1- 1 C001*-01HiMfl)0000^ " Ml ?\3 ?0 i- 1 N!C8l»ai00CSP<CSC<) Method of Least Squares. 53 rata according to (9) , we finally reach the theore- tical frequency distribution expressed in 5-year age intervals and shown in column (10) alongside which we have inserted the originally observed values. Evidently the fit is satisfactory. It will be noted that the final frequency series is expres- sed in units of 5-year age intervals. This, how- ever, is only a formal representation. By sub- dividing the unit intervals of column (1) in 5 equal parts, and by computing all the other columns accordingly, we get the theoretical fre- quency series expressed in single year age inter- vals. is. the principle The following paragraph pur- OF METHOD OF , , ■ , • n •, ■ least squares ports to give a brief exposition of the determination of the co- efficients in the Gram or Laplacean — Charlier series in the sense of the method of least squares as a strict problem of maxima and minima, wholly independent of the connection between the method of least squares and the error laws of precision measurements. l The simple problem in maxima and minima which forms the fundamental basis of the method 1 In the following demonstration I am adhering to the brief and lucid exposition of the Argentinean actuary, U. Broggi, in his exellent Traite d' Assurances sur la Vie. 54 Frequency Curves. of least squares is the following : Let m unknown quantities be determined by observations in such a manner that they are not observed directly but enter into certain known functional relations, fi(x 1 , x 2 , x 3 , . . . . x m ) , containing the unknown independent variables, x lt x 2 , x 3 , . x m . Let furthermore the number of observations on such functional relations be n (where n is greater than m). The problem is then to determine the most plausible system of the values of the unknowns from the observed system. 11 \%1 ) ^"11 ^3 l ■ • • %m) = #1 fn V^i j ^2 ? *^3 1 ' • • ^m) — On when f lt f 2 , . . . f n are the known functional relations and o x , o 2 , . . . o n their observed values. Such equations are known as observation equa- tions. In order to further simplify our problem we shall also assume that 1 All the equations of the system have the same weight, and 2 All the equations are reduced to linear form. By these assumptions the problem is reduced to find m unknowns from n linear equations. Method of Least Squares. 55 a 1 x 1 + b 1 x 2 + . , . = o 1 a 2 x x + b 2 x 2 + . . . = o 2 a s .x x + b 3 x 2 + . ■ • = ° 3 &n %\ i O n X% + . . . = 0„ Since n is greater than m we find the problem over-determined, and we therefore seek to deter- mine -the unknown quantites, x lt x 2 , . . . x,„ in such a way that the sum of the squares of the differences between the functional relations and the observed values, o becomes a minimum. This implies that the expression i = m £(a i x 1 + b i x 2 + . . . — oif = ^(^n x ii ■ ■ - x m) i = l must be a minimum or the simultaneous existence of the equations. £1 = 0,^ = 0,. ..^ = o. (/) ox x ox 2 ox m If we now introduce the following notation OiX 1 + biX 2 + • • ■ — Oi = Xj for i = 1, 2, 3, . . . re, the m equations in the above system (I) evidently take on the following form 56 Frequency Curves. X 1 a 1 + X z a z + . . . +X n a n = \ x b x + \ 3 b 2 + . . . + X n K = If we now again re-substitute the expressions for A in terms of the linear relations OiX 1 + biX 2 + . . . Oi = h, for i = 1, 2, 3, . . . n, and collect the coefficients of x x , x 2 , . . . x„, these equations may be expressed in the following sym- bolical form : [aa]^! + [af)]a; 2 + . . . . — [ao'] = [ab^x 1 + \bb']x 2 + . . . . — \bo] = [ak~]x 1 + [bk}x 2 + . . + \Jik~]x m — [feo]=0 where [aa] = a x 2 + a./ + . . . . [ab~] = a x bj + a 2 b 2 + . . . . is the Gaussian notation for the homogeneous sum products. The above equations are known as normal equations, and it is readily seen that there is one normal equation corresponding to each unknown. Our problem is therefore reduced to the solution of a system of simultaneous linear equations of m Normal Equations. 57 unknowns. If m is a small number, or, what amounts to the same thing, there are only two or three unknowns the solution can be carried on by simple algebraic methods or determinants. If the number of unknowns is large these methods become very laborious and impractical. It is one of the achievements of the great German mathe- matician, Gauss, to have given us a method of solution which reduces this labor to a minimum and which proceeds along well denned systematic and practical lines. The method is known as the Gaussian algorithmus of successive elimination. is. gauss' solu- For the sake of simplicity we TION OF NORMAL i nl ,- M. 1 i. equations snail limit ourselves to a sy- stem of four normal equations of the form [aa]^! + [ab]x 2 + [_ac]x s + [arf]^ — [ao~\ = [ab]^! + \bb~]x 2 + [bc]x i + [bd]x i — [bo] = [ac]^! + [bc]a: 2 + [cc]:r 3 + [cd']x i — [eo] = [ad]x 1 + [bd]x 2 + [cd~\x 3 + [dd]x i — [cfo] = The generalization to an arbitrary number of unknowns offers no difficulties, however. On account of their symmetrical form the above equations may also be written in the more convenient form, viz. : 58 Frequency Curves. [aa~\ x 1 + [ab~\x 2 + [ac~\x 3 + [_ad~]x i — [ao] = [bb~]x 2 + [bc]x 3 + [bd]a; 4 — [bo] = [cc]a; 3 + [cd]x i — [co] = [dd] Xi — [do] = From the first equation we find ^ ~ [ao] [ao] 2 . [aa] 3 [aa] 4 ' Substituting this value in the following equa- tions and by the introduction of the new symbol [ik] — H[oft] = [ik.l] [aa] we now obtain a new system of equations of a lower order and of the form [bb.l]x 2 + [bc.l]x 3 + [bd.l]ir 4 — [bo.l] = [cc. 1]» 3 + [cd. l]a; 4 — [co.l] = [dd.l]x 4 — [do.l] = Solving for x 2 we have [bo.l] [bc.l] [bd.l] X * == [bb.l] [bb.l] Xi [bb.l] Xi ' Substituting in the following equations and writing Normal Equations. 59 we have [cc.2]x s + [cd.2]x 4 = fco.2] [dd.2]x t = [do.2] or [co.2] [cd.2] 3 ~ [cc.2] [cc.2] Xi ' Moreover, by writing [ik.2] = [ci.2]&±=[ik.S], we have finally [dd.S]x A = [do.3] This gives us the final reduced normal equa- tion of the lowest order. By successive substitu- tion we therefore have : [do.3] 4 _ [dd.S] [co.2] [cc.2] ' _ [bo.l] _ [bc.l] [bd.l] x * ~ [bb.l] [bb.l] [bb.l] _ [ao]_[ab] _\ac\ [ad] Xl ~ [aa] [aa] 2 [aa] X * [aa] Xi as the ultimate solution of the unknowns. [co.2] [cd.2] Xz ~ [cc.2] [cc.2] ' 60 Frequency Curves. 17. arithmetical The example in paragraph 13 APP mbtho°d ° F gave an illustration of the ap- plication of the method of mo- ments. As previously stated this method works quite well in cases of moderate skewness, but is less successful in extremely skew curves and where the excess is large. We shall now give an illustra- tion of the calculation of the parameters by the method of least squares. The example we choose is the well-known statistical series by the disting- uished Dutch botanist, de Vries, on the number of petal flowers in Ranunculus Bulbosus. This is also one of the classical examples of Karl Pearson in his celebrated original memoirs on skew varia- tion. Although the observations of de Vries lend themselves more readily to the method of logarith- mic transformation, which we shall discuss in a following chapter, we have deliberately chosen to use it here for two specific reasons. Firstly it is a most striking illustration in refutation of the immature criticism of the Gram-Charlier series by a certain young and very incautious American actuary, Mr. M. Davis, who has gone on record with the positive statement, "that the Charlier series fails completely in case of appreciable skew- ness". Secondly (and this is the more important reason) it offers an excellent drill for the student in the practical applications of the method of least Numerical Application. 61 squares because it gives in a very brief compass all the essential arithmetical details. The observa- tions of de Vries are as follows : i No. of petals X F{x) = o. 5 133 6 1 55 7 2 23 8 3 7 9 4 2 10 5 2 where F(x) denotes the absolute frequencies. The observed frequency distribution is well nigh as skew as it can be and represents in fact a one- sided curve, and should therefore — if the state- ment by Mr. Davis is correct — show an absolute defiance to a graduation by the Gram-Charlier series. The process we shall use in the attempted mathematical representation of the above series is a combination of the method of semi-invariants and the method of least squares. Following Thiele's advice we determine the first two semi- invariants in the generating function directly from the observations while the coefficients of this function and its derivations are determined by the least square method. Choosing the provisional origin at 5, we obtain the following values for the crude moments. 62 Frequency Curves. s = 222, s 1 = 140, s 2 = 292, s 3 = 806, s 4 = 2,752, s 5 = 10,790, s 6 = 46,072, s 7 = 207,226, from which we find that \ = 1, x x = 0.631, A 2 = 0.917, X 3 = 1.644, A 4 = 3.377, A 5 = 5.972, X 6 = —2.911, X 7 = 122.638. All these semi-invariants with the exception of the two first are, however, so greatly influenced by random sampling in the small observation series that it is hopeless to use them in the deter- mination of the constants in the Gram-Charlier series. In fact an actual calculation does not give a very good result beyond that of a first rough approximation. The generating function, on the other hand, may be expressed by the aid of the two first semi-invariants as follows : ]_ — 2 2 :2 9 ° w = m e ' where z is given by the linear transformation : z = (3 — 0.631) : 0.9576. (\/)T 2 = 0.9576). We now propose to express the observed func- tion F(x) or 9(2) by a Gram-Charlier series of the form : Numerical Application. 63 F(x) = cp(z) = A; cp (z) + A: 3 cp3(z) + /c 4 cp 4 (z). In this equation we know the values of the generating function and its derivatives for various values of the variate z as found in the tables of J0rgensen and Charlier, while the quantities k are unknowns. On the other hand we know 6 specific values of F(x) as directly observed in de Vries's observation series. We are thus dealing with a system of typical linear observation equations of the forms described in paragraphs 15 and 16 and which lend themselves so admirably to the treatment by the method of least squares. From the above linear relation between x and z we can directly compute the following table for the transformed variate z. X 3 —0.688 1 + 0.402 2 + 1.493 3 + 2.583 4 + 3.674 5 + 4.764. The numerical values of <% (z) and its derivat- ives as corresponding to the above values of z can be taken directly from the standard tables of J0r- gensen and Charlier. We may therefore write down the following observation equations : 64 • Frequency Curves. ?0 <J>3 ft .3148fc — .5472fc 3 + .1207fe 4 —133 = .3679/c + .4198fe 3 + .7566fe 4 — 55 = .1308/c + .1506fc 3 — .7073fc 4 — 23 = .0145fe„ — .1346fc 3 + .1062fc 4 — 7 = .0005fe — .0180fc 3 + .0486fc 4 — 2 = .0001fc„ — .0005fc 3 + .0020fe 4 — 2 = for which we now propose to determine the un- known values of 7c by the least square method. While this method may of course be applied directly to the above data, it will generally be found of advantage to start with some approximate values of the k's. It is found in practice that this approximate step saves considerable labour in the formation and ultimate solution of the normal equations. Although the first approximation in the case of numerous unknowns must be in the nature of a more or less shrewd guess, which facility can only be attained by constant practice in routine mathematical computing, we are, however, in this specific instanoe able to tell something about the nature -o fthe coefficients from purely a priori con- siderations. We know for instance from the form of the Gram-Charlier series that the coefficient k of the generating function must be nearly equal to the area of the curve, which in this particular instance is 222. Moreover, a mere glance at the observed series tells us that it has a decidedly Numerical Application. 65 large skewness in negative direction from the mean coupled with a tendency of being "top heavy", indicating positive excess. We can there- fore assume as a first approximation that the coefficients of the derivatives of uneven order are negative and the coefficients of derivatives of even order are positive. From such purely common sense a priori con- siderations we therefore guess the following first approximations, viz. : k l = 222, k\ = — 25, k\ = 30. The probable values of the various fc's may be written as h, = rik\ for i = 0, 3, 4, and our problem is therefore to find the correction factor r with which the approximate value k\ must be multiplied so as to give kt. Applying the various values of k\ to the original observation equations on page 64 we obtain the following schedule for the numerical factors of a b c s 69.9 + 13.7 + 3.6 —133.0 —45.8 81.7 —10.5 22.7 — 55.0 + 38.9 29.1 — 3.8 —21.2 — 23.0 —18.9 3.3 + 3.4 + 3.2 — 7.0 + 2.9 0.1 + 0.5 + 1.5 — 2.0 + 0.1 0.0 + 0.0 + 0.0 — 2.0 — 2.0 184.1 + 3.3 + 9.8 —222.0 —24.8 66 Frequency Curves. where the additional control column s serves as a check. The subsequent formation of the various sum- products and normal equations is shown in the following schedules together with the s columns as a check. aa ab ac ao as + 4,886 + 958 + 252 — 9,297 —3201 + 6,675 —858 + 1,855 — 4,494 + 3178 + 847 —111 — 617 — 669 — 550 + 11 + 11 + 11 — 23 + 10 + + + — + + + + — + + 12,419 + +1,501 —14,483 — 563 bb be bo bs + 188 + 49 — 1,822 — 628 +110 — 238 — 578 — 408 + 14 + 81 + 87 + 72 + 12 + 11 — 24 + 10 + + — 1 + + + — + +m" — 96~ — 1,182 — 954 cc CO cs + 13 — 479 — 165 + 515 — 1,249 + 883 + 449 + 488 + 401 _j_ 10 — 22 + 9 4- 2 — 3 + 1 + + + + 989 — 1,265 + 1129 Numerical Application. 67 We may now write the normal equations in schedule form as follows : ORIGINAL NORMAL EQUATIONS (a) +12,419 + + 1501 — 14483 (1) +0+0—0 (b) + 324 — 96 — 1182 (2) + 181 — 1750 (c) + 989 — 1265 (3) +.00000 +.12086 —1.16617 The sum-products from the observation equa- tions are shown in the rows marked (a) , (b) , (c) . The row marked (3) and printed in italics is formed by dividing each of the figures in row (a) with 12,419. The row marked (1) contains the products of the figures in row (a) multiplied with the factor .00000. All these products happen in this case to be equal to zero. Eow (2) is the products of the factor 0.12086 and the figures in row (a) . We next subtract row (1) from row (b) , row (2) from row (c) , which results in the following schedule, which is known as the first reduction equation. FIRST REDUCTION EQUATIONS (0) +324 — 96 — 1182 (1) + 28 + 350 (b) + 808 + 485 ]2)~~ —.29626 ~ —3764814 68 Frequency Curves. The above equations are treated in a similar manner as the original normal equations, and we have therefore the 2nd reduction equation of the form : SECOND REDUCTION EQUATION + 780 +135 The solution for the unknown r's may now be shown as follows : r 4 = — 135 : 780 = —.17308 r 3 = 3.64814— (—.29626) (—.17308) = 3.59637 r = 1.16617— (0.0) 3.59637) — (.12086) (—.17308) = 1.18709. From which we find : — k B = 263.5, K=— 89.9, fe 4 = — 5.1 Applying these factors to the values of 9 («), y 3 (z) and <p 4 (2) we obtain the following re- sult :— T *0?0 hva h9* 2 ^9i Obs 82.9 + 49.2 —0.6 131.5 133 96.9 —37.7 —3.9 55.3 55 34.5 —13.5 +3.6 24.6 23 3.8 + 12.1 —0.5 15.4 7 0.1 + 1.0 —0.2 0.9 2 0.0 + 0.0 -0.0 0.0 2 1 For a closer approximation see my Mathematical Theory of Probabilities (Second Edition, New York, 1921). Transformation of Variates. 69 is. transforma- While it is always possible to TION OF THE n £ i variate express all frequency curves by an expansion in Hermite poly- nomials, the numerical labor when carried on by the method of least squares often involves a large amount of arithmetical work if we wish to retain more than four or five terms of the series. Other methods lessening the arithmetical work and ma- king the actual calculations comparatively simple have been offered by several authors and notably by Thiele, who in his works discusses several such methods. Among those we may mention the method of the so-called free functions and ortho- gonal substitution, the method of correlates and the adjustment by elements. The chapters on these methods in Thiele 's work are among some of the most important, but also some of the most difficult in the whole theory of observations and have not always been understood and appre- ciated by the mathematicians, chiefly on account of Thiele 's peculiar style of writing. A close study of the Danish scholar's investigations is, how- ever, well worth while, and Thiele 's work along these lines may still in the future become as epochmaking in the theory of probability as some of the researches of the great Laplace. The theory of infinite determinants as used by M. Fredholm in the solution of integral equations is 70 Frequency Curves. another powerful tool which offers great advant- ages in the way of rapid calculation. All these methods require, however, that the student must be thoroughly familiar with the difficult theory upon which such methods rest, and they have for this reason been omitted in an elementary work such as the present treatise. We wish, however, to mention another method which in the majority of cases will make it pos- sible to employ the Gram or Laplacean — Charlier curves in cases with extreme skewness or excess. We have here reference to the method of logarith- mic transformation of the variate, x. is. the general One of the simplest trans- tr^s¥ormation formations is the previously mentioned linear transforma- tion of the form z = fix) = ax + b, by which we can make two constants, c 1 and c 2 vanish. Other transformations suggest themselves, how- ever, such as fix) = ax 2 + bx + c, fix) = [/«, fix) = logx and so forth. For this reason I pro- pose to give a brief development of the general method of transformations of the statistical variates, mainly following the methods of Charlier and J0rgensen. Stated in its most general form our problem Theory of Transformation. 71 is : If a frequency curve of a certain variate is given by F(x) what will be the frequency curve of a certain function of x, say /(a?) ? The equation of the frequency curve is y = F(x) , which means that F(x)dx is the probability that x falls in the interval between x- — \dx and * + %dx. The probability that a new variate z after the transformation z = f(x) , or x (*0 = #i falls in the interval z — \dz and z + ^dz is there- fore simply F[x(z)]y}(z)dz = F(x)dx, which gives in symbolic form the equation of the transformed frequency curve. The frequency for z = i{x) is of course the same as for x. The ordinates of the frequency curve, or rather the areas between corresponding ordinates, are therefore not changed, but the ab- cissa axis is replaced by f(x). Equidistant inter- vals of x will therefore not as a rule — except in the linear transformation — correspond to equid- istant intervals of fix). If, for instance, the frequency curve F(x) is the Laplacean normal curve 1 — x?:2o* F(x) = —==, e <3\/2n 72 Frequency Curves. and if we let z = f(x) = x 2 or x = ]/z, we have 1 e evidently h __ 2;2(j2 W = oj/2n 2|/z 8« logarithmic Of the various transformations ™ ANSFOBMArJOiV the logarithmic is of special importance. It happens that even if the variate x forms an extremely skew frequency distribution its logarithms will be nearly normally distributed. This fact was already noted by the eminent German psychologist, Fechner, and also men- tioned by Bruhns in his Kollektivmasslehre. But neither Fechner nor Bruhns have given a satis- factory theoretical explanation of the transforma- tion and have limited themselves to use it as a practical rule of thumb. Thiele discusses the method under his adjust- ment by elements, but in a rather brief manner. The first satisfactory theory of logarithmic trans- formation seems to have been given first by J0r- gensen and later on by Wicksell. 1 ) Jgrgensen 1 The law of errors, leading to the geometric mean as the most probable value of the variate as discovered by Prof. Dr. Th. N. Thiele in 1867 may, however, be con- sidered as a forerunner of Jgrgensen's work. Logarithmic Transformation. 73 first begins with the transformation of the normal Laplacean frequency curve. Letting z = logx and bearing in mind that the frequency of x equals that of logx we have z — f(x) = log x, or x = x(z) = e z and dx = e?dz. The continuous power sums or moments of the rth order around the lower limit take on the form =J 1 /log x — «i\ ! {n]/'2n)- x N jj afe* l " ' dx = u + f _w!=*y = (n^2^) _1 iV \ e«e 2 ^ m Vdz. on the assumption the logx is normally distrib- uted. The change in the lower limit in the second integral from — 00 to zero arises simply from the fact that the logarithm of zero equals minus in- finity and the point — 00 is thus by the trans- formation moved up to zero. By a straightforward transformation we may write the above integral as 74 Frequency Curves. +» iV mir + D + ikriHr+iy p — l Wdt M r = -=e „ T mCr + lJ + '/sM^r+l) 2 = Ne Changing from moments to semi-variants by means of the well-known relations X = M A 1 = M ± :M X 2 = (M 2 M -Ml):Ml X 3 = (M 3 Ml — 3M 2 M 1 M + 2M\):Ml A 4 = (M K M\ — m z M r M\ — 3MIMI + + 12M a M\M — 6M\):M 4 we have tn+'hn' A = Ne A l — e A 2 = e 2m+3n '(e n *-l) ^ = e «- + e-' (6 *.-_ 4c »»'_ 3e ^ +12< ,.'_ 6)- Mathematical Zero. 75 These equations give the semi-invariants ex- pressed in terms of m and n. On the other hand if we know the semi-invariants from statistical data or are able to determine these semi-invariants by a priori reasoning we may find the parameters ra and n. 21. the mathema- A point which we must bear in mind is that the above semi- invariants on account of the transformation are calculated around a zero point which corresponds to a fixed lower limit of the observations. Very often the observations themselves in- dicate such a lower limit beyond which the fre- quencies of the variate vanish. In the case of persons engaged in factory work there is in most countries a well-defined legal age limit below which it is illegal to employ persons for work. Another example is offered in the number of alpha particles radiated from certain radioactive metals. Since the number of particles radiated in a certain interval of time must either be zero or a whole positive number it is evident that — 1 must be the lower limit because we can have no negative radiations. Analogous limits exist in the age limit for divorces and in the amount of moneys assessed in the way of income tax. 76 Frequency Curves. The lower limit allows, however, of a more exact mathematical determination by means of the following simple considerations. It is evident that this lower limit must fall below the mean value of the frequency curve. X/et us suppose that it is located at a point, a, located say r\ units in negative direction from the mean, M = \ , and let us to begin with select \ as the origin of the coordinate system in which case the first semi- invariant, X 1; is equal to zero. Transferring the origin to a the first semi-invariant equals n , while the semi-invariants of higher order remain the same as before the transformation and we have : -. MJ+1.5B 8 Aj — - a = r\ = e A 2 = n 2 (e K ' — 1) or e" ! = l+.\ 2 :n 2 \l 3X| — H n 6 n 4 . which reduces to X 3 r\ 3 — SAjJn 2 — Xij = 0. The solution of this cubic equation which has one real and two imaginary roots gives us the value of n or \ — a and thus determines the mathematical zero or lower limit. We have in fact : m log(l + X 2 :n 2 ) and log t) — l.bn 2 , while N = \ n :e m-^jzn 2 Logarithmic Transformation. 77 22. logarithmic- We have already shown that ALLY TRANS- . J formed fre- the generalized frequency curve QUENCY SERIES & 1 J could be written as + .. 77/ \ / \ ^Wifa) <¥p 2 (x) c a y a (x) F(x) = c cp (z) — J^-L + sn^J. — J^J where the Laplacean probability function — (»— My <Po(«) = -77^= e is the generating function with M and o as its parameters. The suggestion now immediately arises to use an analogous series in the case of the logarithmic transformation. In this case the frequency curve, F(x), with a lower limit would be expressed as follows : F(x) = k % (x) ~jf-+ 2 , - --3'— + • ■ • while the generating function now is where m and n are the parameters. 1 n\ = \n. 78 Frequency Curves. Using the usual definition of semi-invariants we then have XjCO \ro 2 X 3 a>3 p Tr + -2T + -3r+---_ c , £i» , ^ , S3C0 3 5 e — s -t- i! "^ 2! 3! '" .3! The general term on the right hand side in- tegral is of the form (— l) s k s :s\l e xco ® s (x)dx h where the integral may be evaluted by partial integration as follows : ] e x(a <5> a (x)dx = e^O^Or)] — co "$ e x< °<$> s - X (x)dx. 00 Since both <& (x) and all its derivatives are supposed to vanish for x = and x = 00 the first term to the right becomes zero and ] e m ®.(x) dx= — co J e* 03 ^-! (as) dr. By successive integrations we then obtain thp following recursion formula Transformed Frequency Series. 79 (— co) 1 1 e xca <P s - 1 (x)dx = (— to) 2 jj e x( °®^(x)dx O (_ 03(2 J e x( °<5> s ^ 2 (x) dx = (-co) 3 ] e xa <$> s - S (x)dx (— to) 8-1 1 e xw ^(x)dx = (— co) s \ e xw %(x)dx. Or finally ] e XC0 <P s (x)dx = (— to) ! ] e xm %(x)dx. Expanding e x<a in a power series we have |e a;ro <l> s (a;)da; = n\/2n J 1 + iccoH H + 2! 3! 1 r logs— m l* ~z L » J dx. The general term in this expansion is of the form » 1 rloga;— ml* "Zl n J (— co) s co r C n\/Jn r! J afe rfa; 80 Frequency Curves. which according to the formulas given on page 74 reduces to : Hence we may write r = as ]e*°»S> s (x)dx = (-co) 8 V^+WV+DV.,., Consequently the relation between the semi- invariants and the frequency function Fix) = k %(x) - ^ ^(x)+^ 2 (x) - ^ 3 (x)+ . . . can be expressed by the following recursion for- mula \jO> X 2 (0 2 ^3<D 8 Tr + "2T + ^3T + - ••_ , SjM ^2 SgCQ 3 1! 2! 3! V =s +^ 1 -+^n-+-^r-+-- : = \" Sv ^=Y'y l co^ V e m( ' +1)+1/2B2(r+1 V: H v = » = r = The constants k are here expressed in terms of the unadjusted moments or power sums, s. It is readily seen that the Sheppard corrections for adjusted moments, M, also apply in this case. We are, therefore, able to write down the values Transformed Frequency Series. 81 of the fe's from the above recursion formula in the following manner M = k Q e m+1 ' m ' M 1 = J h e m+llm °+k e* m+2n ' M % = k 2 e m+l '* n '+2k 1 e* m+2n '+k e 3m+i - Sn ° M a = k 3 e m+lhn '+M 2 e 2m+2n% +Sk ie Zm+ ^ n2 + k e im+Sn! M, = k i e m+i, ^+ik 3 e 2m+2n '+Qk 2 e Sm+ ^ + ^k 1 e im+8n ' +fe,e 5m+12,5 " ! It is easy to see that it is not possible to determine the generating function's parameters m and n from the observations. These parameters like M and o* in the case of the Laplacean normal probability curve must be chosen arbitrarily. If m and n are selected so as to make k x and k 2 vanish we have M = k e m+ ''^ M x = k e' M % = k e 2m+2ri l Zm+iAn? the solution of which gives e M M 2 2m _ M\ while 82 Frequency Curves. l^v+'l" = M i -4M 3 e m+1 - 6nl -M e im+9n \e 3n '-4). This theory requires the computation of a set of tables of the generating function i nog x— my *> / x i ~ si - s - J wj/2n and its derivatives. For O (a;) itself we may of course use the ordinary tables for the normal curve <p (z) when we consider log x — m z = —2 . n I have calculated a set of tables of the deriv- atives of <E> (a;) and hope to be able to publish the manuscript thereof in the second volume of my treatise on "The Mathematical Theory of Probab- ilities". 23. parameters The above development is Tea^t M squareI based upon the theory of func- tions and the theory of definite integrals. We shall now see how the same pro- blem may be attacked by the method of least squares after we have determined by the usual method of moments the values of m and n in the generating function q> («). Parameters and Least Squares. 83 Viewed from this point of vantage our problem may be stated as follows : Given an arbitrary frequency distribution, of the variate z with z = (log x — m) : n and where x is reckoned from a zero point or origin, which is situated a units below the mean and defined by the relation ri 3 A 3 — 3r) 2 Aa = Ajj, where a = \ ± — r\; to develop F(z) into a frequency series of the form F(z) = k y (z) + k 3 y 3 (z) + /c 4 q> 4 (z) + . . . + kn<? n (z) , where the fe's must be determined in such a way that the expression (r = It, faipiiz) gives the best approximation to F(z) in the sense of the method of least squares. Stated in this form the frequency function is reduced to the ordinary series of Gram or the A type of the Charlier series, already treated in the earlier chapters. 6* 84 Frequency Curves. 24. application As an illustration of the theory of a mortality to a practical problem we pre- sent the following frequency distribution by 5-year age intervals of the number of deaths (or Zd s by quinquennial grouping) in the recently published American-Canadian Mor- tality of Healthy Males, based on a radix of 100,000 entrants at age 15. Frequency Distribution of Deaths by Attained Ages in American-Canadian Mortality Table. Ages Zdx 1st Component 2d Comp. 15— 19 1,801 120 1,681 20— 24 1,996 230 1,766 25— 29 2,089 440 1,649 30— 34 2,120 790 1,330 35— 39 2,341 1,370 971 40— 44 2,911 2,270 641 45— 49 3,937 3,570 367 50— 54 5,527 5,400 127 55— 59 7,723 7,722 1 50— 64 10,383 10,383 65— 69 12,987 12,987 70— 74 14,535 14,535 75— 79 13,807 13,807 80— 84 10,328 10,328 85— 89 5,464 5,464 90— 94 1,757 1,757 95— 99 278 278 100—104 16 16 100,000 91,467 8,533 Mortality Tables. 85 The curve represented by the d x column is evidently a composite frequency function com- pounded of several series. From a purely mathe- matical point of view the compound curve may be considered as being generated in an infinite number of ways as the summation of separate component frequency curves. From the point of view of a practical graduation it is, however, easy to break this compound death curve up into two separate components. A mere glance at the d x curve itself suggests a major skew frequency curve with a maximum point somewhere in the age interval from 70 — 75 and minor curve (practically one-sided) for the younger ages. Let us therefore break the ~Ld x column up into the two so far perfectly arbitrary parts as shown in the above table and then try to fit those two distributions to logarithmically transformed A curves. Starting with the first component the straight- forward computation of the semi-invariants is given in the table below with the provisional mean chosen at age 67. 86 Frequency Curves. Frequency Distribution of Deaths in American Mortality Table First Component. Ages x ?(i) xF(x) x'F(x) z*F(z) 04—100 — 7 16 112 784 5,488 99— 95 — 6 278 1,668 10,008 60,048 94— 90 — 5 1,757 8,785 43,925 219,625 89— 85 — 4 5,464 21,856 87,424 349,696 84— 80 — 3 10,328 30,984 92,952 278,856 79— 75 — 2 13,807 27,614 55,228 110,456 74— 70 — 1 14,535 14,535 14,535 14,535 69— 65 — 12,987 59,172 105,554 304,856 1,038,704 64— 60 + 1 10,383 10,383 10,383 10,383 59— 55 + 2 7,723 15,446 30,892 61,784 54— 50 + 3 5,400 16,200 48,600 145,800 49— 45 + 4 3,570 14,280 57,120 228,480 44— 40 + 5 2,270 11,350 56,750 283,750 39— 35 + 6 1,370 8,220 49,320 295,920 34— 30 + 7 790 5,530 38,710 270,970 29— 25 + 8 440 3,520 28460 225,280 24— 20 + 9 230 2,070 18,630 167,670 19— 15 + 10 120 1,200 12,000 120,000 32,296 88,199 350,565 1,810,037 Sr 91,468 —17,355 655,421 771,333 Computing the semi-invariants by means of the usual formulas in paragraph 13, we have : \ 1 = —17355:91468 = — 0.18974, or mean at age 67 + 5 (0.19) or at age 67.95 Mortality Tables. 87 X 2 = 655421:91468 — A, 2 = 7.1296 X 3 = 771333:91468 — 3 A^H- 2 A^ = 12.4981. In order to determine the mathematical zero or the origin we have to solve the following cubic : M 3 — 3X 2 2 n 2 = V, or 12.498 n 3 — 152. 511 n 2 = 362.47 the positive root of which is equal to 12.39. The zero point is therefore found to be situated 12.39 5-year units from the mean or at age 67.95 + 5 (12.39), i. e. very nearly at age 130, which we henceforth shall select as the origin of the co- ordinate system of the first component. We have furthermore 12.39 =e m +i- 5n \ and 7.1296 = e 2m + 3n '(e n '-- 1) = = (12.39) 2 (e» 9 — 1), the solution of which gives n 2 = 0.04436, n = 0.2106, m = 2.4504, all on the basis of a 5-year interval as unit. If we wish to change to a single calendar year unit we must add the natural logarithm of 5, or 1.6094, to the above value of m, which gives us m = 4.0598, while n remains the same. The above computations furnish us with the necessary material for the logarithmic trans- formation of the variate x which now may be written as 88 Frequency Curves. z = [log (130 — a:) —4.0598] : 0.2106, where x is the original variate or the age at death. Having thus accomplished the logarithmic transformation we may henceforth write the generating function as *o(*) = 1_ pog(130 — z) — 4.0598 -I' 2 L 0.2106 J .2106|/2jt = <Po(z) = 271 We express now F (x) by the following equation. F{x) = k Q <5> (x) + k s <£> 3 (x) + k^^x) + .... or in terms of the transformed z : cp(z) = A: cp (z) + A: 3 cp 3 (2) + A; 4 cp 4 (z) + , and proceed to determine the numerical values of k by the method of least squares. The numerical calculation required by this method follows precisely along the same lines as described in paragraph 17. I shall for this reason not reproduce these calculations but limit myself to quote the final results for the various co- efficients k, which are as follows : — 1 1 Interested readers may consult the detailed com- putations on pages 246—257 in my Mathematical Theory of Probabilities (2nd Edition, New York, 1921. Mortality Tables. 89 ft = 7361.8; /b s = — 212.2; k A = — 9.6. The final equation of the frequency curve of the first component F (x) , is therefore : — Fi(x) = 7361.8q> (*) — 212.2<p,(z) — 9.6<p 4 (z), where the generating function, y a (z), is of the form : — 1 Hog (130 — x ) — 4.0598 -I" <Po(z) = <£„(*) = —7= e~ 2 L °- 210 ^^ ~ J 0.2106)/ 2 jt The second component, F n (x) , can by means of a similar process be expressed by the equa- tion :— Fn{x) = 947.4cp (z)— 63.4cp 3 (z)— 30.0cp 4 (z), where 1 Hog (x + 68.8) — 4.532 1' 1 „ 2 L 0.12 J <PoO) = <J>o(*) = 0.12J/2jt Addition of these two component curves gives us the ultimate compound frequently curve, representing the d x of the mortality table. A comparison between the observed values of d x and the values of d x as computed from the above equation is shown in graphical form in the attached diagram. Evidently the graduation leaves but little to be desired in the way of closeness of fit. 90 Frequency Curves. ooo >-" \ / \ / \ """" / \ / \l / v loco \ l " otJ \ 1000 / " ou # V -,uo / u££— - "■"— — — - -/. IS So as> 30 35 -*<3 ^S So S5 &o 75 So as <3o <3S loo /KCie-S Figure 1. Diagram showing graduation of d x column in the AM (5) table by a compound frequency curve of the Gram-Charlier types. 25. biological It appears that the Italian of mortality statisticians were the first to break up the d x curve into a system of five or more component frequency curves, which, however, were all of the normal Laplacean type. Pearson who in a brillant essay entitled Chances of Death was the next to attack the problem, employed a system of five skew frequency curves. Already as early as 1914 I found that from ages above 10 the majority of d x curves in previously constructed mortality tables could be represented by not more than two skew Biological Interpretation. 91 frequency curves as shown in the above example of the AM (5) table. Although all such investigations may be very interesting and useful from the point of view of the actuary, we must, however, not overlook the fact that the breaking up of the compound d x curve in the manner just described is merely an empirical process pure and simple. While such processes undoubtedly represent very neat methods of graduation, a quite different and more im- portant question is whether mathematical work of this kind allows of a biological interpretation. It is evident that from a mere mathematical point of view we may break up the d x curve into various component parts in an infinite number of ways. But while such breaking up processes may be extremely interesting as actuarial graduations and exercises in pure mathematics, they have evidently little connection with the underlying biological facts of a mortality table. This aspect of the question has been brought out in a very forcible manner by the eminent American biometrician, Eaymond Pearl, in his 1920 Lowell Institute Lectures. The whole subject would appear in a quite different light if it were possible to give a biological interpretation of the mathematical analysis and to show that the component fre- quency curves as derived from pure mathematics 92 Frequency Curves. have a counterpart in actual life. This, I think, would be very difficult, if not impossible to establish, because it is not mathematics which determines the conduct or behavior of living organisms. One might, however, view the whole problem from the standpoint of the biologist rather than from the standpoint of the mathema- tican. The problem then is to ascertain whether the observed biological facts as shown in the collected statistical data allow of a mathematical interpretation, rather than to find a biological interpretation and counterpart of previously established empirical formulae. It is to this important question that I have devoted the entire discussion of the second chapter of this book. I have proceeded from certain observed biological facts (in this particular instance the statistics on the number of deaths by sex and attained ages from more than 150 causes of death) which represent the natural phenomena under investigation. In order to offer a rational explanation of these facts and to inter- prete their quantitative relationships, I have adopted as a working hypothesis the supposition that the number of deaths according to attained age and sex among the survivors of a homogeneous cohort of say 1,000,000 entrants at age 10 tend to cluster around specific ages in such a manner Biological Interpretation. 93 that their frequency distribution by attained ages can be represented by a limited number of sets of Gram-Charlier or Poisson-Charlier frequency curves. On the basis of this hypothesis we can now by simple mathematical deductions construct a mortality table from deaths by sex, age and cause of death and without any information about the lives exposed to risk at various ages. Finally we can verify the ultimate results contained in this final mortality table by working back from the table to the data originally observed. This procedure is in strict conformity with the model of modern science, which according to Jevons consists of the four processes of obser- vation, hypothesis , deduction and verification. The important factor in this investigation, and one which most actuaries and statisticians fail to grasp, is that I have looked at the whole problem as a biometrician rather than as a mathematician. Mathematics has been employed only as a working tool in the whole process, and the reason that the method has met with success must be sought for in concrete biological facts and not in the realm of mathematics. 94 ' Frequency Curves. 26. poisson-s I Q certain statistical series it P ft?nction Y frequently happens that the semi-invariants of higher order than zero all are equal, or that \ x = X 2 = X 3 = . . . . = X r = X. We shall for the present limit our discussion to homograde statistical series where the variates always are positive and integral, and where there- fore the definition of the semi-invariants is of the form : — Xco Xco 2 Xco s e Tr + -2T + ^r H "z<p(a;) = ^y( x )e xm = = cp(0)e 0co + <p(l)e lm + cp(2)e 2co + cp(3)e 3ro + ...., or Xco Xco 2 Xco 8 _\ \,co „ , . xca e for x = 0, 1, 2, 3, . . ., which also can be written as Xe m . X 2 e 2co e- x (l + — 4 1! ' 2! = 9(0)1 + 9(l)e ro + cp(2)e 2m + The coefficient of e TCD gives the relative fre- quency or the probabitity for the occurence of x = r, and we find therefore that Poisson's Function. 95 e- x A r <f(x) = i|>(r) = -yy This is the famous Poisson Exponential, so called after the French mathematician, Poisson, who first derived this expression in his Recherches sur la Probabilites des jugesments, but in an entirely different manner than the one we have indicated above. The Poisson Exponential opens a new way for the treatment of statistical series which poss- ess the attribute that all their semi-invariants of higher order than zero are all equal, or nearly equal. It is readily seen that whereas the Lap- lacea probability function y (x) contains two parameters X x and o the probability function of Poisson contains only one parameter, A. 27. poisson— We have already seen in the f , fJAJ}T TDD . frequency previous chapters that the Gram-Charlier frequency curve could be written as F{x) = ~Ld(pi(x) = T.aHi(x)(p (x) for i=0, 1,2,3, where cp (^) is the generating Laplacean proba- bility function. The idea now immediately suggests itself to 96 Frequency Curves. use a similar method of expansion in the case of the Poisson probability function and to employ this exponential as a generating fuction in the same manner as the Laplacean function. We are, however, in the present case of the Poisson exponential dealing with a generating function which so far has been defined for positive integral values only and, therefore, represents a discrete function. Por this reason it will be impossible to express the series as the sum-products of the suc- cessive derivatives of the generating function and their correlated parameters c. We can, however, in the case of integral variates express the series by means of finite differences and write F(x) as follows : F{x) = c i\>(x) + c^O) + c 2 A^(» .... (/) where ty(x) = er m m x :x! for x = 0, 1, 2, 3, .... , and Ai{>0) = t\>(x) — ii>(x — 1), A 2 i|)(a;) = AiKa:) — A^(a;— l)=i|)(a)— 2\\>(x— 1) + $(x— 2). The series (I) is known as the Poisson-Char- lier frequency series or Charlier's B type of frequency curves. The semi-invariants of these frequency series are given by the following relation : Poisson's Function. 97 XjOD + X a CO 2 + X 3 CO 8 + . . . ~2\ 3T e = x = Expanding and equating the co-efficients of equal powers of co we have : A = 1 = c S\|) (x) or c = 1 \ t = Zz (i|> (re) + cA$(x) + c t ^(x) + ..-) (II) \ l z + \. 2 = Zx*{ty(x) + cAi\>(x) + cA 2 Mx) + ---) We now have 2i))(j) = 1, and Za;i|) (a;) = Im« _m m x ~ x : (x — 1) ! = mZ\|) (x — 1) = m. We also find from well-known formulas of the calculus of finite differences that 1 Za) 2 i|)(a;) ZxAip(x) = 1 These formulas can also be derived from the de- finition of the semi-invariants and the well-known rela- tions between moments and semi-invariants as given on page 74 when we remember that according to our de- finition all semi-invariants in the Poisson exponential are equal to m. 7 98 Frequency Curves. ZxA 2 ^(x) = ~Lx 2 A^(x) = — (2m + 1) 2,x 2 A 2 i\> (x) = 2 Substituting these values in (77) we obtain X 1 = m — c x X x 2 + A 3 = to 2 + m — (2m + 1) c Y + 2c 2 By letting m = A x we can make the coefficient Cj vanish, which results in \ ± = m c 2 = %[>.;, — -to] where the two semi-invariants X x and A 2 are cal- culated around the natural zero of the number scale as origin. For the above discussion we have limited ourselves to the determination of the three con- stants m, c and c 2 . It is easy, however, to find the higher parameters c 3 , c 4 , c 5 , : . . from the relations between the moments of the Poisson function and the semi-invariants of order 3, 4, 5, . . . ect. Charlier usually calls the parameter m the modulus and c 2 the eccentricity of the B curve. Numerical Examples. 99 28. numerical Xs an illustration of the appli- examples cation of the p i sson _charlier series we select the following series of observations on alpha particles radiated from a bar of Polonium as determined by Ruther- ford and Geiger. The appended table states the number of times, F(x), the number of particles given off in a long series of intervals, each lasting one-eighth of a minute had a given value x : — x F(x) x F(x) x F(x) 57 5 408 10 10 1 203 6 273 11 4 2 383 7 130 12 3 525 8 45 13 1 4 532 9 27 14 1 We are here dealing with integral variates which can assume positive values only and the observations are therefore eminently adaptable to the treatment by Poisson-Charlier curves. Select- ing the natural zero as the origin of the co- ordinate system we find that tbe first two semi- invariants are of the form \ 1 = 3.8754, \ 2 = 3.6257, and we therefore have : w = \ 1 = 3.86; c 3 = %i[X 2 — to] = —0.125. The equation for the frequency distribution of the total N = 2608 elements therefore becomes 7* 100 Frequency Curves. F(x) = N[T|),. gg (a;) + (—0.125) 2 ^ 3 . ss (x)~]. The table below gives the values as fitted to the curve, F(x) : Alpha Particles ■ Discharged from Film of Polonium (Rutherford and Geiger). N = 2608, m = 3.88 i, c 2 = — 0.125 (i) (2) (3) (4) (5) (6) X M*) A 2 i|>M NX (2) i^X(3)Xc 2 (*) + (5) .020668 + .020668 53.9 — 6.7 47 1 .080156 + .038820 209.0 —12.7 196 2 .155455 + .015811 405.4 — 5.2 400 3 .201015 —.029793 524.2 + 9.7 533 4 .194967 —.051608 508.5 + 16.8 525 5 .151625 —.037654 394.5 + 12.3 407 G .097850 —.009714 254.9 + 3.2 258 r .054249 + .009814 141.2 — 3.2 138 8 .026316 +.015668 68.7 — 5.1 64 9 .011351 + .012968 29.6 — 4.2 25 10 .004407 + .008021 11.5 — 2.6 9 11 .001555 + .004092 4.1 — 1.2 3 12 .000503 + .001800 1.3 — 0.6 1 13 .000150 + .000699 0.4 — 0.2 14 .000042 + .000245 0.1 — 0.1 15 .000010 + .000076 0.0 — 0.0 16 .000003 + .000025 17 .000001 + .000005 As a second example we offer our old friend, the distribution of flower petals in Ranunculus Bulbosus. Selecting the zero point at x = 5 and Transformation of Variates. 101 computing the semi-invariants in the usual manner we obtain the following equation for the frequency curve. F(x) = 222 ^>(x) + 31.5A 2 iMaO, m = 0.631. A comparison between calculated and observed values follows : — x F (x) Obs. 5 134.9 133 6 51.6 55 7 22.5 23 8 9.5 7 9 2.9 2 10 0.6 2 29. trans- For integral variates we have F thevariat£ shown that the Poisson fre- quency curve possesses the im- portant property that all its semi-invariants are equal. Now while a frequency distribution of a certain integral variate, x, may perhaps not possess this property, it may, however, very well happen after a suitable linear transformation has been made, that the variate thus transformed will be subject to the laws of Poisson 's function. Let z = ax — b represent the linear trans- formation which is subject to the above laws with a series of semi-invariants all equal to m. 102 Frequency Curves. These semi-invariants according to the pro- perties set forth in paragraph 5 are therefore m = X x (z) = a\ 1 (x) — b m = X 2 (z) = a 2 \. 2 (x) m = X 3 (z) = a?\ 3 (x) and our problem is to find the unknown para- meters a, b and m. Simple algebraic methods, which it will not be necessary to dwell upon, give the following results : a = X 2 :X 3 m = X 2 3 :X 3 2 b = aX 2 — m As a numerical illustration of this trans- formation we choose from J0rgensen a series of observations by Davenport on the frequency distribution of glands in the right foreleg of 2000 female swine. No. of Glands.. 01 2 3 4 5 6789 10 Frequency 15 209 365 482 414 277 134 72 22 8 2 The values of the three first semi-invariants are Transformation of Variates. 103 \ = 3.501, X 2 = 2.825, \ 3 = 2.417, o = 2.825:2.417 = 1.168, m = 2.825 3 : 2.417 2 = 3.859, b = (1.168) (3.501) —3.859 = 0.230. The new variable then becomes z — az — b and the transformed Poisson probablity function takes on the form : i|)(z) = A In general, however, we will find that z is not a whole number and the expression z ! therefore has no meaning from the point of view of factorials at least. This difficulty may, however, be overcome through the introduction of the well- known Gamma Function, T(z + 1), which holds true for any positive or negative real value of z and which in the case of integral values of z reduces to Y(z + 1) = z ! Hence we can write the transformed Poisson probability function as , . e- m m z ^ = f(^+T)- Tables to 7 decimal places of the Gamma Function, or rather for the expression — r (z + 1) , have been computed by Jorgensen in his Frekvens- 104 Frequency Curves. flader and. Korrelation from z = — 5 to z = 15, progressing by intervals of 0.01. By means of this table and the tables of ordinary logarithms it is now easy to find the values of i|> (z) in the case of the example relating to the number of glands in female swine. The detailed computation is shown below. 1 (1) (2) (3) x z r( z +i) —.230 .9209 1 +.938 .0108 2 2.106 .6555 3 3.274 .0679 4 4.442 .3216 5 5.610 .4547 6 6.778 .4904 7 7.946 .4446 8 9.114 .3285 9 10.282 .1506 10 11.450 .9177 « (5) (6) (7) log m? (3) + (4) + loge— m *W F(x) .8651 .1101—2 .0129 30.1 .5500 .8849—2 .0767 179.2 .2350 .2146—1 .1639 382.9 .9199 .3119—1 .2051 479.1 .6048 .2501—1 .1780 415.8 .2897 .0685—1 .1171 273.6 .9746 .7891—2 .0615 143.7 .6595 .4282—2 .0268 62.6 .3444 .9970—3 .0099 23.1 '.0294 .5041—3 .0032 7.5 .7143 .9561—4 .0009 2.1 1 The characteristics of the logarithms have been omitted in this table (except in column 5) and only the positive mantissas are shown. Column 7 represents the 2000 individual observations pro rated according to column 6. CHAPTER II (TRANSLATED BY MR. VIGFUSSON) THE HUMAN DEATH CURVE In the following paragraphs I 1. INTRODUCTORY & r & Jr- remarks intend to discuss a method of constructing mortality tables from mortuary records by sex, age and cause of death, but without reference to or knowledge of the exposed to risk at various ages. This proposed method is indeed one which has been severely criticized in certain quarters, and. several critics flatly deny that it is possible to construct morta- lity tables from such data without detailed infor- mation of the exposed to risk. It is, however, a very dangerous practice to say that a certain thing is impossible. The true scientist, least of all, should attempt to set limits for the extension of human knowledge. It is still remembered how the great August Comte once denied that it ever would be possible to determine the chemical con- stituents of the celestial bodies. Only a few years after this emphatic denial by the brilliant French- 106 Human Death Curves. man the spectroscope was discovered, by means of which we have been able to detect a number of chemical elements of other worlds than that of our own little earth. It is but fair to say that the method which we here shall describe has met with rather determined opposition in certain actuarial quarters. Under such circumstances it is natural that the process will be viewed in a light of scep- ticism and criticism. I welcome such an attitude because it has been my purpose to present the following studies for further investigation and not to force them upon my readers as authoritative or as a kind of infallible dogma. In presenting the outlines of the proposed method I wish to state that it has never been the intention to supplant the orthodox methods of constructing mortality tables where we have ex- act information of the so-called "exposed to risk" or number living at various ages. Numerous and very important examples, however, offer them- selves in actuarial and statistical practice where such information is not available. Most of the greater American Life Insurance Companies, especially those writing the so-called industrial insurance, have on hand an enormous amount of information of deaths by sex, attained age and by cause of death among their policyholders. Even the mortuary records of certain occupations, as for instance metal and coal miners, among the Introductory Remarks. 107 death claims in the industial class are so numer- ous, that it would be possible to construct a mor- tality table for such professions if we know the exact number exposed to risk at various ages. Such information is, however, in the majority of cases wanting, or could only be obtained by means of a great expenditure of time and labor. Again, as Mr. P. S. Crum has pointed out in an article in the "Insurance and Commercial Magazine", a number of cities and states in United States give from year to year very detailed information in regard to mortuary records by sex, age at death and cause of death. On account of the intense migration taking place in certain sections of the United States, especially in those of an industrial character, it is, however, impossible to know the exact population at various ages, except in the particular years in which the federal or state census has been taken. The fact that for all but a few states of this country the intercensal period is no less than ten years, the determination of the population composition by age and sex for a given locality and intercensal year, with any degree of accuracy, becomes a practical impossibility without a special count. Such a count or census of a specific locality or a single city is, however, a costly undertaking at its best, for which the nec- essary funds are rarely available. In all such instances the mortuary records are practically 108 Human Death Curves. worthless in so far as the construction and com- putation of death rates are concerned, if we are to rely solely upon the usual method of construct- ing mortality tables. It will therefore readily be seen that, apart from purely academic interests, the possibility of establishing a method of con- structing mortality tables without knowing the population exposed to risk at various ages would be of great practical value, and I deem no apology necessary to present the following method, which intends to overcome this very obstacle of having no information of the exposures. 2. empirical and In order to bring the method INDUCTIVE ME- ■ , ,1 .• -. thods of solu- mto the proper perspective it will be of value to contrast it with the ordinary methods followed in the con- struction of mortality tables. Let us therefore briefly review'those methods and principles com- monly employed by actuaries and statisticians. A certain number, say L persons at age x, are kept under observation for a full calendar year and the number, D T , who die among the original entrants during the same year are recorded. The ratio D x : L x is then considered as the crude probabi- lity of dying at age x. Similar crude rates are ob- tained for all other ages and are then subjected to a more or less empirical process of graduation to Impirical and' Inductive Methods. 109 smooth out the irregularities arising from what is considered as random sampling. One then chooses an arbitrary radix, say for instance 100,000 per- sons at age 10, which represents a hypothetical cohort of 10-year old children entering under our observation. This radix is then multiplied by the previously constructed value of q and the product represents the number dying at age 10. This number, d 10 , is subtracted from l 10 or 100,000 and the difference is the number living at age 11 or Z„. This latter number is then multiplied by q xl and the result is d 117 or the number dying at age 11 out of the original cohort of 100,000. In this way one continues for all ages up to 105, or so. It is to be noted that the column of q x in this process represents the fundamental column while the columns of l x and d r are purely auxiliary columns. Allow us here to ask a simple question. Do these empirically derived numbers of deaths at various ages out of an original cohort of 100,000 entrants at age 10 give us any insight or clue as to the exact nature of the biological phenomenon known as death, and are we by this method enab- led to lift the veil and trace the numerous causes which must have been at work and served to pro- duce the total effect, the d r curve, of which we by means of the usual methods have a purely 110 Human Death 'Curves. empirical representation? I fear that this question will have to be answered in the negative. The usual actuarial methods do not give us a single glance into the relation between cause and effect, which after all is the ultimate object of investiga- tion for all real science. Probably some critics would answer that they are not interested in in- vestigating causal relations. Such an attitude of indifference is, however, very dangerous for a sta- tistician or an actuary whose very work rests upon the validity of the law of causality. We may, however, overlook this apparent inconsistency of the empiricists and turn our attention to the pro- posed methods of constructing mortality tables- along inductive lines, or by the process which Jevons has termed a complete induction. Such a process we should find diametrically opposite to the methods of the empiricists, both in respect to points of attack and deduction. In the case of the empiricists the q r . is the initial and fundamental function from which the d x column is computed as a mere by-product. The rationalistic method starts with the d column and terminates with the q x as the by-product. Being primarily interested in the absolute number of deaths and not in the relative frequen- cies of deaths at various ages, our first question is therefore, "What is the form of the frequency Property of "Death Curves" m curve representing the deaths at various ages among the survivors of the original group of 100,000 entrants at age 10?" Right here we can, strange to say, apply some purely a priori know- ledge. We know a priori that the curve must be finite in extent, because of the very fact that there is a definite limit to human life, and we also know that it assumes only positive values. There can be no negative numbers of deaths unless we were to regard the reported theological miracles of resur- rections from the Jewish- Christian religion as such. This information about the death curve, or the curve of d , is, however, not sufficient for use as a basis for our deductions. We must therefore look about for additional information, whether of an a priori or an a posteriori nature and of such a general character that it can be adopted as a hypothesis. It was Poincare who once said 3. GENERAL PRO- , . , . . perties of the that every generalization is a "DEATH CURVE" / to hypothesis. Hence we shall look for some general characteristics which all mortality tables have in common in the age interval under consideration (age 10 and up- wards) . Let us take any mortality table, I do not care from what part of the world, and examine the general trend of the curve traced 112 Human Death Curves. by the values of d for various ages. The curve rises gradually from the age of ten. The increase in the number of deaths among the survivors at various ages will increase, although not uniformly, until the ages around 70 or 75 are reached. At this age interval we generally encounter a maximum. From the ages between 70 and 75 and for higher ages the number of deaths among the survivors will decrease at a more rapid rate than at the earlier stages of life. After the age of 85 only a small number of the veteran cohort are still alive. After the age of 90 only a few centenarians struggle along, keeping up a hopeless fight with the grim reaper, Death, until eventually all are carried off between the ages of 110 and 115. We can much better illustrate this process of the struggle between the surviving members at va- rious ages of the cohort and the opposing forces as marshalled by the ultimate victor, Death, through a graphical representation. The chart on page 114 shows a mortality graph of the male population in Denmark (1906-1910) from ages 10 and up- wards as constructed by the Royal Danish Stati- stical Bureau. The ordinates of the curve show the number of deaths at various ages among the survivors of the original cohort of 100,000 entrants at agelO. We notice a gradual increase from the younger ages until the age of 77, where a max- Property of "Death Curves". ng imum or high crest is encountered. From that age a rapid decline takes place until the curve ap- proaches the abscissa with a strongly marked asymptotic tendency after the age of 90. At the age of 110 all the members of the cohort have lost out and death stands as the undisputed victor, a victor among a mass of graves. The curve we thus have traced may properly be called "The Curve of Death". On the same chart I have also shown a graphical representation of a comparison between the Danish death curve and the corresponding death curves of males for England and Wales in the period 1909—1911, Norway 1900—1910, France 1908—1913 and United States period 1909 — 1911, all based upon an original radix of 1,000,000 entrants at age 10. We will notice quite important variations in these curves. The curves for the Scandinavian countries show a relatively heavy clustering around the maximum point which in the case of Den- mark is reached at age 75, in England at age 73, and in France at age 72. The Danish curve is also more symmetrical and shows a more uniform clu- stering tendency around the maximum value than the other curves. The asymmetry or skewness is most pronounced in the American curve, due to the comparatively greater number of deaths at 114 Human Death Curves. s & "5 3 " 03 t? < H g g is. younger ages than in the other tables. Tn the curve for Norwegian males I rnight mention Property of " Death Curves". 115 another peculiarity which is absent in most other death curves. I have reference here to a secondary minor maximum or miniature crest at the age of 21. This maximum point, which is not very pro- nounced arises from the heavy mortality among youths in Norway, whose male population always has consisted of rovers of the sea. A much larger proportion of young men braves the terrors of the sea in Norway than in any country in the world. These sturdy decendents of the Vikings can be found in all parts of the globe. You are sure to find a weatherbeaten Norwegian tramp steamer even in the most deserted and far away harbours of our continents. But the sea takes its toll. The result is shown in the little peak in the curve of death among these sturdy Norwegian youths. 1 Despite all these smaller irregularities all the curves have, however, certain well defined charac- teristics , namely : 1) An initial increase with age. 2) A well defined maximum point around the age period 70 — 80. 2) A more rapid decline from that point until the ultimate end of the mortality table. 1 Another factor is the high number of deaths from tuberculosis typical of youth. See in this connexion dis- cussion in paragraph 12 a under the Japanese Table. 116 Human Death Curves. The most interesting of these 4. RELATION OF . . . . frequency c o m m o n characteristics is CURVES the encountering or a maxi- mum point in the neighborhood of 70, and the subsequent decline toward the higher ages. This fact has a very important biometric significance, which we shall discuss in a somewhat detailed manner. Most of my readers are familiar with the so-called probability curve, expressed by the equation : This Laplacean or normal curve is represented in graphical form by the beautiful bellshaped curve so well known to mathematical readers. Various approximations to this curve are continually en- countered in numerous instances of observations relating to certain biological phenomena where certain measurable attributes of various sample populations tend to cluster around a certain norm, such as the measurements of heights of recruits, fin rays in fish, etc. We also know that where this tendency to cluster around the mean is asymmetri- cal or skew, it is in many cases possible to give a very close representation by the Laplacean- Charlier frequency curves. Now let us return to our curves of death. It Relation to Frequency Curves. 117 will be noted that all these curves for ages above the crest period 70 to 75 to a very marked degree approach the form of the normal probability curve and exhibit a marked clustering tendency around this particular period. The ages around 70, the Bible's "three score and ten", can therefore be looked upon as a norm of life around which the deaths of the original cohort group themselves in more or less correspondence with the binomial probability law. This pronounced grouping ten- dency is a very significant biological phenomenon, which it might be of interest to dwell upon. If all the members of our original cohort were identical as to physical constitution and characte- ristics, if they all were exposed to. identically the same outward influences acting upon their mode of life, it becomes evident from the law of causa- lity, which is the basis and justification of every collection of statistical data, that all members would die at the same moment. We see, however, immediately that such hypothetical conditions are not present in human society. The paramount feature of our material world is variation. No two persons are alike in regard to physical constitu- tion. Certain inherited characteristics, which are present in the individual in more or less pronoun- ced form, make themselves felt. No two persons or group of persons can be said to be exposed to 118 Human Death Curves. the same outward influences. The clergyman and college professor living a sort of tranquil and sheltered life are not exposed to the same dangers as the working man or the man in business life. All these and other factors, almost infinite in number, tend to produce a decided variation in the actual duration of life. Of these influencing factors those relating to purely inherited or na- tural characteristics are without doubt the most powerful. If it were possible to eliminate certain forms of deaths due to infectious diseases, tuber- culosis and accidents, causes more or less due to outward influences, we should have left a number of causes due to a gradual wearing out of the human system, similar in many respects to the deterioration of the mechanism in ordinary ma- chinery. The death curve from such causes of death would be more related to the normal curve than the death curve which includes causes of death from non-inherent or anterior causes as menti- oned above. This statement is borne out in the shape of the Danish death curve. In Denmark where a very determined and largely successful fight has been carried on against tuberculosis, and where the accident rate is very low we also find that the curve is more symmetrical than for in- stance in this country or in England. This tendency to an approach towards the bi- Relation to Frequency Curves. H9 normal probability curve was already noted by Lexis, who from such considerations tried to de- termine what he called a "Normalalter" or normal age for various countries and sample populations. Speaking of this attempt the eminent Danish sta- tistician, Harald Westergaard, says in his „Sta- tistikens Teori i Grundrids" (Copenhagen 1916) "An unsually interesting attempt has been made by Lexis to determine the normal age of man. A mortality table will, as a rule, have two strongly dominant maximum points for the num- ber of deaths. During the first year of life there dies a comparatively large number. From the age of 1 the number of deaths decreases and reaches its lowest point in early youth. It then again begins to increase, at times in wavelike motions, until the maximum point is reached at the old age period". "The clustering around the latter point has now a great likeness with the normal or Gaussian curve, and we might for this reason call this specific age the normal life age. For the cal- culation of such a normal age the argument may be put forth that experience shows that the great variations in mortality tend to disappear in old age. Let the rate of mortality in a certain gene- ration at age .r be \x x and the number of the cor- responding survivors be l x . The quantity \x x l x will 120 Human Death Curves. then increase from a certain point, while l x de- creases, in the beginning slowly, but later on at a more rapid pace. "During a long period of life the quantity \i x l x — the number of deaths at a certain age — -will increase with age. Later on a reversed motion takes place. But when this reversion will occur depends on many conditions, the successful fight against certain diseases, progress in econo- mic conditions, or change in the mode of living. All this exercises an important influence, and the maximum point occurs therefore sometimes sooner and sometimes later. It is also important to in- vestigate the natural selection in old age, which so to say divides the population in different strata, each with its own state of health. The healthiest of such groups will with the increase in age play a greater role. Here as everywhere it is the more important problem to study the clustering around the mean inside the special groups rather than to attempt to find a derived expression for the morta- lity. On the other hand, the correspondence be- tween the normal curve as established by Lexis is another testimony to the fact that this curve or formula very often can be applied, even in complicated expressions". Compound Curves. 121 s. the "death Lexis was satisfied to deter- CVRVE" AS A ,, , a compound mine the normal age. A more ambitious attempt to investi- gate the mortality by means of frequency curves throughout the whole period of life was made by the eminent English biometrician, Pearson, in a brilliant essay in his "Chances of Death". Pear- son took the number of deaths in the English Life Table No. 4 (males) and succeeded in break- ing up the compound curve into five component curves typical of old age, middle age, youth, child- hood and infancy. I want to advise my readers to study this brilliant and illuminating essay, especi- ally on account of its beautiful form of exposition which makes the whole subject appear in a most interesting light. Speaking of this attempt by Pearson, the American actuary, Henderson, is of the opinion that „the method has not, however, been applied to other tables and it is difficult to lay a firm foundation for it, because no analysis of the deaths into natural divisions by causes or otherwise has yet been made such that the totals in the various groups would conform to these (the Pearson) frequency curves". We shall later on come back to this statement by Henderson, which we feel is a partial truth only. On the other hand, it must be admitted that the system of Pearson's types of 122 Human Death Curves. skew frequency curves (by this time twelve in number) are by no means easy to handle in practical work and often require a large amount of arithmetical calculation. Moreover, there seems to be no rigorous philosophical foundation for the Pearsonian types of curves, and they can at their best only be said to be exceedingly powerful and neat instruments of graduation or interpolation. On the other hand, I am of the opinion that the goal can be reached more easily if we, instead of the Pearsonian curve types, make use of the Laplacean-Charlier andPoisson-Charlier frequency curves, which are expressed in infinite series of the form : F(x) = q ,(ar) + p 8 q,in( a: ) + p 4 q,iv( a . )+ ..: ( 2 ) or2f(s)=iKaj) + Y I A»iMs) + Y,A»iMa!) + ....(3) These two curve types have been treated elsewhere by Gram, Charlier, Thiele, Bdgeworth J0rgensen, Guldberg and other investigators, and it is therefore not necessary to dwell further upon their analytical properties, which were discussed in Chapter I. Eeturning now to the general form of our d x curve of the mortality table which we discussed above, it is readily seen that this curve has all the properties of a compound frequency curve, that Compound Curves. 123 is, a curve which is composed of several minor or subsidiary frequency curves, generally skew in appearance. As proven both by Charlier and by J0rgensen, any single valued and positive comp- ound frequency curve vanishing at both -\- oo and — cc can be represented as the sum of Laplacean- Charlier and Poisson-Charlier frequency curves. We know thus a priori that the d x curve is comp- ounded of the two types of frequency curves. But how are we to determine the separate component curves? It is readily admitted that no a priori reason will guide us here. The purely empirical observer might therefore abandon the project right here, because to all appearances it would seem hopeless to attempt a solution by purely empirical means. The positive rationalist does not despair so easily. "Very well", he says, "if we can not make further progress by purely empirical means, we are at least permitted to try deductive reasoning and attempt to bridge the gap by means of an hypothesis". The hypothesis I shall adopt is the following : The frequency distribution of deaths ac- cording to age from certain groups of causes of death among the survivors in a mortality table tend to cluster around certain ages in such a manner that the frequency distribution can be represented by either a Laplacean- 124 Human Death Curves. Charlier or a Poisson- Charlier frequency curve. A study of mortuary records by age and cause of death immediately supports this hypothesis. We notice, for instance, that diseases such as scarlet fever, .measles, whooping cough and diphr theria often cause death among children, but rarely seem to affect older people. We know, for instance, that there is a much greater probability that a 5-year old boy will die from scarlet fever than a man at the age of 40 wiill die from the same disease. On the other hand, there is quite a large probability that an old man at age 85 will die from diseases of the prostate gland, while such an occurrance is almost unheard of among boys. Similarly deaths from cancer and Bright's disease are very rare in youth, but quite frequent in early old age. Tuberculosis, on the other hand, causes its greatest ravages in middle life, and has but little effect upon older ages. 6. mathematical Leaving, however, the ques- PROPERTIES OF ° ^ nIntfreq P uen- tl0n 0f the 8 T0U P in g of causes cy curves of death into a limited num- ber of typical groups to a later discussion, we shall in the meantime see how the hypothesis can carry us over the difficulties. Let us for the moment Mathematical Properties. 125 assume that we are able to group the causes of death into say 7 or 8 groups. We shall also as- sume that we know the percentage frequency distribution of deaths according to age in each of the groups. This means in other words that we know the equation of the frequency curves giving the percentage distribution. Let the ana- lytical expression for these frequency curves be denoted by the symbols : Fj(x), F a {x), F m {x), ..., .Fviii(z). (4) Again, let the total number of deaths among the survivors in the mortality table from causes of death according to the above grouping be denoted by N u Nu, Niu, Nix, . . ., Nviu respectively. (5) The number of deaths in a certain age interval, say between 50-54 can then be expressed as follows : x = bi ^d x =^N Fi (x) +^N U F n {z)-\-.. X = 50 60 54 + y,^ vmFy\ii{x). (6) In this relation the only known quantities are the equations for the frequency curves Fi{x), 126 Human Death Carves. Fa(x), . . ., Fvm(x, of the percentage frequency distribution according to age in each of the eight groups. Neither d x nor any of the various N's are known. The only relation we know a priori among the quantities N is the following : JV, + N u + N m + ■ ■ • JVvm = 1 ,000,000. (7) The latter equation is simply a mathematical expression for the simple fact that the sum total of the sub-totals of the various groups of causes of death, in other words the deaths from all causes among the survivors in the mortality table, must equal the radix of the entrants of our orig- inal cohort of 1,000,000 lives at age 10. Viewed strictly from the standpoint of frequency curves, we might express the same fact by saying that the sum of the areas of the various component curves must equal 1,000,000. It is readily seen that on the assumption that the expressions of the different F(x) conform to the above hypothesis it is possible to find d for any age or age interval if we can determine the values of the different N's. It is in this possibility that the importance of the proposed method lies, and we shall now show how it is possible to deter- mine the N's without knowing the exposed to risk. Observation Equations. 127 r. observation Consider for the moment the EQUATIONS following expression : 50 III JTiVm Fm (a) £Ni Ft (x) +£Fn (x) Kn + 50 51) 54 54 -^Vin Fm (x) + . . . +^^111 jPViii (a) (8) What does this equation represent? Simply the proportionate ratio of deaths in group III to the total number of deaths in all type groups (in other Words the deaths from all causes) in the age interval 50-54. Such ratios are usually known as proportional death ratios. It is readily seen that these proportionate death ratios are dependent on the deaths alone and absolutely independent of the number exposed to risk, provided tne total number of deaths from all causes in a certain age group is large enough to eliminate variations due to random sampling. 1 In other words, we can find 1 Strictly speaking this statement is only true for an age interval of one year or less and may in the case of large perturbing influences in the population exposed to- risk be subject to appreciable errors when we use large age intervals of 10 or more in our grouping for the com- puting of R{x). When the age interval for the grouping of causes of deaths by attained ages is 5 years or less the error committed in assuming R(x) as being indepen- 128 Human Death Curves. a numerical value for the term JB rir (x) on the left side of the equation from our death records alone without reference to the exposed to risk in this interval. Similar proportionate death ratios can of course without difficulty be determined for the other groups of causes of death and for arbitrary ages or age intervals. In this manner we can determine a system of observation equations with known numerical values of .R. (#)(& = I, II, III, . . .) The fact that the number of observation equations in this system is much larger than the number of the unknown N's makes it possible to determine these unknowns by the method of least squares. Probably the simplest manner is first to deter- mine by simple approximation methods, or by mere inspection, approximate values for the various N's and then make final adjustments by the method of least squares. Let, for instance, 'JVi, 'N n , 'N } nil dent of the number exposed to risk is in most cases negligible. One of the difficulties encountered in the construction of a mortality table for Massachusetts Males was that the age interval used for the grouping was 10 years instead of 5 years or less. See in this connection the remarks at the beginning of paragraph 11 and at the conclusion of paragraph 16 of the present chapter. Observation Equations. 129 be the first approximations of the areas of the various groups of frequency curves so that #, = <#!, N n =a,'N n , -tfvm = a 8 'JVvm. -} (9) Let us furthermore introduce the following symbols : 1 (10) 'JVi Fj (x) = <&! (x) , 'N n F a (x) = <D 2 (fc) , 'N Y inF vm {x) = <£> a (x). The different values of ®i(«). ® 2 (*). *s(*). ••-, ^ 8 (*) may then be regarded as a system of component frequency curves to which we now must apply the different correction factors c^, a 2 , a 3 , . . . , a 8 in order to fit the curves to the observed proportional death ratios, R(x), for the various groups of typical causes of death. Let us for example assume that the observed death ratio of a certain age (or age group), x, under a certain group of causes of death, say group No. Ill, is Rm(x). We have then the following observation equation : B m (x) = a s ® 3 (x): [a^W+a^W-U + a t <J> 4 (z)+. • .+a a ® 8 (x) + a 2 <P 2 (x)} } (U) 130 Human Death Curves. Since the sum of the areas of the different comp- onent curves necessarily must equal 1,000,000 it is easy to see that we may write the factor a 2 in the last term of the denominator in the follow- ing form : a, Y O 2 0) = 1,000,000 or 1,000,000- [a^X^+ag^ 7 <D 3 (z)- ... + a 8 ^ '<D 8 (x)]) : JT<D, (x) = = h — [h « x + /j a ;i + . . . + /> 8 a 8 J where 1 ,000,000 _ Z p! (x) ~ ! I$ 2 (l) ' J ~~ I$ s (l)' 1 " s$,(i)' '•■' 8 KD 2 (i)' (12) The expression for i?m (a;) can then be put in the following form : Bm {x) = a 3 O s (a;) : [c^ ^ (x) + a ;} <J> 3 (x) + ' + a i * i (x) + ....+a 8 <l> s (x)+ /(IS) + (*t> — *! a x — . . . — /.•„ a 8 ) <D 2 (a;)] . . Classification of Deaths. 131 Similar observation equations for the other groups are derived without difficulty. Once having formed the observation equations it is simply a matter of routine work to compute the normal equations from which the values- of the unknown N's can be found. We shall, how- ever, not go into detail with the derivation of the necessary formulas, since this is a process which belongs wholly to the domain of the theory of least squares and which has received adequate treatment elsewhere. (See for instance Brunt's Combination of Observations.) s. classifica- We think it more advantage- TI °oF°DEATi ES ous to illustrate the method by a concrete example. As an illustration we may take the case of Michi- gan Males in the period 1909—1915. The mortuary records of Males in Michigan are for that period given in the reports issued annually by the Secretary of State on "Begistrat- ion of Births and Deaths, Marriages and Divorces in Michigan". The deaths by sex, age and cause of death are given in quinquennial age groups. A very serious drawback is the grouping of all ages above 80 into a single age group instead of in at least 4 or 5 quinquennial age groups. This makes it impossible to obtain good observation equations 9* 132 Human Death Curves. for ages above 80. When we consider that about one fifth of the original entrants at age 10 in the mortality table die after the age of 80, it is readily seen that this defect in the Michigan data is of a very serious character, which makes it out of the question to determine correctly the areas of the curves for middle old age and extreme old age. For ages below 70 these curves do not play so important a role, and the method ought therefore in these ages yield satisfactory results. We now make the assertion that the deaths among the survivors in the final life table can be grouped in the following typical groups. Causes of Death typical of : — Group I Extreme Old Age. II Middle Old Age. — Ill Early Old Age. — IV Middle Life. V Early Middle Life. — VI Pulmonary Tuberculosis, Etc. — Vila Early Life Occupational Hazard. — Vllb Middle Life Occupational Hazard. — Villa Childhood. The classification of causes of death according to this scheme is given in the following table, mar- ked Table A. Classification of Deaths. 133 Table A. Michigan Males 1909—1915 Classification of causes of death according to the chosen system of curves. No. in Inter- national Class fication. 81. i_ GROUP I Diseases of the arteries. 124. Diseases of the bladder. 125—133. Other diseases of the genito-urinary 142. 154. 126. system. Gangrene. Old age. Diseases of the prostate. GROUP II 10. Influenza. 47—48. Rheumatism. 64. 65. 66. 79. Apoplexy. Softening of the brain. Paralysis. Heart disease. 82. Embolism. 89. Acute bronchitis. 90. Chronic bronchitis. 91. 94. Broncho-pneumonia . Congestion of the lungs. 96—97. Asthma and emphysema. 103. Other diseases of the stomach. 134 Human Death Curves. No. in Inter- national Classi- fication. 105. Diarrhea and enteritis, (over 2 years) 14. Dysentery. GROUP III 39. Cancer of the mouth. 40. Cancer of the stomach and liver. 41. Cancer of the intestines. 44. Cancer of the skin. 45. Cancer af other organs. 46. Tumors. 50. Diabetes. 53 — 54. Leukemia and anemia. 63. Other diseases of the spinal cord. 68. Other forms of mental diseases. 80. Angina pectoris. 109 — 110. Hernia, intestinal obstruction, and other diseases of the intestines. 120. Bright's disease. 121. Other diseases of the kidneys 123. Calculi of urinary passages. GROUP IV 56. Alcoholism. 18. Erysipelas. 62. Locomotor ataxia. 73 — 76. Other diseases of the nervous system, 77. Pericarditis. Classification of Deaths. 135 No. in Inter- national Class fication. i- 78. Endocarditis. 83. Diseases of the veins. 84. Diseases of the lymphatics. 85 —86. Other diseases of the circulatory sy- stem. 87. Diseases of the larynx. 88. Diseases of the thyroid body. 92. Pneumonia. 93. Pleurisy. 95. Gangrene of the lungs. 98. Other diseases of the respiratory sy- stem. 99- -101. Diseases of the mouth, pharynx, and oesophagus. 111. Acute yellow atrophy of the liver. 113. Cirrhosis of the liver. 114. Biliary calculi. 115- -116. Diseases of the liver and spleen. 118. Other diseases of the digestive system. 143- -145. Furuncle, abscess, and other diseases of the skin. 147- -149. Diseases of the joints, and locomotor system. GROUP V 4. Malarial fever. 13. Cholera nostras. 136 Human Death Curves. No. in Inter- national Classi- fication. 20. Septicemia. 24. Tetanus. 32. Pott's disease. 33. White swellings. 34. Tuberculosis of other organs. 35. Disseminated tuberculosis. 55. Other general diseases. 60. Encephalitis. 70—71. Convulsions. 102. Ulcer of the stomach. 117. Peritonitis. 119. Acute Nephritis. 164. Diseases of the bones. 155. Suicide by poison. 156. Suicide by asphyxia. 157. Suicide by hanging. 158. Suicide by drowning. 159. Suicide by firearms. 160. Suicide by cutting instruments. 161. Suicide by jumping from hight places 163. Suicide by other or unspecified means 164—165. Accidental poisonings. 166. Conflagration. 167. Burns (conflagration excepted). 168. Inhalation of noxious gases. 172. Traumatism by fall. Classification of Deaths. 137 No. in Inter- national Classi- fication. 175 — (2). Traumatism by electric railway. 175 — (3). Traumatism by automobiles. 175 — (4). Traumatism by other vehicles. 176. Traumatism by animals. 178. Cold and freezing. 179. Effects of heat. 185. Fractures and dislocations (cause not specified. GROUP VI 28. Tuberculosis of the lungs. 29. Miliary tuberculosis. 37 — 38. Venereal diseases. 186. Other accidental traumatism. 57 — 59. Chronic poisoning. 67. General paralysis of the insane. 31. Abdominal tuberculosis. GROUP VII 1. Typhoid fever. 69. Epilepsy. 108. Appendicitis. 182. Homicide. 169. Accidental drowning. 170. Traumatism by firearms. 171. Traumatism by cutting instruments. 138 Human Death Curves. No. iD Inter- national Classi- fication. 173. Traumatism by mines and quarries. 174. Traumatism by machinery. 175 — (1). Traumatism by railroads. 180. Ligthning. 61. Meningitis. GROUP VIII 5. Smallpox. 6. Measles. 7. Scarlet fever. 8. Whooping cough. 9. Diphtheria and croup. 30. Tubercular meningitis. 150. Congenital malformations. 9. outline of com- ^ e numDer of deaths in the put in g scheme various groups according to the above classification and ar- ranged according to age during the period 1909 — 1915 is given in the table B on page 140. From that table it is a simple matter to com- pute the proportionate death ratios of the separate groups of causes of death. Such a computation is shown in table C on page 141. It is readily seen that these death ratios are independent of the number exposed to risk. More- Computing Scheme. 139 over, the number of observations seem to be suffi- ciently large to eliminate serious variations due to random sampling. This might perhaps not hold true for the age intervals 10 to 14 and 15 to 19 where not alone random sampling is present, but a somewhat modified classification seems neces- sary. I have, however, not used the observed pro- portionate death ratios for the two younger age intervals in my computations which only took into account the ratios above 20. For this reason I do not deem it necessary to go into a closer investiga- tion of a re-classification of causes of death for these younger age groups. A more serious defect which cannot be overcome is presented in the ages above 80 where, as mentioned before, a clas- sification according to age is absent in the original records for the state of Michigan. The fact that the highest number of deaths (12,473) occurred in ages above 80 makes this defect more serious than the omission of a re-classification of causes of death below 20. So far we have only been concerned with the first step in the complete induction according to the model of Jevons, namely that of simple observ- ation. The next step in the induction is the hypoth- esis. We present now the following working hypothesis. The frequency distribution of deaths according 140 Human Death Curves. 00 o -3 Cn ~3 o — Cn OS o Cn Cn en O >P- cn IP- CS co Cn CO o to CO o Cn o &> CO -3 ip- CO as iP- Cn CO HS» IP- co IP- IP- CO CO CO IP- DO CO CO iP- I-- CO M Ip- ::::... ^ s :::::::::::::: § O- <S> co co -a w i- 1 i- 1 ■— k . , t-io»oooto-oco-acD hh hs© COCnOiOOM'OtOrf^CObDW (--h^tOCOtOI-^M!- 1 __ . ........ M -* **• cS' s ~3 o as CO OS Ip- Oo Cn CO to CO iP- CD lP- o O' CO Cn I-- o as CO CO Cn Cn 4- o ~3 CO CO h-i to CD CD CD as 00 as ~3 ip- CO CO CO' o o o o CO as h- 1 O to CO M h-- CO CO O to CO oo IP- -J -a OO IP- as IP- OS Cn IP- ~3 >P- CSn to CO IP- CO CD IP- ~3 CO CO CO O CO OS Cn -.1 cn o 00 o CO -a l-» Cn as CO o ~3 CO as as oo ~3 M O as OS ►P- IP- CS OS CO Cn CO COCOOOOO<iasCniP.|P-lP.|P.|^COCO Z? PODOih^OSlPOS^-O^Oi^lOSKK 2. -JlP.CDCO00lp.|--l--00COCOCDCn~3l-- ^ WODOSPHCDWOCOOiKCO-JlPW £-. 3 O CO Co s 00 cn IP- CD CO Cn CO CO Cn OS IP- -a OS CO o ~3 O0 as oo OS CO o IP- CO M CO ~3 I-- co to CO i-- i-- oo -3 1-^ to co CD < 1 1 cc O CO to iP- CO CO o oo CO oo CO CO ip. Cn Cn CO OS CO -a -J CO CO o o Cn I-* y. CO OS CO 00 Ip- O as as oo CO Cn < 1 1 1 1 < TO cc Cs b CIS rS o Computing Scheme. 141 ooooooooooooooo o En > CO cm TO rH ro CM tH O O o o o o O O O O h-t TO CO CM O CM CO CO r- o CO TO O r« 02 00 H 05 > CM TO TO CO TO CM CM CD CO 7-H CO C~ ^r CO CM rH o o *- o tH -* CM lO lO CM o rH lO TT lO lO rH 10 c~ © > CO CM CM CM CM CD CM CM en tH CO lO TO CM rH o SO 05 (-) CO rH M > TT 'CO C- ■* TO CO Ci> Oi o ro ^H r« CO tH rH rH tH c~ o CO CM CO CO o cm ^- >— H CO CD lO CO en o o &. o S» «H CO lO o o C^iOrHCOCO-^i-IO i>coioddo.ooo HCOTjiOOiO(»OOl> C-[>CO<?ac-T-(<MCO<X> CD O rH CT> -<H rH TO O "* ■<# O rH rH rH CM r* C- CM CO rH TO TO ■^05TliaS-^a3'HHa3^05-^rCX)'HHC73 THrHCMcMTOTO-*'H<iOiOCOCOr-r- I I I I I I I I I I I I I I OiOOiOOiOOiOO>LOOiOOiO HHOJ(MCI3cO^TPiQiOO«3l>l> <3 142 Human Death Curves. to age of the above groups of causes of death among the survivors of an original cohort of 1,000,000 entrants at age 10 can be represented by a system of frequency curves determined by the following characteristic parameters: Parameters Group Mean Dispersion Skewness Excess I 79.6 years 9.5730 years + .1066 + .0546 II 70.5 - 12.8000 - + .0967 + .0126 III 65.5 - 13.6870 - + .1248 + .0650 IV 59.5 - 17.0890 - + .1790 - .0106 V 65.5 - 19.9411 - + .0555 - .0367 VI 44.5 - 16.0352 - - .0124 - .0272 Vllb 57.5 - 12.1552 - + .0008 - .0005 Vila Poisson-Charlier Curve: Modulus = 28.5 years, Eccentricity = 1.0001 Villa Poisson-Charlier Curve: Modulus = 13.5 ye; irs. From these parameters and from well-known tables of the probability or normal frequency curve and its various derivatives it is easy to determine the frequency distribution for any desired interval. For this system of frequency curves we now shall try to find the various areas of N v iV n , iV In , , N YUI so as to conform to the observed values of R x in Table C. As a first approach to the final values of N , we may by an inspection (which of course is improved upon by Computing Scheme. 143 a long practice in curve fitting) choose the follow- ing approximations. 1 Group Approximate Value of 'N. I 123000 II 366000 III 183000 IV 105000 V 75000 VI 70000 Vila & Vllb 61000 VIII 17000 1000000 These preliminary numerical values represent the first approximations of the areas of the various frequency curves. The sequence represented by 'NjFJz), 'N n F n (x),'N m F m (x),. ■ -'N^F^x^U) gives the number of deaths at age x. We notice thus that by multiplying the various equations of frequency curves for arbitrary age intervals with 1 These numbers represent as a matter of fact a first rough approximation of the areas of the different com- ponent curves by means of the method of point contours. Hence it is to be expected that the final adjustments will be comparatively small. This fact has, however, no influence upon the application of the method. 144 Human Death Curves. M CO CO « O <r> CD oo M 02 «-» CD ^ CD o> M M <J0 CO rt^ 01 V-i M (10 CO CO M CJi e CO M CD CD 00 05 r *^ h^ O M O f] rv> c > I CO M M -J e O LO I- 1 C 3 O O O O MCOCDCOOO^<iasC5CnCn£a.^COCOCOCOI- l M OO'OCJiOOiOU'OOiOOiOOiOOiOOiO I I I I I II I I I I I I I I I I I > ^CD^CO^COhJ^CD^CO^CDrf^CDht^COh^COrii'. 02 K K) M tO M MWOKICDOirf».03WtOH OHOiCOOD«DC00505010iOlH C0-0CnOC"G0Cntfi.-00i-0^-0 MWCD0010DWOitO^OJO:<l ^W-Oi^OOtOOOOtOOOlWM COOO^OCnWMMCnOiCOCnOlOO'XiOOO: COO<Xi<lOiffl05CnCr<CaW^^a3^WOtO COO^COCil^COCOCO^CDrf^CO^COC^CTiCO \'A £$ MMWWtOMM MCnWtOOKXlCOOCnWWK)!- 1 !-* ^CnCOOOCDCOI-^COCl-Orf^Cn^JCO-OCOI-tCnCO 050JI- 1 -000500COCOI-*tOWOtOOOOCOCDCOCO 1=1 ^ KWO^COtOWKJOCOOiifiCOWWHH 1— 1 CO^^COCObSCOCn-^h^COCOGOM-CnCOCOCO <j -O CO Oi CD DO OS I-* -J CO I- 1 OO CO Cn Cn CO M^ !-»■ h-*- ^ -owa)OtDOioioi^cnwot-jwtcoa:<i MtOWCntltOOiaJOlCnOi^WWCOM ^00<^^C0h^OC}CD00i£>.C0C0aiC0OC0CnC0 -H KGOCOMCOOOiOlOOCni^COtO^tOMOW ^ CO M- CO OO'CO OO^DWKtOWOiW-JCJKDOlM J-LCOCOCncs-O-^OO^JCUCnCOCOl- 1 K)05WCOOOWWCDOfl:SlW<lWK) -H CnWIOMOiO*JCncnC»WO^OCOOJI^ 12 co -o co ro *a Oi o Co rf^ cn co co ^ co rf^- co co -a 1-^CO^^lCCCOCnl-^ <h tOCnW-JCD^COOOOi ._] C0^3l-*CT)^00-OM--OOiMCn LJ COCODOI-i^tO<ICOCOCDGOCOa5 ^ |-i CO CO CO CO CO M <-< M- rf^ CD Oi CO O CO CO CO ^ 00 CO M. ^J MOKlWKMCOtOW^OQtOGOCni^M LJ ^^.^C00^^0iC0hf^cr5h- l M.|-i0iC3OG0rf^C0 {y COCn 00 <3 Cn M> 00 CO Zh MCDW<I-OK LJ CO CO Ci O O CO CO 1— ( I— l I— ' H-^ COasCOCOCOI- 1 CD<ICnhpi.COCOCOCOI- 1 l— 'I- »■ COCDCOCOCDCO^a-0-0*».jJUC)COCOaiCOCDOiCO CL, H* 00 C" CO CO CO #•* M Cn CO O Cn O O CO CO Ci OS M W COrf^-OOCOM^-OCOI- l COCDCOCnCnOCnaD^lCD o^wcDCo-acoos-acoajtoocDOHWKM 4 e cy 13 o Computing Scheme. 145 their respective 'ATs we can get a first approxima- tion of the final death curve. I give on page 144 an approximate table arranged in 5 year intervals. We might now first compute the various factors k n , k„ /c„ which will be common for all observation equations. We have, referring to the above formulas (llandl2) for the various k's (15). _ 1000000 _ 123089 . = 183045 ) °~ 365995 ' 1_ 365995' 3__ 365995' _ 104888 75030 69996 . 365995 5 365995 " 365995 61003 17002 (15) 365995 8 365995 Or & = 2,732, ^ = 0,336, &3 = 0,500,& 4 =0,287, k b = 0,205, ft, = 0,191, k 7 = 0,167, k 8 = 0,046. To illustrate the further process of the compu- tation of the observation equations, let us take a certain age interval, say the interval between 50-54. The value of <1> 2 taken from the above table is 163.39. The value of R m (x) for this interval is 0.234 (see table page 141) . Hence we have the following observation equation (16). 10 146 Human Death Curves. 0.234 = 104.53a 3 : [15.76(^ + 104.530,+ 84.16a 4 + 64.52a g + 73.55a 6 + 35.01 a ? + 0.00a 8 + (2.732 — 0.336a 1 — 0.500a 3 — (16) 0.287 a 4 — 0.205a 6 — 0.191a 6 — 0,167 a g - — 0.046 a g ) 163.39]- After a few simple reductions this may be brought to the following form : 9.16cl + 99.19a, — 8.72a, — 7.26a. — ) 13 4 5 (17) 9.91 a 6 — 1-81 a, + 1.76 a g — 104.45 = 0. j In the routine work I usually use a system of computing the various equations which is out- lined in detail in the accompanying tabular scheme referring to all the groups in the age interval 50-54 and shown on pages 148-154. Similar observation equations are arrived at in exactly the same manner for other groups and other age intervals. For the whole interval from age 20 and upwards we get, in this way 96 obser- vation equations from which to determine the cor- rection factors. The coefficients of theae obser- vational equations are then written down, and Computing Scheme. 147 their various products formed in turn. We deem it not necessary to give all these observational equations and their coefficients for all the 96 observations, but shall limit ourselves to give all the necessary computations for the interval from 50-54 as previously considered. With the usual system of notation employed in the method of least squares we get the scheme on pages 148-154. Normal Equations, Michigan Males 1909 — 1915. 723763 400750 218930 150776 135184 115318 30325 1801152 877847 253187 176242 149858 129697 34600 2053941 237159 90440 72317 62110 16246 964843 105346 47022 39939 10576 628608 76774 28909 8668 525295 53378 7012 437390 2391 111625 The addition of the various columns of the sum products of the coefficients gives us finally the above set of normal equations of which we only submit the coefficients in the usual scheme em- ployed in the method of least squares. Solving the above system of normal equations by means of the well-known method devised by Gauss, we obtain finally the values on page 154 for the various a's by which the approximate values 'N must be multiplied in order to yield the prob- able values- of N. 10* 148 Human Death Curves. 5 . wcoor-iOHc-co 2 i!cocoooooc»co-'tf wQ gcvidoddooo es> CS 2 „ a c3 S° i3 o s CO a o '-4-3 •<# c8 m 3 o O" 1 »o S is > S-i O 3> -^ c3 > H ® DO 60 <1 o in <u i— i rd OS -*J T— 1 c+-l era o o a OS 1— 1 -^ c3 3 o F— 1 c3 a s-< o <4-H CD a CO A o CO o^+ I I I I I I I ZT- QOOCCHOiOiQ^ iOCDOINCDHHC" OCOt>i-(Oaosr^<« $£, -*vC^I>iQCOHH TiHHHi>NOJCOt> CO ^l> O HO)0JN 00* I COCO * TJH C5 C73 CO l> OS r-i iH CO CO CO 00 irf CD H H' * ' ^ 1—1 O OS [> l> ^ I + I I II I + 7++ II I I + I + I + IU4 ,~s H -—* CO TiH CO H O: O iO tH iOCO^CMCOiHi—ICO iOCOC*-iOC<IOiC*rH ifL ^ ^ "^ "-T ^ °° r "} *"! ^Hcor-iwcncor*- co ■<# i> i-\ th a> oj w ^ CO ' "^Cyiidodl^oir-Hi-H COcdcOCOiOCDHr+ Tt< c- +++++ I + I +++++ I + I ++++ C> OO-^COC-CO-^CNlCvl $2, COHCOWOCO[>iO 4- cd" as <>d i> r-n cxi [> t> ^ ^ CO CM CO CO r^ GX] *^ •" + I + + + + + I X ** VO CO -* CO CO CM 3 + i i I i i i i CO CO CD fH OHm^iocot-oo ft s ~ O iH CO ■<* iO CO D- CO O tH CO ■«* iO CD P- 00 C5 Computing Scheme. 149 JO^IM OOMtHOJ'*'* I CM Tt< CO iQ •*# *"3< CM ; -+ I I I I I I I HOWtHM^iWOI CO O CO Ol l> 00 Ol (£> iQtHOHC-CDOcO H?OI>^ooqcftCO CO *Q CO <N lO CO CO CO <M l> CD CO iH OS -^ xO ,6 tj5 <m* -^S o *o o © th rj3 oi ^ co 06 o o i-i cm' t-h oa gnj <m* ^* o ^ CD O CD CO CO l + l l + l 1+ l + l I 1 + 1+ l + l I l l+ + IO o CO \o r- co co o 03 05 c- r- oj co o -^ o tH d- co ^t< CO CO lO CO CM »0 00 00 GO CM O CD ID H Ol 1O lO ) ■* Uj Tji CO VO O O i-H"*aaTi!c0*#OO r-i CM r-i CM ci CM O O ~5 »Q CO I+++++I +I+++++I +I+++++I -iH CO •** >Q CO r- 00 OHCO^lOCOt-OO Or-ICOTtllQCOC-00 OHCOMHOCCt-CO 150 Human Death Curves. ^OHCC^OOl 1 ' ' CO^CQCOHHod I O C- O O CO CM I rH | | | | CO t-H CO CM OS OS iO ^H OHHOOCiQ <M X CO CI OJ »0 "* I , | | | COOT COffiOO^OH i i i i <° i 7 CO CO rH CO CO CM tH o r-" id o cd cm cm' I I I ^ I I ■? C*- C~ O "^ CO CD CO o co co -di ■* od i> i i^i i r? ^ CM CO O CO O H O 0* CO CM CM* t-H CO I °> I I I I °? co ro so ■■* *# cxi -* CM I o =8 Computing Scheme. 151 OS CM CO r-H tA CO l>* OO O CO iO CD CM iO CO "^ "^ <X> Cn iO CM ■^ cd r- -^ co co co iH CJS ^ (M CM l-H i-H rH r- -^ -^ rH l>" CO CD iO CM i (MHiOiOCMHil CO tA US CO OS GO CD* CO CO O CM CM CO CM C5 "># CDOOI>-tOW ri6l>d dl>H HQOOtMH CM CD CD C*- t> I> CO "* CD CM* 'ctH CM* i-H ^ CD i-i cm tH i— i e'- en) co co CO CD CO tH t*H CO O CM ^l" CM* CM* O I>* CD O- 00 ^ CM CM CO CM OS O I O CO CM I -a id o o O -Q O .Q CO s CO CM -^ -^ 00 CO CM VO CO ^ C- -* •** tH CO co co ci co co o -^ CO C^ Tjl CQ iO Ol 05MHH CO O CO rH CM O CO CO T* CO GO C^ O CO *# CM CM © ■* 1-1 -* HCOffl^cOWH COHCOkOCOCOH O CM CO CO CO ^H C^ CO CM iH C~ OS <X) r-t C- CO CM CM -sH CM ^Jl iO O CO O -* CO CM CM r-i © CO CO tA tA Tji CO COCS1HH ^ CO CM © CM CO -^ CO 00 CO CO d d ■<* c- co oi h -* iH CO 00 c~ a 00 152 Human Death Curves. OC -* QO CO O 00 CM CO r-4 C~ t-H oS c*- cd OS CI ^ CO iH CO C- MlOONOtOH ccdcDodricDcd CO CM i-t -^ "^ co I O CM CO CO CO Tt< 00 tooJomdHt^ O -* CO CM CO CO C} C- CM CM ** lO 00 J3 T3 tH OO ^1H lO C- CTD CO id cm id co ■«* co od CO C-- CM CO CO CO [> CO CM i-l U0 CO o CO 00 CM i-i C~ CO O OS CO -* HHHl>NHCO O CO CD -* CO r-i C-^ T-H lO CM oa o co' H I> ■* O OJ C^ CO O id r-< "<* CO cd Tfl HO 00 00 rt CD o o 3 ^ HHCOC^WOiC- CD i-t O O T^ 0O CO O CO CO CM id C>^ cd co -* cm ro co lO CM ■* CM O CM id TdJ D-^ CD -<* C^CCOTtl t-H CO CM O0 ^ lO OO lO lO C- TO o co c~ c~ id id cm CO CTi CD t-H rt CO CM -* O o ■* co o co o co -# \qt>q^ioooco o »d ^ OS CO co CO r-»HH oo o CO CO a 15 OO. s CO Computing Scheme. 153 CO CO C- vO Ul o o CO CD OJ lO CO OS H CD -^ id id -^ CO i-H COHCOOTOiOi OCNO CO CO i-H CO Tt< id 00 lO C- (MCOHlOCOkflH O O O C~ O CO "^ CO ci VO cr> Oi C-^ O CT5 *cH CM ^H lO T-l CO CD CO CO lO CO ri C35 to d d CO d CO rlOC- iH CO ; r-H I>; CO 03 OS CD < cd id i> cd cxs id co as -^ ■<* c~ c- 1-1 en © ai CD 0J CD 00 00 CO CO dMHOOI^CO © cxi i> oo oq co -# dmHOOOTil C2 co r-ICO o cm' O0 co CO a CO I o id a 3 CO 154 Human Death Curves. gg gh S s 50-54 50-54 0.0 0.8 0.5 3.2 188.1 39.6 1.4 88.3 6.1 0.8 47.8 0.3 0.8 46.2 10.3 0.3 15.7 1.5 29.2 1740.4 69.7 n: 2391.0 - 111625.0 -1807.0 hh hs — — — — — — — — 72.3 45.9 10920.3 2299.0 5416.9 375.4 2819.6 15.9 2631.7 584.8 979.7 93.9 103877.3 4157.7 Sum: 6630212.0 107358.0 Correction Factors, a. Group I 1.03284 II 1.00017 — Ill 1.03635 IV 1.03731 V 1.00956 VI .0.97334 — Vila 0.90332 — Vllb 0.60565 — VIII 1.13743 Goodness of Fit. 155 Applying the above correction factors to the respective values of 'IV, we get finally as the total areas of the respective component curves : Group I 127,131 II 366,059 III 189,699 IV 108,750 V 75,747 VI 68,130 Vila 33,032 Vllb 12,133 VIII 19,339 1,000,000 Multiplying the equations of the various frequency curves, F(x), of the percentage distribution in each group with the above values of N we ob- tain finally the complete mortality table as will be given in the Appendix. The final graphical representation of the frequency curves is shown in Figure 2. io. goodness of This completes the third step FIT in the inductive process. The fourth and final step is the verification of the results thus arrived at by a mere deductive process. Here it must be remembered 156 Human Death Curves. h> Fa §> o W \ hi sl\ \ g In \ , i; y \ iff 6 10 1! k g ? J« \i P !\\ . \ tr 1 - 8 Hi s |V jN \ \ \ ^ 1 i \ H \ \ \ h 0> I .< ) o 1// -0 1// /> //^ y 8 (/ / Goodness of Fit. 157 that the condition which the final component fre- quency curves shall fulfill is the one that observed proportionate death ratios shall agree as closely as possible with the expected or theoretical pro- portionate death ratios as computed from the final table. In this connection it must be borne in mind that the observed proportionate death ratios are given in quinquennial age groups. Thus the observed proportionate death ratios in a certain age interval, as for example between 50 — 54 are really the average or "central' ' proportionate death ratios at age 52. From the complete table it is, however, possible to compute the proportionate death ratios for each specific age. Graphically the expected proportionate death ratios will therefore represent a continuous curve, while the observed ratios will be represented by a rectangular shaped column diagram. Such a graphical representation is shown in Pig. 3 which simply represents the figures in Table C and Table E in graphical form. The "goodness of fit" of the "expected" or theore- tical values to the ''actual" or observed values is seen to be very close, especially in the largest and most important groups. It is only in the combined groups Vila and Vllb that the "fit" might prob- ably be open to criticism for higher ages, but even here the deviation is small between the actual and theoretical values. A very small increase in the 158 Human Death Curves. <Tf- M cr X a> m CO 00 ~j -a OS Oi Cn Cn >f^ *- CO CO CO CO m M > CD en o C" o Cn o Cn o Cn o Cn O CJl o Cn O at CD ►3 O o CO CD O ci- CD O CO CO h+ Pi- i-i *>. ~J ~J 00 J^ CO CO CO K^ p 1 — 1 -0 ^o Cn ^J *» bo C3 b> h^ CO CO r^ tr co CD ci- o a> i-i »4i- *- *>. ►p- ** CO CO IO CO h^ h- ' 3 CD e-t- co Oi p p CO p CO p *- P° h-i p CO I-* jzj *■ b b to tf^ b b b Cn h- 1 CO b ^J i-i o >-r-. l-h I-i CD cd hj hJ hJ to to CO to CO i- 1 h-* h^ >-^ h^ i-i p ►P- p o> p P° hf- ^0 CO ^ to h-^ CO p to h^ y o b CO CD CO *- b h^ *- bo CO to b t(^ to *- b | - 1 T3 o i-i P o c^- VI o' 3 CO H^ h- 1 I- 1 r-i h^ m h^ ^ h-i h-i i — i o p ~J GO p to *- p p p p p CO h-A p ~J p < c b cn bo b CO J-l ^J b CO h-i be 00 ^0 b b b CD CD CO U I- 1 M h-1 M h^ M M k- 1 <. O CD P *>■ f° ** Cn p ~J. O CO p -J C75 p CO h-^ -a p r^ h^ b CO w CO CD *- Cn r-* b #* f-i l<^ f-i bo to o i- 1 • I- 1 r-i CO CO to CO to M i- 1 < 1 1 cro. (rt- o p ^ to CO p O p 1— i p p p CO -a h^ ~j P 6' h(^ bo CO to bo CO CO CD CO ^3 <] t- b bo to CO CO g S- CO h-» h-l co CO CO i-i < CD CO o p p p p h^ to CO f° -0 h-i P p° p p _-a p o io CO Cn CO Cn CO 1W CD CO CO b> b r- 1 b ^a b *~ 1 M B erf- CD O CD CD to Cn ~j <! & O p p p h^ p p *- i 1 — l CD b h^ b h- 1 b bo b HH Oc O cr ts Goodness of Fit. 159 area of the Vllb curve would easily adjust this difference. It is, however, doubtful if such a cor- rection or adjustment would have any noteworthy effect upon the ultimate mortality rates q x , and I do not consider it worth while to go to the addi- tional trouble of recomputing the areas, especially in view of the fact that the observation data above the age of 80 are not exact and detailed enough to be used in this method of curve fitting. For ages up to 70 or 75 I consider, however, the table as thus constructed as sufficiently accurate for all practical purposes. u Massachusetts ^ s an °th er example of the me- 1914^917 "hod I take the construction of a mortality table for the State of Massachusetts from the mortuary records for the three years 1914, 1915 and 1916. The records as given by the Registration reports are better than the records for Michigan, in as much as they have avoided the deplorable practice of grouping all deaths above the age of 80 into a single age group. On the other hand, the classifi- cations of cause of death in Massachusetts by at- tained age are given in ten year age groups only. Hence it is readily seen that we will only be able to secure half as many observation equations as in the case of the five year interval in Michigan. 160 Human Death Curves. ! 6 t it 1 5 ' 1 I J * I * 1 5 % ^3„ ^ * s 8 ftl •■. b * '■i nH Jt ^l" '■ X ^ V This rather large grouping puts the method to a severe test. In spite of this drawback I shall for Massachusetts Males. 161 the benefit of the readers briefly outline the results I have obtained from an analysis of the Massachu- setts data. While for the Michigan data I employed a sy- stem of frequency curves previously used with success for certain Scandinavian data, I found it was easier to fit the Massachusetts data to a sy- stem of frequency curves used in the construction of a mortality table for England and Wales for the years 1911 and 1912 from the mortuary records of deaths by age and cause among male lives. The classification by age of the causes of death in 8 groups is also different from that of Michigan, especially for middle life and younger ages. The parameters of the system of component frequency curves to which I fitted the Massachusetts data are shown in the following table F : Table F. Parameters of the System of Frequency Curves for Massachusetts Males 1914—1916. Group Mean Dispersion Skewness Excess I 78.70 years 7,9775 years + .0920 + .0331 II 68.00 - 12,2051 - + .1151 + .0234 III 63.05 - 13,0532 - + .1210 + .0471 IV 60.45 - 17,8552 - + .0983 - .0091 V 49.60 - 18,6100 - + .0328 - .0309 VI 43.80 - 14,6750 - - .0091 - .0272 Vllb 57.40 - 12,1550 - + .0021 - .0026 Vila and Villa constructed from Poisson-Charlier Curves. 11 162 Human Death Curves. The observed number of deaths according to the 8 groups of causes of death, and their correspond- ing proportionate death ratios are given in the fol- lowing tables G and H. By finding first approximate values and then by a further correction of these approximation areas by means of the factors a. determined by the method of least squares in exactly the same man- ner as demonstrated in the case of Michigan, we finally arrive at the following areas of the various groups. Areas of the component fre quency curves in the Life Table for Massachusetts Males t 1914 — 1916. Areas Group I 90064 — II 281470 — Ill 207854 — IV 151316 — V 99543 — VI 107718 Vila & Vllb 40719 — Villa 21316 1000000 Forming the products N F (x) for the various groups and integral ages we obtain finally the life table as shown in the appendix. In order Massachusetts Males. 163 8 O CO C- ffl> ■* -# CO 00 r* c- i-SS^SSSSSS £ HHiotooooHom 2222229939 -^ cS CD P H>C10CO>OJ--t-10t-I»0 "*COlOOlO>OrtHHO COOJrtH >■ tH CM CD >C) tH CO iH — CSHOCOCOKJCSlrlOOO 00 KH CD H r- T* ffi t> T|l IS CO (N lO H _o COOCOt-COOO^r»COCM CO ^. "* OO O CO (M CO CT CO CO "t= ■< •■ p k* cofflHOmcOrt H [5 IHCMCSIH " CO o I- 1 J. 3 OHiOC^OlHlOD-COO: OMWOlOOOINiOlQX O > CO^(MOOCOCOC-CMH CD „ ro»OHCS>HCO-*H t> p o o CM CM Tt< CM Ci CO CM CM l> CO CM CO CO CO *^ i-H -^ -^ OS CO m f^CDi-HCMas-^aOCTacOC-C- .-^ ^^^.^^'rj . *■ • ■ cb ^ O l-1 r-i io en ^ t> c- -* o km o ^"^SSEl^S^rjrj ^ CO THriHH Km i-t <-l t-H i-H r-t i-I t-I tH CD O SS _^ -m _£ CO .O "© fc=5 i ~r, M COiHiH00r)<THCM'^TH00 S t3 CO lO [- rH ■* ri (N O O CO 8 2 SPr^cococnrHiocMc-coc^r- « m«!rj_;^ -;^_i;~; ■ fc"H • P HCMCO« E"1 ' — ' HHCMMCMHH tD O -k 3 CO CJ I— I JCOCOOHHCOO C3 CO <35 CO iH O CO ■* CM 2oH COCOCMCOCOCMOr- CD ^ ™ „c ^ _; „< „S ^; ,T3 « rH CM CO -tf r-f H HHCMCOCOCO'* CO J3 S) -4-= CO C— CO C5 CM CO c3 CM *# CM OS O} C- * S - SSSSgS Q rncooicococo r-f r-l tH Q r-f CM CO as « o a :::::::::: 2 CD PM > • o T^C33CS^C7SC^CTiC3jC7iOi ^CT3CiCT3CX)CSCJiC7iCJ5cTl ™ riHCTCO-^iOCOt-COjj, 1-HrHCMCOTfiOCOC— CO & i i i i i i i i r i i i i i i i i i a «1o»ooooooooo OvOOOOOOOO> HHCMC0^iOCOE>00O5 H H M CO -^ OCDC^ CO O 11* 164 Humaxi Death Curves. Massachusetts Males. 165 to test the "goodness of fit" of the curves it is necessary to compute the expected or theoretical proportional death ratios from this latter table and compare such ratios with the observed or actual proportionate death ratios as shown in Table H. The theoretical values are shown in Table I, and a graphical representation illustrating the "good- ness of fit" between the observed and theoretical ratios is given in Fig. 5. I think it will be generally admitted that the fit is satisfactory for all practical purposes. The State of Massachusetts has always been the foremost state in the union for reliable and trust- worthy statistical records, and in all probability it would be possible to secure the deaths by causes in 5-year age groups instead of ten-year groups. By taking the above table as a first approximation one should then obtain a very accurate table. On the other hand, it is possible to verify the final results in the above Life Table for Massachusetts by an entirely different process. It happens that the State of Massachusetts took a census in April 1915. This census for living males by attained ages could then be used as an approximation for the exposed to risk, while the deaths for the three years could be used as a basis for the number of deaths in a single year. A Life Table could then be con- structed by means of the orthodox methods usually 166 Human Death Curves. » a * a a yi s * e * »» * 1' * k <-H a « K 8 S J) 8 jJ 51 J § • r •& s 1 * t k * -: * * * > at 4 * '-: ! ■^ s IS 8 <d i p ff * « a *a & ?f J > .1 i • ^ 8 r " f a f i S * i K H S L j I s\L, 1 * ^ := 1""" I R I! | I 5 , « a > > > » t rrJ | * J H If il ^ J« I I » Sj ^ a fe fc i. \l ! B -I ■1 L_\ 8 ,'| .Is t Massachusetts Males. 167 CD ci I o . ™ o g 02 "-J3 P c3 ^4 P5 CD P CDCMOCDiO^COi-HCOO^ CO CM CM rH O ^OOCDC^r-ir-ir-irHOOO © o o o o C- CO t-i ■ T3 05 CD ^H ^WMHjqQHWeONOH^ffiCD^ HoiwcdtNico^i^io^ccdrJdc'c' i-l CM 1-1 i-l WCO^OHiQCOCOO^WOlXiOCOnO lO^^rH^COO^OCOOO^CMTHOOO c3 HidcodH'6dcoi6dcoi>iQrpc6c6^ 02 tHi-HtHCMCMCMCMi-Ht-Ht-I 02 c3 fa oim^mq^HcoooDco^iCoqcocDin t2 idcddw^u3i>cdcrii> io^cow(?i^6 c3 CO a CD > S-i += 3 S-H o o C~ O C3 r-i o O 1-5 cm' o c~ O ^H & »>> ^H ^H o O a CD P "3 CJ 1 CO IT- CO C~ O CD o CM' o 00 -^=> ■4-H 0) o O CD g +=" CD h CO o CO CD CD O CD CO o o iO o o o iO a i-i rH CM CM CO CO m S o — ft iQ(MiOO^CCI>ffi^COCO iO O I> T-HOOOSiOO^CDOCOr-H CM O r-i CM* CO 168 Human Death Curves. employed by actuaries and statisticians in the con- struction of mortality tables from census returns. 12 ' coMmfvB en. As a third illustration, I shall TABLiFitnt—ir cons truct a table for American other tables Locomotive Engineers for the period 1913—1917. The statistical data forming the basic table are the mortuary records by at- tained age and cause of death among the members of The Locomotive Engineers' Life and Accident Insurance Association, a large fraternal order of the American Locomotive Engineers. The total number of deaths in the five year period amounted to more than 4,000. Distributed into separate groups of causes of death, it was found that it was possible to use a system of frequency curves similar to that employed in the State of Massachu- setts, except for Group No. IV, for which it was found exceedingly difficult to find a single curve which would fit the data, and much points towards the actual presence of a compound curve of that group of causes of death among the Locomotive Engineers. The grouping of causes of death is, also slightly, different from that of Michigan and Mas- sachusetts. I shall not go into further details as to the actual construction of this table, except to mention the areas of the various component fre- Locomotive Engineers. 169 y/i /I ill ! V *s A N / *( ] ^ig /; \ > if \ 1> [ 1 v % \ / "3k\ ill \ |l \ * I re a 170 Humaii Death Curves. quency curves of which I present the following table. Areas Group I 44,857 — II 342,645 — Ill 226,022 — IV 147,420 V 47,650 — VI 31,260 — Vila 79,005 — Vllb 77,713 — VIII 3,428 1,000,000 It must also be remembered that the radix of this table is taken at age 20, instead of at age 10 as is the case in the preceding tables. The final graph is shown on the preceding page. A num- ber of diagrams illustrating the "goodness of fit" are also attached and need no further com- ment. It might, however, be of interest to men- tion the fact that the American actuary, Moir, has recently constructed a mortality table for American Locomotive Engineers along the ortho- dox lines from the data contained in the Medico- Actuarial Mortality investigation. Moir's table -- or at least the great bulk of the material from Locomotive Engineers. 171 which it was derived — falls in the interval be- tween 1900 and 1913. Owing to the energetic 'safety first" movement which since 1912 has been actively pursued by most of the leading American joia .016 .014 .012 .010 .008 .006 MH .002 .000 B y, ■ / y'\- // „-- ^' -ty) ■'-' __/_ — -- ^ ''.'■' _^- -^ '~~ 7^- 3£ := 5 *-~Z "^, s^^ _— Jt- ^ : - -v" .-■::- -'" I-T&taZ WiJe C/as/S-JXHS) JT-TtsAeii; %it/r (OSlS-an'l Fig. 7. railroads, it is, however, to be expected that the period 1913 — 1917 indicates a reduced mortality as compared with that of Moir's period. This fact is also shown in the diagrams in Fig. 7. 1 On the other hand, the almost parallel movements of Moir's table with that of the table of the fre- quency curve method of 1913 — 1917, seems to indicate the soundness of the proposed method. 1 Curves I, II and V are Locomotive Engineers' Mor- tality Tables for various periods. 172 Human Death Curves. „x,„T~T„„r, T A similar table showing mor- 12 a. ADDITIONAL ° mortality tality conditions among a de- TABLES cidedly industrial or occupational group has been constructed for coal miners in the United States. The original data of the deaths by ages and specific causes were obtained from the records of several fraternal orders and a large indus- trial life assurance company and comprised nearly 1600 deaths. The number of deaths above the age of sixty were, however, too few in number to determine with any degree of exactitude the area of component curves for the older age groups. For ages below sixty-five the table should on the other hand give a true representation of the mortality among coal miners in American collieries during the period under consideration 1 ). A particular feature of this table is the comparatively low mortality in group VI, which contains primarily deaths from tuberculosis. Coal miners present in this respect different conditions than those usually prevailing in dusty trades where the death rate from tuberculosis is unusually high. The same feature is also borne out in previous in- vestigations on the death rate of coal miners in Eng- 1 It was not possible to seperate anthracite and bituminous coal miners. The data indicate, that anthracite mine workers have a higher accident rate than workers in bituminous mines. Coal Miners. 173 jff, \ *_/,• / ;i / // -'[ ii / J\/. i \ \ / il \ / \ / / A \ v \ \ / 1 1 \ V I V 1 \. \ •si / 1 \ / 1 v ( \ V \ s 174 Human Death Curves. land, and by the recent investigations by Mr. F. L. Hoffman on dusty trades in America. In order to have a measure of the mortality pre- vailing among industrial workers in America, we submit a table derived from a very detailed collection of mortuary records by age, sex and cause of death as published by the Metropolitan Life Insurance Com- pany of New York. A deplorable defect in this splen- did collection of data is the grouping together of all ages above seventy in a single age group, which makes it almost impossible to determine the com- ponent curves for higher ages with any degree of trustworthiness. The defect in the original Metropolitan data for older age groups made it neccessary to modify the earlier sets or families of curves which were used on the Michigan and Massachusetts data and to combine several of the subsidiary component curves, especially those for the older age groups. Such modifications were, however, easily performed by means of simple logarithmic transformations. I give below my grouping scheme for the Metro- politan data designated by the code numbers of the international list of causes of death. The actual cause of death corresponding to each code number is found under paragraph 8 of the present chapter. "Metropolitan" Life Table. 175 GROUP I 10, 39 to 46, 48, 50, 54, 63 b, 64 to 66, 68, 79, 81, 82, 89 to 91, 94, 96, 97, 103, 105, 109 a, 120, 123, 124, 126, 127, 142, 154. GROUP II 4, 13, 14, 18, 26, 27,, 32 to 35, 47 (over age 20), 49, 51 to 53, 55, 60, 62, 70 to 72, 77, 78, 80, 83 to 88, 92, 95, 98 to 102, 106, 107, 109 b, 110 to 119, 122, 125, 143 to 145, 148, 149, 155 to 163. GROUP III 28, 29, 31, 37, 38, 56 to 59, 67. GROUP IV a AND IV b 1, 5 to 9, 17, 19, 20 to 25, 30, 61, 63 a, 73 to 76, 108, 146, 147, 150, 164 to 186, 47 (under age 20). It will be noted that under this scheme Group I includes practically Groups I to III of the Michigan classification, Group II corresponds partly to IV and V for Michigan, Group III is practically Michigan's Group VI, while Group IV a and IV b takes in partly V, VII, and VIII in the Michigan experience. As a further correction I found it also advisable to transfer some of the deaths in the age intervals 10 — 14, 15 — 19, 20—24, and 25—29 in Groups I and II to Group IV a so as to avoid the long left tail ends in these older age curves. 176 Human Death Curves. After grouping the deaths (more than 200,000) of the Metropolitan experience according to the above scheme, it is a simple matter to compute the various PER CENT ~1\ 80 ' "k GROUP ^k GROUP "Si 1 • U^-. K \ \ 1 \ 60 \| . \ ' \\ X- GROUP III. ' \ 40 '> \ \ \ '* \"1 ""V 1 \] V 20 GROUP \| IVA ANO IV37I \ x \_ "^■OW* ^^^t}--= 20 30 40 SO 50 A3ES . Fig. 9. values of R(x) of the four groups for quinquennial age intervals and use these values (altogether 52 in number) for finding the observation equations and in the subsequent determination of the component curves as shown in the final mortality table in the appendix Japanese Life Tahle. 177 to this chapter. A comparison between the observed values of R(x) by quinquennial ages and the con- tinuous values of R(x) (indicated by dotted curves) as computed from the final mortality table is shown in Fig. 9. The "fit" between calculated and observed values is evidently satisfactory. A most instructive and unique experience is of- fered in the table of Japanese Assured Males for the four year period 1914-1917 and based upon the death records of more than a dozen of the leading Japanese Life Assurance Companies. About 35,000 deaths by cause and arranged in quinquennial age groups were available for this construction. The component curves for the older age groups were determined by a simple logarithmic transformation of the variates and offered no particular obstacles in the a priori determination of the parameters. The curves for middle and younger life were more difficult to handle, especially the curves typical of tuberculosis, spinal meningitis and the peculiar Oriental disease known as Kakke, aris- ing from an excessive rice diet. A first attempt to use the same curve types as employed in some of the European and American data did result in a very poor fit between the observed and calculated values of R(x) for the younger age intervals clearly indica- ting that the clustering tendencies were different in the case of the Japanese data than in the other experi- ences I had previously dealt with. 12 178 Human Death Curves. The peculiar form of the observed values of R(x) for the tuberculosis group indicated beyond doubt that the frequency curve for this group itself was a compound curve. I therefore decided to include both spinal meningitis and kakke with the tuberculosis group, and treat this new group as a compound fre- quency curve with two components. By successive trials I finally succeeded in establishing a complete curve system which satisfied the ultimate require- ment of the fit between the observed and calculated values of R{x) for the various groups. 1 Grouping of Causes of Death in Japanese Assured Males 1914—1917. GROUP I Diseases of Arteries, Senility, Influenza, Cerebral Hemorrhage, Acute and Chronic Bronchitis, Broncho- pneumonia. GROUP II Asthma and Pulmonary Emphysema, Cancer (all forms), Tumor, Diabetes, Other Diseases of Body, Paralytic Dementia, Tabes Dorsalis, Diseases of other organs for circulation of Blood, Chronic Nephritis, Other Diseases of Urinary Organs. GROUP III Mental Diseases, Other diseases of Spine and Medulla Oblongata, Other Diseases of Nervous 1 See Addenda for the final table. Japanese Life Table. 179 System, Diseases of Cardiac Valves, Pneumonia, Pleurisy, Other Respiratory Diseases, Gastric Catarrh, Ulcer of Stomach, Hernia, Other Diseases of Stomach, Diseases of Liver, Acute Nephritis, Diseases of Skin and Diseases of Motor Organs. GROUP IV a AND IV b Typhoid Fever, Malaria, Cholera, Acute Infectious Diseases, Peritonitis, Suicide, Dysentery, Tuberculosis (all forms), Syphilis, Kakke, Menengitis, Inflamma- tion of the Caesum, Death by external causes (acci- dents, etc.). Arranging the collected Japanese statistics on causes of death among assured males by attained age at death in accordance with the above scheme of grouping, using a 5 year interval as the unit, we obtain the following double entry table for the 35207 deaths as used in my computation for the various values ofR(x). Ages Group I Group II Group III Group IV Total 10—14 3 4 37 79 123 15—19 17 23 216 714 970 20—24 37 65 181 1640 1923 25—29 62 109 324 1975 2470 30—34 124 257 800 1993 3174 35—39 278 480 1147 2065 3970 40—44 449 662 1299 1674 4084 45—49 701 957 1352 1482 4491 50—54 742 959 1115 990 3806 12* 180 Human Death Curves. Ages Group I Group II Group III Group IV Total 55—59 864 1045 1041 728 3678 60—64 865 847 874 482 3068 65—69 626 571 612 186 1995 70—74 399 268 347 80 1094 75—79 123 76 100 20 319 80—84 16 13 10 3 42 The observed values of R(x) as derived from the above table are shown in the staircase shaped histo- graph in Fig. 10. The correlated values of R(x) as calculated from the final mortality table are shown as dotted curves on the same diagram. The "fit" between observed and calculated values of R(x) is evidently satisfactory except for the youngest age intervals. The construction of the present Japanese table con- stitutes probably the most severe trial to which the proposed method has hitherto been put. We are here dealing with an entirely different race living under different economic conditions than the nations of Europe and America and afflicted with certain forms of diseases which are comparatively rare or unknown among the Western nations. It is therefore gratifying to note that the eminent Japanese actuary, Mr. T. Yano, in comparing the above mentioned table with an investigation he made on the aggregate mortality in 1913-1917 of all the Japanese life assurance companies (about 45 in num- ber) from the actual number of lives exposed to risk Japanese Life Table. 181 So 6s. Zo V^ s - % ^ V GffOuP X \ > > X \ ■"■ » i . \ \ \ \ \ GrOoof li *v 'n. t \ \ \ V \ N X V N N \ \ \ Gtvow "» ^ \ III Qooupt) \ \ \ TVA$AV5 \ \ \ •n. ^ 2Q 3o •^o 5o bo Jo Ayat Peo^Vi. Fig. 10. 182 Human Death Curves. at various ages has been able to test independently the validity of the proposed method to complete satisfaction. (See remarks in preface). 13. criticisms and With these remarks I shall summary close the mere technical dis- cussion of the proposed method and turn my attention to the arguments advanced by certain American critics against the possibility of constructing mortality tables from records of death alone. I deem no apology necessary to meet those critics and give a brief historical sketch of the origin of the proposed method, because re- marks along this line will tend to accentuate the difficulties the mathematically trained biometrician has to contend with in obtaining a hearing among the present day school of actuaries and stati- sticians. A good many critics, among whom I may men- tion Mr. John S. Thompson and Mr. J. P. Little, apparently have received an erroneous impression of the fundamental processes of the proposed me- thod and its evident departure from the conven- tional methods. Mr. Thompson states "If we un- derstand the process, the result is simply a gradua- tion of "d " the "actual" deaths, and it is not apparent why a mortality table should not be formed from the unadjusted deaths and some other Criticism and Summary. 183 function of graduation with equally good re- sults" 1 . From this it would appear that Mr. Thompson is of the opinion that I have graduated the deaths as actually observed. As any one who will take the trouble to read the above article can see this is not the case. The actually observed numbers of deaths have only been used to con- struct the observed proportionate death ratios 2 . The whole process may be summarized -as fol- lows : 1) The choice (a priori) of a system of fre- quency curves based upon the hypothesis that the distribution of deaths according to age from typi- cal causes of death can be made to conform to those postulated frequency curves whose para- meters are known or chosen beforehand. 2) The grouping of causes of death so as to conform with the above mentioned system of fre- quency curves. 3) The computation for each age or age group of the proportionate death ratios of such groups 1 Proceedings of the Casualty Actuarial Statistical Society of America, Vol. IV, Pages 399—400. 2 These objections by Thompson and Little are shown in their full obscurity in the case of the tables for Lo- comotive Engineers, Coal Miners and Japanese Assured Males where the greatest number of observed deaths fell between ages 35 — 49. 184 Human Death Curves. from the oollected statistical data of deaths by age and by cause of death. 4) The choice of approximate values of the areas of the various component frequency curves. Such approximate values can be determined by inspection or by simple linear correlation methods. 5) The determination by means of the theory of least squares of the various correction factors a with which the approximate values of the areas must be multiplied in order that we may obtain the probable values of the areas of the component curves. The observation equations necessary for this computation are obtained from the observed proportionate death ratios, which are indepen- dent of the exposed to risk. 6) The subsequent calculation of the products NF(x) for all groups and for all integral ages. This gives us again the total number dying from all causes at integral ages among the original cohort of 1,000,000 entrants at age 10. In other words the d x column from which the final morta- lity table can be constructed. 7) The computation of the "expected" or theoretical proportionate death ratios from the final table and their subsequent comparison with the "actual" or observed proportionate death ra- tios to illustrate the "goodness of fit". It is this last step which constitutes the verifica- Criticism and Summary. 185 tion of the results derived by means of a purely deductive or mathematical process, and is a test of very stringent requirements. It is namely re- quired that there must be a simultaneous "fit", not alone for all groups of causes of death, but for all age intervals as well. The sole justification of the proposed method hinges indeed upon the validity of the hypothesis. Is it indeed possible to choose a priori a system of frequency curves to which to fit our observed data? Theoretically speaking each population or sample population, as for instance certain occupa- tional groups such as locomotive engineers, far- mers, textile workers, miners, etc. will in all pro- bability have its own particular system of fre- quency curves. From a purely practical point of view — and this is the one in which we are chiefly interested — we may, however, easily get along with a limited system af frequency curves for the various groups of causes of death and limit our- selves to a comparatively few sets of frequency curves to which to fit our statistical data. The case is analogous to that confronting a manufac- turer of shoes. Undoubtedly the foot of one indi- vidual is different in form from that of any other individual, and in order to get an absolutely fault- lessly fitting boot we would all have to go to a custom boot maker. Practical experience shows, 186 Human Death Curves. however, that it is possible to manufacture a few sizes of boots, say 6's, 7's, 8's and intermediate sizes in quarters and half s, so as to fit to com- plete satisfaction the footwear of millions of people. Exactly in the same manner I have found from a long and varied experience in practical curve fitting that it is possible to fit the mortuary records of male deaths by attained age and cause of death to a comparatively limited number of sets of component curves, say not more than 5 or 6 sets. Moreover, if in a certain sample population a certain curve should not exhibit a satisfactory fit it is indeed a simple matter to change its para- meters so as to improve the fit. 14 additional ^- n re g ar d *° * ne classification PIUNCIPLES OF 0f the CaUSeS 0f death int0 a method limited number of groups it seems that some of the critics of the method are of the opinion that this classification is ironclad and fixed. This, however, is not the case. While in a specific sample population a certain cause of death might fall in group II, it is quite likely that the same cause of death would come under another group in another sample population. For instance, the deaths from asthma are in Michigan grouped under Group II. In the case of Coal Miners such deaths would, however, go into group Additional Remarks. 187 IV or group V. If the classification of causes of death were fixed, the frequency curves for separate population would show great variations, and it would be out of the question to limit ourselves to a small set of systems of component curves. Mak- ing the classification flexible, we are, on the other hand, in a better position to proceed with a fewer number of curves. For instance, in order to use the postulated frequency curve for Group VI for Michigan it was necessary to place the cause of death listed as No. 186 (other accidental trau- matism) of the International Classification of Causes of Death in that group instead of in group V or VII, where most deaths of this type are or- dinarily classed. It would be interesting to see to what extent the proposed classification and the chosen system of frequency curves in Michigan deviates from the theoretically exact system of frequency curves. In the case of Michigan it would be impossible to test this. An approximate test might be obtained from the Michigan mortality data for the three year period 1909 — 1911. Professor Glover has con- structed a mortality table for males in the State of Michigan in this three-year period by means of the usual methods employed by actuaries by resorting to the exposed to risk. Starting with a radix af 1,000,000 at age 10 it is possible to break 188 Human Death Curves. up the deaths or the d x column of the Glover table into a set of subsidiary columns of death from groups of causes of death in the same order as given in Table A on page 133 by means of a simple application of the observed proportionate mortality ratios as derived from the 1909 — 1911 period. On the basis of a radix of 1,000,000 sur- vivors at age 10 we find that according to the Glover Table, 5016 will die in the interval from 50 — 54. Let us also suppose that the proportionate mortality ratios in group III for ages 50 — 54 amounted to 0.23, then the number of deaths from group III in that particular interval in the Glover table would be 5016 x 0.23 = 1154. Similar num- bers could be found for the other groups and for arbitrary age intervals, and we would in this man- ner have an empirical representation of the fre- quency curves. This aspect of the matter is treated in brief form on another page. Keturning now to our original discussion, it will readily be admitted that the method of construc- ting mortality tables by means of compound fre- quency curves cannot be considered as absolutely rigorous from the standpoint of pure mathematics. But neither can the usual methods of constructing mortality tables by graduation processes either by analytical formulas, mechanical interpolation for- mulas or a simple graphical process be considered Additional Remarks. 189 as mathematically exact. All statistical methods are, in fact, approximation processes. In the greater part of the realm of applied mathematics we have to resort to such approximation processes. It is thus absolutely impossible to solve correctly by ordinary algebraic processes simple equations of higher degree than the fourth. We encounter, however, in every day practice innumerable in- stances in which an approximation process, as for instance Newton's or Horner's methods or the method of finite differences, is sufficiently close to determine the roots of any equation so as to satisfy all practical requirements. From this point of view I claim that the pro- posed method in the hands of adequately trained statisticians will yield satisfactory results, and I am inclined to think that the results are probably as true as the ones obtained by means of the usual methods, which especially in the case of gradua- tion by interpolation formulas often are affected with serious systematic errors. Moreover, there are sound philosophical and biological principles underlying the proposed method, which is perhaps more than can be said about the usual methods, purely empirical in scope and principle. On the other hand, I will readily admit that the proposed method is by no means a simple rule of the thumb and it can under no circumstances be entrusted to J 90 Human Death Curves. the hands of amateurs. The whole process can in my opinion only be employed when placed in the hands of the adequately trained statistician who is thoroughly familiar with his mathematical tools, as provided in the formulas from the probability calculus. Such adequate training is not acquired over night, but only through a long and patient study. Meticulous and patient work is often re- quired before one is finally brought upon the right track, especially in the classification of the causes of death. Failure upon failure is oftentimes en- countered by the beginner in this work, and it is probably only through such failures that the in- vestigator is enabled to avoid the pitfalls of the often treacherous facts as disclosed by statistical data and steer a clear course. Mathematical skill is only acquired through a long and careful study. The illustrious saying of the Greek geometer, Euclid, who once told the Ptolemaian emperor that "there is no royal road in mathematics" holds true to-day as it did in the days of antiquity. The fact that the method is no simple mechani- cal rule, but one which can be entrusted into skill- ful hands only, is, moreover, in my opinion, one of its strong points, because it eliminates all at- tempts of dilletantes to make use of it. A large manufacturing plant would not, for instance, put an ordinary blacksmith or horseshoer to work on Additional Remarks. • 191 making the fine tools for certain parts of automa- tic machinery employed in the manufacture of staple articles. Only the most skilled and highly trained tool makers are able to produce machine parts, which often require precision measurements running into one thousandth part of an inch. Nor would a large contracting firm dream of putting a backwoods carpenter in charge of the construc- tion of a skyscraper. Yet, this case is absolutely analogous to that of letting the mere collector of crude statistical data make an analysis and draw conclusions from certain collected facts as ex- pressed in statistical series of various sorts. While some American critics to all appearances have misunderstood the principles underlying the method, several European reviewers of the short summary of the method as originally published in the "Proceedings of the Casualty Actuarial and Statistical Society of America" evidently have un- derstood its fundamental principles completely. The European critics seem, however, to be of the opinion that there is a rather prohibitive amount of arithmetical work involved in the actual con- struction of the mortality table. Thus a review in the Journal of the Royal Statistical Society for May 1918 has this to say : "Mr. Fisher's object is to construct a life table, being given only the deaths at ages and 192 Human Death Curves. not the population at risk. The hypothesis employed is that the total frequency of deaths can be resolved into specific groups of deaths, the frequencies of which cluster around cer- tain ages. The parameters of these sub-fre- quencies having been determined, the areas are deduced from a system of frequency cur- ves of the form : R (x) = N * F *& ■ BK ' ~ N B F B {x) + N c F c {x) + N D F D (x). . . where Rb(x) , the proportional mortality at age x of deaths due to causes in group B and F B (x), is obtained from the equation of the sub-frequency curve for cause B , while Nb + N c + N D + . + N E = 1,000,000. The values of R(x) provide a system of observa- tional equations from which (by least squares) the values of N B , &c., can be obtained. "Since particularly in industrial statistics, or in general statistical inquiries under war conditions it is easier to obtain accurate data of deaths at ages than of exposed to risk, the success of the method is encouraging. It is, however, to be noted that the amount of arith- metical work envolved is considerable. Quite apart from the determination of the para- meters of the frequency curves, the formation and solution of the normal equations needed to compute the areas is a heavy piece of work. It would be of interest to see whether the re- solution into but three components effected by Professor Karl Pearson in his well-known Additional Remarks. 193 essay published in the "Chances of Death" could be made to describe with sufficient ac- curacy an ordinary tabulation of deaths from age 10 onwards to lead to approximately cor- rect results for life table purposes. The test should, of course, be made with mortality data derived from a population very far from being stationary and the deductions compared with the results of standard methods. The subject is one of peculiar interest at the pre- sent time." From the above quotation it is evident that this English reviewer has a clear conception of the fundamental principles upon which the method is based. His criticism is mainly directed against the heavy piece of arithmetical work involved. This work can, however, not be compared with the much more difficult task of obtaining the ex- posed to risk at various ages, which under all cir- cumstances would take much greater time and be infinitely more costly, in fact be absolutely pro- hibitive from a financial point of view. I wish in this connection to state that the whole arithmeti- cal work involved in the construction of the Michi- gan table was done by two computers in less than 70 hours, while the corresponding table for Mas- sachusetts took about 75 hours. I do not know if this can be called exactly prohibitive. In regard to the remarks of my British critic 13 194 Human Death Curves. concerning the Pearsonian method I might add that in my first attempt of an analysis of mortality conditions along the lines as described above I tried to subdivide the causes of death into four groups. It was, however, found that this was not always sufficient to describe the frequency dis- tribution of the number of deaths around certain ages. 1 doubt whether it is at all possible to des- cribe the frequency distribution in the various sub- groups by a system of normal curves, which, of course, would somewhat lessen the work. I have made attempts to do this, but so far I have not been successful except in a few cases. 1 It might be possible that we should succeed in this if we first set up a hypothetically determined curve of the numbers exposed to risk. Such a curve might, for instance, be a normal curve. Personally, I be- lieve that little would be gained by such a proce- dure. More fruitful appears an analysis by means of correlation surfaces. The mortality table con- structed by the process as I have described it con- stitutes in its final form a correlation surface, wherein the age at death and the group of causes of death are the independent variables, and the number of deaths at a certain age and from a 1 See Addenda for the Metropolitan Table and the Japanese Table. Another Application. 195 certain group af causes of death is the numerical value of the correlation function of the two va- riates. Provided one could obtain an exact equa- tion of such a correlation surface, it would be a simple matter to construct a mortality table, and I hope that some statistician may in the future be induced to attempt a solution of the problem in this lieht. 15. another ap- Before closing the discussion of PLICATION OF & thefreqven- this subject we shall, however, CY CURVE ME- J . thod give a brief description of an- other application of compound frequency curves in the construction of mortality tables. We have here reference to the use of skew frequency curves in the graduation of crude mortality rates as com- puted in the usual empirical manner as the ratio of deaths to the number of lives exposed to risk at various ages. On page 165 it was mentioned that the State of Massachusetts took a census in April 1915. This census together with the deaths for the triennial period from 1914 — 1916 makes it an easy matter to construct a mortality table in the conventional manner. Moreover, such a table can be compared with the previously constructed table from mortuary records by sex , age and cause of death only and shown in the appendix. In this connection it might be worth mention- 13* 196 Human Death Curves. ing that my first table for Massachusetts as con- structed by compound frequency curves was pre- pared during the summer of 1918 and first pre- sented in a series of lectures delivered at the University of Michigan during the month of March 1919, while the final official report of the 1915 Massachusetts census did not come in the hands of the present writer before May 1919. Another Application. 197 The official census of the population of Mas- sachuetts by sex and single ages is given on page 478 in Vol. Ill of the Massachusetts report from which Fig. 11 has been constructed. It is seen from a mere glance of this graph that there is an unduly high tendency among the figures to cluster around ages being multiples of 5. This tendency is especially marked in the age interval 30 — 60 and presents a defect which is of no small im- portance in the construction of a mortality table by means of the conventional methods. It is in- deed doubtful if a table constructed from data so greatly influenced by observation errors and misstatements of ages can be considered as ab- solutely trustworthy. On the other hand the data ought to be sufficiently exact to test the results arrived at by the proposed method of compound frequency curves. We give below the male population in 5 year age groups for the middle census year of 1915 and the corresponding deaths from all causes durirg the triennial period 1914 — 1916. MASSACHUSETTS 1915 Male Population and Number of Deaths among Males from 1914 — 1916. Ages Population, L x . Deaths 1914— 10. D x . 5— 9 169010 1715 10—14 152419 1004 198 Human Death Curves. Ages J 15—19 J opulation, L % . 1 154773 Jeaths iyi4— 1537 20—24 171961 2353 25—29 171017 2726 30—34 149294 2979 35—39 142617 3535 40—44 125462 4007 45—49 107909 4393 50—54 89490 5026 55—59 65133 5459 60—64 49079 5679 65—69 34790 6027 70—74 23638 5946 75—79 13724 4752 80—84 6494 3166 85—89 2479 1751 90—94 530 540 95—99 124 133 100 & over 12 23 A few small discrepancies will be found to exist between this table and the table printed on page 163, giving the observed deaths from various causes in ten year age intervals. This arises solely from the fact that a number of deaths were re- corded where the contributing cause was unknown and could, therefore, not be distributed in their proper groups. But this defect is of no influence in the construction of mortality table by means Another Application. 199 of the method of compound frequency curves, un- less all the causes reported as unknown should happen to belong to the same group, which hardly can be assumed to be the case. At any rate the proportionate death ratios which are the keystone in this method of construction are for practical purposes left unaltered whether we include or ex- clude these few numbers of unknown causes. In the usual way of constructing tables from ex- posures and number of deaths it is on the other hand absolutely essential to include all deaths as otherwise the death rate will be underestimated. Bearing these facts in mind we therefore refer to the above figures of L x and D x for Massachu- setts Males from which we without further diffi- culty can construct an empirical mortality table, either by graphic methods or by simple summa- tion or interpolation formulas. There is indeed no dearth of such formulas, of which a large number have been devised by Milne, Wittstein, Woolhouse, Higham, Sprague, Hardy, King, Spencer, Hen- derson, Westergaard, Gram, Karup and several other investigators. In the following computation I have used a formula originally devised by the Italian statistician, Novalis, and later on some- what modified by the English actuary, King. The following schedule shows the actual process in detail. 200 Human Death Curves. MASSACHUSETTS MALES. A. Population. Graduated Quinquennial Pivotal Values. Graduated Ages Population L x A L x A 2 L X Age Population 12 29332 17 30836 34537 34369 22 5— 9 169010 — 16591 10—14 152419 + 2354 + 18945 15—19 154773 + 17188 + 14834 20—24 171961— 944 — 18132 25—29 171017 — 21723 — 20779 27 30—34 149294— 6677 + 15047 32 29739 35—39 142617 — 17155 — 10478 37 28607 40—44 125462 — 17553— 398 42 25095 45—49 107909 — 18419— 866 47 50—54 89490 — 24357— 5938 55—59 65133 — 16054+ 8293 60—64 49079 — 14289+ 1765 65—69 34790 — 11152+3137 70—74 23638— 9914+ 1238 75—79 13724— 8130 + 1884 80—84 6494— 4015 + 4115 85—89 2479— 1949 + 2066 87 90—94 530— 406 + 1543 92 95—99 124— 112 + 100—104 12 52 57 62 67 72 77 82 294 97 102 21587 17946 12961 9802 6933 4717 2731 1265 480 104 23 1 Graduated Population = u x+7 = 0.2 L x+5 — 0.008A 2 L, +5 Another Application. 201 B. Deaths 1914—1916. Graduated Quinquennial Pivotal Values. Ages 5— 9 10—14 15—19 20—24 25—29 30—34 35—39 40—44 45—49 50—54 55—59 60—64 65—69 70—74 75—79 80—84 85—89 90—94 95—99 100—104 In this manner we obtain the graduated quin- quennial pivotal values of the population and of the deaths for ages 12, 17, 22, 27, ... . etc. Then No. of Deaths D x A * ' (\ 2 n* Age Graduated Deaths 1715— 711 1004 + 533 + 1244 12 200.8 1537 + 816 + 283 17 307.4 2353+ 373 — 443 22 470.6 2726+ 253 — 120 27 545.2 2979 + 556 + 303 32 595.8 3535 + 472 — 84 37 707.0 4007 + 386 — 86 42 801.4 4393 + 633 + 247 47 878.6 5026 + 433 — 200 52 1005.2 5459+ 220 — 213 57 1091.8 5679 + 348 + 128 62 1125.8 6027 — 81 — 429 67 1205.4 5946 — 1194 — 1113 72 1189.2 4752 — 1586 — 392 77 950.4 3166 — 1415 + 171 82 633.2 1751 — 1211 + 204 87 350.2 540— 407 + 804 92 108.0 133— 110 + 297 97 26.6 23 102 4.6 202 Human Death Curves. by dividing one third of the graduated deaths by the population we have the graduated pivotal values of the so-called "central death rates", or m x for quinquennial ages from age 12 and up. From these values of m, we easily find the corre- sponding values of q x by means of the formula : 1*- 2 + m x We give below the results of this computation Massachusetts Males 1914—1916. Age 1000 q x from Novalis' Formula 12 2.21 17 3.33 22 4.64 27 5.29 32 6.68 37 8.25 42 10.65 47 13.53 52 18.67 57 26.38 62 38.29 67 58.12 72 81.90 77 109.91 82 165.02 87 240.18 92 325.64 Graduation of d x Column. 203 The intervening values of q x are without diffi- culty derived by interpolation formulas or by a graphical process. Once having all the values of q x for separate ages from age 10 and up it is a simple matter to form tables of l x and d x commen- cing with a radix of 1,000,000 at age 10. Without going into tedious details we present the following values of l x for decimal ages. Massachusetts Males 1914—1916. kge h Ages 1,d x 10 1,000,000 10—19 27,700 20 972,300 20—29 47,330 30 924,970 30—39 66,750 40 858,220 40—49 98,650 50 759,570 50—59 153,900 60 605,670 60—69 233,150 70 372,520 70—79 237,130 80 135,390 80—89 124,760 90 10,640 90 & over 10,640 100 32 16. graduation It is to this table that we now BY FLUENCY sha11 **& » P™* 88 ° f re " curves graduation by means of the method of compound frequency curves.. Here we have already an empirical representation of the total compound curve of death or the d x curve. 204 Human Death Curves. This compound curve can now by simple and straightforward processes be broken up into its various component parts as to causes of deaths by means of the various observed proportionate mor- tality ratios, R x shown in Table H on page 163. Let us for the sake of illustration take the age interval 40 — 49. According to our empirically con- structed table as derived from the Massachusetts 1915 census we find that the number of deaths among the survivors in this age interval amounts to 98,650. Applying to this number the observed propor- tionate death ratios, B , in table H we are able to break this number up into its various component parts according to the groups of causes of death from which the numerical values of R x were de- rived. These component parts are as follows : Group Nc i. of Deaths I 1180 II 18050 III 17170 IV 17170 V 14300 VI 23970 VII a & b 5820 VIII 990 Total : 98650 Graduation of d r Column. 205 oooooooooo o OOCO«OiOOiOCOiO'*fl o ^HCOCOl>CDOC>i-lr-IC~CO o t-Ht-I"^ICDCT:iOCOCOCMt' 1~t CM CM T-( i-H OS ooooo — oooo CMOCOOCOOG^^t-I tBO) H H OOOOOOOOOO COCJOi-ICMIMOOOCM T-ICMCOlO*OTtlcOr-l OOOOOOOOOO i-iioooac^ooacoocM iOi-(oO'^aic^vooo H 51 IN rt rl OOOOOOOOOO COOSCOvOOCMCOr-CMO HWOH"*CD(OOn o CO o CO "= M CO OOOOOOOOOO 03iHOr-iC~CMO00»OO CDWX(MHC(3Q[*(»0] tH t-1 CM CO CO t-I OOOOOOOOO tH^HC-COWHCDCO HCO[^>COCO^MH rH CO CD lO CM ft. s CO S o ft o O OOOOOOOO OrHiOOC^COvOOO c©C£>OC*-CMOt-ICM CO00COHCBHO5-* in -* r- en -# o o o o o o oo co io r- CO (31 i-H CM Tjl O iO lO r-i lO y-1 O CO CO CM -# CO o CO o o ■^H C75 C75 OCS OC5 M ri H IS CO ■* S) | | | | I <! oooooooooo HHCHCO^lOCOC-COO! 206 Human Death Curves. In the same manner we can break up the com- pound curve (the d x curve) in its eight component parts for all other age intervals, which finally gives us the following table of component groups, printed on the preceeding page, and graphically this table will represent a series of frequency diagrams of the various groups of causes of deaths. It is an easy matter to fit such diagrams to a system of Laplacean-Charlier or Poisson-Charlier frequency curves, which symbolically may be represented as follows : N^x), N u F u (x). .N^F^x) where F(x) is the frequency function of the per- centage distribution according to age of the va- rious component groups or curves, while N stands for the areas of such curves. These curve areas are simply the sub-totals of the respective groups in the above table. The pa- rameters giving the equations of the curves F t (x), F n (x), F UI (x), .... are easily computed by the methods of moments and are shown in the follow- ing table on page 207. Once having determined the parameters of the various frequency curves it is a simple matter to construct the final mortality table which is shown in the addenda. Graduation of d x Column. 207 Values of Parameters of Component Curves, Massachusetts, 1914—1916 Males. 1 Group Mean Dispersion Skewness Excess I 75.0 9.78 +0.080 —0.005 II 67.5 13.65 +0.117 +0.017 III 64.0 14.12 +0.124 +0.030 IV 60.5 16.51 +0.089 —0.006 V 50.0 18.61 +0.026 —0.034 VI 43.5 15.57 —0.036 —0.023 Vllb 57.5 16.33 —0.027 —0.028 It now remains for us to compare the final values of q x which we obtain from the three tables : A) The values of q x as computed in the usual 1 In this grouping I have combined Vila and VIII into a single group and roughly fitted this group to a truncated Poisson-Charlier curve. This, of course, is not exact and introduces evidently errors in the younger age interval from 10 — 19. For ages above 20 this curve plays no importance and the other curves should for the ages above 20 give a satisfactory fit. If absolutely exactitude was required for younger ages it would indeed offer no difficulties to compute curves Vila and VIII separately and thus obtain a much closer fit in the youngest age interval. In view of the fact that the present calculation is a test case only, it has not been thought necessary to go to these refinements. This defect will af course also effect to a slight extent group VII b. 208 Human Death Curves. way from the number of lives exposed to risk and the corresponding deaths at various ages. B ) The values of q x as obtained by a re-gradua- tion of the mortality table under A by means of compound frequency curves. G) The values of q x constructed from mortuary records by sex, age and cause of death, but with- out knowing the numbers of lives exposed to risk. Massachusetts Males. 1914—1916. Values of 3000 q by various methods. Age A B C 17 3.33 3.15 3.27 22 4.64 3.99 4.28 27 5.29 5.04 5.46 32 6.68 6.72 7.03 37 8.25 8.63 8.88 42 10.65 10.83 11.05 47 13.53 13.86 14.05 52 18.67 18.83 19.13 57 26.38 26.88 27.66 62 38.29 38.79 40.26 67 58.12 59.04 56.54 72 81.90 76.50 77.61 77 109.91 103.69 107.51 82 165.02 137.97 148.79 I think that every unbiased investigator will admit that there exists a close agreement be- Comparison between Methods. 209 tween the three series. It is indeed difficult to say which one of the three is the most probable. We know that on account of the great perturba- tions due to misstatements of ages the values under A are effected with considerable errors. The usual interpolation or summation formulas do not suffice to remove these errors and tend often to increase them. A re-graduation by means of fre- quency curves as shown in series B will in all probability give better results, although on ac- count of the large age interval (10 years) in which the causes of deaths are grouped in the Massa- chusetts reports this method does not come to its full right 1 . The values of q x under A and B are naturally closely related to each other, and those in series B cannot be derived unless the values in series A are known beforehand. Series C on the other hand is independent of either A or B, having been derived by means of entirely different methods of construction. 17. comparison A comparison between the pa- B §£WJi?£'£l F ~ rameters in the seperate com- thods ponent curves in B and C gives us, however, a way of testing the validity of the hypothesis upon which the method of See footnote on page 127. 14 210 Human Death Curves. series G rests. In the case of the series G we star- ted with the hypothesis of the existence of a set of frequency curves of the percentage distribution of the number of deaths according to age among the various groups. On the basis of this hypothesis and from the observed values of the proportionate death ratios, R , we determined by the method of least squares the areas of this postulated set of frequency curves. In the case of the B series we broke up the empirically constructed compound death curve (the d curve) into its various com- ponent parts according to a similar classification of causes of deaths as under C. We have therefore in this case an empirical determination of the areas of the component curves and all that we need to do is to graduate the rough frequency diagrams as represented by such areas to a system of frequency curves. Let us now briefly examine how far the various skew frequency curves in series B and C differ from each other. In regard to the various statis- tical parameters of the separate groups we have the following results : Means. Group Series G Series B I 78.5 75.0 II 68.0 67.5 Comparison between Methods. 211 Group Series G Series B III 63.0 64.0 IV 60.5 60.5 V 49.5 50.0 VI 44.0 43.5 Vllb 57.5 Dispersions 57.5 Group Series C Series B I 7.98 9.78 II 12.21 13.65 III 13.05 14.12 IV 17.86 16.51 V 18.51 18.61 VI 14.68 15.57 Vllb 12.16 Skewness. 16.33 Group Series C Series B I + 0.092 + 0.080 II + 0.115 + 0.117 III + 0.121 + 0.124 IV + 0.098 + 0.089 V + 0.033 + 0.026 VI —0.010 —0.036 Vllb —0.002 —0.027 14* 212 Human Death Curves. Excess. Group Series G Series B I —0.033 —0.005 11 + 0.023 + 0.017 III + 0.047 + 0.030 IV —0.009 —0.006 V —0.031 —0.034 VI —0.027 —0.023 Vllb —0.003 —0.028 Taken all in all there is found to exist a satis- factory agreement between the hypothetical va- lues in series C and the values derived by empiri- cal methods. It is only in group I that we find some important discrepancies. This group contains causes of death typical of extreme old age where we naturally may expect great perturbations owing to large errors from random sampling, especially in series B. In this same connection we may also mention that the empirically deter- mined values under series B are subject to a slight correction by means of the Sheperd formulas, which were not employed in my computations. We have already mentioned that the system of frequency curves which we choose a priori for Massachusetts (Series C) was the same system which we had used on a previous occasion in the construction of a mortality table for Eng- Comparison between Methods. 213 lish Males for the period 1911— 1912 1 ). This is a fact of no small importance. It will in general be found that the percentage distribution according to age in the various component curves differs little in different sample populations. Even in the case of American Locomotive Engineers it was found possible to use the same set of curves as in the case of Massachusetts and England and Wales. In the same way I have found that the set of curves used in the construction of the table of Michigan Males also can be used in the case of males in the urban population of Denmark. With a very few exceptions I have found it possible to get along with a limited number of sets of curves, say four or five sets. Should it never- theless prove impossible to fit the original data to any one of these particular curve systems, it will in most cases be found possible by means of suc- cessive approximations to reach a system of cur- ves which may be made the a priori basis for the construction of the final table as was the case in the table for Japanese assured males. Finally we come to the comparison of the vari- ous areas of the component curves. We have here : 1 See " Proceedings of the Casualty Actuarial Society of America", Vol. IV, page 409. 214 Human Death Curves. Areas. G B I 90064 105000 II 281470 296190 III 207854 213010 IV 151316 144200 V 99543 87850 VI 107718 106260 VII & VIII 62035 47410 Total 1000000 1000000 Evidently the agreement is not so close in this case. But it would indeed be rather rash to assert that the values in series G are faulty. One must here bear in mind the diametrically opposite principles employed in the determination of these areas. In series B we have a direct determination by empirical methods. In this determination we shall, however, find reflected all the original sy- stematic and observational errors originally pre- sent in series A from which the curves under B were computed. Every error due to misstatements of ages and systematic errors introduced by the summation or interpolation formulas will be di- rectly reflected in the areas under series B, and such areas can therefore in a sense only be con- sidered as a first approximation to the true or presumptive areas. Comparison between Methods. 215 Another point well worth remembering is the one that no conditions are imposed upon the areas in series B. In series G where we work with mor- tuary records only we have on the other hand the very important condition or restriction requiring that the areas of the component curves must be so determined that their ratios to the compound curve for various age intervals will conform as closely as possible with the observed proportionate death ratios, R x , for those same age intervals. In order to test the influence of this additional requirement in respect to conformity to observed proportionate death ratios we might use the values of the component curves under series B as a first approximation and then afterwards determine the correction factors a for the areas in exactly the same way as in the case of series G. No doubt such a calculation would tend to improve the table. A difficulty occurs, however, in the case of the Massachusetts data owing to the large interval of 10 years into which the causes of death by attained ages are grouped. As pointed out in the footnote on page 127 the quantity R B (x), (x = 30, 11, 12, 100 ; B =1, II, III, ) , can only be considered as being independent of the "exposed to risk" if the age interval into which the deaths fall is sufficiently small. If this is not 216 Human Death Curves. the case, the "central" values of Rb (%) are subject to certain corrections. In the case of the groups of causes of death typical of younger ages the observed "central" values of Ryii (%) and Iivm (x) for the age intervals 10 — 19, 20 — 29, 30 — 39 are evidently too high, while on the other hand the values of Rj (x) and JJ n (z) in the case of the age intervals 60—69, 70—79, 80—89, 90 — 100 are too low as compared with the true values of R(x) at these "central" ages. I have, however, tacitly ignored this fact in my computa- tions. The subsequent result is that the final values of q x for the younger ages in column C as shown on page 208 are in all probability a little too high, and the values of q x above 65 too low. In the case of the other tables as shown in the present book the age interval into which the causes of death were arranged was 5 years or less, and the error was thus reduced to such an extent that further corrections may be disregarded for all practical purposes. ADDENDA I Showing Detailed Mortality Tables and Death Curves for 1) Japanese Assured Males (1914 — 1917) 2) Metropolitan Life. White Males (1911—1916) 3) American Coal Miners (1913—1917) 4) American Locomotive Engineers (1913 — 1917) 5) Massachusetts Males (Series C) (1914—1916) 6) Michigan Males (1909—1915) 7) Massachusetts Males (Series B) (1914—1916). 218 Addenda. Mortality Table — Japanese Assured Males 1914—1917 (Aggregate Table) Age I II III IVa IVb dx lx lOOOqx 15 24 65 343 2379 2811 1000000 2.81 18 39 74 360 3645 4118 997189 4.13 17 43 84 388 4888 5403 993071 5.44 18 48 93 415 5981 6557 987668 6.64 19 54 107 446 6826 7433 981111 7.58 20 60 120 478 7447 8105 973678 8.32 21 68 135 513 7716 8432 965573 8.73 22 77 153 550 12 7734 8526 957141 8.91 23 87 171 591 27 7581 8457 948615 8.92 24 101 195 633 50 7274 8253 940158 8.86 25 111 218 678 77 6864 7948 931905 8.53 26 126 246 729 112 6384 7597 923957 8.22 27 140 278 780 153 5860 7211 916360 7.87 28 160 315 838 206 5341 6860 909149 7.54 29 178 353 899 26S 4821 6519 902289 7.22 30 198 305 963 341 4323 6220 895770 6.94 31 227 446 1033 425 3853 5984 889550 6.73 32 252 501 1109 521 3421 5804 883566 6.59 33 286 557 1185 629 3021 5678 877762 6.46 34 319 626 1273 751 2665 5633 872084 6.46 35 358 700 1364 885 2336 5643 866451 6.51 36 401 779 1460 1031 2048 5719 860808 6.64 37 450 872 1564 1186 1797 5869 855089 6.86 38 502 970 1671 1350 1566 6059 849220 7.13 39 570 1081 1791 1524 1366 6332 843161 7.51 40 638 1197 1916 1701 1191 6643 836829 7.94 41 716 1332 2049 1883 1037 7017 830186 8.45 42 802 1475 2193 2066 903 7439 823169 9.04 43 899 1632 2341 2249 783 7904 815730 9.69 44 1005 1799 2501 2428 680 8413 807826 10.41 45 1126 1985 2671 2599 598 8979 799413 11.23 46 1261 2180 2852 2764 514 9571 790434 12.10 47 1406 2393 3042 2917 447 10205 780863 13.07 48 1575 2611 3236 3061 395 10878 770658 14.12 49 1764 2867 3459 3187 339 11606 759780 15.27 50 1957 3122 3666 3298 295 12338 748174 16.49 51 2180 3395 3892 3389 257 13113 735836 17.82 52 2426 3679 4136 3473 224 13938 722723 19.29 53 2692 3984 4380 3532 195 14783 708785 20.86 54 2987 4285 4638 3576 172 15658 694002 22.56 55 3306 4610 4922 3611 147 16596 678344 24.47 56 3654 4940 5177 3612 130 17513 661748 26.46 57 4026 5274 5456 3605 113 18474 644235 28.68 58 4432 5603 6742 3581 97 19455 625761 31.09 69 4857 5937 6025 3544 84 20447 606306 33.72 60 5316 6257 6316 3498 74 21461 585859 36.63 61 5795 6568 6604 3424 69 22460 564398 39.79 62 6293 6860 6890 3345 59 23447 541938 43.27 63 6805 7129 7162 3255 51 24402 518491 47.15 64 7332 7361 7423 3150 43 25309 494089 51.22 65 7854 7570 7672 3042 38 26176 468780 55.84 Addenda. 219 Ige I II III IVa IVb dx Ix lOOOqx 66 8366 7727 7896 2919 36 26944 442604 60.88 67 8863 7838 8089 2791 31 27612 415660 66.43 68 9313 7894 8257 2655 28 28147 888048 72.53 69 9719 7894 8385 2511 23 28532 359901 79.27 70 10053 7829 8468 2362 20 28732 331369 86.71 71 10294 7700 8503 2212 18 28727 302637 94.92 72 10424 7496 8477 2067 15 28479 273910 103.97 73 10424 7227 8389 1901 13 27954 245431 110.69 74 10280 6897 8230 1746 13 27166 217477 124.91 75 9970 6503 8002 1593 10 26078 190311 137.02 76 9492 6057 7695 1444 10 24698 164233 150.38 77 8834 5571 7313 1298 8 23024 139535 165.00 78 8037 5047 6853 1159 7 21103 116511 181.12 79 7086 4499 6314 1026 6 18931 95408 198.42 80 6046 3943 5733 900 5 16621 76477 217.33 81 4953 3400 5091 784 4 14232 59856 237.77 82 3871 2862 4421 676 3 11833 45624 259.35 83 2813 2365 3730 577 2 9487 33791 280.75 84 1957 1907 3046 489 1 7400 24304 304.48 85 1232 1498 2396 412 5538 16904 327.61 86 701 1141 1797 340 3979 11366 350.08 87 343 844 1275 277 2739 7387 370.76 S8 140 603 844 225 1812 4648 389.78 89 48 40S 516 179 1151 2836 405.85 90 14 269 283 141 707 1685 419.58 91 5 171 134 110 420 978 429.44 92 111 53 83 247 558 442.65 93 56 14 63 133 311 452.10 94 28 4 44 76 178 457.05 95 14 2 31 47 102 460.78 96 5 1 22 28 55 509.01 97 14 14 27 518.50 98 99 4 9 4 13 4 692.30 1000.00 Mortality Table Metropolitan White Males 1911- -1916 Age I II III IVb IVa dx lx lOOOqx 10 SO 153 205 47 1720 2205 1000000 2.21 11 95 179 274 01 1776 2385 997795 2.39 12 118 210 350 77 1812 2567 995410 2.58 13 141 244 444 96 1832 2757 992843 2.78 14 168 282 550 116 1834 2950 990086 2.98 15 202 327 671 140 1825 3165 987136 3.21 16 240 373 810 171 1803 3397 983971 3.45 17 282 427 960 199 1772 3640 980574 3.71 18 336 483 1130 233 1733 3915 976934 4.01 19 393 545 1315 274 1680 4207 973019 4.32 20 454 611 1514 311 1612 4502 968812 4.65 21 527 685 1728 358 1539 4837 964310 5.02 22 599 765 1951 407 1449 5169 959473 5.39 23 6S7 845 2184 459 1363 5538 954304 5.80 220 Addenda. Age I II III IVb IVa dx iX lOOOqx 24 775 932 2428 515 1279 5929 948766 6.25 25 874 1024 2674 575 1190 6337 942837 6.72 26 977 1120 2924 638 1107 6766 936500 7.32 27 1088 1223 3173 703 1012 7199 929734 7.74 28 1202 1328 3414 770 923 7637 922535 8.28 29 1324 1436 3648 839 840 8087 914898 8.84 30 1473 1549 3879 909 757 8567 906811 9.45 31 1584 1662 4089 985 684 9004 898244 10.02 32 1702 1779 4283 1052 614 9430 889240 10.60 33 1863 1899 4459 1125 545 9891 879810 11.24 34 2012 2015 4604 1196 485 10312 869919 11.85 35 2160 2139 4740 1266 427 10732 859607 12.48 36 2324 2259 4842 1332 378 11135 848875 13.12 37 2485 2379 4919 1399 335 11517 837740 13.75 38 2664 2501 4968 1462 296 11891 826223 14.39 39 2847 2617 4989 1520 25S 12231 814332 15.02 40 3057 2734 4988 1577 226 12578 802101 15.68 41 3272 2848 4953 1628 192 12893 789523 16.33 42 3508 2960 4898 1675 163 13204 776630 17.00 43 3767 3066 4821 1719 143 13516 763426 17.70 44 4057 3170 4719 1757 120 13823 749910 18.43 45 4389 3267 4604 1789 100 14149 736087 19.22 46 4748 3358 4471 1816 90 14483 721938 20.06 47 5153 3447 4320 1839 75 14834 707455 20.97 48 5599 3526 4160 1855 61 15201 692621 21.95 49 6064 3598 3991 1867 50 15590 677420 23.01 50 6631 3663 3810 1872 42 16018 661830 24.20 31 7198 3721 3630 1872 35 16456 645812 25.48 52 7820 3769 3443 1867 30 16929 629356 26.90 53 8492 3809 3254 1857 22 17434 612427 28.47 54 9168 3839 3069 1840 10 17926 594993 30.13 55 9897 3858 2876 1820 1 18452 577067 31.98 56 10637 3868 2696 1793 18994 558615 34.00 57 11378 3867 2519 1762 19526 539621 36.18 58 12114 3853 2340 1726 20033 520095 38.52 59 12847 3830 2169 1687 20533 500062 41.06 60 13555 3794 2004 1640 20591 479529 43.77 61 14217 3746 1844 1591 21396 358538 46.67 62 14817 3685 1692 1541 21735 437140 49.72 63 15359 3615 1547 1484 22005 415405 52.97 64 15820 3535 1408 1425 22188 393400 56.40 65 16179 3443 1277 1364 22263 371212 59.97 66 16450 3340 1153 1299 22242 348949 63.74 67 16610 3229 1037 1235 22111 326707 67.68 68 16691 3109 930 1166 21896 304596 71.89 69 16591 2981 828 1098 21498 282700 76.05 70 16412 2851 736 1030 21029 261202 80.51 71 16107 2711 649 955 20422 240173 85.03 72 15721 2568 571 892 19752 219751 89.88 73 15225 2423 500 825 18973 199999 94.87 74 14629 2271 434 759 18093 181026 99.95 75 13946 2126 377 695 17144 162933 105.22 76 13225 1976 325 632 16158 145789 110.83 77 12423 1828 278 572 15101 129631 116.49 78 11580 1684 237 515 14016 114530 122.38 Addenda. 221 Age I II III IVU IVa dx lx lOOOqx 79 10729 1543 200 461 12933 100514 128.67 80 9840 1406 167 411 11824 87581 135.01 81 8950 1272 138 363 10723 75757 141.54 82 8092 1144 115 318 9669 65034 148.68 83 7237 1024 98 282 8641 55365 156.07 84 6420 911 79 247 7657 46724 163.88 86 5645 806 65 208 6724 39067 172.11 86 4920 707 53 1S1 5861 32343 181.21 87 4240 615 43 150 5048 26482 190.62 88 3622 531 34 126 4313 21434 201.22 89 3065 457 27 106 3655 17121 213.48 90 2550 387 22 87 3046 13466 226.20 91 2099 327 16 70 2512 10420 241.07 92 1698 270 14 56 2038 7908 257.71 93 1355 222 11 45 1633 5870 278.19 94 1053 179 8 35 1275 4237 300.92 95 805 143 6 27 981 2962 331.20 96 595 112 5 20 732 1981 369.51 97 412 85 1 14 512 1249 409.93 98 286 62 10 358 737 485.75 99 198 27 6 231 379 609.50 100 95 15 4 114 148 770.27 101 27 5 2 34 34 1000.00 Mortality Table — American Coal Miners (1913—1917) Age I n III IV Va Vb VI dx be lOOOqx 18 99 124 142 4566 7 366 5304 1000000 5.30 19 114 144 164 4702 10 408 5542 994696 5.57 20 140 168 187 4954 14 452 5915 989154 5.98 21 162 194 214 5196 19 498 6283 983239 6.39 22 190 223 243 5234 27 546 6463 976956 6.62 23 223 250 272 5151 38 597 6531 970493 6.73 24 256 282 307 5067 50 646 6608 963962 6.86 25 298 315 341 4952 69 697 6672 957354 6.97 26 341 349 379 4846 91 749 6755 950682 7.11 27 390 386 421 4748 120 802 6867 943927 7.27 28 440 424 465 4683 156 853 7021 937060 7.49 29 498 461 508 4569 202 903 7141 930039 7.68 30 557 500 560 4413 257 953 7240 922898 7.84 31 622 538 609 4220 326 1002 7317 915658 7.99 32 688 579 663 4000 408 1048 7386 908341 8.13 33 761 618 718 3757 505 1093 7452 900955 8.27 34 837 654 777 3500 618 1133 7519 893503 8.42 35 915 693 840 3233 749 1175 7605 885984 8.58 36 994 732 905 2963 898 1212 7704 878379 8.77 37 1084 775 973 2697 1064 1246 7839 870675 9.00 38 1171 818 1045 2435 1251 1277 7997 862836 9.27 39 1267 867 1124 2184 1452 1305 8199 854839 9.59 40 1364 920 1206 1946 1667 1329 8432 846640 9.96 41 1471 978 1293 1723 1894 1352 8711 838208 10.39 42 1581 1045 1386 1515 2131 1369 9027 829497 10.88 222 Addenda. Age I II III IV Va Vb VI dx lx lOOOqx 43 1705 1125 1489 1325 2372 1383 9399 820470 11.46 44 1835 1222 1585 1106 2609 1395 9752 811071 12.02 45 1 1976 1322 1712 883 2841 1403 10133 801319 12.65 46 6 2132 1444 1837 853 3063 1408 10743 791186 13.58 47 10 2302 1584 1971 729 3265 1410 11271 780443 14.44 48 21 2492 1741 2114 619 3443 1408 11838 769172 15.39 49 32 2705 1918 2265 524 3595 1402 12441 757334 16.43 50 42 2934 2118 2423 442 3706 1395 13060 744893 17.53 51 54 3190 2337 2589 368 3790 1383 13711 731833 18.74 52 73 3470 2567 2764 307 3832 1368 14380 718122 20.02 53 94 3775 2820 2945 255 3832 1352 15073 703742 21.42 54 123 4104 3086 3130 210 3790 1331 15774 688669 22.91 55 153 4437 3355 3313 173 3706 1308 16445 672895 24.44 56 185 4843 3637 3501 141 3595 1281 17183 656450 26.18 57 225 5246 3922 3689 115 3443 1252 17892 639267 27.99 58 268 5656 4192 3872 93 3265 1220 18566 621375 29.88 59 310 6085 4454 4047 76 3063 1186 19221 602809 31.89 60 354 6530 4703 4209 61 2841 1148 19846 583588 34.01 61 402 6970 4936 4364 48 2609 1109 20438 563742 36.25 62 450 7403 5133 4500 39 2372 1076 20964 543304 38.59 63 508 7832 5305 4618 30 2131 1023 21447 522340 41.05 64 573 8230 5438 4718 24 1894 978 21855 500893 43.63 65 648 8615 5533 4795 19 1667 931 22208 479038 46.36 66 746 8954 5581 4846 15 1452 884 22478 456830 49.20 67 875 9255 5596 4871 13 1251 834 22695 434352 52.25 68 1015 9507 5563 4871 9 1064 785 22814 411657 55.41 69 1207 9704 5479 4841 6 898 736 22871 388843 58.81 70 1437 9846 5358 4786 6 749 686 22868 365972 62.49 71 1702 9917 5196 4701 4 618 637 22775 343104 66.38 72 2008 9931 4999 4592 4 505 588 22627 320329 70.64 73 2334 9871 4771 4460 2 408 540 22386 297702 75.20 74 2677 9747 4513 4302 2 326 494 22061 275316 80.10 75 3028 9557 4233 4125 2 257 449 21651 253255 85.49 76 3332 9307 3941 3929 1 202 408 21120 231604 91.19 77 3610 9001 3638 3722 1 156 366 20494 210484 97.37 78 3827 8643 3322 3496 120 329 19737 189990 103.88 79 3967 8237 3012 3267 91 293 18867 170253 110.82 80 4020 7799 2704 3029 69 258 17879 151386 118.10 SI 3980 7327 2411 2788 50 226 16782 133507 125.70 82 3916 6803 2123 2552 38 198 15630 116725 133.90 83 3658 6315 1846 2313 27 171 14330 101095 141.75 84 3370 5801 1596 2085 19 147 13018 86765 150.04 85 3040 5286 1366 1862 14 125 11693 73747 15856 86 2684 4776 1151 1650 10 105 10376 62054 167.21 87 2305 4281 957 1448 7 88 9086 51678 175.82 88 1937 3809 789 1261 5 71 7872 42592 184.82 89 1584 3353 640 1085 3 60 6725 34720 193.69 90 1269 2924 513 927 2 48 5683 27995 203,00 91 985 2535 404 784 2 38 4748 22312 212.80 92 747 2168 310 650 1 29 3905 17564 222.33 94 551 1845 231 531 22 3180 13659 232.81 94 396 1545 170 428 17 2556 10479 243.92 95 278 1279 119 338 12 2026 7923 255.71 96 198 1050 79 261 7 1594 5897 270.31 97 126 845 48 195 5 1219 4303 283.29 Addenda. 223 Age 98 99 100 I 85 70 35 II 672 525 401 III 26 9 IV 140 96 59 29 4 Va Vb VI dx 2 925 701 ix 3084 2159 lOOOqx 299.94 324.69 101 24 298 495 1458 339.51 102 19 217 351 963 364.48 103 14 149 240 612 392.16 104 10 97 163 372 488.17 105 8 55 107 209 511.96 106 6 25 63 102 727.65 107 3 2 37 39 794.87 108 2 5 S 625.00 109 1 1 3 1 666.67 1000.00 224 Addenda. (MKnn«DOWC0'*(N«liO«OQ0aifflOHMONOM«0»©NC0'*iOrtriiSr| rHf-lrHi-li-IrHrHrHiHi-li-ICq(Nffl« M oo ia 01 g ■>*" id in omoohoo owoxocoeo Kan «^I>OTC0^I>0iG^C0C»0iC<lC0OC0r^NM®C001i-tNWTi(>nini0i0»rii0i0iC5TH'*N nHHH(MNMNMWnW'!i''*-*THiOiniOiaOia(0©!0(O l »!0<0!DO©©©CD!00 OONU!«)mQ0OT|i®NN'!|(HiflO©NN00C0'*0DOC0C0OHNro , *<0C0<0iOa(N OOH^NO^NH^NOCO^OJHMiONWQiOHHNHHHOGr-C^iNOXOM NCOCO(MTt<-<*l->*iftiniOeD!D«D(OI>I>C-l>l>I>.COOOQOCOC300000WC , -t-l>t>l>I>*SO<D© o S3 . ON^tMOoON^^NOXiNt-THQOHTHfflMinmH^IJKDNCOt-TjIHiONNWlMO f> ^i>OM©COHi'NOCOinOOOniOCOOlN'*(OCOOHM'*ifl!ONC0010)OC10aOO COC0^*rpTHT*>Oif3UicD«O^O<Ot*I>l>t-a000000000OiOiO)OlOiOiO>O>OiOi0SOiffiro0) . GQ00)HOt-TdM(NaOOOHNl>O'*NWHOiniaHMQ0i0OinH000>Oaa'a P" <»iH^OOrHOOOlftOlO(NCO^C<103r-eO"*OTCOCOCOOTrtil050l^030W"*vO'OCOCOCpH H HNNO]OiSM'*'*iOlOCD<ONacOaiOHNW-*lO(OM)OaiOHM'*iO<DNXClON HHHHHHHHHHMfMNWNWNNIMMCfl BCDOJI>000>0(»in-*NHQ'*000005I>0'<Ji03 , «liaNOTjii*^(Mi>eo©©fflfflOI)CO OWI>H<OOlflOlOOWa , *MCONO"*010500'niMlN<*a)lOtOHmH®»ONHt-w HHHHHHHHHNNNWWM'i"*lO H^^00r-C0Tl<00"*t«-0)0SOeQMC050>OMO)NMC000W«D0SI>C0Qi|'*C0TH000iNI>H N»l>OWWX'N©HI>rflHffiNCDWWOiON0001H«©HiO'«KTH«omMNf;HS HHH«NMW>*iOiOfflr-»ffiOHlMmT*(Dt>»OHMiONOM<OOiOO riHHHHHHHNNNNNMWM'iiTldO fcDOi-lC<IM'fi(iO«Dt-00OlOr-ltNCQ^ittSDI>0003OrHfflCOTHia©r-000lOHCMC0Tfli0=g Addenda. 225 0CCQOaiOi-j»n»-; OTCO O«O«^NM0JH«HNlO^t0HOIHlflTKH(NlOCBin00WoSSS« HHHrtHHHHHHrtH<M(NN(NN«N'NNWM« CD ^ll OJ ^3 Cfj CO CO ^-1 On Lft *M O I s *- \n f^J **^ r*- ** *N1 <*■* Is- i rt tv, m /— s rw r*_ ar* i a —b aa «i ai i »^k«^it coeoiooiisiowrs' 22 H ?3ri2S592£: , * u::iHO;i ® Nqoo;iCi 'HcocqNMONN«iat>H«o>OHwaiNw*-*o) WWH(Bn©0000)lX©WeoaNOMrtOOOrtWlflOlNN«l(]OCOCOOCO<OlO^MNNHH INI»(NINIMIM«(NINHHHHHHHHH •> *N N 1-1 1-1 r-l 1-. rHoowasiQOJinw WWWHH ON-*H»lOHMl0N03®«iHQ0®'JlNOC0N®lQ'*e0NHHH ©KJlQlO^llirJlwmMlNMlMIMHHHHH C»C»D-OJ»OTfiH«OiOOaSJ>COOinO , *Oiei30lTt<r-IOOO»rtiHCOSDSO-*M'*t>C<IOOOvf5'«^I>r-(CD NO>OTH(MOOO«DWO[*'*HB«3H«'#ONo:ON^OW>ONOMOtOHQOOOiO'iiM(MNH OJffiroooQcoaiQOMNNNtDcotoiniaia^'j'^commNNNWHHHH OMHlONOC0G0JOOT((00'*0Je0<000ffllOa>C0<DlOMOlflN©»n05OWNC0QCD000J!0H^ «»-*OlOOd5'*THr((HNHiflCDN(OTJ(HeOCOOO«!OOIM<OHtOHNCi3HacONNaH'*COMa' <MeQTHiain<d(OOcD(D<D>nka'«eisiNHOo>NcD'«coHOoot«ia'd<^HOoot<-cDia'4i^coiN(MH «MW«M«MWMWMmMWWCQmeOINMlNM(M(MNHHrtHHHH t»NNOr»ifliNaHOOMI>'*0«H(OCOiomiOOJ©r»U5HHM(MODMNOlMNMaot'Mmo ©■*NOOO(NC-®NaU3WU3H(Ma(N^lON»HN»CO , *MtOIMH'*0000!H«HDO<D'*« iOSD«t-l>O.|>000000001>t-I>r-«OCOiaiOTj(TjtC0CO(NlMNrHrHi-li-l O[»aHNONC0iiltDk0HQ0<MNQ0IMNtxHt>>aiMI>OQ0OG^0)C0naa0<D^C000C0C0a0VQ0 '*M'iioo>oaNH^ONwaN'*inifttDHHia'#a>wiooooi>NmaODOrtOMN'*ooH >OHI>-*Hr'C0OeDH©aNM^«HC0'*ffim00aHTJl©aH'*00N©WN'*H»®^«Hrt O««I>a)C00JOOi^iHrHNMNW(NiHiHOO0JQ000r-«'iOiO'rJiC0C0(NWrHi-lrH HM^M«Nl»a»r-Q0O«0O(DC0NI>Nt-HNO00MMWH-*H-*niMinN00iACD'*'*3« ■^(OOiC-|ir500rHinO!0 , ^ieO«30(DlO<Dt^03QOeD01C30'--ia>iOCOr-lOQOTjli-IO)aJT-l<D'*'*I>Nl*ifJ HHHNNWM«^'l<Ul»l>OOIN'*OCOON«'*ifl'*i*(NHroiO'*IN©N©ifiMlNHH HHHHHN<NlMIN(NlMlMN<MHHHH 15 226 Addenda. XWH^aiOfDHCJOHO M »(OHNO<OON'*weoOcOOmtDO<0'^MiOC095 JCO OSi-Hl>oqi>COCOCOI>0 tf »T)*u3<D«OJHIN-*!D0qqWini>.O)W-*NOW©O 2©I^l>OCo"osOSrHcdiH© 2 CMCMCM°<NCMNCOeOWMe0^^^^^iOinidtdcDCDr^ — ; CO CO ■* O (M m 00 ■* CO CO o 2 2 CO CO co ■* ■* ■<* ■* m to t- o » -:cr:iHi-i>-t<H-tH ocooooociONm©©i-(cooooHH(Mio®coo5co(a "ODiOMMH >j © co r-i co o cm co cm o co r-i *# m eo © m co os i> ■* t«. cs c- ooo>oo>ooooooi>t^t*soso»nmifirt<Tjicococ»iHi-( ©osoocsoscscsososcsoscsoscscsoscsososcscsos 1-1 ..xntjih^ommnmh t^moasco , *os'*eoiocoaocNii>©csj>-i».omioe<ieo ^COffiCOOC^ClH X MflOTKNNUlOOmNINIMHMHHMNHN^H "(MrlrlH fl P3THlOONOiO<MMiOI>Or iCOOt-Oii-ICOCOCOi-lTjl WCMCMCM!NCMCOCOCOCOCOCO^}<^''tf^'^tinirainiC>eOCD S FT « i—i tj i-h r-i i-i i-i ih ©a cm cm co co «* ■>* m w m co co t- co oo H m ej OiOWCOINCHaafflNOmoOinHOOMMMElO C rn M OlHiOOOCOOCiOlN^NffiH^OONNCiJOMOtH ti. 7J H t~©T(i«rlOO)«CON5DO , *'*«NIMHHH * ~> ri H«ocoih ■ £ NW(Ncq««oi>mo^i>HNmaioocoo i — i Oi ^ a; ->> m -*00Cir-C0<M00C0-*OlOOC»«iOC0OCg«J00C0I>"* l-» Sh _L I— 1 CO OS 00 m W CO CM OS rf< OS CM m CD 00 GO m <N OS m iH GO W iH —i *-^ -^r b- THcoTdmcDcciiNt-QOoocooooococot-r-ixDco© ^^ iMM(MMtH10NIY1QOOC4Q I> OS©OH ■ iH CM CM ©J Q CD CD CD U5 CM "* *# ©OS COCO O COI> *# COIN -* ■* E> i-I CM CM CO ■* "* tO CD t- CO OS O rH CM CO •* O CO rH iH 1-! rH rH i-< tH , (DrlN'*'#(0O'*N00O(MMMX"*Hl0ifiOlDffl(M ^-^ to ^ to o co t> i-i « o ■* o •* o in o to i-i i> co m ■* o m o to 4^ p" NMW«'*'*lO»niOCOt»N«COOOOOHN(Mm« McONiflHffiMOinNrt H CM (N i-I iH co o co t> i-i « o ■* c» tc o m o co i-t t- co oo - p- cm eg co co -# ■»? m m in to i> i> oo oo cs os o © r r-> . o co m o co cm co m »# o oo co © os os i-i r- gm © cm cm cs eo ■"it > i*iOl>OST-HTj(cOO>CM>OCOCMCOOSCO00CMI>CMI>CSlt-C0 tJ H ,-( r-( iH r-l CM CM CM (M CO CO CO ■* ■* ■* lO m CD CO t^ I> 00 00 OS CD •r. cd S m -* CO i-i CO Tti CO ■* CO CD iH 00 » rt( rH O rH incOl^CM CDHN ■ 1=4 m cm co ■* m cd t> os t-i co co co i-i m os co i> w co o m os t« oo I— I i-lr-lr-tr-lCMCMCNeOCO'sJi-'tfinWOCOCO r -|,_|,_| r -|CMCMC0e0^ Addenda. 227 HHHHHHHHHiNiMi:](MOJ(NWC0MT)i'*-*'#iOiaC05D<OI>N00CO OO^HQCO«)aWCOOu:OQO'*WNlOint^MN«OQONeO®Miaffi!OOH©©'*«OHO'*H(Nfl) Nt-Q000000iC)OOHHdN OCOW^>£3"*OOOCOCOrH"*01lftt-CC'-HI>.CO(>7«DrHi--l03irS»0(NOOOOt--COirtOOCOOI>OaJQO'-lCOeD G©WHONOi'^«waiwccnn»i'aHawa>NtDW-twxniOHOoo®W!OOHMcoiNiN OONiOf-0«t005NiOaiN«OOiftO'nNCO«OW(MOOJCOMO-)iH()0 , *(10N'nONtOT(iHr"(M HHHHHHHHHHHHHHHUHHM(NN(M0q<M(MW(M(M(MW51 HHHHHiN(NWeQCOCOM^^^^WW^OiOiOiaifl»ni(JiOifliOin'!i''«JfT(('<JICi3nM(NMM fflfficooto^iMOiisioMor-ujWHaMrjcoNoacONtDio^toiNH lfjiomui^^^nmmeomwtNiNWHHHHHH MNa0OC0^MWiOM-*NffiH'*O«aiNOO'*M'rtlM0000ifl0J'!lli3)OOai03iOMNlOHaiO (NM'*'!mOO(DCOffi©ISN<0©-*(DlOW*'*W(MHOO)OONCOm'*M<MH005QON©<OKJ , !t"* ININNMINNNNWNNNNINIMNNNNNNNNWHHriHrlHHHHH M^HainNMQOTHooaot-ONosuit-iaOHXOaweooofflMocoo^OT^oioaoao©© H»rHlOO'*CO(NCOO!(M©(IOH« , *»l>Ma3!ai«®iflMH(10i*0©H(OHiOO>l»<003NincO ^ r H r 4 r HrHrHiH^rHTHiHWr^iHrHrHrHrHrHrHi^r-li-('-(r-(r-(rHi-Hi-liHrHiHi-Hi--i I HiHrHrHiHi-tiHr-« QQHHIM«'*'#in(DM100100N(Nt<5'JllOtDNCOOlOH(MlMMmiiTfiTH'*T)l'!HmcC(NHOO> HHHHHHHHHHHHIMIMIM&KMMMeiiMINCiCOCOCOnmnnCOCOCOCOCOCOCCGQCQCOiM (MHNTltWfflNNNiOHOlNtOOiiniSt^OMTXO^OHN^NffiONONTjiO^NOOOmwaJ IN!DQWeOO'*OJlOMHHWOffiWCONOifl1"*tDO)H(MHtDOOlOiOONXHCOCOiMWiO©M Nr.N»XO>QOOHINM'Ji<ONOW>OOM"XlO-fXmNH'#NOlM'*i , ^'*NOX>nHNW mMCi»MO'*(D©Offl<O^NO'*aMOH'MWH«DaMlOOOXinirtON»0'*OXWNg XOiMOXNin^lWC0(NNMOXM00t»ffl-*'Mr)(HH»On-*00©XHOT(MlO«C0'OWq'*C07 i*OtDNNXO!OHN«'*>a!DNe!OM'fll>OMNHiOOlOO!OHN(NNHlOCCOHNHroN HHNKilOfflQIMlOOJMXMX'JOCnHHMONHO^MM© rH-H^n-:] t i tc rt-tinmffif-oofflHNioxronaico 15* 228 Addenda. K^Hi'i'ioronacOi-iHOCOtDINinM^ONiflOiN^MOWOWSOWiOeiCilNO CTTji ©OaDC0ajiONC0eDOOTHO0003NWt»M'*C00qOH00O"*MO©l0M , *CDO yffiOOHIMM«'*m(ONQOaiOHINm'*irj(DOt-CO©MfflHMlONlO««>'*0 ^ HHHHHHHHHHHrHMNNNNlMiMOlNIMWMMNCOWmMMWnOO *ftOTMWOnHNI>OOH-*HOO«*iaQOC>'*-*(M'*mH(NN«lNOWaNaOfiH yCOHWOTC«5MtOINHlfl^ffiM'*NHiOQOOtDClO«DCOMO)® , *eOIN'-' OTlH03I>mC0^HO00I>«D , *'^ , CCC-J(MT-lr-l tMtNrHrHrHWiHi-l !/ WiOO(MNiNi-KOO»INO!OHMONHCOOrtO>J^«0)l>0>nHt» l OCOHHHH ^OOOOOOlN^OCHNOTffilSiOiOOmiMNMaJMOMiNlMHH <NtM<N(N I -<i-l!-l.-li-l>H!-l oa>coo)i>«D©in>OTi(Ti<eomNO]HHHH ► i> Osr-lf3CCOi<©(NOO-*'Oi»lNOH>ir3COOT(MCO-*COa3COt-(MI>COC HcONffllOCONHQXNlO^IMHOffiWIXOifl^COniNNHrtp NNIMINNINMHHHHHHHH MO00lflON(0i-lH(DWOt-0)MOlONa)aa)ONH0qi0a5NNWWO'*MHHf CO^OOCOOHOOXOinm^OWOOffilllOCOiaTXWNHHrtHHHHHHHr "oiiioiOHNnooTjiHroNm^nNHHH wOMiOHcocoMffiNi>a(0©CimNHinaiHO'#fflO'*io»wasoiflM(NH M«QSN05i}lCOO!D'*iON'*inOHlOfflOQDH^m'HaiTHWHH O)0000NNtO(OlO'*^MeQNINHHH HOi>MOoou3a3HcomNi>a.mofflMmo5a^iflinoa)'*coNHHH coHinxmoaffioiMoocoaD^aiiooiOKO^mciriH £iO!DNao»OHC]WTfiioat-(]OfflOHNm'>*iaffiNooa!OHWmTHiowNcofflO ™l>NNNNC0Q0Q0QCi®Q0XQ0»C0ffl03ffl01Q0!0!0!0>O!OOOOOOOOOOH <J rlHHHrlrlnHHHrl Addenda. 229 S12S5ISS£;S£;53' : 2 0> Sd5 05D '-* oslo '- |t ^ < MO«0(MOO>nNrHWeOr-iairHt-asiOO^ NCOOt»N«OIM'*©OOa3MHN' , *iOHOONONa , iJ<a©mNNHHH NGON^II>COMt1IMCOUJON©NOiOHCO>0^!NHH HHHHHHNWIMIMN s ° WWO«NOOOO!fflH©00«DOOHOD«DCOMNHOHlO'*MOffi©"*M!DOTNinOQOHO QN©»0»ONCOONrH-*WHNN'*»0-*i:qoOMt-0(N'*©C001H«*r.O'*COMOm<H01) HNW*ia«DOOO>OHNMTO^'*'*'*'*eOWNWHOaMINt».!DlOMJ , *MWlN(NNH CD ©NH»0)N05©a^OOW©NeONOD<OW05»«CffiiOMiN©wafflNOODHINNffla}W NO^NH©OWO»HNW©«5H©Mffl'*ai^lffl'*«NiOaiOlM'*(D©NNCOift^TO HWWweoW'iiTj<wio«D®Ni>ooffiC300HHO](Nimm'j(^ijiio»ow«iioiauiiowioio Wj .5P aoONNaWCOMO^N^NOJH^NH^ISHTHNONWiatO'ilOMOHWaiNMH^ S 2 K. tH®000(MiONOCOiflCOHTflNHTHNH'*NH'!HNH'>*NOOT«lOJ<MWCOOlNiai>OlN ^^ i-M K" HHrtWMN^ei3W03WTH^iJ(LlL'JinfflfflONNNroaC0010)0)fflOOOHHHHN(M O iH i-C iH iH r-( iH lH rH iH l XOO»t--t--t--OSflOeDrHTtfO<»C<100-*Ci'OrH<D«OO i ^'OSI>'*OilOQO-<*lCOMI>- r> WiOCOOOON'^NOXMrhNON'^NONONONiONOrainaNWO'^a M HHHH<NWIMNN«»MW*^T(tWiOi0q,C0O©!DNI>l>t-Q0(»0l0>0) rH<D«00 , ^'©I>'*OilOQ0-<*ICOWI>-Oi«OOllOOseO OHHNMW l-f u»^THTf*iOOiiOO'*iOC3SCOCOOWOS©'*0^-OC001COOiH«Dr>TH'>*0(CiOOiO«000'>*eO l_i N{0*j(100NaHWiaNOeor-OWNHiOCONOXOnifl!DN©OHWiOa'*HOHiO OJf»»O^OHTHM^'*OHCOHWNWHON»»OHiOiONO'^M HHHHW(MMM'*lOW©N00ffiO«WTl*tDt«.fflON iHl-lrHi-HrHrHi-HNN 4j HHr 230 Addenda. MfflOfflNHNt-^XOCOlOlSWNHINWBCOWffimfOIX-COONHtDO^COlOffllNmOINOMa' i— miftiOifta«HX'*O'*f-0}Hi-(Hail>'*OtDH®O^C0(N©H(DHa0lOmW'*l>HI>^«mil' HOO!iMN»inmNHat^ift-*NOC»lOO l 3rtaO«OCOH»lOMOCOinKlOCO!0'*MOO)N(D'n'*M , J '*m©WHai>a(000©'*HMw5tsoaiiftmoT(iffiHMNcorHHHm<Dr-ato©oQOM^o3fli «Olncgco^TJ^co»oOG<IO<»^a)^HOJlnoDOsall>(^]I>(^^^Ml>^-^>co^flNwcool!^lol(^^0(^^o^50 , n3comffi©cooffl<»r-Ni>»©inificoiMffl«N<»c'3©aoo«ioO'*<D(OiO(Nooi*coraMNNMO COffiOiOHWNCO'*iO!OI>OOGOHMlMW^^»OiOifl©OiOiOU3'«MMHOCOMO'*eqrtaMN HHHHHHHHHHrlNW(MMMN(N(M(MMIN«M<MlM(M(N(M(NiNHHHHHH HWO>lOC10)0«X)Nil'*H©a)X«3N©C»OM'*'*'iliONO!HlOOiacOHNrl«l©0!OH(CWoCO HHcq-*m©ooooaaio>a)coN!OiO'f(MOfflNinMHaNinTj(NHOMN(Oin'*«W(M(NHHH km««MMW«COWMmcO«inMWMCON(NiMN(NHr«r(HrtH .. hx , *aT(io3mMm<OHmoiooiaH!D(MciONai®'*HaicDtDin'*nNMH HWnNiMHHOOaaMOONNOtOiOUl^-tWCOIMWNHHHH f ^tDooONMin<ot~ooo!a)OJoxacoN<0"j'MHooioiMOJiaHNnoomao'*o)'i'0>-fai , *oorit» WNNWMnOTMM«inMMOTeCMM«TOCOnOJNNHHH00050i(X)XI>NffltOifliniO'i<Tl<W ^NiC3-t'*T)*MmM(NNOQOiOHNH'*ininTllHNN>nMDlMNCO'*HCOifl!MOI>'l'C«5rlOOrlM' Ml , in | OI>XO)OHINm'*-*iO©fflM>NNNI>«OlO>*«lNHOO>X!00'*eOHOOaOI>«D>0^ HHHrtHHNNNNNNNNlNMINMMNNlMNINflNNNMHHHHrirlHH BNCl^MOXHOINONntDMISNHm^MXNNTOOXWlflOHOTHXOtDNiHOHOaMW WrtCOmfOMOXXr-<0(M(00(M«XN050l>HO'*MOJMeo^«IX'*t>rt©0>HXNX«0 M^"OXH^'NHiOXNfflOT((I>OCOmcONXI>tOlO«ONeOO<OfNMMOCDMO)©'*rlfflr«CDiO HHHNWNCOMOTT)(^iflinia(D(Dffl©(OOiO©©«DOu3iniO'*^eQMOT(MNHHHH 1-H OS i— n ud t— i ^k^ ^p ^n i."* vj ^r i^ t t uu t-'j w uu ^* i*n ^** <JJ t*N u M^l-.ffi(MiOMM(DOlOOlOHI>iMX'*0)'*OJ( NcuMconw^^nnioctoNNxxiarooOp '*WI>"*lHONW'*lNNKlH'*'*HXWHMXMt*t*mO)030)i003XXHXXNHNO)HttMN ? r O)OrH(NeOT)<irS^l>00010'^C<lCO^vO»C«COOiOTHff-ICO-*lf5(©l>C30C10iHNCO-#»ntDI>QOOlOr Ji<iOmOiOiQifflifliaiOiO(0(OtD®ffi!0©CD(D©l>t"[^M^t'NNI>t-XCOXXXX»WXXOlC Addenda. 231 •■#01 !>■ to o HON«5« NNlNNiN(MiMHHHHHHHHNMMM(NW'*iOO COCOCOCOCOCOCOeO C0OSrHCMrH©^C0CM^C0-'#THOliN®U3T#C0CQCMrHrHrH "-"■ J ^ N t-rHI>eO©I>ir5^COCMrHt-HpH CM N rH rH rH OCOWHHiOMO) OCJOlCftCOOOCOC- ■_■_._■.■ ~ r~. ~ ~ c. „ (M«HO«N*OJ tj (n^hihooooo COCOCOCOCOCOCOCO PQ t>-*tt<TtllMeMrHrHrHr (L, K CO ■* O <D JN CO CB rH 1) c r " 1 CO ^ l-C WMTdlOlONOW M !OtH(MOCO«DincO > CMCMCMCMrHrHrHrH cMcccoTj(-*»ncor- aHNweccoci-'jNO-toafflHc MOO«OMO(MOaDMa'jilNHH WMOMWHHHH COCMCRCMt-irtiOCO _j co cm o3 cm r- m m co _55 > cm m co <n m 0i cc i*- ^H ' fVI /vl /Vl n-\ /*n a^ *aj -^ (NNNMMW^'* COrHOCM-^COeOCO IOOOHCOHNCO CO CM CM r-t iH HM©(N01C0OH t> OOOONKIOCOO f_j HHrtHHN «ON!0©*tH00«lO^O00©i0HOC0O(NO(»Tfi-*00(0(DN0> CO CO CM rH rH iH a e H00H-T)*T((OtD(DMMHOH'* WOHNMNOMQOtDNOiOH «MQ0tJ<HCBMOM(MHH CM (M i-I rH i-H fOOMOiflOHHHlOOOOOSDiOOOWODiOHttlHinO^r «MM>HHlOCO^r-MOI>«DlO'*THcOCOMUJNHH HISMOaUD^MINHHH CM rH rH rH . OrHOMCOTjfm<Ot> 232 Addenda. XWH'*O>00GHNlO'*«JCDNl>NffiH"*WC0Tl((£iO'*m«©C0Th<0N'!ti(NHe'3iONNNC0'*NN C'lN M^^NOJW^NOMCOWNO^CONCO^OJOTIBCOX^HOOCJifiiO^OOOHCNOJWOJHlO O . HHHHHHHHHHHHW(N(NIMMMCCM OHOifliOOOlWifliaOO^OiOCO^Omt^CKM-iHiflMOOtOffiNaHIMHNiOOHmMNOHmcO I HN«JH'«HCD»HlMt»®NO!(NN(nOl©'*'*'*W(DNt-'*NMOmcO(MHfflNO'!l<!OCO!000 uHO>t»^ , aiCO'<l(TjllMNO)a©0(NO©COMin010QOMlflTj(ONH«OCDHNr-0«lWHH»NMO «^THOOU5iHCO^OOfHOi^Or^ift03(NOOOr-lCQ«OI--OiOrH(NNWiH001t>'*i-lt>C0001Ni«t^O)OJ t-^OOClOlfllO^^lMMNlNHOOOJCOCOIXDlO^^WNHOOCOiOiO^MHOCOMOWHffl OJ030ia>OaJ010aOiOJO>0)0>0)C1010iMOOOOOOC3000C50000DOOXCOt-t»l>t-t^C^I>-I>cOCO»m(D»a X CO CM CO t» CO w-* ^ *-v*b iv\ rt^ Soj©WrlOaiffiOOHrlN(>lHCS«DWCOW»OtO®^rlNrlM'*'#NffliflOi , N05fflC6XO^O!0 H«'«®COOHeO©a)ON'*©a005Hei3THCDNQOCSOHHW«WNWHHH00300l><OlOTKn(MO CiHO)OfflNeOm®<M®NU!0)MNOH«INWffimi»««»<MOXC*0'*WmMNHHH HHOroNOWJIOiJimmNiMHHrlH O^^iflNWN^OW^DMiOiniNiOOOtOTliMTKffifflMHroO^HNCONHOUlNNrlWfflHlyM- hnNNWWM'*intOO<D(DU3'#NO)inrltOOWU5tDt'NiOTflHCO^ffl^ai9:!DClNiflN01HN* HHnHHHHHHHNWlNOlNNNMNNMNCqNNWNWNNNINNNMHHHHHH O»ONO00C000t-00ffiW©t0«©'*W0>ON'<*00OHH^CSHfflNCCIMMOHOMNC0<DI , -O110' !ilM®H!OOmOlQOiOOU50U5ai'<)lXCTNO'#NN'*NO)HTflOt»ffiOHWWINNHOO!N(fln r Oin®©Nt*Q0(10C903OOHHHNINMW'*'*'*>nin»0l0«D(05D®©Nt-M>NNf.|>©©<D<0 i-lrHrHr-liHr^rHrHr-lrHiHrHrHr-tiHiHr-irHiH^HiHiHrHrHl-l'HiHiHHrHHHH M» (5 O H M <•! M (NUMNCOCOCOCOW J l»m«M00r*Mrl'00OWHr-^OWHHlC«0M00"#O«O00NW«Di0'*T|(N«©^aM^«l0C0 ]®O(MlOailN©O^0inC0(MI>HtDiHffiOUSH©WHOQC0>H!D01HIMinO00N00OW(0aHH 3HHNNCT«M't'*«l'lOin!DiONNCOCOO)roOOH(NN«l'>tiOI>OH«HOOOO«(OOMttiffl«0 HHHHHHrtHHHWWIMINCOMW'mthTlltl'iftW ^MHW«©00aO»N(»N^HN00^^'*HNM®CSi©WWHWNlNNHN'i*©l>MCS«t'H HtOON'*r-OMNN«DNIN«10N'*Hffir-lQ'*COlNHHWTjlNH(0'l(M'HC9(OinNO]OOlMmO i HHHHN(MKIM(0"*'*m©(0SXX0)OHN«l-!)lin«0NC0OHMiOD-QNiO00(Mt0O'*C0n r HrHrtiH^T-(rHrHiH0q(N(N)C<lC>JCNIC0C0C0-^-*»Oin»0<0 HHHNMMn"*U1CDt»«ai-lN ^co©OHl^^co■*ln©NCOO)OHl^lM'*lntD^oooOHMm•*lnffl^ootsOHMM•fln!0^coc6g Addenda. 233 rf. rTL ~-f — * ,V _J i -. -—-. aa i . j . * T -— ' a J •> * _ « —J .«! -J ^^ - " . HHHHHHHHHHHHlNWWWNMWCOW^'flOCNaiO t-l C600NTHH(»^aiT|(a^OWHOT«WCQWCCWO)rHOCO©NWH^SSHOoS3NH2 MO«HffitD-*HC»«^OqaiN-*NO»«OTH(MOffiCO<OW^«coS2rtH NffiNONNjiNOX^'d'NCOONNtOHinMNttl^OOmwcOINrllOHmHODMNO^COOtC*] 5NgC0iflNO<0OOWffifflf.lMONWO*N<0(NNN00tDONSSSSMHNHH§SSoH N«MN«WNN(NINMIMN«NWNNHHHHHHH ^uj-vcrsc^CMrHrHr-l r-oweqcovoiflaDMrHOO'*NWN<0'*WNH^orao'* | Ocoa)OmQO'*oaioMNriH WmHOHWlONOOTOOJeOMNMMQOTHHOOOMOmMO^MWlMHHrt «-C t-l r-t rH rH rH HO!OOOaiNOOmcO(>]^^^M^T)<W©OiONNNMiOCD©OWU100«0'*(OH»^'*eeftN HWMiOttWNlMNrHiOOTXQONlOO^OOnNMMNNMOaiNCitOWHaMOTKCCNrt tOlOiflOTiltfiMWNNHHOfflOJCOCONOOWWTli^MCgiMiMiMHHHH i-liHi-ir-fr-lr-li-lr-lr-fr-lr-lt-fiH '*00N» , *INO(DOWlO»nTflNat0HNN(DH!0OtCCJ00»0mHHr'W'*!DO , *01l0H00>0»H MWWMMWCOWMMMMCOMWWNNNNHHHHHHH 01^HO'!liOOt-HO'*mNMNM<CCiOiO(B(Nnai«a)©(OOOMOO)Hin(MOONtOHN'*H C0HMVfi©NXMt*©T|iNON'*HN»O<DNOWIMffl©'*<NffiC0(0i0'*MWHH intD(0««D<D©CO©tDtO'D<0>OW>n'l'^^MP!NNWrtrtHH lOMCO^eCHHMO^COOKn'flOIDinrt^THOODNOONOOWHHO^WMOOHNHiH OMfl'l'»flNOmr-oMeomaHffinN©»xwwcDN!D'*M'<*fflON<ococoHHMicO'*rt TltiONOJHMCMOWiftNfflOWNMMNHfflMONOltOWON^iMaiMO'JiniMH HMW^iOffiN[/3C5OT-INM'*W«DNX0!OHW«-*lfltDNa0>OHM«5'*l0ONC0ffiOHejn2 ©50500©{2C3a!DCNNt-r-NNt-NNWXQOCOWXXXffiXOiffifflfflO>QOiro05o)00000 ADDENDA II In order to show a rapid application of frequency curve methods to the graduation of mortality tables when the number of lives exposed to risk at various ages is known, the following data, relating to appli- cants who had been rejected for life assurance on account of impaired health, by Scandinavian assur- ance companies is instructive. The original stati- stics as collected by a committee of the insurance companies were first published in the quinquennial report (1910—1915) of the Danish Government life Assurance Institution (The Statsanstalt) for 1917. The material related to Scandinavian and Finnish applicants who previously to 1893 (and in the case of two Danish companies before 1899) had been re- jected for life assurance. By a special investigation, the committee followed up these rejections and sought to establish whether the applicants were alive at July 1, 1899, or were previously deceased. Detailed re- ports for the full period during which the risks were under observation were available for 8,208 individual applicants. For 2,023 applicants complete data were not available. The final statistical results of the Statsanstalt's in- vestigation are shown in the following summary table: Addenda. 035 TABLE I. Mortuary Experience of Rejected Risks of navian Life Companies. Attained No. Exposed Number Age to Risk of Deaths 15-19 434 6 20-24 3,831 28 25-29 11,405 145 30-34 17,644 233 35-39 19,442 318 40-44 17,600 324 45-49 13,971 296 50-54 10,179 295 55-59 6,640 264 60-64 3,927 194 65-69 1,995 96 70-74 836 71 75-79 306 32 80-84 98 20 85-89 12 3 The exposed to risk by separate ages and the correlated deaths are shown in Table II in Columns 2 and 3, from which we, without difficult}', obtain the crude or ungraduated mortality rates, as shown Column 4. We next assume a purely hypothetical frequency distribution of the exposed to risk, according to age, represented by a Laplacean normal probability curve with its mean or origin at age fifty and a dispersion equal to 12.5 years, as shown in Column 5. The fre- quency distribution of the number of deaths on the basis of the ungraduated mortality rates in Column 4 236 Addenda. and the above-mentioned normal probability curve is shown in Column 6, which may be considered as an ungraduated compound frequency curve. * Arranged in quinquennial age intervals this latter frequency distribution is shown in the following sum- man,- table: Ages No. of Deaths 13-17 51 18-22 75 23-27 329 28-32 711 33-37 1,464 38-42 2,498 43-47 3,649 48-52 5,377 53-57 6,238 58-62 6,232 63-67 5,254 68-72 3,605 1 73-77 2,536 78-82 1,425 83-87 1,169 88-92 351 93 or over 95 Total . . . 41,059 The above frequency distribution is now subjected to a graduation by means of the Laplacean — Charlier or Gram — Charlier frequency function. The mathe- matical calculations give the following parameters: 1 A slight adjustment was made in the figures in column (6) corres- ponding to age 70, and in the age groups above the age oi 88. Addenda. 237 Mean Age 57.75 years Dispersion 13.32 years Skewness —0.0031 Excess —0.0037 Applying these parameters to standard probability tables we obtain the usual Laplacean — Charlier fre- quency curve. Distributing the 41,059 individual deaths according to this frequency curve we obtain column (7) which is the graduated death curve cor- responding to the hypothetical exposure as- given by column (5). The final mortality rates per 1,000 of exposed to risk are then found by dividing (7) with (5) and are shown in column (8). In order to show how close the graduation by means of frequency curves agrees with the actual observations, I have made a calculation of the " actual" to the " expected" deaths by quinquennial age intervals as shown in the following table: TABLE III. Comparison between "Actual" and "Expected" Deaths on the Basis of the Graduated Mortality Rates of the Scandinavian Mortality Table for Rejected Lives No. Exposed A S es to Kisk 15-19 434 20-24 3,831 25-29 11,4-05 30-34 17,644 35-39 19,442 40-44 17,600 Actual Expectec Deaths Deaths 6 3.4 28 37.6 145 133.4 233 242.2 318 314.3 324 336.8 238 Addenda. Ages No. Exposed Actual to Risk Deaths Expected Deaths 45-49 13,971 296 321.8 50-54 10,179 295 287.2 55-59 6,640 264 234.8 60-64 3,927 194 178.6 65-69 1,995 96 119.5 70-74 836 71 67.4 75-79 306 32 33.8 80-84 98 20 15.1 85-89 12 3 2.5 Total 108,320 2,325 2,328.4 Considering the somewhat meager experience on which the graduation was based, I think it must be admitted that the method of frequency curves comes surprisingly close to the actual facts. In this connec- tion it is of interest to note that the actuaries of the Danish Statsanstalt made a graduation of the above data on the basis of Makeham's method and obtained from least square methods the following values for the constants. x A = 0.006 log B = 7.0566 — 10 log C = 0.025 The " expected" deaths according to this latter graduation, and on the basis of the above experience, amount in total to 2,317 as against 2,325 "actual" deaths and 2,328 " expected" deaths according to the frequency curve method. "Viewed from the stand- 1 See formula (6) page 192 of Institute of Actuaries Text Book. Life Contingencies by E. E. Spurgeon, London, 1922. Addenda. 239 point of the principle of least squares it is also found that the sum of the squares of the deviations is smal- ler under the frequency curve method than under the method of Makeham, which seems to be pretty good evidence of the soundness of the method in spite of the fact that I throughout have worked with un- weighted observations. If properly chosen weights were applied to the observations even closer results could be obtained. TABLE II. Mortality Experience of Rejected Scandinavian Risks (Male). (5) (6) (7) G t d, du&t 6 d f\ \ (2) (3) (3) : (2) Hypo- (5) X (4) (8) (1) Exposed No. ol thetical Crude Death (7) : (5) ige to Risk Deaths Expo- Death Curve lOOOqx sure Curve 15 11 0.00000 792 5.6 7.07 16 31 1 0.03226 987 32 7.1 7.07 17 64 1 0.01562 1223 19 9.2 7.52 18 121 0.00000 1506 11.7 7.77 19 207 4 0.01932 1842 3 15.4 8.36 20 340 1 0.00294 2239 7 19.7 8.80 21 501 1 0.00200 2705 5 25.0 9.24 22 719 6 0.00834 3246 27 30.8 9.49 23 982 6 0.00611 3871 24 38.8 10.02 24 1289 14 0.01086 4586 50 47.8 10.42 25 1619 22 0.01359 5399 73 58.2 10.78 26 1986 23 0.01158 6316 73 70.6 11.18 27 2287 34 0.01487 7341 109 85.0 11.58 28 2597 29 0.01117 8478 95 101.7 12.00 29 2916 37 0.01269 9728 123 120.5 12.39 30 3180 38 0.01195 11092 133 142.0 12.80 31 3395 50 0.01473 12566 185 166.4 13.24 32 3564 44 0.01235 14146 175 193.5 13.68 33 3700 46 0.01243 15822 197 223.4 14.12 34 3806 55 0.01445 17585 254 257.0 14.61 35 3882 48 0.01236 19419 240 293.3 15.10 36 3943 64 0.01623 21307 346 332.8 15.62 37 3921 72 0.01836 23230 427 375.3 16.16 38 3880 66 0.01701 25164 428 420.0 16.69 39 3816 68 0.01782 27086 483 467.7 17.27 40 3737 66 0.01766 28969 512 517.6 17.87 41 3637 63 0.01732 30785 533 566.9 18.41 240 Addenda. (i) Age (2)i Exposed (3) No. of (*) (3) : (2) (5) Hypo- thetical (6) (5) * (4) Crude (7) Graduated Death Curve (8) (7) : (5) to Risk Deaths Expo- Death 1000 qx sure Curve 42 3539 59 0.01667 32506 542 623.3 19.17 43 3426 62 0.01810 34105 617 678.2 19.89 44 3261 74 0.02269 35553 807 732.7 20.61 45 3079 67 0.02176 36827 801 787.8 21.39 46 2941 61 0.02074 37903 786 842.4 22.23 47 2793 46 0.01647 38762 638 895.1 22.97 48 2653 61 0.02299 39387 906 945.9 24.02 49 2505 61 0.02435 39767 968 994.3 25.00 50 2348 61 0.02598 39894 1036 1039.0 26.04 51 2184 65 0.02976 39767 1183 1079.9 27.16 52 2024 66 0.03261 39387 1284 1116.0 28.33 53 1882 59 0.03135 38762 1215 1147.4 29.53 54 1741 44 0.02527 37903 958 1173.3 30.96 55 1610 62 0.03851 36827 1418 1193.0 32.39 56 1447 60 0.04147 35553 1474, 1206.9 33.95 57 1308 45 0.03440 34105 1173 1214.3 35.60 68 1189 47 0.03953 32506 1285 1214.9 37.37 59 1086 50 0.04604 30785 1417 1209.0 39.27 60 966 44 0.04555 28969 1320 1197.0 41.32 61 871 35 0.04019 27186 1089 1178.8 43.52 62 786 35 0.04453 25164 1121 1154.2 45.87 63 701 44 0.06277 23230 1458 1124.6 48.41 64 603 36 0.05970 21307 1272 1090.1 51.16 65 518 22 0.04247 19419 825 1050.7 54.11 66 453 24 0.05298 17585 932 1006.3 57.22 67 392 19 0.04847 15822 767 960.1 60.68 68 340 16 0.04706 14146 666 909.6 64.30 69 291 15 0.05155 12566 648 858.4 68.31 70 244 25 0.10246 11092 1136 804.2 72.50 71 193 17 0.08808 9728 857 750.9 77.19 72 158 13 0.08228 8478 698 695.7 82.06 73 132 9 0.06818 7341 501 642.4 87.51 74 109 7 0.06422 6316 406 589.1 93.27 75 91 8 0.08791 5399 475 537.7 99.59 76 74 10 0.13514 4586 620 486.8 106.15 77 58 8 0.13793 3871 534 440.3 113.74 78 45 4 0.08889 3246 289 393.8 121.32 79 37 2 0.05405 2705 146 351.9 130.09 80 31 5 0.16129 2239 361 311.8 139.26 81 24 6 0.25000 1842 461 274.5 149.02 82 18 2 0.11112 1506 168 241.6 160.42 83 15 4 0.26667 1223 326 209.5 171.30 84 9 3 0.33334 987 329 181.5 183.89 85 6 2 0.33334 792 264 155.9 196.84 86 3 0.00000 631 000 133.4 211.41 87 2 1 0.50000 499 250 113.4 227.26 88 2 1 0.50000 393 197 95.5 243.00 89 0.5 0.50000 307 154 79.2 257.98 Note: — The observations above age 87 are not reliable. TABLE OF CONTENTS CHAPTER I. Introduction to the Theory of Frequency Curves. Page 1. Introduction 1 2. Frequency Distributions 6 3. Property of Parameters 8 4. Parameters as Symmetric Functions 11 5. Thiele's Semi-Invariants 12 6. Fourier's Integrals 16 7. Solution by Integral Equations 19 8. First Approximation 21 9. Hermite's Polynomials 26 10. Gram's Series 33 11. Co-efficients and Semi-Invariants 41 12. Linear Transformation 45 13. Charlier's Scheme of Computing 47 14. Observed and Theoretical Values 51 15. Principle of Least Squares 53 16. Gauss' Normal Equations 57 17. Application of Methods 60 18. Transformation of Variate 69 19. General Theory of Transformation 70 20. Logarithmic Transformation 72 21. The Mathematical Zero 75 22. Logarithmically Transformed Frequency Curves. 77 23. Parameters Determined by Least Squares 82 24. Application to Graduation of a Mortality Table. 84 , 25. Biological Interpretation 90 26. Poisson's Probability Function 94 27. Poisson — Charlier Curves 95 28. Numerical Examples 99 29. Transformation of Variate 101 CHAPTER II. The Human Death Curve. 1. Introductory Remarks 105 2. Empirical and Inductive Methods 108 3. General Properties of Death Curves Ill 4. Relation of Frequency Curves 116 5. Death Curves as Compound Curves 121 Page 6. Mathematical Properties 124 7. Observation Equations 127 8. Classification of Causes of Death 131 9. Outline of Computing Scheme 138 10. Goodness of Fit 155 11. Massachusetts Life Table 159 12. American Locomotive Engineers 168 12a, Additional Mortality Tables 172 13. Criticism and Summary 182 14. Additional Remarks 186 15. Another Application of Method 195 16. Graduation of dx. Column 203 17. Comparisons of Methods 209 ADDENDA I. Mortality Tables for: Japanese Assured Males 218 Metropolitan White Males 219 American Coal Miners 221 American Locomotive Engineers ; . . . 224 Massachusetts Males 1914—1916 226 Michigan Males 229 Massachusetts Males (Series B) 231 ADDENDA II. Mortality Experiences of Rejected Risks of Scan- dinavian Life Companies 234 INDEX Archimedes, 4. Biological Interpretation of mortality, 90 — 91. Bi-orthogonal functions, 28. 30, 32. Broggi, U., 53. Bruhns, 72. Brunt, 131. Cauchy, Theorem of, 17. Causality, Law of. 110. 117. Charlier, 1, 2, 17, 19, 48, 51. 98. 122. Charlier's A type series, 83. B type series, 96. Scheme of Com- putation, 47. Charlier — Gram series, 60, 64. 90, 93, 95. Charlier — Laplace series, 53, 70, 116. 122. 123. 206. Charlier — Poisson series, 93 96, 99. 122, 123. 206. Coal Miners, American, 172, 174. Component frequency cur- ves. Mathematical pro- perties of, 124. Comte, August, 105. Crum, F. S., 107. Davenport, 102. Davis, M., 60, 61. da Vinci, Leonardo, 5. Death curve, general pro- perties of. 111 — 115. Death curve as a com- pound curve, 121. de Vries, 60. 63, 100. Dispersion, or Standard De- viation, 40, 4 5. Eccentricity, 98. Edg-eworth. 122. Empirical and Inductive Method, 109. 110. Error Laws of precision measurements, 53. Euclid, 190. Eulerean relation for com- plex quantities, 22. Exposed to risk. 105. 106. Fechner, 72. Fourier, function, conju- gated. 19, 21. Fourier, integrals, 16. Fourier, integral theorem 17. Fourier series, 32. Fredholm, 4, 69. Fredholm determinants, 4. 5. Fredholm integral equa- tions, 32. Frequency Distribution, de- finition of, 6. Gamma function, 103. Gauss, 56, 147. Gaussian algorithms of suc- cessive elimination, 57. Gaussian curve, 119. Gaussian solution of nor- mal equations. 57. Geigrer, 99 — 100. Glover. 187, 188. Gram, 4, 5, 122. Gram series, 33, 41, 53. 70, 83. Gram-Charlier series, 60, 62. 64. 90, 93, 95. Guldberg. 122. Heiberg, 4. Henderson, 121. Hermite polynominals, 27, 33. 36, 38. 69. Hoffman, F. L., 174. Homogeneous Sum Pro- ducts, 56. Homograde Statistical Se- ries, 94. Horner. 189. Integral Calculus. Founda- tion of, 4. Integral equations, 4, 5. Frequency curve as so- lution of, 19. Fourier's 17. Fredholmian. 32. Japanese Assured Males, 176, 182. Jevons, 93, 110, 139. Jorgensen, 27, 51, 63, 70. 72, 102. 103. 122. 123. King-, 199. Laplace, 1, 69. Laplacean probability func- tion, 24, 73, 77, 95, 96, Laplacean Normal frequen- cy curve, 24. 26. 71, 81, 90. Laplacean-Charlier series, 53. 70, 116, 122, 123. 206. Least squares, principles of the methods of. 53, 57. Least squares. Parameters determined by, 82, 85. Lexis, 119, 120. Little. J. P., 182. Locomotive Engineers, American, 168, 171. Lowell Institute, 91. MacLaurin Series. 2. Massachusetts Males, 128, 159 — 167, 195 — 209. Mathematical Zero, 75 — 76, 87. Metropolitan Life Insurance Co., 174 — 176. Michigan Males. 131 — 158. Modulus, 98. Moir. 170 — 171. Moments, of a frequency function, 38, 47, 73. Sheppard corrections for adjusted moments, 80. Mortality, biological inter- pretation of, 90 — 93. Mortality Tables: American-Canadian, 84, 86. American Coal Miners, 172 — 174. American Locomotive Engineers, 168 — 171. Massachusetts Males, 159 — 167, 195 — 209. Metropolitan Life Ins., Co., 174 — 176. Michig-an Males, 131 — 158. -Japanese Assured Males 176 — 182. Myller-Lebedeff, Vera, 32. Newton. 189. "Normalalter", 119. Normal equations, 56 — 59, 67, Gauss solution of, 57. Novalis. 199. Nucleus of an equation. 19 Observation equations, 54. 127 — 131. Orthogonal functions. 5, 36. Orthog-onal substitution, 69. Parameters, determined by least squares, 82, 83'. Parameters, .properties of, 8 — 10. Parameters viewed as sym- metric functions, 11. Pearl, Raymond,, 91. Pearson. Karl, 1, 2, 25, 38, 60, 90, 121, .192. Percentag-e frequency dis- tribution. 125—126. Poincare, 111. Poisson, Exponential Bino- mial Limit, 95 — 97. Poisson, Probability func- tion, 94, 98. 101, 103. Poisson-Charlier series, 93, 96, 99, 122, 123. 206. Power Sums, 11, 47. 73. Probability function, oTefl- nition of, 6. Reduction equations, 67. Relative frequency func- tion, 6. Rutherford, 99, 100. Semi-invariants. definition of, 12. Computation of, 4 6 — 4 8. General properties of, 16. Sheppard corrections for adjusted moments, 80. Standard deviation, 40. Statistical series, homo- grade, 94. Sum products, homogene- ous, 56. Symmetric functions, 38. Parameters viewed as, 11. Taylor series, 2, 3. Thiele, 1, 12, 19, 38. 61, 69. 72, 122. Thompson, John S„ 182. Transformation, General theory of, 70. Linear. 4 5, 6 2, 101. Logarithmic. 72, 74, 77, 82, 87. Of variates. 101 — 104. Westere-aard, 119. Wicksell, 72. Yano, T„ 180.