aS thence nee tenn vn tat tr Soa! eS Se es IE kane tet apd eater’ bel te! a oe ee at Gn then me et A NY ee Nt OE eS Tacha ip i aE DAI OMe Sorta tl oh Ay BA a eh trl ls FTE it AB SP man cee oa oat bee oe z ox oactaaen toe hasten ffi sh onthe trwthos i Wh ny. ates} vps BLOMETRIKA A JOURNAL FOR THE STATISTICAL STUDY OF BIOLOGICAL PROBLEMS FOUNDED BY W. F. R. WELDON, FRANCIS GALTON ann KARL PEARSON EDITED BY KARL PEARSON VOLUME XIl NOVEMBER 1918 TO DECEMBER 1919 4 YW a » BUS SHO} “A LE Lo a yeenian om Z CAMBRIDGE AT THE UNIVERSITY PRESS Na ther, a\ Musewl a LONDON: FETTER LANE, E.C. 4 (C. F. CLAY, Manacer) anp H. K. LEWIS AND CO., Lrp., 136, GOWER STREET, W.C.1 WILLIAM- WESLEY AND SON, 28, ESSEX STREET, STRAND, W.C. 2 CHICAGO; UNIVERSITY OF CHICAGO PRESS ’ BOMBAY, CALCUTTA AND MADRAS: MACMILLAN AND CO., LIMITED TORONTO: J. M. DENT AND SONS, LIMITED TOKYO: THE MARUZEN-KABUSHIKI-KAISHA [All rights reserved] Il. III. IV. We VI. NIT. WAOUE XIII. CONTENTS OF VOLUME XII. Memovrs. On the Standard Deviations of adjusted and interpolated Values of an observed Polynomial Function and its Constants and the Guidance they give towards a proper Choice of the Distribution of Observations. By KirsTINE SMITH On the Product-Moments of Various Orders of the Normal Corre- lation Surface of two Vaniates. By Kart PEARSON and A. W. YOUNG : 3 The Correlation Coefficient of a Polychoric Table. By A. RITcHIE- Scorr On a Formula for the Product-Moment Coefficient of any Order of a Normal Frequency Distribution in any number of Variables. By L. Isseruis On the Mathematical Expectation of the Moments of Frequency Distributions. Part 1 By AL, A. TCHOUPROFF On the Mathematical Expectation of the Moments of Frequency Distributions. Part I]. By AL. A. TCHOUPROFF An Explanation of Deviations from Poisson’s Law in Practice. By STUDENT The Criterion of Goodness of Fit of Psychophysical Curves. By GopFREY H. THOMSON ‘ : : On Corrections for the Moment-Coefficients of Limited Range Frequency Distributions when there are finite or infinite Ordinates and any Slopes at the Terminals of the Range. By ELEANOR PAIRMAN and KARL PEARSON : Peccavimus ! Editorial Quadrature Coefficients for Sheppard’s Formula (c). Calculated by P. F. Everitt : : ; ; . On generalised T'chebychetf Theorems in the Mathematical Theory of Statistics. By Kari PEARSON CHARLES B, GorinG, 1870—1919. Obituary Notice and Appre- ciations ; : ; PAGE 86 93 134 140 iv Contents XIV. On the Nest and Eggs of the Common Tern (S. fluviatilis). A Second Cooperative Sick. XV. On the Degree of Perfection of Hierarchical Order among Corre- lation Coefficients. By Goprrey H. THOMSON Miscellanea. (1) Preliminary Note on the Association of Steadiness and Rapidity of Hand with Artistic Capacity. By M. L. TILDESLEY (11) Sur les Moments de la Fonction de Corrélation normale de» Variables. Par SVERKER BERGSTROM (ui) Formulae for determining the Mean Values of Products of Deviations of mixed Moment Coefficients in two to eight Variables in Samples taken from a limited Population. By L. Isseruis . (iv) Inheritance of Psychical Characters. By Karu PEARSON . (v) Variation and Distribution of Leaves in Sassafras. By N. M. Grier (vi) Life-History Albums. By ErHen M. ELpERTON : : : (vi) The Check to the Fall in the Phthisis Death-rate since the Discovery of the Tubercle Bacillus and the Adoption of Modern Treatment. By Karu Pearson Plates. Plate I. CHARLES BUCKMAN GorING, M.D, Froma sketch by R. Arar : . = , to yideex: Plate IL. Nests of the Common Tern, indicating the range of Material used and the character of the Environment f Plate IIT. Eggs of the Common Tern painted by WILLIAM Rowan » Plate IV. Common Tern alighting and the Four-Egg Clutch - Plate V. Common Tern arranging her eggs and settling down to sit i: Plate VI. Common Tern sitting. The Bird angry : Plate VII. Comparison of Association in Intelligence of Pairs of Siblings as determined by Broad Categories and Binet-Simon Tests. Folding Diagram Plate VIII. The Check to the Fall in the Phthisis Death-rate. Folding Diagram 4 : ; ; P ; x PAGE 170 374 370 375 Vol. XII. Parts I and II November, 1918 BIOMETRIKA A JOURNAL FOR THE STATISTICAL STUDY OF BIOLOGICAL PROBLEMS < FOUNDED BY ane _ _-W. F. R. WELDON, FRANCIS GALTON anp KARL PEARSON EDITED BY KARL PEARSON ian. !nst an lug; > | % (8 JAN 131919 Ny tieng| Muses 2 RAE et OR ‘GAMBRIDGE UNIVERSITY PRESS C. F. CLAY, Manacrr LONDON: FETTER LANE, E.C.4 s also H. K, LEWIS AND 00., LTD., 136, GOWER STREET, LONDON, w.(. 1 WILLIAM WESLEY AND SON, 28, ESSEX STREET, LONDON, W.Cc.2 t ; CHICAGO; UNIVERSITY OF CHICAGO PRESS — BOMBAY, CALCUTTA AND MADRAS! MACMILLAN AND CO., LIMITED TORONTO: J. M. DENT AND SONS, LIMITED TOKYO: THE MARUZEN-KABUSHIEI-KAISHA Price Twenty Shillings - [Issued November 28, 1918] — - JOURNAL OF ANATOMY — (ORIGINALLY THE JOURNAL OF ANATOMY AND PHYSIOLOGY) CONDUCTED, ON BEHALF OF THE ANATOMICAL SOCIETY OF ; GREAT BRITAIN AND IRELAND, BY ‘ ‘ Professor THOMAS H. BRYCE, University of Glasgow an Professor EDWARD FAWCETT, University of Bristol dase Professor J. P. HILL, University of London Professor G. ELLIOT SMITH, University of Manchester Professor ARTHUR KEITH, Royal College of Surgeons VOL. LIII ANNUAL SUBSCRIPTION 30/- POST FREE CONTENTS OF PART I—OCTOBER, 1918 j Epwin 8. Goopricz, F.R.S. On the Develop- | Ratpx Taompson, Ch.M., F.R.0.S. The Male ment of the Pericardiaco- Peritoneal Canals 4 in Urinary Meatus. ya. y Selachians. : 3H : Joan S. B. Stoprorp, M.D. The Variation in| Gwynyere Bucranan, D.Sc. (Melb.) and Exia- Distribution of the Cutaneous Nerves of the BetH A. Frasor, D.Sc. (Lond.). Hdited with | Hand and Digits. an, introductory note by Prof. J.P, Hira, Miss 0. G. Fisk and Miss O. G. Potter. Note on D‘Se., F.R.S. The Development of the Uro-~ a Case of Defective Development of the Dia- genital System in the Marsupialia, with special phragm, accompanied by Stenosis of the Anal reference to Trichosurus vulpecula. Canal. F ¢ CAMBRIDGE UNIVERSITY PRES Fetter Lane, London, E.C. 4: C. F. Clay, Manager NOW READY JOURNAL OF THE ROYAL ANTHROPOLOGICAL INSTITUTE OF GREAT BRITAIN AND IRELAND Vol. XLVIII. January—June, 1918 c aces CONTENTS Minutes of the Annual General Meeting, January 29th. Presidential Address. PrimitiveArt and its Modern Developments. (With Plate I.) Tattooing in South-Eastern New Guinea. (With Plates II-XV.) By Capt. F. R. Barton. A Sketch of the Anthropology of Italy. By V. Grurreipa-RuGGERI. Studies in Primitive Looms. Part IV. (Conciusion.) ' By H. Line Rota. No Paternity. By CarvetH “READ. 5 jeaanrecd eae be and our Older Histories. By H. J. Fiuvurn and Miss L. WINSTANLEY, 178 PAGES, WITH 15 PLATES AND MANY ILLUSTRATIONS IN THE TEXT. PRICE 15/- NET THE ROYAL ANTHROPOLOGICAL INSTITUTE OF GREAT BRITAIN AND IRELAND | 50, Great Russell Street, London, W.C. 1 General Agent: —FRancis Epwarps, 834, High Street, Marylebone, W. 1 s x \ MAN A MONTHLY RECORD OF ANTHROPOLOGICAL SCIENCE Published under the direction of the Royal Anthropological Institute of Great Britain‘and Ireland. Hach number of MAN consists of.at least 16 Imp. 8vo. pages, with illustrations in the’ text together with one full-page plate; and includes Original Articles, Notes, and Correspondence; Reviews — and Summaries; Reports of Meetings; and Descriptive Notices of the Acquisitions of Museums and Private Collections. Price, 1s. Monthly or 10s. per “Annum prepaid. é ‘ £0 BE OBTAINED FROM THE ; ; ROYAL ANTHROPOLOGICAL INSTITUTE, 50, Great Russell Street,W.C.1 ; AND THROUGH ALL BOOKSELLERS VoLtumME XII NOVEMBER, 1918 Nos. 1 & 2 BIOMETRIKA ON THE STANDARD DEVIATIONS OF ADJUSTED AND INTERPOLATED VALUES OF AN OBSERVED POLY- NOMIAL FUNCTION AND ITS CONSTANTS AND THE GUIDANCE THEY GIVE TOWARDS A PROPER CHOICE OF THE DISTRIBUTION OF OBSERVATIONS. By KIRSTINE SMITH, Copenhagen. CONTENTS PAGE Introduction . 5 : ; 1 I Adjustment of a holynontal fanetion of one Paable: general distribution of observations . 3 II. The“best’’ grouping of observations with cate rant standard domanon 13 III. Uniform continuous distribution of observations with constant standard deviation. General formulae. : 17 IV. Uniform continuous distribution of observations aah Constant standard deviation. Special formulae . : 28 V. Uniform continuous distribution of observations with Paciional observations clustered at the ends of the range; constant standard deviation of observations. General formulae . eo VI. Uniform continuous distribution of observations with additional clusters at the ends of the range; constant standard deviation of observations. Special formulae . é ; f : a (4 VIL. Observations with varying standard deviation : 50 VIII. Best distribution of observations for determining a arin caneeane of the function ‘ : 72 IX. Adjustment with regard to both of eo variates peered: hae a linear relation . : : : : 3 : : : BS 2, INTRODUCTION In all sorts of experiments which are not simple repetitions but have at least one varying essential circumstance or indefinite variate the experimentalist is confronted with a choice in regard to the values of that variate. If the ex- periments be quite simple the question may be without great importance; but when their requirements as to time or expenditure come into account the problem arises, how the observations should be chosen in order that a limited number of them may give the maximum amount of knowledge. It clearly depends upon the relationship between the observed quantity, which we shall name the primary variate, and its essential circumstances, the secondary variates, and upon the variation of the errors of the observations. Biometrika x11 1 | + ay [agM%y + ay Mgt] + by Mato vee + dnMen|}, 9 : 5 Os 2 n or applying (4) or = W (agen ope agate nee FG.) score eae (5). ° Q . 5 FB . . Hence a; is found by elimination of the a’s between (4) and (5), which results in 2 N apt me Oy, OB 1 aie Grea deta Ly. | it Mo. M; WOsee dsoode Mn | | ecm Mm, Ms Wr secs Minty | =O) see (6). Wee © ore © TD Oe thieade Mn nL i Ore ren lei) antec pes | This determinant is of fundamental importance for all the following work and it will be useful at once to examine it more closely. (3) First however it may be pointed out that the standard deviation of any other linear function b = body + ba, + B24, 4+ ...... b,Ay of the constants of the function y may be determined in quite the same way by N 9 o, . 2 by b, b, ee eeee b, ‘ oO bo Me mn, HOES Abe Oe Mn > ter, mM, Ms Mig» sagen Myre =O Ke co code (Qe b, M, Ms WO) auc an Mn+o (1 Mn SM is Neen Mon KIRSTINE SMITH 5 In particular Gas is found from | N | p CO 0 My mM Mo My My ar) my Ms Ms Mosry srs Mrs | | 0 mM, Ms Ma TN gig ae yes (ea revaneneiese (8). ; : | 1 DU ite por «s+ Mian esc. nee 0 Taare Untie oee IN ity wn, Toi (4) Let us call a determinant, identical with that of (6) except that it has 0 instead of the element Ue 2 , A, let A,., be its minor not containing the rth row and sth column, again let A,,y,, be the minor of this not containing the pth row and the gth column of A. We then find from (8) 2 0 Avy, 942, 161 ps TS a TE a (9). With this notation we obtain from (6) peomiGe A Meare) (10). In the following we shall drop the index r and indicate by ,o, the standard deviation of a y adjusted by means of a function of the nth degree. _ If we were dealing with a function of (n — 1)st degree and retained the observa- tions distributed as before we should find Bie oa? ( Anse, nts ) n-1 a ? eee NT fo nta at, 1 and therefore 3 9 o? Ay . [Bere n+2 7 AN Anse, n+2,1,1 nOy — n-19y = N° ars, Aint s Anse, n+2,1,1 but A is orthosymmetrical and therefore the numerator of this fraction equals 2 An+2, 19 and 2 ps 2 Ante,n+211 HiT Ie aM . g De hence A, ,, and A, 49,n+2,1,1 have the same sign, and ,,o,, — n_14, is therefore a square . Ly) i} of a function of x. In the same way we can express ,_,0,, — ,-go, and thus further 2 ; ; ae Oe 01 F ie : down all the differences till yo? = Nema by which means ,o% is developed in a sum 0 of squares and takes the shape 6 Choice in the Distribution of Observations 2 1 mM ™, 2 Vor LM, Ms, 2 a Ona de 4 xz Mm, x? My Ms n0y =H , 1 Mo My M4, | My My | | My My, My My. | 0-}m, my] | Mm, My| | M, My Mz | me Ms, M, LON ON esaeies Tope. |e TE OSE wee Sexdbbe Mn HU, MLL ono ies Mansy oh SM alae Miss | Maas (11). eM bocanme iy ee | #o Tie My Wp cs wee Mn Mae Seen Mnat | . LM epe eanrcrsen LOPS APs. Uae vases: Mon It will be seen that the squared standard deviation of an adjusted y vs a function of : 3 A : the 2nth degree of x. The coefficient of x2” is the square of —**®"*®*+ which, as 1,1 2 * should be multiplied in order to give N g;,,,> 1b is therefore positive and can never vanish. was just seen, is the factor with which (5) If all the m’s with odd indices are zero it is seen from (6) that a, is a function of a. This is, at least in theory, a natural thing to aim at, since our general purpose is to find a curve for o;, giving as nearly as possible a constant value for a, throughout the range. Rearranging the order of rows and columns in (6) we get, when all m,,,; = 0 and n = 2p, N Digs ame 2 4 2p 3 5 2p-1 2pFy +5 il x Wh -o55556 z x x Bk rie bse £ il My Ms Meier Mop 0 0 ORS eee 0 aye mM, Ms, VT. obo gen. 0 OF Rees 0 He M, Me Wr aeebee Toe 0 Opec cer 0 27 — LAN Ns aa ole AIT eee May 0 0 Ones 0 0 z 0 0 OS tae 0 My M4 Mig wnsees Mop a 0 0 Oe 0 M4 Me Meats ae Mon+9 ae 0 0 OW eee 0 Ms Mg More vescece Monn gent 0 0 Osi iesieses 0 Mia Ta OMntVamiiainostets Map» ra from which we find tae KIRSTINE SMITH 1 ae Lire Cee | | LE My Mperesise Mas 2 y ; | Mg M4 TRO ee Mon+2 4 x M4 Mg LPT AN Une re [oe : ; : 3 2 q |_£°P Mey Manta Mopyg vee Map Mo Ms Nie Mon Me M4 Mig Bete Mont Ma Mg Meg acts Moyea ree ese TLE ee eee a, 0 ] ay ane Renee: il aoe | 1 Ms M4 Mg severe Map | ae Ms Me EP Sean Mon+a | | 4 , | z Mg Mg Wy wees SM opea. | Cpe mn m m m 2p 29+2 Dh Ay sislehs) sis 4p—2 My M4 Mee esse Mon | } My Me lp) ec Mons. | | M6 Me My eeeeee Mo n+4 | . . . | : 5 : | Meru istie Mona sade: Mana | eal For a function of the degree 2p — 1 we get the same determinant as in (12) except that it does not contain the row and column in which 2” is found. Hence we find 0 1 ie Gas ine a gen 2 | 1 m Mz Nga hisses Wiseaa- || 9 2 4 29-2 pe My Ma Ue eee Mop | ae Ma Me Migs Meena Moy+9 : = yy . | 2 2p-2 a? { | a? ey ney SUE SUL en Lr | zp-1%, eAT N | ihe DEA eee ano Ms Ma Mae siectsute Men Ma Mg Migr een Mop+2 My p-2 Mon Me n+.2 Pa hoy Map 4 8 Choice in the Distribution of Observations il te CS eae eee ate ; 1 Ms Ma UN etree tica Mop eee Mg Me ORY sas bos pA6 4 x Me Mg My ssc Monet, 2p—-2 a m m Mon iavcccne Mayo + 2? ae aie ait 2. seven sme GLAD) Ms M4 a) boatee Mop i Ma Me Ws Sapte inte Mops2 M¢ Mg Uy ease Mop44 | Mowe aeMamiiot alitoramauensees Miapeo (6) The last two determinant ratios of (13) and (14) are identical, and when the numerator of the first fraction of (13) is indicated by 8 we therefore find 2 eee ( So42n42 8 2p°u 29-1 y N 5 ’ 1,1, p+2, +2 811 or as 6 is orthosymmetrical and therefore Opto 2 D+2, p+2 — 8. 5y, 1, +2, p+2 = Fes 1? 2 2 ue Ge On 431 apOy — 2p-19y = gi N 8,, Ope, p+2,1,1 Comparing eo) and oie e we see that they have the first determinant ratio in common and that when y stands for the numerator of the other fraction of 2p-10, We have 2 Mens ahh Mores if 2p-1%y — 29-205 = N 2 : Y p+1, 9+1,1,1 Y1,1 or again, since y is orthosymmetrical, 2 2 o 29-19% y — ap-9 Fy = 2 2 2 Yo+1,1 N ¥1,1+ Yo41, 94441 The general formula (11) hence for ANY M441 = 0 takes the shape 2 | 2 1 mp | 1 mg 2 2 | 2 Set cen (il Pal x* Me eg LLL Pee: 2pFy = 7 oe SSG ; me Nate Ms My Me | M, Ma, Nigel | Ms . | Ms Mg | | M, Me 1 Me Nght wee MNowas ae Ma Tg tenses Moy | | a6): Barge GPP) Mar m ala onee ert Mana Tip Niner Mon—2 Ms Thon Mencia Ones Ma Mig iecsacanl op Ma Mga sean Mla No UO ES MILLE, sceoe Man—6 Mis n 9 avon no tekeee: Man—2 KIRSTINE SMITH 9 LEG ULES. SERRA LUP a 2 ; | OF Mg Mg severe Moy Gee) AG es UL m ) ze BT i ect A 99 i ae (15), | Mo LOR anette Dee Mo ise sie er ey Ms Tapers, Mey | | Me Mai ci Tati | Baise pce iaie ate. Mapa. |) \eMep Mapes ees May (7) Before leaving the general case and treating special distributions of observations three auxiliary propositions shall be proved. We shall first prove that 2 . 2 ] . 9 is the curve of ,o,, can never be entirely below a , “. . With that purpose ,o,, will 0 be summed over all the places of observation with the weight . i le. for a F(#) aa no, dx, Where % (x) is the number of observations, will be integrated over the range of observations. continuous distribution of observations, the expression Looking first at the numerator of the last term of (11) we find that it can be expanded into MGT weoece 74 | x Wy Or “aoemae Mn—1 (= a, a2 mM lm copoos lea n CE Ge MeN Ga cose nn sy Nicene Maes Van ie va My | Too 5 Kar Nog ice oe ee + | a Tips. “CLS, SO Pee tales Atma, m421f + ETL On vee feces Mon—1 T Now / ay dx integrated over all the observations is what we have called N.m,. When integrating the determinants we therefore find that the first n of them will vanish, two of their columns consisting of proportional elements, whereas the integral of the last determinant is Ules iliy> SEG aoe Mey NO ee Mao. MN, NE ecemnyote A105 Beets uly, ce (== (= 1)” NAG 4: . . | Us UDP ALIS Rea Magn 10 Choice in the Distribution of Observations AS Aj n+4o,n+2,1 = — Anse,nt21,1, the integral of the last term of (11) equals WV. The integration of the other terms, He the first, gives the same result so that 2b (x 2 [no ae ra) de = 0 (n+ 1), ib (x) and as [eo dx = Nino, If (2) ; the mean value of ,,o, calculated in this special way is Noe) INS he It is therefore clear either that ,o, must at all the places of observation be ‘ 1 2 equal to idl SEALE n@, must at some of these places be greater. The first case N° m cannot be realised by a distribution of which any part is continuous, as ,0, is proved to be of the 2nth degree in x. If therefore we could find a distribution consisting of groups of observations for which at all the places of observation ,,o,, was equal z 1 to x : "a, and if further we could choose the places of observation so that ,.o, 0 at all other places within the range of observations was smaller than that value, we should know that no other distribution of observations with that value for my could provide a curve of standard deviation with a lower maximum. If the standard deviation of the observations be constant and equal o, f (x) equals 1, and so does m,. After what we have just proved the maximum of the ,,o;, curve cannot then be lower than a (n+ 1). Now when we choose to distribute our N observations in (n+ 1) equally big groups the adjusted y at each of these (n+ 1) as will be the mean of the observations and its squared standard deviation will be 2 V at +1). Hence our problem is reduced to find out how to arrange a table of (n + 1) values of a function of the nth degree to make the squared standard deviation of any interpolation result inside the range smaller than the squared standard deviation of the values of the table. It will be seen in what follows that this can up to n equal 6—that is so far as the problem here has been investigated—be obtained by one and only one form of grouping. When the standard deviation of the observations varies over the range, mo varies with the different distributions, and we cannot use the same method for finding the best distribution. It even appears that the best distribution has not always its maxima at the places of observation. (8) A second problem which we want to consider here is the condition for two adjusted y’s being uncorrelated. In the beginning of this section it has been shown that the adjusted y, 1 eae er Yo 2 Wv = nv {42 [Og te Vicia Der anie estos + Op By lL KIRSTINE SMITH 11 when AgMo + ay My, + Ao Moy + eeeeee -+- An Mn = 1 Ap + 44M +09M3 + -rreee + OnMnsy = 3 a GpMs + O,M5 + ay7%, + oc... oT EON Ur Minin op ait tine vintseis of (16). Tt Ag Mn + Oy Myyy + AgMnyg + ceeeee + On,Mon = 5 Let y, be another adjusted value, then ening a: 2 n Ys = N S {7 (tp) [vo ae Yip ar Y2%p ae oat Ly Yn&p , where Yom £11 + eM + «es aie aaa Yom + 1M, +Y¥eM3 + --eee VnMny = _ YoMg +YV1Ms LF YoMqg bt eveeee BEN Un hep al cca see biess (17). YoMn + YiMnsza + Y2Mnze T veers + YnMon = 2; Hence the condition that y, and y, are uncorrelated is, since the squared standard deviation of the observed y, equals o* f (x,), E 2 VW 9 es S ann flag Fy By + Ay Ly + reese + On @y] - [Yo + Y1Ep + SYSOP ha rate ire ac + yn] = (i D or S ae ) [ay + ayXp + Cate Spo uvee es + an, ; Dp a {ee 7 aatty + any + ney boven + ant | Dp =F S {ore lag X; ar a, 2, sr pt, See Ea ae ana see p a S {3% [aoa oF ae "* a Gata Cee a ana )} =) q Remembering that S | eat = Nm, and applying the relations (16) this re- duces to : Yo + Vite Ya%n +--+ + Yn%, = 0, from which the y’s are eliminated by (17). 2 n On! oe The ese e sg Ne Oh Mixtecs. Mn | ee 21 Wes eae one re 3 == eee ey rete (18) Oe ip LO Fave iinod: Minis 1 sng MTU maT Ay sec tac Msn is therefore the condition that y, and y, are uncorrelated. 12 Choice in the Distribution of Observations (9) Returning to the formula (11) for o, written as a sum of squares we shall now prove that the (p + 1)st term of this put equal to zero determines a set of p abscissae the adjusted y’s of which are mutually uncorrelated both for a function of the pth and the (p — 1)st degree. The condition for y, and y, corresponding to the arguments x, and z, being uncorrelated is for a function of the (p — 1)st degree 2 p-l 0 1 Ly Gta cession’ Li 1 Hs: Wry Wes aeecec Mp1 Lo Ui) Ae HES soba My 0 2 ae Ly Ma Mg Mg cnt Most =f } Tee 1 Se Nn IN aoe Nine and for the same distribution of observations and for a function of the pth degree the condition is "| ) yO) ll Ly Cesare Li | Ll mM my ULB © ceoanee Mp Puen Wie Mil. o6occ0 Men | 5 = OR CRM Tee TH | (LOR eet Mp0 | : ; ‘ cee | edly = 20900 = 00) iat Meng. ea weenie Mion Putting | My My, HORNY eniSsoc My | My Ms WOES. Poa Mons Mog Me Ma wee eee My+9 i 1D), | "May “Mera Nearer enee Nine these conditions may be written ed ee ? ‘ x {ay oo Dosa prises (19) ( Pp T Ss and DE {Gj bp Dat peat Ow oeaee ss Seu eee (20), 0 where the sums include all combinations of powers with 7 and s lying between 0 and (p—1), and 0 and p respectively. Now we have for an orthosymmetrical determinant A, aN 5 Ay Siam AN 3 aN Scary Ass 0 Age’. If therefore (19) is multiplied by D and subtracted from (20) multiplied by D411, p41 the coefficient of x, . 2, becomes Do41, 041+ Drta,sia — D . Dost, 941, 241.841 = Dy s3, 041+ Dost, st as long as both r and s are smaller than p. KIRSTINE SMITH 13 When one of them, for example s, equals p the term is B® - Dysy, 41+ Dray, ot which is of the same form and this also holds for 7 = s = p when the term is Te TD Pe The total result is thus ‘3 Ur Ss ie . Go Do41, 741 . Dore = 0, ) or in the form of determinants tL Wiley On Hrs, BORaoe Te Ne ie Ue Tt) = ee 0, OE ometicdhOrs pee Cen. Ms ey -eoeree My | | ish Up og mL eae LO eee My Menai IN, Niger: Osa .| Ue Ta ag re Myr = 0. . . . . | | : . | ATOMS 0 Miao «<0. Hip FI A all Ng SLO 0 SOR 7 Re Nain Hence z, and x, must be roots of 1M, - ™, Mgt seeae M1 CaM Ma. ie Mig at. csc: My eM, Ms Ma reeves Mina | SSO)" gosedodoaoasaancoe (21). Ce? thi Uliana WOrgipy cacccs Mo p—4 When z, is found from this and substituted in (19) or (20) we get since the coefficient of x} in the latter is zero an equation of the (p — 1)st degree to deter- mine x. It is therefore clear that any pair of roots of (21) determine a pair of uncorrelated y’s. II. The “best” grouping of observations with constant standard deviation. (1) It was shown in the last section under (7) that the mean of the squared standard deviations of the adjusted y taken over the places of observation and weighted o2 N fore the curve of squared standard deviation can never be entirely below that value. And further, that since (n + 1) equally big groups of observations at the places of observations give the squared standard deviation this minimum, there is the possibility, ,o7, being of the 2nth degree in x, that by placing the groups at special 2 with the number of observations at each place is equal to =, (n+ 1) and that there- positions the curve of squared standard deviation could have those values 7 (n+ 1) as its maxima within the range of observations. Let %1, Uz... Up --. Un4, be the places of observations and Y», the mean of the observations at z,, the interpolation formula of Lagrange is then y= 5 {eo reese + (% = Ln43) 7 (pay) (a — U5) (Gy oe a ae the sum taken over all the places of observation. 14 Choice in the Distribution of Observations From this we find 2 oO (2 2) (CO — eee (4m a5) Ve o,==(n+1 Bie — 2 BTM) ceweennes 22), rH? Ge, 1) Gm) Ga) a! o2 N and the (n + 1)st taking the value 1 as it ought to. If x, be the greatest of the x’s it is hence clear that for x > z,, since wich for? —12,), 625440, 47 equals (G04) (G25) a CSc ee Genes wee ea a = (n+ 1) The same applies to any x smaller than the smallest of the places of observation. 2 Therefore as we want o, to be = = (n + 1) at the ends of the range we have to place two of our groups of observations there. Let us take the half of the range within which it is possible to make observations as the unit of « so that the range goes from — 1 to 1. (2) Hence for a linear function there is no choice left, the two groups of observa- tions must be at — | and 1. According to (22) we have 526 ic +1? (@— v} iS Nes 4 4 > oF or 19, = yw 2tl— 2 (1 — 2h}, which illustrate the well-known fact that by simple interpolation between two equally good values of a table, we obtain interpolated values with less probable error than those of the table. (3) Investigating a function of the second degree we have a third group to place besides the two at — 1 and 1, that is if we do not beforehand suppose the distribution to be symmetrical. Let the third group be at a, then the interpolation gives — 1)\(e—a)= 1 —a)_ Be It Cae ee 5 Viet ee from which ‘ A : 5 2 2 ab Railarre ral cues) >| sa) | We want this to be a maximum for «=a, but (=) can only vanish for a = 0, in which case a, is reduced to 2 oO 20, = wy otl— 22? (1 — 2*)}, (n+ 1), the n terms of the sum being zero KiIrRstiInE SMITH 15 which shows that we have succeeded in making o, a maximum at «=0 and obtained a standard deviation with the maximum value - 3, as we desired. (4) For a function of the third degree we find from four groups of observations at —1, 1, a and y that (@-1)(@-a)(@-y) | @+1)@-a) @-»), =2(1Fa)I+y %*" 2-a—-y a aa ee sien C meee ny ae (I +a) (1+) doa Bi & —1)(@- ay fe SiG = a CS ac), A= \ie=%) The condition (=) =) AX J pa requires 3a% — 2ay —1l= and (=) =) OG Jinay requires By? — 2ay —1=0, from which is got a? = », and, since a 2 y, a= yal, By introducing this value for a? and y? in a, we find : 3 : 52 pay AL ae we rd ah, which has the required maxima at + 4/1. (5) For the functions of higher degree we shall at once assume that the dis- tributions sought are symmetrical, since it is pretty clear from the symmetry of y and o, with regard to the sought positions that it must be so. To determine a function of the fourth degree let us put groups of observations at +1,+aand 0. The expression for o%, can be written down at once and is such that the terms arising from the groups at +1 and —1 can be put together as well as the terms from + a and — a, then re C- (2? — 1)\(x?—a?))2 1 (x(a? —a?))2 ple? 1 es oy | os rt (ame eee ee do’, = ine Eee See ee ; (F yee = 0 provides the condition =a O or a= with which value the squared standard deviation becomes 2 2 10, = = 5 {1 = ce a2 (a? — #)2 (1 — Oe which has the required characteristics. 16 Choice in the Distribution of Observations (6) Adjusting by a function of the fifth degree six equally big groups of obser- vations at the arguments + 1, +a and +y the squared standard deviation of the adjusted y is salar mace > YP ae ji Lea ‘GlGoaom] 9a acae@an] re lorie ee The condition for maximum at 7 = +a is 9at — 5a?y? — 5a? + y? = 0, which together with the condition for maximum at x= + y Gy* — da2y? — dy? + a? = 0, since a? must be 2 y? results in a? +y27=% and ay?=— a2 7 SIE 2/7 or i = 2] When these values are substituted in the expression above for a, this may by somewhat lengthy algebraic operations be brought into the form iy AT ce 3°.5. 7? | 2)2 (42 2)2 (J 2 = Fy OL ay (at = at (2 — 728 (1 —2)h (7) For a function of the sixth degree the observations may be supposed to be at +1, +a, +y and 0. The expression for the squared standard deviation of an adjusted y becomes feta ath It An re ee 5 a . OF 7! I Sie Ve) Ae Sy” | ary? : 2 a2 (a? — y?) (a2 — 1 1 | x (x? — a?) (x2 — 1) |? 1 | w (a2 — a?) (a? — y?) |? laa) + +3| ean | +O} A maximum at x= + a requires llat — Ta? y? — Ta? + 3y? = 0, and a maximum at «= + y requires Llyt lary? = Ty? 3-307 = 0; which added and subtracted provide 11 (a? + y?)? — 36a? y? — 4 (a? + y?) = and ae 2) atts) Oh a0 Since we must have a? < y’?, a+ y2=109 and a?y?= 4, Ci 15 + 2/15 15 or ye oe The expression for a, may after rather laborious operations be brought into the form ye if GE Ue ohh 2 2)2 (2 2)2 2 = ses —~—— «7 (x? — a”)? (@ AR) KIRSTINE SMITH 17 (8) Itis thus, as we aimed at, shown for functions up to the sixth degree that by distributing the observations in (n + 1) equally big groups and choosing the places of these groups in one special way we can manage to keep the standard deviation of any adjusted y within the possible range of observations less than the standard deviation at the places of observation. There is every reason to believe that the rule holds for any degree of function, but as the general proof would be very complicated and as almost all practical cases will be covered by functions up to the sixth degree, the problem can therefore be left at this stage. As we have proved, any other distribution of observations leads to a curve of squared standard deviation that has a higher maximum value within the range. This special set of (n + 1) groups has therefore a very conspicuous advantage over all other distributions of observations. The application of it is however limited in that at demands that the degree of the function must be known beforehand and thus the obser- vations do not provide any justification for the form of function chosen. If however the function has been fully investigated beforehand and there is no doubt about us form, (n + 1) equally big groups of observations placed as indicated are the most desirable set of observations possible. The approximate values of the places of the groups are given in the table below. TABLE I. Degree of function Ist 2nd 3rd 4th 5th 6th 1-0000 1-0000 1-0000 1-0000 1-0000 1-0000 Places of — -0000 -4472 -6547 ‘7651 +8302 observation = a — -0000 2852 -4689 = = -0000 With rougher approximation the intervals between the observations, still expressed by the half range as unit, are as follows: Ist degree of function 2 2nd - es leceatl! 3rd - ee eae Athy » eo ki bth » eet iad Se hee 6th, » Bog a ie ae The six curves of standard deviation are represented in Diagram 1. It will be seen that the minima of a curve, if it has more than two, are the lower the greater their distances from the middle of the range, so that the variation of the standard deviation is greatest in the outermost intervals of the range. III. Uniform continuous distribution of observations with constant standard deviation. General formulae. (1) As was pointed out in the last section the lumping up of observations in groups just necessary to determine the constants of the function in question has some drawbacks and cannot be recommended as a universal rule. In many cases it is through the observations themselves that we first get to know the form of the Biometrika x11 ) Choe in the Distribution of Observations 18 “sqjurod ayruyep 4¥ s1eqysnypo stq ATTunby *suolviAeg pavpuryg Jo seamg “ UXIS t "99 99 © UE es 1040) “ WyaMo ts ‘tg ty “pra s 080 “* puoosg So 6d BD aaiseq, ys Jo uoyoung “Ig!g *swounasasqo fo abuny 9- CG: v: Ss: G: L- (0) L-- Grr g-- v- "T WvudvIg Gr 9:- l- 8:- 6-- G-l- o JN Unit of Standard Deviation KIRSTINE SMITH 19 function, and thus a full investigation may require more groups of observations than merely a number equal to the assumed number of constants in the formula. Besides, even when we believe we know on theoretical or other grounds before- hand the nature of the function a priori we may consider it prudent to distribute the observations so that they supply us with data whereby we may control our hypothesis that the assumed function is the right one. It is therefore desirable to find other forms of distributions which, at the same time as they make the standard deviation of the adjusted function vary little inside the range of observations, are more uniformly spread over this range. (2) A uniform continuous distribution at once recommends itself as the simplest assumption. As we suppose the observations to have constant standard deviations the elements of the determinants of (15) are the moment coefficients of the a’s at the places of observation. When the N observations are uniformly spread between « = — 1 and a = 1, bar = —c4 and plor44 = 9, and the expression for ,0,, is, according to (15), ees ale 1 py | ee el. vee N bs 1 pe eres | ig oemmia Ma Be | 1 Be [pte Oo Or M2n-2 | OS Saat ay aoe bop 4. g2 PP Hey Mapig Hap—4 Me (Poe Sona: Map—2 Me [a dooce Pap Map-2 Map veers: Map-6 | | Man Monte +++: Hap-2 1 1 [lg Manes ds P2op-2 | UP fg ag veeees Lop se EP? Hen Mapig s+ Mayo | ae (23) | 1 [eapiavncse Popo 1 Tp peer Lop j | [Ppp [Hel ocaor Pap | be fea eiereciecie Men+e : ; | : | Man-2 Map --++++ Map—4 | ope ep -Hore ssc Map By this formula we may evaluate successively j0,, 20, -.. 2%, When we know the two general terms of which the sum consists. 2—2 a 20 Choice in the Distribution of Observations (3) The determinant of the order p, 1 1 1 2a Nisei 2p = 3 4 Paiio a eee jh= 2¢g+1 20 Ooh a 2g+2p—1 |}, | é : : | 1 1 1 Pog p= 3 Deepal a 2q + 4p — 5 which includes the two types of the denominators in (23), shall first be evaluated. We find A Meee and yA EY seein Be 2q—1 : 2q — 1) (2q + 1)? (2q + 38)” and it shall be proved that if ,A=(lP-1 97-2) (p—2)8p_— 1)? 2° 0) (24) q q up to the order p, ,II being the product of the elements of ,A, the rule holds for determinants of any order. It is clear that q+2 qd qd Opal +1441 = BSS q qa — i p+iA 41, p41 im oA, p+1At1, p Lz ae pA q q+? and pio, ptt11 = ot. If we therefore in the general relation for an orthosymmetrical determinant 2 Ney : Agy ae Ase sss's’ q put s=1 and s’=p+1 and A=—,,, A, we find A= q a+2 att in pA. >A— pA p+i q+2 ? p-1 and, using (24), Co naa" 30 ate eee (er ae yeaa alll lil p+14 — {1-2 \7=3 ae (p — 3)? (p — 2) “i D -1 Now, according to the definition of IT, ? qd q+2 : eee ss aa (29 =) qa le (29 EO) ooneeS (2q all. 4p — 3)2 (29 aL 4p =) II Il , p+1 p+1 x (2q afk 2 aus 1)2, nae 2 = (2g — 1)? (2q + 1)? (2q + 3)... (2g + 4p — 3)? (2g + 4p — 1)? pt ll? and q+2 pall = (2g —1) (29 + 1)? (29 + 3)2...... (29 + 4p — 3)2(2¢ + 4p 1), KIRSTINE SMITH 21 Hence qd qd pal = (1? 2772... (p — 2)3 (p — YR. QOV + TL. [(2q + 2p — 12 . [527 — 1) Got Ap — 1, qd wal = (17.2772... (p— 12. pe. 22th. TT, which agrees with (24). q (4) Next we have to evaluate the minors of ,A necessary for calculating the q numerators in (23). For this purpose we only need the minors , A, ,, but to carry qd through the proof by induction ,A,,, for any values of s and r is needed. For ie 3 we directly find, q a ey, s23= Og — 1) Qq + 1) (29 +3) 2g +5) 4 22, 2? and sAo,2 (2g — 1) (2¢ + 3)? (2g + 7)’ these both agree with the following formula which will be proved by induction, q q ots p= (SH Le se ea) oye eh 0 an | are (9 —3)*(p.— 2)}7..2'2 VO es OR snot (25) er (posh a By-1,s-1 18 the binomial coefficient s—1 Eee and ,II,,, the product of all the elements of BAe. The relation has to be proved first for 7 = s, then for r = p and finally for any combination s and r. For the first two proofs we use the relation between the minors of an ortho- symmetrical determinant (NGS Aegis! . Ages" 5" = Noe S. (26) =. — arn Seema re ee ee a . Ags" NS Sasosh Nene aig Ay" 535! : Nard This is found from two relations given by Professor Pearson* by dividing one of them by the other. q (oye thet A be 5, A, s’ = 1 and s’’ = » +1, then q qd q qd 2 pitAss = v+tAss11 : pris, 8,p+1, 9+1 Sori AG ease ee (27) qa 7] a qd qa wool ad . +1 A, pt+1 +14), 1, 9+1, 9+1° pgs $51, 94-1 on pe Sh p+1,s,1° pit sa 1,5, p+1 a q+2 Now Pea Soucy pyle reer ’ qd q pas, 8, D+1, p+1 — pANGs , q —— 1 Dei AG 1 o41, pe = ( Laue aE p? a q+1 — D aye igo = (= iy paws * Biometrika, Vol. x1, pp. 232-3. 22 Choice in the Distribution of Observations 1 p,S? Co] a a+ Sl p pit Ani, o4ie4 oa ( 1) pA qa q+1 pe eve ot = (— 1) Ps a Nea so that all the determinants on the right side in (27) can be evaluated by (25). They all have the factor {Ul age yo Bee ae (p — 3)? (p — 2)}2 . 2(P-2) (2) in common, when that is divided out there remains q ; z q+2 q atl ptt Ass (= 1)PB i, 5-2 - Boa, s-1 (ost, 5-4 + ples = pLTs—1, 5) a = q+1 q+1 = q+1 a q+1 = sa0c00d60 ptl Aion ferrari 7 fel aa (ues C Ally Te Di h0) Now indicating by C, the product of the elements of the rth column or rth q q row in ,,,A and by e,, the element of the ,,,A common for the rth row and sth column we find iy a OF p+ 1ttss = ptts—1,s—1° Or é14 . C4 q qa 2 ott pt+l1 Pallet a pul. ot sae OT car po) Cp+1, p+1 + &p+i,s 1 at CiC a4 qa mails aa Bll vecns . : 1, p41 -&s,1+ 6s, p41 Hence the factor of the numerator in (28) is reduced to a] 2 2 Tp, 12 Ss 944 p+1 9 ss C2 12 {eu p41, p+ — 41, pair Uist For the IT’s of the denominator we find q a4 CAG: per lla oar = p**s—1,1 - Prag , €51 +s, n41+ 1,1 " Ga Coy - Os pt+itti,pt+i ™ p DS? G e en? ‘s, D+1 ° “+1, p+1° “1,5 q q+1 C OC pt+1'V1 p+1 I, DLS pr D1 a = h 1, m1 > P11: en4i, p41 1 2 é as a+ C? p+1 1, 9+1 > 9 s—1,s° ’ C55 es1 G es, p+1 the factor containing II’s of the denominator of (28) is therefore equal to 7 Cra e é 2 1s°* ‘%p4+i,s* %1,1 ° p+), 94+1 p+1 IT, p+1 ; C,.0 C2 {ers + €n41,s — 1, p41: Css} ab evoapal OA Introducing these two expressions in (28) and substituting for the one factor q II OF als e : ae “_ the value —1—#+1 ,_—**_ we hence find II 8 1, p41 pt+1**1, p+1 1 1 q Bh cares bs oe q p+1 Ns a. (—1)"B B ei, p+ C11 - Cn 41, p+1 p+l1 ts qd an p—l, s—2/* p—1, s—1 ¥ ‘A 1 1 q p+1“1, p+1 a p+1 I, pt+l Css-€1, p41 1, +P nti, s ; KIRSTINE SMITH 23 The fraction containing e’s equals (2g + 2p—1P—(2g—MWQg+4p— ly) (Se (2q-+ 4s — 5) (29 + 2p — 1) — (2g + 2s — 3) (2g + 28+ 2p—3) (s—1)(p—s +1)’ g qa hence gui Bss (See osillss p+1 A,, p+1 relly p+1 As qa q+1 _ qd pasa (— )?,A —(—1)?{1P>. PUT ee (D228 (pee a Nope we therefore find q q p+ Ags = Bp, 5-1 {19 . 2? (=e) (p Aye oP ee alles, agreeing with (25). q q (6) To evaluate ,,,A, 51, we shall in (26) put A=,,,A,s=1, s’=s and s’’=p-+1. Reversing the fractions we then get qd qd qd q pt+1 A, p+1 _ pt+l A, 8, D+1, D+1* p+1 Ai, 1, 8, p+] si p+l Apes, p+1,1, 8° +1 A, s1, +1 A A A ee pri “11 pt+1 1,1, 8,8 *° +1 "1,1, 9+1, 9+1 p+] 1,1, 8, p+1 Pe (29) q qa As pt1 A, 8, D+1, D+1 — pees qd qt+2 qg+1 p+1 Ai, 1,8, p+1 — rien, AS es Ay pn qd q+1 pt1 Apu, pt+1,1,s — (ee ae ps, p>? q qt+1 p+1 A,, 3,1, p+1 — (= ib) PNT, 8? q q+2 p+1 Ay, MSS) aa BAe s-1? qa q p+1 Ai, 1, p+1, p+1 — DP Ay, 1> the right side of (29) can be evaluated by (25). We thus get q q+1 q+1 q q+2 ots pia (= 1)? Boa s-1 + By-a,s—u + Pot. s-1 (Tolls sa - ols, » + pHs » plea, ») q cs q+2 a q+) ot Ani B;, 1, s—2 (pulls Ls—1<" polly cs ple ot) MM ASR Le ie Te ee Solis al oa (30). qd We want here to express the II’s of the numerator by ,,,H,,,, and those of q i : the denominator by ,,,11,,; and we find the following relations ti an Chae: ptitts, p41 — ptts,s-1*) ’ €1s-©1, n+1- Css 1 Y tess: Cugstien ia iY apt at Dane 2 ie 1, p+1° “+1, 9+1°%1,8 1, ii ~ i Ope Oras p+l1 S370 toll Discs Si aa 7 ? Cs, pti + Css + Op+i, p+ 24 Choice in the Distribution of Observations 2 Re CAC, DELS Ss Pl oe De Sipe e e ? €1,1- 1s + 1, p41 | qd q+2 C; f CG, anc — ess Bags amb aill OSes — il 2? Css . Crs q qd 2 "| = pt+l1 pt+1 Thy #"* by oa lly ‘ pills pia Cs, pat ; : ‘ il Cx Crem p+ittyy we get - es yee! Drei 2 ; fF pris, p+1 (i ee peice €1,5-€1, pin 1,1 + &s, vt rrenllil p+1 = = ———————— ee . > a Braisee 1 1 I Url et Bk 2 5 aie p+1**1,1 : f ' spt+1 &ss+@ pti, p+ or introducing the values of the e’s qa nts, p41 q ott A1,1 : 4 (— ID) pean yee ie (29+ 2s — 3) (29¢-+2p— 1)—(2¢—1) q+ 2p +28 — 3)- spl Gru ois (24-+ 2p -+ 2s —3)2— (2q-+ 48 —5) (2q-+ 4p — 1) a p+1**1,1 I pt+1 S, p+1 = (al) Pee ae < p+ iH, 1 Now A oe 1 9p-2 2 2 1) ' pit Ar1= pA={1?*. 272... (p12)? (pW) ee ten ells and hence q q — cs -1 ane pdt Ce DER Oe dl oe en eee (p — 2)? (p — 1)}?. 279-9 | Ws par in agreement with (25). (7) It now remains to prove that (25) holds for Bs when both s and r are different from 1 and p+ 1, and r different from s. For this shall be used the relation ING ING sate? = ON gee Nea a Nee Ne between an orthosymmetrical determinant and its minors. Putting A= ,,,A, s= +1, s’ =r and s’”’ =s and solving the equation with regard to »,,A,,, we have 1 p+ Ay, 1, p+1 where prion, p+1,7,s — pAr, s mee a (ny1A y pti, p+1,7,s 3 PNET (pO ana saity a; bo Or KIRSTINE SMITH Evaluating this by (24) and (25) we get q p+1 INS Shwe ey i 50) cae (p — 2)? (p — 1)}2. 27?» q pul q a q q x [478 p41, Ss Bip, r—1° pall piles a5 Bo. Sige. Bo, peo pyle mee panna alee otl): wi g e ey e But Il ae Il p+i, p+1° %7, 9+1° Vs, +1 prrr,s pt+1~* r,s 2 Cran I eat i Cs en p41 p+1**p+1,7 ~~ n+l ESS Gl < e ee eee pt+1 ers q ie (G2 =" pt+l1 and Pec, Ul Bea ae p+l1, p+1 il Eo ll (Coke Bhiras : pt+1**p+1,s ~~ C ‘aa 7 S C+, +1 Substituting these values in (31) we find q q pee — (tts 12—* Q-2. 2.(p —2)2(p —1)}2. 2? e—™ Sy Pray oe aes Py dea ae 1 2 es ie 4 I I x Cr pt1 €s, p41 and as the last fraction equals il (p—s+1)(p—7+ 1)’ a ( pet, = (— MB y--1-Bor1l?4. 27-2... (p—2)*(p— P29? Tl, with which the proof by induction for (25) is carried through. (8) We shall now return to (23). It consists of 2p + 1 terms of which the (2r + 1)st originally was found as (5,0; — »,10;) so that 1 1 Pe Aes, peal Me MI x 1 arene ! 3 2 or+1 as ees a eal : go! Oph Onagno dy — 1 Bry ee yer 1 Ou — ar i . 7! aN 1 , il i Nata atl i ene aren ‘ i als 1 3 5 Co ME Se or ii il 3 4 re op mie 3 1 1 mies Ae l 1 Fi aoa] SOE a IN| OVE 2, gs Ap + 1 | and 1 A 1 2 4 RE ys | il 2 1 Al — x 3 amo gies Joao Mere 1 1 1 ° - rosa ; 2r+1 24307" 4r — 3 Qr-14 yor -24y = AT ’ N | i ¥ 1 [eal i fi ae | 3 Bl scat a : | ees: pel 1 Ls ae ee at 4 7) DGuuene : 2 2rd 2 Wr +3 1 il 1 Il } il 1 Wige2Y pee Lig amine peor 4r — 5 | Ort 1 Dips By 4r — 1 With the notations later adopted we therefore find s=Pr 1 2 2 2 S ae eerarae (AV rosea s= o ot PACA ie PY 5 REY ae 1 1 s aN : ri A s=r-1 2 2 p o7 22 | 8 [a Ase at and 2r—-1 Fy — ar-2Fy = N- ae ) 2 NAAN Substituting the values for A’s from (24) and (25) we get : r il ayy) 9 9 o san s+1, arFy — gr—19y = N. 2?" (ir? oc | (— LS omer. aia ae ait aa | ea eee, is le E pit + rt and e 2 ane eh Cnr s=r-1 | ry ae s=0 a oe L LY feed MW or, as ee . i eae a ey (2s + 1) (28 + 3)...:.. (2s + Qr — 1) Jia Sat Cost - Viena) rt1 and 2 abiens eG aa —— = V 47 = 7 (28 43)(25 45)... (Qs +271) yt Ul +1 Cr, r i Pie oie Care al Be ie : ‘ 2 arFy — 2r-1 Fy = ines | Sl 1s a(2s-— Dyi(2s Sys. (2s + 27 — } IR me Fs 5, (32), * The e’s and C do not of course have the same value in the two equations as they represent columns and elements in two different determinants. KIRsTINE SMITH Dr and : o? (47 — 1) 2? {is ic 1)°B,-4, «2? (2s + 3) (2s + 5)...... 2r-ay — 27-200 (jy — 1)? 227-2 s=0 (Decor Wy} ee a (33), 2 4 . F . . ‘ oO which enables us to form ,,o” by successive summations from go; = = N° Before investigating the curve for ,o;, for a special n we shall first look at ,,o° for «=0 and «= +1. (9) From (33) we see that when « = 0 2r-1%y = 2 rm »po, is for = 0 most easily evaluated from the formula (13). 1 : Remembering that in our case m,, = yaaa we find from this 1 ssa Ys CA ies 9°, ee pA and hence by (24) and (25) va eee as Nee Coa Oe BOE al 2p + 1)2 2p419%y = 2pFy = N (2 Begins Qp ; sjeiaraisverciavs,e ele! s) oveseie's (34) (10) To evaluate ,,o? for x= -+ 1 we use (32) and (33). The sum in (32) may be considered as Galed 1 d {a2"-1 (x2 — 1) han dane. Coe dee) with a number r of differentiations. If these operations are undertaken directly upon x?"-1 (x2 — 1)" the result is One wl) a, (@" = 1)?-t sss a, (w— 1)+ ap, of which only Gy Oh ore 2) ceases Sl ge A remains for x= + 1. 7 Corresponding to this the sum in (33) comes out from ae dld GaN drla (a7 ail) eda ean) | dx x dx by taking (r — 1) differentiations and therefore S5,-1 equals, for = +1, (2r—2)(2r—4)......4.2= |r —1. 27-1, xv=1 v=1 oc Hence ory — or-19) = N (47 + 1) z?=1 e=1 o2 and Ct ga 0, = N (47 — 1), or since eC > 097 = N > aa =1 o2 ny = il + 35+ direst (2n + 1)}, xz?=1 o2 por N (Heist eee pera ater taint te eetheak Cok hs Sct nes deve es (35) 28 Choice in the Distribution of Observations (11) In Section I under (7) it was found that ¥ (2) ode =o? Ve no dz = o7 (n+ 1) when the integration was taken over the places of observation. For the present distribution f(a) is 1, % (x) constant and fy (~) dx = N, hence the mean of ,o? in the range of observations is for a uniform continuous distribution For the grouped observations in Section II we find by integration of the formulae for functions from the first to the sixth degree that [eth Se cee 1 5] noid wet (l oct) IV. Uniform continuous distribution of observations with constant standard deviation. Special formulae. (1) Let ,o% — ,_,0% be indicated by S,, then the formulae (32) and (33) give us 2 : See 3a oe 15) Sy = Nos (1 — 327)? 2 S,= 5.1 a2 (8 — 5a) 2 9 bg actosk asia (36), = 2 (8 8072 Baan Sa=q- gg 3 30x AE ee) (eZ eam) Esc aes Ley ro Ss= 7° 64 (15 — 70x? + 632%) g oo 1S (153 lbaty 94ba8 69808) Bs = 9x 056 | 3) ODA = ox IID ) from which we form ,,o? beginning with Pf as o ‘\ 0Fy — N | 2 104 = 5 (1+ 304) 9 oe 9 5 2\2 o 9 1 SY) 21 Ard ay = yt Oe id (te) yan x2 + Bat) and further in the same way 2 32 = = ; ; (9 + 4522 — 16524 + 17528) » (37) ae ae 2 4 6440° + 44108 4, = N 64 ( — 3622 + 29424 — eyo + av ) | Aor = = : = (25 + 17502 — 1750a4 + 6510a8 — 955528 + 4851219) | 6%, c = (175 — 105022 + 1732524 — 9366028 + 22522528 + | — 2453220" + 99099212) / KIRSTINE SMITH 29 (2) Since ,o7 = ,-,0,+S, the curve for ,o; is entirely above the ,_,0;, curve except where S, = 0. Solving the equations S, = 0 the following roots are found: Hors; — 0 |) » S2=0 a= +Vi= 4 -5773 » 8,=0 a= t= +V2= 4 -7746 } 15+2V30 — (-8611 pies hae J 35. «(+3400 35-42V70_——-(-9030 ees oe va arl® 63. «(+5438 ane eS 0 x= + }-6612 9325 Since all the roots are rational and all le between — 1 and + 1,,,0; therefore equals ,_,0, for n values of x all of which are inside the range of the observations. The adjusted values of the functions at these abscissae appear to be of special interest since they are uncorrelated as was shown in Section I under (9). (3) Looking at Diagram 2, representing the curves of ,,o, up to n = 6, it is seen, na) v=1 as was also clear from the formula for o% and o% given in the last section, that while the standard deviation in the middle of the range increases slowly with the degree of function it increases very rapidly at the ends of the range. At x= 0 the curve has a minimum when the degree of function is odd and a maximum when it is even. Besides that the curve has (2n — 2) maxima and minima between —land1. As the curve for ,o? is of the 2nth degree, ,,a; is therefore increasing for x increasing above | or for x decreasing below — 1. The abscissae of the maxima and minima are given in the following table. Deg f RR PtOA Abscissae of maxima Abscissae of minima 1 0 2 0 » EVE= + 4472 : 0 3 +Vi= + 4472 eater (hat [TET , (-7651 . l4Ve = + -6547 NE = een = |-2859 aoe [ oe d fs dt LO (ise | [5 +2715 (-8302 1 -2852 |= 3300—~Ci«<“‘<«é‘C#&L 468899 fegree eri . 6 | Je +2VI5__ |-8302 eee = 33 ~ ~ 14689 5 |.2093 tons of Observati ton O but the Distiriv Choice in } 30 O-L lesa “UOINGIUYSIG WIOFIUL ‘“SUOTYVIAIG prepuvyg Jo saaAmMO ‘Z WVUDVIG *suonnaiasqo fo abuny v: €- G: L oO [Se On ao re a yom Oa 2 Oo One ab — vt TAXIS UL }AN0,7 pUgy puosag aaIseq 4st Jo uoyoun sy IN Unit of Standard Deviation KUIRSTINE SMITH 31 9 Hence the curve for ,,,,0, has a maximum for the abscissae at which 4,0; has aminimum. A comparison with the results in Section II shows that the abscissae of the maxima found here are the same as those of the best places of observation for (n+ 1) equally big groups of observations of a function of the nth degree. These places tally with the places where ,o%; was a maximum. Thus if we imagine that we had started the investigations with a uniform distribution of observations, and to lower the maxima of the curve of standard deviation had put clusters of observations at those maxima and at the ends of the range we should not get the best curve of standard deviation till all the observations of the continuous distribution had been distributed at the n — 1 places of maxima and at 1 and — 1. The minima of the standard deviations obtained from a uniform continuous distribution and the (n + 1) best groups of observations do not fall at the same . * abscissae. (4) The curves are very far from our ideal of a constant standard deviation throughout the range. To obtain the same maximum of standard deviation as (n + 1) groups could give us we should have to limit the part of the range used to the following fractions of the range: for Ist degree +58 ONG s::s, 73 POLO ares -80 petth =, “84 PSOne ts 83 » Oth ,, ‘73 It is not likely that the range of values of the function which we investigate would only be of interest inside a range so much smaller than that within which we might actually observe; further it seems likely that observations all of which were taken inside the smaller part of the range would give better information for that special interval. I shall therefore examine in the following sections if a uniform distribution of observations to which is added clusters of observations at the ends of the range will not possibly give a more satisfactory curve of standard deviations. V. Umiform continuous distribution of observations with additional observations clustered at the ends of the range; constant standard deviation of observations. General formulae. (1) Suppose we have V observations uniformly distributed from — 1 piesa to 1 and besides N i = 2 l+a then have 1 i Nx" Na | observations at — 1 and the same number at 1. We ee NM ea) oe Lene 32 or and - oleate) ele wei Ose Tl, Margi = 0. According to (13) and (14) we find, a o° (1 + a) 2pVy N i + ae ie 1 ai | 1 l+a gta | 2 tta 3+4 ep A pall: _t + Qa pel’? Ww+3 l+a se @ see i+a t+a er oe a ae 2p +1 idee 0 1 a 1 y+ a 5 ta x t++a rata Oe 1 1 2p—2 | ‘ Spee aae 2pewOw t+a t#+a ee iets tra 1t¢ aoood | | 1 i 1 i one a Ip +3 D woeee Choice in the Distribution of Observations eeeece a eeeee COCHOn Aa joo (88) and Biometrika x11 KirstInE SMITH 33 0 I seem ARE y2p—2 1 il lt+a ek ene Fees 2 1 1 1 | x x +a 5s +a GORtGaO Seu Il 1 Il | n2p—2 2 : ea? ii aks a sme erg S 1 il soya eee". aN Sec es esis e 1 1 1 | gta 5 +a ddéacas 2p +1 Qa | | 1 1 1 phe Spam o Apooop pes aee & 0 1 Ce qpo 1 1 I i 3 +a i pes acctece page 2 1 1 1 aw 5 +a 7 +o SOBASO m+3°% Deve att ane : +a 2p+1 2p 3 4p —1 } ae 1 1 j gta ig ar Gk Sasonc Srpmatenuae | tra t+a : roy) | 4 Parishes acdsee op +3 | mat mala : IP : =e 2p +1 a 2p +3 sevens 4p — 1 sabretencmesaloo) 5 34 Choice in the Distribution of Observations and, according to I under (6), eerece ‘ Brees 205, — 2p-19y = N (1 ale a) x il 2 1 l+a Taw Seed ae 1 +a x tra t+a : +a 1 BG tu erence Pee x tig Liat cOemn ibe! : ae i yi ‘ 2p +3 p2p 1 I | 1 + 2 pea yeas vee nih ee l+a Eich. y Yuviaee pa l+a t+a k+a 1 Leas! Lita L+a 3 5 (OB “aeGno0 2p +1 Qa 3 5 bids 1 ligt 1 Ian pe ss Hop ela pe ee iy 28) |) so pe mores and gens hasiie)en 20417 — apFy uae 1 1 if 1 t+a tta oo... paar 1 t+a ¢++a : als z teh, adedos Ip 43 a x4 aL JE. Gi EE ohy anny. d +a ; : 29 + 5 ye 1 | z ~ | 1 eS pee 6 oe foe ek 1 t+a Tod, wees amen ata t+a 1 tt+a Laat Sn yee t+ta t+ta he we Dee eee Loe Qp+1° 2p +3 aa pe io 2p +5 (2) For the reduction of these formulae we have to evaluate the determinant of pth order KIRSTINE SMITH 35 later, _1 I EY pan Screens Ne 2q-+ 2p—3 ; cara 1 i be —— i Se a Oe Boeood Rs a ;o= 2¢+] 2¢+3 2qg+2p—1 a ee = tai Tees ae aS I ah 2g + 2p —3 eerie Seetpepek By subtracting from the elements of each row the elements of the proceeding and leaving the first row as it is, it is transformed to qd po — Le x aes Ls ae eee Anil Qa opeedl OC aoccno 2¢-+ 2p —3 Qa | 2 2 2 | (2¢ — 1) (2g + 1) (Qg-+1)2g+3) 077" (29 + 2p -- 3) (2g + 2p — 1) | 5 ; 9 | =" | (q+ 2p —5) q+ 2p—3) (q+ 2p—3) q+ 2p—1) Bq + 4p —7) qt 4p—5) which when the columns undergo the same process takes the form qa po a | Le 2 2 2 2g—-1 °° (2q — 1) (29 + 1) (2¢+1)@q+3) *(2q+2p—5) (2+ 2p—3) 2 2.4 2 2.4 (29—1)(2g+1) (2q — 1) (2g + 1) (2g + 8) (2¢ + 1) (2¢ + 3) (2g + 5) (2g+2p—5)...(29+2p—1) : 2 ae Ph geen a ae q+ 1) (2¢+3) (2g +1) (2¢ +3) (2¢ + 5) (2¢ + 3) (2g + 5) (2¢ + 7)" (2g-+2p—3)...(2g+2p+1) 2 al 24). ; DA a 2.4 (2g+2p—5)(2q+2p—3) (2g+2p—5)...(2g+2p—1) (2g+2p—3)...(2g+2p+1) (2q+4p—9)...(2g—4p—5) Let us introduce the notation i 1 | Qg—=1) q+ Dq+3) — Cg F N+ HRTHS) q+ 2p— 3)... q+ 2p +1) 1 1 1 wD=| (2¢+1) 2g +3) 2q+ 5) * (2q + 2p —1)... (2g + 2p + 3)] ] (2q + 3) (2q + 5) (2¢ + 7) 1 1 (2g + 2p — 3)... (2+ 2p+1) (Qq¢+2p—1)... Qq¢+ 2p+ 3) (2g + 4p—5)... (2¢+4p—1) q q Then, since for a= 0 ,6 equals the determinant ,A, we have q q oe BO = Pye aor ee tee qd and the problem is reduced to the evaluation of , D. 36 Choice in the Distribution of Observations (3) It shall be proved by induction that q 2D {1P, 2P-1_,, (p —1)?. pt? . 2P(P—2) (p +1) ; (2q = 1) (2g + 1)?(2y +8)... (2q + Bp — 5) (2q + Bp — 3)? (2g + Bp — 1) Bq + Bp +1)? Bq + Bp + 3)P—... (2q + sp —3)*(2g + Ap — 1) It contains the 2p + 1 different factors of the elements with indices increasing from | at the extreme to p in the middle so that the three factors of which the one diagonal line of the determinant consists occur with the index p. For p= 1 the formula gives B 1 1 (2g = 1) (2¢ + 1) (2¢ + 3) as it ought to. As the determinant is orthosymmetrical the relation a ee a vied SSS A holds: gd Applied on ,,,D for s= 1 and s’ = p+ 1 it may be written PR Chet Girl DEO ae aD = (44). yO=Al Looking first at the numerator of (43) we see that it has the same value for the two terms of the numerator of (44), and divided by the corresponding factor of q+? D it becomes ei ets sete (p — 2)§(p— aga) 2(p+ 1)? gap (p-2)—(-1) (2-3) ae Oma a meee ((=2)2iG 1s eee oe Ee Ee Ue tener (p — 2)4(p—1)3 p? (pt Dg 2 pi qd 3 To evaluate the factor in ,,,D arising from the denominator of (43) we shall give a table of the indices with which the different factors occur in the D’s and their ratios. 2q—-1L2a+l 2q+3 ... 2g+2p-5 2q+2p—-3 2q4+2p—-1 2q+2p41 2+2p+3 2q+2p+5 2a+2p+7 ... 2qt4y—1 2a+4p+1 2a+4p43 5 1 2 Bh ee 9 Di—l Pp p p p-] (Da 7 Di tonmace 1 ay ria q+2 pD dE bog GSS) p-2 p-1 Pp Pp p p= Tee 3 2 I q+2 pat) =) i 9 Na ep 8 p-2 p-1l p-1 p-l jas Zs Dp 13) Bes. 1 ae a, q+1 ee — 2 4.,...2(p-2) 2(p-1) 2p 2p 2(p-1) 2(p- 2)... 4 2 sae qd q72 D.,D : P12 8B pol p pb. pee. ip p ia a 2 : p-1 q+1 pi? ¢ ‘ . —2- am 2 8 ue pel Pp p+l ptl ptl p pads ess Z ae KIRSTINE SMITH 37 Hence the factor arising from the denominator of (43) is (2q + 2p — 1) (2g +2p+3) — (2g —1) (2¢g+4p+3) ; (2g —1) (2g +1)?... (2g + 2p —3)” (2q + 2p —1)P*1(2¢ +29 +.1)Pt1 (2g +2p +3)? (2+ 2p +5)... (2g +4p +1)? (2g +4p4+3) ° The numerator of this equals 4p (p+ 2), multiplying with the factor previously found we therefore get piD {]ptl QP _.. (p—1)8 p? (p+1)} \h2) Q(t) (p—1) (pts 2) ~ (2g —1)(2g41)2... (2q-+ 2p — 3)” (2q + 2p — DPF (ag. + 2p+1)P*1( 2g + 2p +3) (2q +2p +5)... (2g +4p +1)? (2q+4p4+3)’ which is what we wanted to prove. (4) When the values of A and D are introduced in (42) we get = {1P=2 , 2-2... (p— 2)? (p—1)}2. 220) Y (2q=1) (2g 41)... (2g + 2p —5)P— (2g + 2p — 3)? (2q + 2p — 1)? ... (29 +4p —7)? (2g + 4p —5) +a .23(P-1) x {[P-t , Qp-2.... (p —2)? (p —1)}?2. 20-YP3S) (2q —1) (2g +1)? ... (2g +2p — 7)P-2 (2g + 2p —5)P-1 (2g + 2p —3)P-1 (2g + 2p —1)P-4 (2g +2p4+1)P?... (2g +4p —7)? (2¢ +4p —- 5) or r= pw ale ak Ce a he a dnadigensiiaee ss (45). (2q —1) (2g +1)? ... (2g + 2p —5)?-1 (2¢ 4 3)? (2g +2p —1)"-1 ... (2g +4p—7)? (2¢ +4p — 5) The denominators of oe eae (38)—(41) for ,o% are now known since they 2 2 only consist of the factors 3 and,6. Tobe able to write down the general expression for ,o, we should have to mealies the minors of 6, but their form is so complicated that a direct calculation of the determinants for the degrees of function in question appears to be simpler. With the material in hand we are however able to deter- mine ,o7 for = 0 and 2? = 1. (5) From (38) and (39) we see that x=0 x=0 o2 5 . 2 ‘ ao, = ope WV ~— (1+ a), and with the 6’s as given by (45) p+19 Hest) x=0 29Fy = 2041%y = pea eee 2S) se i 08 Cm NP pee (2p 48)? (p45)? (4p =1)* (4p +1) N{1.2.3... p}?. 2” .[l+a(pt+l) (2p+1)]5. 72.98... (2p —1)?-2 (2p + 1)?! (2p £3)? (2p +5)... (4p - 22 (4p41) o* (1 +a) 3?..5?... (2p — 1)? (29+ 1)? .[1+ ap oa WN {1.2.3....p}?. 2?” [1+a(p+ 1) (2p+ 1)] se Paleo? (3.5 — 2p + ti? (eco alle ap (2p + 3)] (46. 2p" apt 1oy = Ny 9° 4° .2p 2p j (1+ a (p- is 1) (2 2p ny -D] eialalerevs =] (6) To find ,.o? we have to evaluate the determinant of (p+ 1)st order, or 0 iL Leg ae dae e Scr 1 ib ae ead eee rs, 2q —1 2q+1 Whore 2q¢ + 2p — 3 (ee ee fae : 7 Datel a 2q+3 On saiereiarais 2q +: 29+ p— 1 a 1 1 1 - a =e Se eh Hoppus ; see! 2q + 2p—3 2q + 2p—1 2q+4p—5 38 Choice in the Distribution of Observations q . Treating it as ,0 was treated under (2) of this section, except that now two rows or columns are left unaltered, it takes the form 1 0) Ov” | a. eee 0 ia 2 2 2 ne a —_— a ——_ id eoniunneee 2q-1 (2q -1) (2¢+1) (2q +1) (2¢ +3) (2g + 2p —5) (2q + 2p -3) 2 2.4 2.4 2.4 (2q-1) (2¢ +1) (2q — 1) (2q¢-+1) (2¢ +3) (2q¢ +1) (29 +3) (29+5) 0 (2q + 2p —5) ... (2g+ 2p - 1) 2 yee) 2.4 2.4 i (2q +1) (2q¢+3) (2g +1) (2q +3) (2q¢ +5) (29 +3) (29 +5) (Qq+7) (2q + 2p —3) ... (2¢+2p +1) 2 2.4 2.4 2.4 (2q + 2p —5) (2¢ + 2p —3) (2g+2p-3) (2¢+2p—5)...(2¢+2p—1) (2¢4+2p-3)...(2g+2p+1) 77°" (2¢+4p -9) ... (2¢+4p — 5) q — — 93(p-1) = po Hence we find from (38), 1 1 62 3 Q3(P- 2) wl 6 ane) ? ,D Dy 278) = N a 2 oro po Now from (43) and (45) we get q 2°” D (par 2g ot De (47), g [1 +a(p+1)(2¢+2p—1)] p19 and therefore ey + a piensa ! Sue aN, lta(pt+1)(Qp+1)° l+ap(2p41) x=] o2 p+ 1 p l ee aya ate aie ...(48). or 2pFy Wau a) (2p + 1) ti +a(p aes 1) (Qp41)~ ] + ap (2p + 1)) ( ) In the same way we get from (39), 1 2 w=1 G2 (OEM =) yp) ORCA) 70) ap-19) = 77 (1 t a) +- = jp ey 2 po po which by the relation between »D and ,, Lp just found is reduced to eee p (2p — 1) p(2p + 1) ; aa Se ae -| eee rereee 49 reat Oc ea cneae 1 + ap (2p + 1) 2 Both (48) and (49) are covered by the formula x= o2 Nn nN + » pecs mn Baise sent 2200): poe Wl +a) (041) fe 1) sae SI ee) (7) The evaluation of ,,o7 for special values of n can be made easier by a trans- formation of the determinant 0 —— ve e a St Et a NE, r KIRSTINE SMITH 39 1 gee ee et fg | hee Doe 2¢+2p—3 | x eee get oe +a) : 29-1 2g+ 300 0" 2g + 2p —1 pid = 4 1 1 1 Me ig 1 dg45°% SeSOr Iqaaap Pal | : | SA ae sell cling Ae 3) “ 2q4+2p-1 2g+2p+1 0°00" 2¢+4p—3 Leaving the first row unaltered and subtracting from each of the others the proceeding we get a determinant the first column of which is Wert Ia (a — 1) 1.02? (a? — 1)s while the other columns are identical with those of the determinant 6 previously treated in the same way. When next the two first rows are left as they are and from each of the others is subtracted the proceeding one the result is Ba = (= L)2P~t x 1 ar eS te: ae sie ea 2q—1 2q+1 2g 4- 2p — 3 ee on See 2 i De esta (2g — 1) (2¢ + 1) (2¢ + 1) (2¢ + 3) (2¢ + 2p — 38) (2q + 2p -- 1) (1 — a2) 2.4 ee Raley | 0h Bee: (2¢—1) Q¢+1)(2¢+3) (29+ 12 are Bas 2 (2¢+ 2p—3)...(2q+2p+ 1) 2p-4/ Dea: A 2.4 g2P-4( 1 —a?)2 ee ee (2g+ 2p—5)...(2g+2p—1) (2g+2p—-: = (2g+2p+ 1)" (2g +4p—7)...(2¢+4p—3) Leaving now three rows unaltered, next time four and so on, it is clear that we shall at last after p of these sets of operations get p(p+1) q oti aa (— 1) cae x 1 il 1 ae Sealy ee yaa | 99+ 2p—3 | : 2 2 2 eee (2q — 1) (2¢ + 1) (2q¢ + 1) (2q + 3) = (7 lp 3p = 1) (1 —22)2 ee 2.40 44 eee on 2.4 | (2g — 1) (2g + 1) (2¢+38) (2¢+ 1) (2¢ + 3) (29+ 5) “(2g +2p—8)...... (2g +2p +1) (1-22) 2.4...2p 2.4 ...29 2.4... 2p | (2qg—1)...... (2g+2p—1) (2q+1)...... (2g+2p+1)° (2¢+2p—3)...... (2¢+ 4p --3) By treating the columns in the same way, leaving first two then three and so on unaltered, we find after the first set of operations 40 Choice in the Distribution of Observations ae Fe (e iam ne 1 : ae = ee @i— Det 0 Oe eae eee 12 - DE ee 2.4 2s. . (2q — 1) (2¢ + 1) (2q — 1) (2¢ + 1) (2¢ + 3) (2q+ 2p—5).. ae 1) (eee 2.4 2.4.6 2.4.6 (2g — 1) (2¢ + 1) (2¢ 4+ 3) GG lee EE) (2qg+ 2p —5)...(2g+ 29 +1) (1—22) 2A etaip 2.4... 2p (2p + 2) 2.4... 2p (2p + 2) p E (2qg—1) (2g+1)...(2¢+ 2p—1) (2g—1)(2q+1)...2q+ 2p+1) ~ (29+ 2p—5)...(2q¢+ 4p—3) and after (p — 1) sets of operations 1 1 i 20 reo 2.4... (2p — 2) 2q—1 (2¢ — 1) (2¢ + 1) ‘"" (2q—1)(2¢+1)...(2¢+ 2p— 3) L— 2 2 2.4 ‘ 2) A ret) (2q — 1) (2q + 1) (2g — 1) (2g + 1)(2qg+ 3)” (2q--1)(2g+])...(2q+ 2p—1) | (22)? 2.4 2 eae e 2.4... 2p (2p + 2) | (2g — 1) (29+ 1) (2¢+ 3) = (2¢—1)(2¢+1)(2¢ +38)(2¢ 4-5) (2qg—1)(2g +1)...(29+ 2+ 1) | (122)? 2.4...2p 2.4...2p (2p + 2) Pewee idee 2) Oi) (2g—1)(2¢4+1)...(2g+2p—1) (2g—1)(2q4+1)...(2g+ 2p+1) (2¢—1)(2¢+1)...(2¢+ 4p— 3) : p(p+1), (p= =e since (eae =(-— 1)”. Here the first element of the last » — 1 columns is seen to occur as factor for the whole column so that we can put outside the factor ieee (CAD PAD 72) Qq— IP Rq + DP Oy + 82 Aq + BP... q+ Bp — Hq + 2p — 3) Dat) aes 20st cA Dig 2) me) cae ~ Qq— 19g + Rg + BP-* q+ DPF... Bg + Bp — 5) Og + Bp — 8)’ the resulting expression being x p(p-1) o (— 1)" .19-1. 29-2... (p—2)2(p—1)2 ? pen’ (2g — 1)P-3 (2g + 1)?-3 (2 7+ 3) “3 (2q + 2p — 5)® (q+ 2p — 3) | pies | It aT +a 1 I Pater ms as bse ee (2¢ — 1) (2¢+ 1) 2¢+3 a 2¢+2p—1 Spygate ae Peas 2p (2p + 2) (2g — 1) (2¢ + 1) (2¢ + 3) (2q + 3) (2¢ + 5) "(2g + 2p — 1) (2g + 2p + 1) (1 — a2)? Digan 20 AUS tan (PDS 74) 2p... (4 — 2) (2g —1) (2g +1)...(2g+ 2p—1) (2q¢4+ 38)...(2g+ 2p +1)" (Qq + 2p — 1)...(2q + 4p — 3) KIRSTINE SMITH 41 In our formulae the two cases g = 1 or g = 2 only occur for which according to this we find : ; p(p-1) Perse pe a 3P-1 5e-2 77-3 |, (2p — 3)? (2p — 1) : 1 l+a 1 1 a 1 (a 2 4 6 2p ae 3, 5 7 tai 2p +1 (12)? oA 4.6 6.8 2p (2p +2) — ie WSS es DT 7.9 "(2p +1) (p43) teas 2-4-6 468 6.8.10 2p (2p + 2) (2p +4) _ ee ie 5.7.9 7.9.11 ~ Qp+1)2p+3)2p +5) jae 2.4...2p 4.6...2p+2) 6.8...(2p+4) 2p (2p +2)... (4p — 2) “ 1.3...2p+1) 5.7...(2p+ 3) 7.9...(2p4+5) (2p+1)(29+ 3)...(4p9—1) cae (51) and p(p-1) Pe ol 22 tp = 22 (p= 1)2\ ate Dee (2p 12 (2p 1) 1 sta 1 I she 1 i ee ss £ 6 2p 31,5 if 9 2p+3 (1—22)2 2.40 4:6 poe) EPP aie / Dey 7.9 9.11 (2p + 3) (2p +5) fea Bel 2p (2p + 2) (2p+ 4) ee® oe Omieats (2p + 3) (2p +5) (2p +7) (22)? 2.4.6...29 4.6...(29+2) 6. 8...(2p +4) 2p (2p + 2) ... (4p — 2) ~’ 3.5.7...(Qp+3) 7.9...(2p+5) 9.11...(2p+7)"" Qp+3)Qp+5)...(4p4+1) VI. Uniform continuous distribution of observations with additional clusters at the ends of the range; constant standard deviation of observations. Special formulae. (1) Our first task shall be to work out the formulae for ,,o% — ,,_,02 for values of n up to 6, the next to find what values should be given to a in order to make nO, as flat a curve as possible within the range of observations. With the notations just introduced (40) and (41) take the form ee 2 es Soy = enFy — an—-19%y N (1 +a) i=: pO : p+10 42 Choice in the Distribution of Observations 2 2 2 o? 2 pid” and Sop ti = ep419y — aFy = ay (1 + a) @ 2 2° p+ 9119 From these formulae we find, after applying (45), (51) and (52), o? 3(1+a)2? Ses ap ag nereesreereeerreceeteser eet senses cotecnseeeeuigrstere ea (53), 2 1 32 5 I+a : (oy 0 a So wld ora la 22(1+4 2 3a) eee : o? 5 [2+ 3(1+a)(«2—1)P = oa eu vee (54), 1 2 aco! janet 3.52.7 (late Barra C1 ecaang 22 (1 + 2. Ba) 2 1— 2? 3.5 o? 7 (1+a)a?(2+5 (1+ 8a) (2*—1)P (55) SS 3a) {le10e) ; 1 1-ao = eee 2 4 o? 1.37.5 1.32.53.72.9 /2\2| 1-2 = = Bes iy cue ee 6) Be N 22(1+2.3a)° 22. 28(1 +3. 5a) \3 Ree — 2)\2 peta i (ie) 35 _o? 9 (1+a)[8 + 20 (2 + 9a) (2% — 1) + 35 (1 + 6a) (a? — 1)°P (56) iN? Gta (2th) SS d 1 sta 1 _@ Bi GPT 3 50878, OF a 62 ee Ss WO mea oe al leet Bn5 7 2 Aaa — p2)\2 poeeeree RGtaeecwiny 7.6 _o 11 (1 +a)a*[8 + 28 (2+ 18a) (e?— 1) + 63 (1+ 10a) (@*@—1)P (57), N° 28 (1 + 10a) (1 + 21a) te 1 1 6 oe g, 814.4) LP. Lee ee Ps M6 NS T7982 98 (143 Ba)’ (22.3)2, 22 (144.70) \82.5/ | (y—g22 2 8 a) EE Te) OL Se) ae 2.4.6 4.6.8 6.8.10 2\3 Me (12) 355.7 bem 9 Tagua _o® 13 (1+a)[16 +168 (1+ 10a)(a?—1) +126 (3+40a)(a?—1)?+231(1415a)(2*—1)P N° 28 rae (1 + 15a) (1 + 28a) KIRSTINE SMITH 43 (2) We shall now look at ,o;, for special values of m and as a first attempt at z=0 g=1 finding a flat curve for ,o;, try to make ,o', = ,07,. For a linear function we find, since je AD 19y = 0Fy + S,, 2 o 3 (1+ a) 2 = ‘=> = tl esta et sr are ee ee rere 9). 19y (i+ 1+ 3a 2°) (5 ) z=0 w=] As a is positive it is obvious that we cannot make ,o7 = ,o, which indeed we knew beforehand. This follows because we have proved that ,,;, is of 2nth degree and never lower. For « = 0 we find BAU oe which holds for any symmetrical distribution of observations with constant standard deviation. a is the ratio between the number of observations at the ends of the range and the number uniformly distributed through the range, it may 3 (1+) ‘14 3a flattest possible curve when a = ©, that is when the distribution of observations consists of two groups at the ends of the range. Then the curve is, as already shown in Section IT, therefore vary from 0 to 0. As decreases when a increases we get the o2 N To get a check on the degree of the function and at the same time a flatter curve of o? than that obtained from a uniform distribution we may choose something between the two extreme cases and take for example +N observations at each end of the range and 3N uniformly distributed through the range. (1 + 2). 9 19y — Then a = 1 and, according to (59), o,= (lt ge NN with the maximum cats Salas i d Cy TNE e Lis (3) For a function of the second degree we find, from (46), Ce eer eco let) OO) se meeNaareclisGie so w=1 G2 ‘I 2 and from (50), 207 = $73 (1 +4) { (1+ 3a" 1+ 6a. We want to make these equal and this requires 3 (1 + 5a) (1 + 3a) = 4 {1 + 6a + 2 (1 + 3a)} or 15a* — 8a —3= 0. This has only one positive root a = -7873500. 44 Choice tin the Distribution of Observations For this value 2 ae a)’ which is the ratio between the number of observations at one end of the range and the total number of observations, is -2202562. As .0° = ,07 + 8, we find, from (59) and (54), ete USEC) Seca a) Ge )F 20) = (1+ ae Tab Ge ) for a = :7873500 the curve is 2 .o = a {3-46837 — 6-2786222 + 6-278622%%, : ee 1 which has minima at = + V2" al The extreme values in the range of observations are therefore a. SEER G9 pan y= Sry: 18624 for 2 He gel iN (4) For a function of the third degree we have, from (46), in are 9 (1+ a)(1+ 5a) 30 1319) ton a — pen (lel an d Oy YIN Ae Ge : = 5, 2 (I a) {i oe | asl ae (61). Hence the condition that they are equal is 9 (1 + 5a) (1 + 10a) = 32 (2 + 15a) or 90a? — 69a —11=0, with one positive root a = -9021461. From (60) and (55) we find 1=F (1 ; 3(1+a) 1. and from (50), f 5 [2+ 3 (1+) (e—1)}? Ca ee 1+ 6a ff (1+ a) 2 [2 al 5 (1+ 3a) (2?— as : (1 + 3a) (I + 10a) which for a = -9021461 becomes 2 — 3 {3-67775 + 17-78799x? — 48-56651a24 + 30-778522}. Besides the minimum for z = 0 this curve has other minima for x? = -815820 and maxima for x? = -2361366. 30 The maxima and minima are as follows: +1 o H — lee ee | For x 1 6 a IN LES ILTER Co » cSt *48594. Oy = /N ry 2 3612, KuirstinE SIT 45 By choosing a = -9021461, that is by taking -237139 x N observations at each end of the range, we seem therefore to have overshot our aim since the result is that we have got inside the range a maximum for o, greater than the value obtained for 2=+1. (5) Our next attempt shall be to make x=1 x=0 30, = 2 30 ye? It requires 9 (1+ 5a) (1 + 10a) = 16 (2 + 15a) or 450a? — 105a — 23 = 0. The only positive root is a = -3710723 which gives the curve 2 30, = a {2-730117 + 12-89741 2? — 37-0761224 + 26-9088225. The maxima and minima are: vee : ae For z 0000 Oy JN° 1-652, Shi iy rasa gets » £©=+ 4828 oy NGA 016, = -89 Eas » G=+ -8279 oe YN 1-678, (oy = Me ORE » ©= +1:0000 Oy NE Dat This distribution of observations makes o, for x= +1 greater than the maximum at «= +°4828. By interpolation between these two cases we shall now try to find an a, lying between those of our two trials, for which o, for x = +1 equals the maximum value of o, which still may be expected at about eo -48. x=1 (6) In our first attempt we found o, = 1-918 and its difference from the vw -444, in the second attempt « “a —,- . 2-337 and its difference from UN maximum UN serene ) the maximum UN" 321, If the relation were linear this difference would be zero for “Oy= yy 2161. The a for which o. 3G, “ess this value is found by (61) which leads to 8 (1 + a) (2 + 15a) = 2-1612 (1 + 6a) (1 + 10a) or 160-20? — 61-28a — 11-330 = 0, with the positive root a = -519. For this value (62) becomes 2 32 = wy (279866 + 14-2364a2 — 40-0058x + 27-4521}. 46 Choice in the Distribution of Observations The maxima and minima are: For z= —_-0000 Oy = Tay TRB, » =+ 4843 oy Ty? 116, » ©=+ +8585 oy = Fy > 1-855, » ©=+1-0000 oy ry 2161, and this distribution which has -1708 x N observations at each end of the range may be considered satisfactory. (7) From (46) and (50) we find, for a function of the fourth degree, z=0 g2 295 (1+a)(1+ 14a) RS OE ee ie 2 a | 2 q 1 2 5 1 es eae Nee anc 49 wv? 6 + a) tetas + ete which are equal when 9(1 + 14a) (1 + ae = 64 (1+ 12a) or 1260a? — 552a — 55 = 0, that is when = -5217564. The formula for ,o?, found from (62) and (56), is o fy 3(1+a ) a 22+3( (lta)(@—-DP T(1+a)e#[2+5(1+ 8a)(e—)P 3 Ni Nala ese 4 1+ 6a a7 (1+ 3a) (1+ 10a) _ 9 (L+e4) [8+ 20 (2 + 9a) (2? 1) + 35 (1 + 6a) (2? — 1)? P 63) ' 64 (6a) aaa) | ook (63). For a = -5217564 it becomes 2 10 — wy £2°03367 — 19-727722? + 133:01711x2* — 235-96817x5 + 122-6786825}. The maxima and minima are as follows: 0 o = See eS) 2) For « a 7 Cy va 44, » &= +-3130 oy Ty 2-041, » b= +-6844 oy TN 2-575, oO » D= +:9361 Oy = py: 1-856. e=1 We have again as for the function of the third degree brought o, down below one of the maxima of ,o,, although since ,o, has a maximum at « = 0 the demand Ome ool that o, =, is not so exacting as for 50, which has a minimum at z = 0. KiIRSTINE SMITH 47 (8) We shall next she wee = 1-2671861 ee The condition obtained from (46) and (50) is 9 x 1-2671861 (1 + 10a) (1 + 14a) = 64 (1 + 12a) or a® — -3095773a — -032940969 = 0, with the only positive root a = 3933269. Introducing this value of a in (63) we get 2 40, = a {461918 — 18-02388a? + 122-71833x4 — 220-34099x5 + 116-880728}. The maxima and minima for this curve are: Atc= 0 C= TN 2-149, (oy » B= + “BLLG oy = ay - 1-958, » b=+ +6839 Oy = Fy 2407, per 10014 a, oy 193 » £=+1-0000 o, yy 29. We have thus for a = :3933269, that is by taking -141147 x N observations at xv=1 each end of the range, succeeded in bringing ,o, down to be approximately equal to the highest of the maxima of the curve, thus fulfilling our purpose. (9) After our experiences in the cases of the functions of the third and fourth degree we cannot expect for a function of the fifth degree by making xw=1 x=0 aes 2 5 Oy = 59y to find a curve which has not a greater maximum than that value. We shall therefore start with the attempt 5o The condition found from (46) and (50) is 25 (1+ 14a) (1 + 21a) = 64 (2 + 35a) or 7350a? — 1365a — 103 = 0, with the only positive root a = :2433100. * The ratio 1-:2671861 results from consideration of a special eon curve. It was determined as that curve obtained from three groups of observations for which the standard deviation of o7’s within the range of observations was a minimum. It is not mentioned elsewhere in this memoir as it does not seem to have the interest I at first assumed it to have. 48 Choice in the Distribution of Observations For ,o° we find, from (63) and (57), o {poe Zee 2, 5[(2+3( (1+a)(#—1)? 7(1+a)a?[245(1+ 3a)(a?—1)]* soy NY eer ag 1+ 6a va (1+ 3a) (1+ 10a) _ 9 (L+a) [8 + 20 (2 + 9a) (a* — 1) + 35 (1 + 6a) (a? -- 1)?/? GE (1 + 6a) (1 + 15a) 4 iS 11 (1 + a) a [8 + 28 (2 + 15a) (a* — aS 63 (1 + 10a) = (64) 64 (1 + 10a) (1 + 21a) Goh : Introducing a = :2433100 we get 2 32 = wy {#14228 28-47030a2 — 258-05238a4 + 853-0448x — 1095-92128 + 476:599027}, from which we find the maxima and minima: At C= 0 Cus 7iNic oa ¢= 4 -9958.o,= 2-273 ” y 4/N ’ , = + 5004 Oy = Fy: 2155, o= + --7853 oy ay? 762, yes AVY pe OUT 99 ts Oy /N ’ , = +1-0000 Oy = Fy 2878. x?=1 o, does not differ much from the greatest maximum and we may thus consider the distribution with -097848 « N observations at each end of the range for which a = *2433100 as satisfying fairly well our aim. (10) Considering our previous results we must assume that for a function of ; v= /x=0 ; the sixth degree ai, | o, ought to be made somewhat smaller than 2 which was the value that gave a satisfying result for a function of the fifth degree. x ol x= 0 Let us assume o, = 1-75 0;, or, substituting from (46) and (50), 256 (1 + 24a) = 1-75 x 25 (1+ 21a) (1 + 27a) from which 567a2 — 92-43430a — 4:851429 = 0 and a = +2048019 are found. KIRSTINE SMITH 49 For go, we get, from (64) and (58), _3(1 #4) 2 01243 (1 + a) Ga)? 5 =F {1 1+ 3a 4 1 + 6a 7 (1-+a) 22[2 +5 (1 + 3a) (a2 — )P 4 (essayen0e) 9 l+a nod GaGa iakeabay 11 l+a : 28 (2 + 15a) («® — 1) + 63 (1 + 10a) (a? — 1)22 Se amet ee OE aye 1 13 l+a ! 16 + 168 (1 + 10a) (a? — 1 OMMMInaeBay er Okt SAE 1) + 126 (3 + 40a) (a? — 1)? + 231 (1 + 15a) (a? — Ff , which for a = -2048019 becomes + (2 + 9a) (a? — 1) + 35 (1 + 6a) (a — 1)2]2 so = © (558984 — 33:142342? + 504:452324 — 2512-67328 + 5524-18628 + — 5452-650x1° + 1974-020z12}. The maxima and minima are: Atz= 0 oO, = JN 2-364, r= 2216 o, = oN 2-216, » ®=+ +4826 oy = oy 2515, »t=+ 6194 o, 7 2-427, » C=+ 8445 a, =gy 3149, »t=+ 9615 o, ai 2-485, ,», © = £1-0000 a, = ON 3128. It thus appears that this distribution which has -08499 x N observations at v=1 ; each end of the range fulfils our demand that o, shall be approximately equal to the greatest of the maxima. (11) We bring together our final results in the following table. It gives the distribution of observations, the maximum of o, within the range, the value of Vn-+ 1 or the lowest maximum of riven possible, which can only be obtained Oo by distributing the observations of the function of the nth degree into (n + 1) ; ., /N : groups, and the value of n+ 1 which is the maximum of Py wv — for a uniform distribution. Biometrika x1 4 50 Choice in the Distribution of Observations TABLE II. Ratio of number of | ; Degree of observations at each | Maximum of let function end of the range to | oy JN m+ n+ the total number o ms | -2500 1-581 1-414 2 2 | 2203 1-862 1-732 3 3 -1708 2-161 2-000 4 4 | ‘1411 2-467 2-236 5 5 ‘0978 2-878 2-449 6 6 “0850 3-149 2-646 U A comparison between our maximum and Vn + 1 shows the price we have to pay for information about the degree of the function. For lower degrees the maximum only differs quite insignificantly from Vn-+ 1, but with increasing degree the difference grows relatively greater for the sixth degree, being about one-fifth of Vn-+ 1. The curves of standard deviation for the three sets of distributions are given in Diagrams 3—8, while Diagram 9 represents the six curves just reached. It seems likely from the form of the o, curves that two clusters of observations placed at the outermost of the maxima besides the two clusters at the ends of the range would produce a o, curve with a lower maximum than the one we have succeeded in getting for the functions from the fourth to the sixth degree. But then again the position of these new clusters would depend on the degree of the function and thus make the proceedings more complicated; and what is more at the same time as the maximum of the curve approached Vn + 1 the distribution of observations would incur the disadvantages of the grouping in (n + 1) clusters. On the whole the distribution arrived at seems to be satisfactory and certainly marks a great progress from the uniform distribution. VII. (1) In Section I we have already given the formula for the standard deviation o, of an adjusted y when the standard deviation s, of an observation is o V f (a). It is Observations with varying standard deviation. | N Oy-—5 | x Ce ata xn Veo Fite. ibs LEY > eopeene Mn Ho ils Eidos Wise se: Mati |=0, BoE ey ies LO cE Mn+-2 BO ine Ube WOE conoo6 Mon KIRSTINE SMITH ‘uOTOUN, IvaUIT + *E WVAOVIG suoynadsasqgo fo abuny asUVBI JO Spua atT} 4B SUOTPVAIISGO JO staqysnpo stq AT;enby (stovarosqg Jo Aouanberg Jo uorynqiysiq l SUOT}VIAVG Prvpuvyg Jo sang asuvI JO spua ayy 4B SI9}SN]O YIM UONGAysIp wIoFIUQ ° fstioyvarasqg Jo Kouenbary Jo uoNnqmysiq l SUOIVIAAT pABpULYG Jo sang GOUCIM STDIN UNE = coe ee fsaorearasqg yo Aouanbarg Jo uorNgiysiq yet satis marec l SUOT}VIAA pIVpuvyg Jo VAIND ta) o Unit of Standard Deviation a 4—2 ‘g01deq puooseg jo uoTjOUN ‘PF NVYOVIC syutod 901} 4B fstioyearesqo jo Aouenbarg Jo UoyNqiysiq ‘AL SUOTYBAIASYO JO SAoysnpo Stq AT;wnb| L SUOTIVIAA prBpuByg Jo eaAINg "ND SIOISNIO YA UONGIAYsIp UAIOFIUL) *—*+-—*— SUOTWBIAEG prlepuryg Joeampg “qq f SUOTBAIESGO JO Xouenberg Jo UotNgiysiq “29 FORA ELD LOU areas l SUOTFVIAGG, plvpuvyg Joong "VV suouvasasgo fo abuny L-- o- e-- v- ge: 9-- ———— Choice in the Distribution of Observations 52 o VN Unit of Standard Deviation KIRSTINE SMITH ‘aaSeq piuyy, Jo uoyoung °G NyuDvIGg squtod moy 4 (suoyrarasqg Jo Aouenbary jo uoynqiysiq “At SUONBAIOSGO Jo SAoysnpo S1q A[TeubyY —— —s [ SUOYVIAIG prBpuRyg Jo eaAmMy "OD asuvt JO Spud oy] 4V fsuonvatesqg yo Aouonbery Fo wornqraysiq ‘dd ‘dd SIO4SN]O UIA UWOYNQIyStIp ULIOJIUQ *—+—-> l suOlVIAGd prepuryg joaamg “qd SUOT}LATOS o Aouanbarg Jo uoTyNqIys a) UOMNGLYSIP ULIOJIUG) ---------- Py 10% ee pg tL SUOT}VIAGG pIVpuRg JooaAmNg “VV suoynadasqo fo abuny O:l 65 8: 1B 9: Gg: v: €: Go: ie O Oe Ges BV eee Cs Oo 72 oO ue Ole —L — - 4 — —L 4 —— ep Eo eae ee ed So ap pe ere NS re oe nem cere ee RSs As a Oe ge en Pe ee gi A | ees ee rn re ar ce |e | ee mm SSE rae woe rem seen wees ee reen sea nerd eters eaeeae oe see rae oa ooo eeceeecceca (ea L L ‘= een veers oo a: = Sea a ee assess = AS Sit Soe i Ax. Pe a is bq - Sse = g y x 5 ae Z hi Ie 74. .-£y Myr? o JN Unit of Standard Deviation -g019eq WANog Jo uoyouNnT °9 NVASVIG sjutod aay 4% SUOT}BALESGO JO sLaysnypo S1q ATpeuby asuvi JO spuse 9} 4e SIa}SN[D YJIA UVOTNQLAYSIp WLAOFIU UOTINGIAYSIp WAOFIUL) we cee ce eee ee onan oe eee seceen =e eee Choice in the Distribution of Observations 4 Ten) ena jo fouenborg Jo uonqiysiq “At SUOMVIAA™ pPAVpuRyg Jo eAInD “OD tate jo Kouanbeag yo uoynqiysiq ‘dd ‘dé to SUONBIAE prepueyg Jooamp ‘qd rye suolvaresqgQ jo Aouonboeag Jo uotnqiystg “2P SUONBIANG plVpuURy, JO exaAIND “VV suoynawasqo fo abuny G: L O ibs fps bs aS tea. Gea a= ERS Se Ohl pen ae 4 i i 1 rey SN nt rT] ard Deviation Unit of Stand 1 re) KIRSTINE SMITH ‘aatsaq YY Jo uoyoung *y WvUsVIC suownalasgo fo abuny Ee 0 Vasey sjurod xIs 4% SUOVIAIG PlVpURy Fo saAIng suoladasqg jo Aouenbaty Jo UOINGLysi SUOTYVAIOSGO FO Sx9}snypo Siq ATTenbay suolyVAtasqg jo Aouenberg Jo uoMNgIysiqd asUVI JO Spud oq} 4B { SIo4SN[D YIM UOTINGIAYSIp uIoFIUQ *—+—- SUOTYVIANG pAVpURZG JO xAInD { suolearasqg Jo ouenborg Jo uotNqiaystq et rgmea te wee Gieh \ SUOTPVIAVG prvpurxyg Jo oamng "KA 00 ‘dd ‘de ‘da “pp "VV Choice in the Distribution of Observations ‘ga180q] UJXIG JO UONOUNT “8 NVvUoVIG L: suouna.iasgo fo abuny 0 -- @- 8= -7- sjutod Wades 4e SUOTFVATESGO JO Staysnjo Stq Aypendy aBUvI JO Spud oq} 4B SIO}SN[O TILA UOTJNGIYSTp WAOsIU—Q * UONLYSTP WHOJIUE = ------+--- {gee jo Aouanbary Jo uoyNgiysiqd SUOTVIAI, PAVpUR}G JO aaIng (suoyearesqgQ yo Louenbarg Jo WONNGIASIC ( SUOTIVIAG, prepurRy, JO eamng { suolyeatesqg jo Aouenbarg Jo uoyNqiystg t SUOT}VIAVG PAVPU}Y JO dAIND i. o JN Unit of Standard Deviation KIRSTINE SMITH ‘aSULI JO SpUA oT} YR SIOISN]D TPIM UOYNGLYSICE UMOFUA “SUOIAG PIVPUR}G JO SOATNQ — °G KVAEVIT “ Uaxts - Oded “Uy i “aa “YAN e O Aaa “pu , aed. “€ puodag oe “Bqig aorseg ysl Jo uoyoung ‘g'q suonnasasqo fo abuny O-L 6: 8: L 9: Ss v € G L O L-- or (ES v- G: —————EE_E r Q- 2- B- B- OL- $f l4 o JN Unit of Standard Deviation ~ 58 Choice in the Distribution of Observations ae dx, (x) dx being the number of observations between x and x + dx and the integration being extended over the range of observations. it where m, = wif? It is clear that if we have found a suitable curve of squared standard deviation for adjusted y by taking a distribution ¢ (a) of observations with constant standard deviations a corresponding curve can be derived for observations with varying standard deviations by using the distribution ff (2) = h(a). f(Z)- anes eee tene ee ee (65). As fkd (x) .f (x) dz = N the constant k must be jetta ae [6 (@) f@) de’ { we pi(a) da. Nu» Hence we find y= (CGE » On aes: where p, 1s the pth moment coefficient for the distribution ¢ (x), and as My N aa by FG (e).f (a) da for any p the determinant may be written opie Pian eM BE! 2 acai an 1 ee a ie Lamesa [Di x fea. es [ag soo sh ends |= OG ose eon eeeneee (66) Dee Doe alls [i geeenroee Pye x” Bn Pnti Panta: Man We thus find the same determinant as the distribution ¢ (x) would give for observations with constant error of observation except that the factor / has come in, that is to say the expression for oj has been multiplied by ; = ac Gide ae (67). The goodness of the distribution therefore will partly depend on the value of Ey ; and because we have found ¢ () the best distribution for observations with constant standard deviation it does not follow that wb (x) = kd (x) .f (2) is the best distribution for observations with the standard deviation oV f (2). But the deriving of (x) from ¢ (x) is nevertheless useful as a means of simplifying the investigations and will be applied in the following special inquiries. We shall consider two forms of f(x) and try to find the best distributions for functions of the first and of the second degree. on KIRstTINnE SMITH (a) f(x) =(1+ az?)?, where a > —1, for errors of observation increasing or decreasing in both directions from the middle of the range. (b) f(x) =(1+az)?, where 1 >a 20, for error of observations increasing in one direction. These two forms will roughly cover two distinct and important types of cases, such as occur in practice. (2) When f(x) = (1 + ax)? we find, according to (67), 1 k = 1+ 2apy + a*p4, and as (66) for n = 1 gives o, ae: k= 5 te 2px + @}, we have for a function of the first degree 5 O28 1+ ap, + a7, =F Et (ey [ei This curve has a minimum for 2 = p, and the maximum in the range is, if fy > 0, at c= — 1, and if p, <0, at x= 1; 1t equals in both cases 2 ( (he [H4])?) 2a, f Qa 4 aT Sete Oar ya an on ear C2 [/4] being the numerical value of p1,. Now (69) is a minimum for pp, = 0; we therefore ought to choose that value for w, and we then get, from (68), 9 o ae y= ay (L + 2ape + apa) f+ et aieieieieiein(s\eze\sielelajeievess eis oles) (70), 2 2 fd, and 4 may vary between 0 and 1 independently of each other and are only bound by the conditions that Lg = po and bo — ae For any set of values which satisfies these conditions we may determine a distribution consisting of YN observations at «= + v and (l—y)N at «a= 0, since from any two such values we could determine Pees and aa By introducing v? and y for ys and py we get two quite independent variables and (70) then takes the form (Ue Daya? + oP yo) ( 1 5) o N yu eee, Oy = 60 Choice in the Distribution of Observations x=1 We now have to determine y and v? so that the maximum value a; is as small as possible. We find doy]. 28 fo. a naan! Fal N (2ae + atvt — =) SoqnHOdsdsOb000000090009 (71) do? o2 1 i Bl N (2ay + 2a®yv? + a? — al PR cco (72). do’, F ih Fraley 22 SS 3). Clearly Ee 0 leads to y a Ha) (73) Introducing this value into (72) we obtain 2 dv? jz-1 N- Va (2+ av?)/’ which is > 0. Hence the minimum for constant v? determined by decreases with v?. But when v? decreases, y?, as given by (73), increases and the lowest value of v?, for which it is real, is that determined by 1 ins $e TL Y~ avi (2-4 av?) For v? smaller than this (73) gives y?> 1, and as long as y2=1 we therefore do? y = (9) Es | x=1 r=1 Hence the minimum of o;, is to be found for y? = 1. have For this value (72) may be written as do™ o 1 f ae lat an NL (1. av) (2a0* av? — 1) rasa eee (74), ate which is zero for v= —d4+ ae sibountle nace s tee Sete eee (75) and > 0 for v? greater than this value. When the v? found les between 0 and 1, that is when a > 4, we have thus found 9 the minimum sought. When a = 2, then Fe as given by (74) is <0 and the xr=1 x=1 minimum of o;, is found by giving v? its maximum value, that is 1. Returning to the variates jz, and py we see that in all cases ae ee ik ee Kirstine Siti 61 from which it follows that no distribution of observations other than those arrived at consisting of two equally big groups can give py, fy and py the values required. We accordingly reach the result that: when observing a function of the first degree for which the standard deviation of the observations is o (1 + ax*), symmetrical about the middle of the range, we get the best function for o% by taking two equally big ig Hee, of observations, at the ends of the range if a =4 and atv=+4 pe Ye 1 ee a a>. (3) According to (70) the maximum of oj, for this distribution is xz=1 aE on rs 1 of = Fy (1+ av? (1+ 5), v being equal to | for a = 4 and v being determined by (75) fora > 4. We shall next consider the distributions (i) for which ¢ (x) is constant from — 1 to 1 and (ii) for which ¢ (z) consists of | observations uniformly distributed from —1toland ii into two clusters. 2 (i) For a uniform distribution from —1 to 1 we have p,=4, y= and, according to (67), it ke => il + 20 -- ta? ae N the actual distribution is hence, as ¢ (x) = oe (1 + az?)? (x) aes eet and the maximum o%, as given by (70) for = + 1, x= + 1 g2 on =Wl+3 2a + ta”). 4. (1) When ¢ (2) = - with the additional clusters i at + u we have Mo=§ +3 and py= ot du'. According to (70) the maximum o;, is then x=tl Gg 6 a =Glitah tu) tay +s) (1+ 5074) We shall now determine wu so as to make this a minimum. We find that “L= do? 9 y_o 1 ae s requires 45a?u® + 15a (3 + 5a) ut + 5a (6 + Ta) u? — (90 — 5a + 9a?) = 0 ...(76), the root w? of which is > 1 for a <-5576. For a = 5576 we hence get the satin: by taking the clusters at wu = + 1 and for a > -5576 at the places + u determined ay (76). 62 Choice in the Distribution of Observations Table III contains for a series of values of a the values of v, (1+ av?) and wu of the two distributions above and the maximum o, for the three distributions. TABLE III. . /N | Maximum of aaa of pane of wy Soa | a JN pom a N: rk ree from See iien a v 1 + av? eee nice distribution u for which ¢ QF bution ad ytuce and clusters of $(x)=5 Nese 2 4 0 | 1:0000 | 1-000 1-414 2-000 1-0000 1-581 4 | 1:0000 | 1-167 1-650 2-113 1:0000 1-760 x | 1:0000 | 1-333 | 1-886 2-231 1-0000 1-944 3 8836 | 1-390 | 2-100 2-352 1-0000 2-131 . 3 “8071 1-434 | 2-284 | 2-477 9289 2-316 3 ‘7510 | 1-470 | 2-448 2-603 8502 2-483 1 ‘7071 1-500 2-598 | 2-733 STUN 2-637 2 5559 | 1-618 | 3-330 3-540 -5762 3-438 3 4782 | 1-686 | 3-908 4-382 -4925 4:173 b= + -4278 | 1:732 4-404 5-241 4612 4-899 The difference between the maxima from the two first distributions taken as a proportion of the maximum of the first decreases from 41 per cent. at a = 0 to the minimum 5 per cent. at a = 1, and then again increases to 19 per cent. ata =4. For small a, that is in practice a = 0, and again for a > 3, for which the difference is greater than 12 per cent., the third distribution may therefore be useful as giving a much smaller maximum value than the purely continuous distribution and at the same time offering some justification for the form of the function. (4) We shall next, still assuming that f(x) = (1 + ax?)*, consider the choice of observations for a function of the second degree. According to (66) and (67) we find pearl Cy= WT x (Maple — 3 +2 (abs — a Ha) © + (oa = Byes + py oes) 2? + 2 (My bea = Beg) ©? + (Meg = oa) ©) Mebha — pe: + 24 Me bs — Mi ba — BS See (77), and ; = 1+ ap, + aru,, where the p’s are the moment coefficients about «= 0 of the distribution ¢ (2) which is connected with the actual distribution y (a) by the relation ip (a) = ko (a) . f (2). From any distribution ¢ (x) which has 1, and pz; 2 0 we can form a symmetrical 1 {4 (x) + 6 (— x)} which has the same py and py as (x). We shall prove that the maximum o? obtained from the symmetrical distribution is always lower than that obtained from the skew. KURSTINE SMITII 63 Let the factor in curled brackets in (77) be F, for a skew distribution ¢ (z) and F, for the corresponding symmetrical distribution. We then have Tyee Popa + (Ma — 3p5) te aH fiyDt ba (Ha — 2) The condition for a maximum or minimum other than that at «= 0 is 3p — pla > 9, or Boao: and as the denominator is positive we have in that case the maximum at # = 0. It is thus clear that the maxima of F', between — 1 and | must be either at 2 = 0 oratea=—+1. We shall show that [Fsle=o > [Pole=o; and that either Aa ora Maley Py leas According to what has been proved in Section I (4) the coefficient of x* in (77) is positive, the denominator of (77) is therefore positive and we have 9 ne 2 a LP; —Fole=0 = 5 (Hats Hap) : aS - (lg — M2) (Haft — pea + 2pey blades — Mia — 5) We shall next compare F, and F, forz= +1. Putting [Fole-1= _ N—68 we have Fay = [ alan D €E? where 8 = ps — 2py pg + pt & 2 {pry (1 — py) — oy (ue — pa)} and € = fs — 2py fofs + Mipa- For - we find O _ (Hs ~ Ma)” + 2 {ys (1 — pa) = pr (Ho — Ha)} € (Hs — Papa)? + pei (fea — fo) , Looking first at the case “ = 0, we have M3 (Hs — My)? < (Mg — py fs)”, and if we choose the value for which the other term of the numerator is < 0, ) ar ls € When = < 0 we see, from considering the form 3 5 = oe pi (1 = pg) — 2pty ps (1 — fog) + 2 {Hs (1 = bea) — Hy (Me — Pads € (es — Habe)” + pei (Ha — p13) that for either x = 1 or z= — 1 oS € 64 Choice in the Distribution of Observations : Nig As « > 0 we have hence for any w, and ws, remembering that = being a squared : D standard deviation multiplied by the number of observations is = 1, Ne ee D—« D ‘ that is, for either x = 1 or — 1, des ID We have thus proved that the maxima of F, are below those of F,. (5) Our problem is hence reduced to finding the best curve among those repre- sented by , _ (1 + apy + apg Oe = ao "IN pg (fa — 12) As was stated in (2) of this section we get all sets of possible values for py - and jy from three groups of observations symmetrical about x = 0, and we may therefore limit our search of the best distribution to these. ) sisntta + (ey = Spt)? ee (78). Let the observations be TN at c=+2, at (l—y)N at «=0. The inter- polation formula of Lagrange gives, when 7, represents the mean of the observations at r= p, a Us aC) x (% +) _ Yy mo = vy Yo T Dy Yo ar We Yo> from which we find o2 1 ((x? = vu)? : a? (x? stk v?) (1 | at (79) Cee Toe | ea yon te aaa It is obvious that if for a certain distribution we have z=0 2=1 Cy = Gy we can get a better distribution by taking more observations at 0. If on the other hand x=0 a= et REI fa x=1 the curve cannot be the best unless o} is a minimum for the present values of v and y. From (79) we find f do. ope il \ —wvy? (1+7)(1+ oom (80) (@] oe awe ee 1 do’, ao 1 As) (se v2—avt)(1+ a ee Eile ~ Nv eae y from which we obtain the conditions for maximum or minimum »_ 1+2Va y= a y 2Va(1 +Va)? and f= N=" ey ee OLV on Tl KIRSTINE SMITH 65 The lower sign requires 3— 2V2=2a=Z} and the upper sign a>3+4+2/2 to make 02v?21. The case a < } has no interest, as we have seen that when a < }$ extrapolation is not even for a ices function advantageous. We have therefore — =) seen that for a<3+4+2V2* o% has no minimum and we have thus proved that ; : : ; x= 0 x=1 the best distribution requires o%, = ;,, that is Qe®—1 (1+ 0%) (1+ av)? = y ‘ i poles O>) (eas )e or en 1 a | aieroiefersy seis ersvelsierelsiaiaieiaisisieis (81) The maximum of the curve is - z—0 o2 i Oy N < ] er y To find the minimum of this value we differentiate (81) and get do* 1+ av? F ae ie Cee alte) which is zero for and positive for greater v?, so that we have found a minimum. 9 For a = 3 we find from (82) v? = 1, hence for a = 3 we have to choose v? = 1, from which, according to (81), follows =14+2(1+a)® or = Tea 7 When 34+2V2>a>3, eas (1 ab / 38 ae =) is < 1, and for the corresponding Qa ee y we have ts aay (lL+av)? 1+ —5a+4+Va(38a+ 48) (83) ty, a.) (Qu? — 1) 8 (a + 2) sewer eta eoee . Returning to the ¢ (xz) distribution, which is found from this distribution by dividing the frequencies by k:. (1 + ax?)?, we therefore find, when - 5 Ni is the number of observations at « = + v and (1 — «) N, that, at x = 0, € 2 1+? T-e 2(%—1) w=] . . roe 2 . . . . * A further examination shows that for a>3+2V2 OF has a minimum but this is smaller than “7=0 w=1 x=0 © Fj 9 fd a when a<6-7. Up to this value we therefore have a, =o, for the best curves. For a>6-7 the v=1 . - 9 . . . . minimum of Gy determines the best distribution. ou Biometrika xm 66 Choice in the Distribution of Observations or cas ery Hence Pf, = 4 (1 + 2) NERD cerme nnn OR Rae Gahdnsooscas 84 and a= ag (Lt) (84) For a = 3 we have found v? = 1 which according to (84) involves pz = fy, SO that only the distribution above consisting of three groups can realise the requisite conditions. When a > 3 we have v <1 and therefore p, < py, so that it must be possible to satisfy the equation (84) by a continuous distribution of observations. However v® is decreasing so slowly for increasing a that practically the distribution deter- mined by (84) cannot differ much from three groups of observations. Our results are accordingly that for a function of the second degree, of which the standard deviation of the observations is o (1+ ax®), we get the best function for a, when a = 3 by taking three groups of observations at the middle and the ends of the range, each group proportional to the squared standard deviation at the place, and when 34+2V2>a> 3 by taking three groups of observations determined by (82) and (83). (6) From (78) we find z= 0 o~ L4 a, = 57 (14+ 2ap, + a%u,) — ! N Be Ma [a — which, when py, and p, are found in accordance with (82) and (84), determines the maximum a, arrived at from our special three groups of observations. Besides the’ numerical evaluation of this standard deviation, we give in Table IV below the maximum of o, obtained from a distribution for which ¢ (x) is constant from — 1 to 1, that is, since, according to (67), the distribution pb (x) = That maximum is determined by 2 Oo a 2 ee alle Pip Q Oo, N (1 | 30 1 5) 5 a) Gn : : 5 : : aa 3 ; : N° 9 being the maximum o;, obtained from a rectangular distribution of observations with the standard deviation o. The last column of the same table gives the maximum oa, arrived at when ¢ (2) z=0 @=1 is the rectangular distribution with clusters at — 1 and 1 for which o, = oj. For this distribution consisting of -22026 N observations at +1 and at —1 and -5595 N = 2cN uniformly distributed from — 1 to 1, we have found as given in KIRSTINE SMITH 6 “Tr Table II (p. 50) the maximum UN . 1-862. Hence when pz, and py are the moment coefficient of this ¢ (x) the maximum is found from 2 9 (oy oy = N ( We find pr. = -6270, py = °5524 and The actual distribution is hence -27975 (1 + az)? 2) aes oe mm we Ee together with the clusters 1 + Zap, + a?u,) . 1-862. 1 is 1 + 1-2540a + -5524a?. N, _+22026 (1 + a)? 1+ 1-2540a-+ -5524a2**’ at — 1 and 1. TABLE IV. “3 : F N neue of | Maximum of a Maximum of ae for a roe eon for aes distribution with the best P ey (x) =c and clusters distribution RAE (2) 2 at +1 0 1-732 3-000 1-862 1 3-000 4-099 3-120 2 4-359 5:310 4-453 3 5-745 6-573 5°810 + 7-135 7-861 7-178 | 5 8-522 9-165 8-551 The difference between the first and second maxima taken as a proportion of the first varies from 79 per cent. at a = 0 to 8 per cent. at a = 5, while the difference between the first and the third maxima varies from 8 per cent. at a = 0 to 0-4 per cent. at a=5. The continuous distribution with clusters is therefore especially useful for smaller a. For a = 4 we find from (82) v = -9816 and for a = 5, v = -9700, both of these values of v are so close to 1 that if instead of using them we take the observations at 1 and — 1 and let the numbers of the three groups of observations be proportional to the squared standard deviations we get the maxima 7-141 and 8-544 which only differ quite insignificantly from the corresponding values of Table IV. (7) For a function. of the first degree, of which the standard deviation of the observations is o (1 + ax), where 0 2a <1, we have, according to (66) and (67), » oF 1+ Zap, + aus 5 2S lat DE OMEN Pian Meee coon aces : Oy = one {fg — 2p av + 27} (85) For p, = — c? the maximum of this function is at x = 1, and for p, = c? at — 1, As the maximum of (12 — 24,” + «?) has the same value in both cases it is clear o—2 68 Choice in the Distribution of Observations that the negative , gives the lower maximum for o;. We therefore only have to find the conditions for [o;],, beg a minimum when p, < 0. We have (peg nee leah Laan bs — fy and differentiating with regard to ps, Fal 0? of (us = wi)? = (1 = pa) (LE pea)? dps Ja=1 N (He — fi)? Asa<1, we have (1 — p,) (1 + ap,) > 0 and @ (pe -- fy) ~ (1 — py) (1 + opty) = (opty — 1) + py (1 — 2) < 0, from which it follows that Fa = <0 Apr cI The greatest value p, can take for our range — | to + 1 is 1, the minimum of wal Se) ee (86), for any pt, = 0. “ot must therefore be found for uw. = 1, for which value (86) passes into "Ge Lowi 1= 2 {204 foe o2 which, since px, = 0, is a minimum and equals 2 — .2(1+ a?) when p, = 0. N The ¢ (x) distribution ought accordingly to consist of two equally big groups at the ends of the range and the actual distribution to be chosen for a function of the Jirst degree, the standard deviation of which is a linear function of the variable, should be two groups at the ends of the working range with numbers proportional to the squared standard deviations at these places. (8) For a continuous distribution from — 1 to 1 with frequencies proportional to the squared standard deviations we have Hy =0 and py = 5, v=1 2 2 and the maximum = = (1 ats 3) 4, n\2 the actual distribution is (x) = (isan) 2 ae en = ‘ Table V contains besides the maxima of o, from these two distributions those obtained from a distribution for which ¢ (a) is constant with two additional clusters at — 1 and | each consisting of : of the observations. The actual distribution is, since Mg =3+$=3 _,_ (L+azx)? N POS are a7 ’ KURSTINE SMITH 69 d (l—a)? N ; . — observations at — | with Taeeat a 2 . and pals ; a at + 1 in addition. 1+ 202° 4 The maximum of o%, is 2 oO = (1+ 2a*)&. wy (i+ se) 2 TABLE V. Maximum of ‘ / N| Maximum of o, ve for a/N Maximum of o,, ~— o oy -—— for CEPR TA Js distribution with a o for distribution N best distri- | : N (x) =— and clusters | Waikion with ¢ (a) => 4 ot at +1 ‘0 1-414 2-000 1-581 ‘1 1-421 2-003 1-587 2 1-442 2-013 1-602 3 1-477 2-030 1-628 4 1-523 2-053 1-663 5 1-581 2-082 1-708 “6 1-649 2-117 1-761 “7 1-726 2°157 1-821 ‘8 1-811 2-203 1-889 ‘9 1-903 2-254 1-962 (9) For a function of the second degree we found in (5) that when the standard deviation of the observations was s, = o (1 + az?) and a = 3 it was advantageous to use the whole working range of observations, much more must this be the case when s, =o(1+ az) and0=a<1. We shall therefore try to find the three best groups of observations taken at — 1, v, and 1, supposing v unknown. We do not venture to assert that another form of distribution might not lead to a curve of standard deviation with lower maximum, but the solution of the general problem would involve a more elaborate investigation into the possible variations of 4, [42, fs and py, for distributions with limited range than seems desirable in this con- nection. We shall further limit our problem by assuming that the best distribution e=1 r=-1 will be found among those which make o, = o;, and both also equal to a maximum situated between x=—1 and x=1. This would obviously be right if the maximum were found at x = v; this in fact is not the case, but still the maximum value is likely to be chiefly determined by the number of observations at 2 = v and there is therefore every reason to believe that our assumption is justifiable. Let there be Né observations at —1, N.y at land N(1—6d—y) atv. The interpolation formula of Lagrange then gives yee), , @-Y@+), aol ee ee! enone eo” 70 Choice in the Distribution of Observations . from which we find ooo {f = 0) (@ = Te Sa) eae eee oe ee VE ea. Gy eat (®— 12° 1—8—y} a=1 a=-l1 The condition for of = 2 ae aNe is (1+ a)* _ (1~a) y 3) we=1 Khminating 6 we obtain for o;} — o; the value . opty cage CSC reel) (a URE tas) (ne It) oN oe (1 + a)? — 2y (1 + o?) 5, [ll + v?) a + Qu (1 — vo?) a + 2 — 54 v| or » 7 o& (1 +.a)2 (22 — 1) a ie N° PI + a= yl s + Qu (1 — 0) [(1 + a)? — 2y (1 + a2)] a + (1 + a)? (2 — Be? + v1) — Qy [(d + 8) (2 — Set of) 4p ow)8)} ceca eee ( os Our assumption that the maximum o® shall be equal to o;, requires that the expression in curled brackets shall be a perfect square for which the condition is 2 Fe x) tea + a?) v8 + 2a (1 + a2) v5 + (3 — a2 — 3a4) v4 — 4a, (3 + 202) v8 + (— 2+ 9a? + 5a4) v? + 2a (3 + a?) v — a? (3 4+ 2a?)} + a to {— a?v§ — 2av5 + (— 5 + 2a) v4 + 1208 — (2 + 9a?) v2 — 2av +3 + 4a*} yt 202! he OU ae sar aaa nent Saree eee cee (88) ee on alee) oars ‘ ; Now of = an the maximum which we want to make as low as possible, hence we have for a certain a to find the v for which (1 a a)? as given by (88) is a maximum. We shall examine the cases a= -5 anda =°9. (10) For a=-5 (88) takes the form 2 | is {62508 + 2-5v5 + 5-125u4 — 1403 + 1-125? + 6-5u — 1-75} (1 +a)? *25v8 — v — 4-504 + 6v8 — 4-250? — » + 4} 4 v4+ Qo? 1=0, T Bais aaa which differentiated with regard to v gives Fe sae =) {3°75v8 + 12-5v4 + 20-5v8 — 42? + 2-25u + 6-5} 1-5v5 — 5v* — 18v3 + 18y? — 8-5v — 1} + 4v (v? + 1) = 0. KIRSTINE SMITH v1 We find that these two equations have for v = — -190 the root fal ea == 280 in common which represents a maximum. The maximum of the curve is hence _ . 3-405, which value occurs for « = + 1 and for x = -064 determined by (87). The distribution of observations is -6607 N at 1, -0734 .N at —1, and -2659 N at — -190. For comparison we shall consider what would result from taking for the ¢ (2) distribution three equally big groups of observations at — 1,0 and 1. This would for observations with the constant error o make the maximum of the curve equal o2 to =,.3 and that multiplied by N 3°5 I+ 2apty + a? py = my 2 gives a 3.5. The actual distribution ¢ (x) would be 6429 N at 1, 0714 N at —1, and -2857 N at 0. This last distribution only makes the maximum o; about 3 per cent. greater than the value which we obtained by our special distribution and it will therefore for most practical cases be as useful. (11) When a=-9 we find for (87), 2 Fest {2°9322v8 + 65160 + -443404 — 33-26403 + 17-141 y? +. 13-716 — 7-4844} Qa *B1lu® — 1-85 — 3-38y4 + 10-803 — 9-290? — 1-89 + 6-24} pit 2y2-— 1—0, which differentiated with regard to v gives 72 faa {17-59320° + 32-5804 + 1-7736v? — 99-792v? + 34-2820 + 13-716} a aa ca {— 4-8605 — 9u4 — 13-5208 + 32-40? — 18-580 — 1-8} + dv (2+ 1) =O. Y For v = — :354 these two equations have the root (lta) = :23214 in common a which is therefore the maximum of sulesens (is\a)s 72 Choice in the Distribution of Observations The maximum of the corresponding o, is hence o (l+a)? o@ a = ) = ay: 4308. From (87) we find that it occurs at 7 = -125 as well as ataw=+41. The dis- tribution of observations is then 8380 N at 1, -0023 N at —1, and -1597 N at — -354. Comparing again with a distribution consisting of three groups of observations at — 1, 0 and 1 with frequencies proportional to the squared standard deviations at these places we find that the distribution would be ‘7814 N at 1, 0022 N at —1, and -2164 N at 0, and the maximum of o} would be We thus find that by our special distribution the maximum of o7 was 7 per cent. lower, the choice of that distribution would thus permit us to reduce the total number of observations at the same rate without raising the maximum of o?. (12) The result of these investigations is that the maximum oa, obtained from the best three groups of observations differs so little from that obtained from three groups at —1, 0 and | that the first grouping only in quite exceptional practice would be pre- ferred. We shall therefore in Table VI give the maximum o, arrived at from the following three distributions: (1) three groups of observations at — 1, 0 and 1 in numbers proportional to the squared standard deviations at these places, (2) a distribution for which ¢ (x) = a and (3) a distribution for which ¢ (x) = -2797 N with additional clusters -2203 NV at + 1 (see Table IT, p. 50). Both in Table V and in Table VI the difference between the two first maxima as a proportion of the first decreases with increasing a so that the distribution with uniform ¢ (zx) is more profitable for a > 0 than for observations with constant errors. VIII. Best distribution of observations for determining a single constant of the function. (1) Our choice of observations has hitherto aimed at giving within the working range of observations a determination of the function as accurate and uniform as possible. We shall now consider what is the best choice of observations for Kirstine Siri 723 TABLE VI. | ae Os Maximum of pes: Maximum of pad pesca of | Te I e Pee from from distribution from distribution for) 9! ¢ aco three groups | : » _ |which ¢(x)=-2797N| best three at 0 aan ea for which $ (z)= 2 | and clusters at --1 groups ‘0 1-732 3-000 1-862 — ‘1 1-738 3-005 1-868 — 2 1-755 3-020 1-886 — 3 1-783 3-045 1-914 — “4 1-822 3-079 1-954 — 5 1-871 3:122 2-003 1-845 6 1-929 3-175 2-062 — Or | 1-995 3-236 2-129 — 8 2-069 3°304 2-205 — 9 2-149 3-381 2-287 2-076 determining a single constant of the function. The investigations will be carried out for functions of the first and of the second degree for which the standard deviations of the observations are or S01 02), S,— od an); a>-—l Sc = 0: We have in (3) of Section I given the formula (8) for g;,, and shall here give only the form to which it is trans ferred by putting b (x) = kd (2) f (2), 1b SE aeen | (x). f(a) da The formula analogous to that given for o% (66) is on evi Or 0 Opam Setar. tS este 0 p o2 0 Lee gl ig ees ee eee fen 0 fee Eee [a 2EgaoE epee iii Pen4 0 fe = bg USM roctence [po Giekioes nse |= 0 ...(89) 1 Po Moy Mpta <+--- (bPhj. ance n+p | O Un Benya Menta verre Mptn sre Hon (2) For a function of the jirst degree Y = A + Oy % for which the standard deviation of an observation is and therefore 8, =o (1+ az’), 1 k: => 1 = ae 2Ofty + a>—l, aig, 74 Choice in the Distribution of Observations we find, according to (89), Ca Ge (1 + 2apy. + a4) (1 qe = Hie 2) wacisiotseiteicieen seit (90), ees ba — phy a 1 and OF, = a7 (1 + Zap, + a2u,) 5. sae eddies tiger ied REE 91). N ( be Ma jig — ( As for any skew distribution of observations we can find a corresponding symmetrical distribution with the same jz and py, both these expressions are a minimum for p, = 0. We have already shown in (2) of Section VII that any possible values of uw, and [44 can be produced by three symmetrical groups of observations, so that by intro- ducing the variables v and y determined by [n= vy, and pg = v'y, and limited by OP ally OS yaw; we do not leave out any possibilities. From (90) we then get 2 9 CG C= (1 + 2ayv? + a%yv'), 2 . . mes - Oo which for a>0 is a minimum when y = v?= 0, and for a= 0 is WN for any y and v2, For a < 0 we find, since (fo pos 1 —% == — Jay (1+ av?) and v<—-, ARR NE ee ( ) ae: that for a constant y, o7, has the least value when v? is as great as possible, that » is' for v= 1. The minimum of o7, is then 2 o1, = wilt (2+ a) ay}, which, since a (2 + a) < 0, is a minimum when y takes its greatest possible value 1. The minimum is thus af, =F (1 +a) Hence we conclude that: o2 N o2 N when a > 0, oj, is a minimum and equal to =; for NV observations at x = 0, when a = 0, of, 18 a Minimum and equal to 5, for any distribution for which py, = 0, and a: ee co? when a <0, o;, is a minimum and equal to N (1+ a)? for two equally big groups of observations at + l. KIRSTINE SMITII 75 (3) When we introduce p, = 0, py = yv? and pry = yo" in (91) we get a = e (1+ 2ayv? + a*yv') zo : eae ly yw This for constant v? is a minimum when y = | and then equal to ue T8 Daa2 ana) 2 99 Ou, = 7 ( + Zav + atel) 72 eee ree ec ccc cesveereerecees (92). dope 302 (a wl ER dv N (« 4 a) j 1 ; . , ae 5 v? = +— when possible, that is for a $1 determines a minimum, while for a 9 a<1, o%, reaches its lowest value for v?=1. From (92) we find for a =1 the minimum a= a . da, and for a <1 the minimum 2 = 7 (1+ a)’, 2 both formulae giving o2, =. 4 fora=1. a N Our results are accordingly : 2 : but oO ° is @ minimum and equal to N° 4a for two equally big groups of observations at v= ee or for any distribution with the same py and py, Qa 2 ay when a> 1, o 2 2 ay . Oo oO is a minimum and equal to and when a=1,o W (1+ a)? for two equally big groups of observations at «= + 1. We see that for a = 0 two equally big groups of observations at + 1 make both o,, and a7, minima and these groups in addition form the distribution for which o? has the lowest maximum within the possible range of observations. (4) For a function of the second degree Y = A+ 2+ a2", with the standard deviations of observations S7—o (baz), aS—T1, = 1+ 2apy+ a*py, and therefore ; we find. from (89) [abla — p23 2 Oo OF, = a (14+ 2apry + a4) . = —— aaa Staibisto hs}. w' a OT tay — Ws — p+ py pops — Hig 99) ere ba = pS 04, = a7 (1+ 2apy.+ a*u,). a dase es 94), N a Pa taba — Ha BS 2p ably — Haha eS 5 Ge b= Mi and oO. = a (14+ 2ap, + a2 OS —— demmuouauelts))), aN a ms Pabla — pe — pes + 2p Pals — Mi ba ; 76 Choice in the Distribution of Observations We shall prove that the last factor of each of these formulae is a minimum for ba = Hg = 0. To prove this for (93) we consider the difference Boba bs zh Hs (Maps — Hapa)” <7) Popa — by P53 — 2by Peds + Miba Me (Ma — £2) [(Ms — Mabe)? + pit (Ma — pe) ] from which follows Hoba — [3 se bs Peably — Ba — BR + 2p Mobs — Piola Moba— 2 M3 — 2pyMebs + Miba For (94) it is at once clear that Ha — be ae Popa — Ba — U(bs — Mabe)” + ei (oa — ba) ota — BS For the case of (95) we compare peste ee ht 7 pat te Ne Hopa — Hy Hs — SHyMebs + Pipa and we find the difference il 1 ae ia ies ws — 15 + (#2 — po) by and hence oe He bi Motta — HS — pS + ppt — Palla Matta — PP — Dea beats + fa It is thus proved for the three formulae that a distribution of observations for which pp, = ps = 0 gives lower values than any distribution with the same p, and fy as the former and with p, = 0, ps 2 0. Hence our problem is reduced to finding the p, and pz, which make the following expressions minima: oa bh OF, = a7 (1 + Qapeg + a2 pg) A oes eeeee eevee eens (96), N iD ba — 8 ) o (a2 214) (97) Oo, = wll t+ cap, +a ae Sec ated slnia Sie laynisteleleleyste eerste 5 N lid Ma it 9 o (+2 211,) (98) Oa, = a7 (i + sap, +0 EE ee eanannaeio0 2 WN Me Ma [ty — po (5) Introducing py = yr? and p, = yv* in (96) we get 2 on =H (1 ae a any) 2 which is seen to be > F except when y = 0. 2 Hence the minimum value of o%. =~ can only be obtained by taking all the o N observations at «= 0. (97) is identical with (91) for 4,=0. The conditions for a minimum of o%, are KUIRSsTINE SMITH rer therefore the same for a function of the second degree as for a function of the 9 ay : th. ot oO isa minimum and equal to a t Palette 5 equally big groups of observations at « = + —, or for any distribution with the same a first degree. That is, when a>l1,o . 4a for two 2 o2 N fg and py, and when a = 1, a7, is a minimum and equal to =, (1 + a)? for two equally big groups of observations at x = + 1. With the variates y and v (98) takes the form | UL eo ne en eee RemeNi e H8y ( Sy)" By differentiating with regard to v? we get 2 2 9 i ee (1 Gyo) de N'y(1—y)v8 which is negative for any a, v and y within our limits. For constant y, o7, is therefore least when v? = 1 and the minimum value is (ty Fem Ot (: ) 1 oy, = fee Rte Chai Wer vaisalacl sy atvmisi aa aysea las oe (99). 2 N y q i] 1 cm y This is again a minimum when doo" 1 = _ Dial Hels Oy aly § me rayne ty 0 that is for y = 5 aha which gives a minimum both for positive and negative a. 4 Qa Thus the distribution that makes o7, a minimum has a ¢ (x)-distribution aes N : l+a ; consisting of 575—— observations at — 1 and 1 and 5 — N observations at 0. 2(2+a) +a 1 We have Ha Bao 1 d = : an k (1+ a) The relation wb (x) = kd (a) f (x) i N i} = then gives us wb (0) Oem l+a and PACE) pause) From (99) we find the minimum value o, = = (2 + a). 2 Our result is thus that 07, is a minimum and equal to . (2 + a)? for a distribution consisting of 9 ‘ = observations at 0 and TCE Natt. 78 Choice in the Distribution of Observations (6) When the standard deviation of an observation is s,=oa(l+az) and 0Sa> im 74) ROOD OCUIGOSE DOs 0D0 006 N M1 Me Lp — pe al pi Comma aaah (101) é C7, = a7 20,4, + = MEduoseesbaddond6a0ceds : N My Me tn — BS By differentiating (100) we find doy, — 0 2a (1 + apy) (oa + af) du, N (Hz — pa)? a doy, 0? (apr, — Zaps — fy) (Ma + afte) an MeN Wa —- pg (Hz — Hi) Both of these can only be zero when freA 12 | Re HERARATANERARRE A AGcci6 Jo0500 (102), 2 which is seen to determine a minimum of o;, the value of which is a The condition pf. = — PY can be fulfilled by an infinity of different distributions. From Qa OZ pz l follows the condition 05m Sa. We shall confine our attention to those distributions which consist of two groups of observations. Let there be Ny observations at v, and (1 — y) N at v,, we then have fa = Uz + ¥- (01 — 0), py = 3+ y (0? — 08), from which by means of (102) is found Y boy 1 = (LFan) (+ an)” mya Fa) 1 (1 + av,) (1 + ave) a i = om AW) Noe] her and k + apy AG a Thus we find that the ¢ (z)-distribution consists of = vy (1 + a») = ie N “yy ™—H)ftam+a) 9 v, (1 + av,) and pes eee -N at vp, (vy — %) {1 + a (v1 + %)} z while the actual distribution ye sh Oh ae 25) = Jaane HN?) ise ang Klee) I a KIRSTINE SMITH 79 Be a aleea: consists of = 02 (1+ ar) 7 ot 0] Vy = Vg Braet Pe Meee ot cist (103). and ets Neat 02 | V1 — V2 We thus see that for any two points v, and v2 of which one is negative and the other 9 Gd . 9 o~ . positive we can choose the numbers of observations so as to make oj, = >, as it of N course would be by taking a single group of observations at «= 0. (7) By differentiating (101) we get dot a" 2 a = = 5. (1 + apy) (My + Apia) cercessccssereeees 104) ihn Ne ara Si do®, o7 (1 + apy)? and Bes re Apt NV (fy — pa)? As the latter is always negative o%, is for constant ju, least when j. has its greatest value, that is 1. Introducing this in (104) we get as condition for a minimum, py ta= 0. There is only one distribution for which p, = | and pp, = —a, and it is that consisting of two groups of observations at — | and | included in the distributions examined in (6). From (103) we find that the actual distribution consists of i = = N observations I+a ne er ; co? at — 1 and 9 N atl. The minimum of o7, is from (101) found to be N° o2 N tions at the limits of the range with numbers proportional to the standard deviation of observations at these places. This distribution makes also o?, a minimum, but it vs not, except when a = 0, the distribution which gives a, the lowest maximum value within the possible range of observations. The minimum >, of o7,, can thus only be obtained by taking two groups of observa- (8) For a function of the second degree, Y = A) + 2+ a,x with the standard deviation o,=oa(1+az), where Oo we have : = 1+ 2apy + aus, and from (89), eee Peo ply — 3 Fa = 7 (1 + 2apy + a? U2) — ae =— ...(105 na : Beate — BS f+ Dp flag — pi peg (105), o hie Or, =a (lt+ Zap + ag) Ps te —— ...(106 iy Mba — Py — 3 + 2py eos — [ei fg oy sae Pasa oa, = (1 20.4 a" ) Z i a ae Se ea a siete 107 3 : N Habla — HE — as + 2} Mobs — [i beg oe 80 Choice in the Distribution of Observations (105) may be brought into the form ( 6 By 2 1 3 Gee fe» (Hotty — Bs) (a 1 = ip (12 — peo)? | o.,, So 1 + RE TE 5 N baba — bey — 3 + 2p Me Mg — Mig where the denominator and pyp4 — aa are always positive. Hence the condition for o7, taking its minimum value ~ N° 1 fy + Optg =O and py —~ py ps = 0 1 or Oe A ee (108). (PA fea a We shall examine the possible distributions consisting of three groups of observations with the frequencies y,, y, and y; at v,, vg and v3. The conditions (108) require vat + yath+ vot _ vith + etl + vot _ ya0l (oe) + ys0h (My — ey) 1 Vii t Yo. + Yass Vai t Yor +7353 Y2U2(V2— 4) + 33 (Vp9—%) Yi%r1 (1 + ay) Yat, (1 + avy) _ Yas (1 + avs) Vg — Ug 3 — Vy — Vg or v. U; (OP o Now —?!— , —2~— and ——?~— can never all have the same sign and (1+ av Wis On CaO V1 — Vy 2 2 3 3 Al 1 4 is for any v 2—1 positive, from which it follows that (109) leads to. negative frequencies. Nor can (109) be satisfied by two groups of observations as y, = 0 requires Vv, = v3; = 0, that is one group of observations at x = 0 which of course o2 gives = lo N° (9) We may write (106) pokes o (- iL (ts — p5) (Ma + Ope)? + (Mg Huta!) ea He be (Ma — }2) (te aa Hi) = (3 HaHa)? where the last ratio is seen to be positive unless My + Aplg = O and Bs — Hib2 = 0) cece cece cece ccc ecnccs (110). If therefore any distribution of observations can give — its minimum value 1 Me and at the same time fulfil those conditions it will make of, a minimum and equal 2 to. But pr = 1 together with (110) lead to N° v3 = -i= — 4, : : l+a ; l-a which require Ser N observations at — 1 and ae tl at 1, whereas the actual ’ i= f 1 distribution must consist of ——N observations at — 1 and ae N at 1. 2 Thus the only distribution which makes of, a nunimum and equal to is that N — 3 I consisting of seats N observations at —1 and +2N at 1. KUIRSTINE SMITH 81 (10) The general minimum conditions for o,, cannot be found without more elaborate investigations into the possible variations of the moment coefficients than are at present available and we shall limit our research to the case of three groups of observations. Let us suppose y, V, y, N and (1 —y, —y.) N observations taken at x, 2, and «,, and let the corresponding means be 7, #/, and 75. We then find, when A = (2&1 — Xp) (%_ — 23) (3 — 2), il 3 = i Oa {1 (%3 — Xa) + Yo (#1 — X3) + Ys (2 — %)}, i i ; o {(s = %)*(L+ aay)? | (Gy, — Hs)? (1+ am)? | (4 — 2%)? (1+ “a5 “1 Ye DS Vie V3 2 A2 N a Differentiations first with regard to y, and then with regard to y, give the minimum conditions vi = v (CS yi ye) (%3 — %)* (1+ aay)? (%,— a3)? (1 + amy)? (%_ — @) (1 + ag)?’ or, when we suppose 2, < % <3, al me =I = See eye) = i (%3—%,)(1+a%,) (,—23)(1+am,) (%2—2)(1+am,) 2 (x3— 2) (1+ ax) sete, AO (112). With these values for y, and y, we get from (111) , 0 (2(m—m) (Lam) of (21+ ary aN ee } N (22 — 2) (#3 — “at This for constant x, is obviously a minimum for x, = — 1 and x, = 1 and is then equal to Pe Gaal 2a Glas) ot. | Teele From this we find doa, o 2 (ax, + 2x, + a) ji. VN (=aP ” 1 1 an ae 2 5 co —— 12 N and the frequencies found from (112) are 7Vi-a(vita—Vi-a).N atl, which shows that determines a minimum. The minimum value is (L+V1 — a), Biometrika x11 6 82 Choice in the Distribution of Observations QV! TERN V AGE Ve), WP aT, and LN at == Wl Se); IX. Adjustment with regard to both of two variates connected by a linear relation. (1) The case often occurs when both of the variates observed have errors of observations of the same order so that adjustment only of one of them is unsatis- factory. We shall therefore in this section consider adjustment with regard to both of the variates and give the adjusted relation between them and the standard deviations of the constants. Let x’ be observed with the standard deviation Vac and y’ with the standard deviation Vyo, we shall then for the sake of greater perspicuity exchange the / i? ° , av U b variates for « = —- and y= + so that both of our variates have the same Va vy standard deviation o. Let . x {x"y*} taken over the N pairs of observations be denoted by p,,,, we then find, by adjusting only the y’s according to (3), jie x Mor 1 pao |= 9, Mar Pio P20 | or Y— Pou = eth UE (0 = peyg) : tnena rane gente (113) pert) = Yat) By adjusting only the z's we get Loo — Hon a = — —— (a ) cats saan (114), na Mir — Foi10 ( ae which only coincide with (113) when (H20 — io) (Moz — Hix) = (#11 — HorHa0)”» that is when there is perfect correlation between x and y and no casual errors of observation. (2) Adjusting at the same time with regard to # and y may be transformed to the problem of finding the straight line for which the sum of the squared distances of the observed points (x, y) is a minimum. Let the line sought be xcosu+ysinv+ p= 0. The sum which we want to make a minimum is then SS = [yy COS? V + py: Sin? v + 24, COS V Sin V + 2ppyy COS V + Zppy, Sin Vv + p?, ap = 0 requires P = — fy COS V — po, SIN V, indicating that the line passes through the mean (49, Mo); this determines a minimum for constant v, KIRSTINE SMITH 83 The corresponding S is S = (tao — Mio) COS? V + (yg — fay) SIN? UV + 2 (1441 — fo fy) COS V SiN V...... (115), which differentiated with regard to v gives dS ; gies ‘ aa {H20 — Mio — (Hoz — Poor) $ Sin 20 + 2 (1441 — Poi fro) COS 20. It thus follows that tan 2v = z (Hen — Horo) ees Bae 2 , M20 — Pio — (Moz — Min) = 1 — tan*v or Te 2 {loz — Mon — (Meo — Mio) = V [M02 = Po = (H20 = Pio) P + 4[ba ae LoHo1] "3 Mii — FoiF10 determine a maximum and a minimum of S. Substituting in (115) we find S = 3 {H20 — Mio + Moz — Bor = V [p20 = [io — (Moz — Mor)? + 4 [ea — Morro] so that the minimum corresponds to the negative sign of the root in (116). The adjusted function connecting « and y is hence a line through the general mean forming an angle uv with the x-axis which is determined by tan u = — cotv = es 2 (Pu — Ho1!10) For the variates x’ and y’ there must to this value of the tangent be added the factor w 2, expressed by the moment coefficients of x’ and y’ we therefore find (fy — $21) — Y (fea — Bad) + -V [ey (edo — ped) — (theo — feo) |? + dary [pts — poor fio] 2a (Me a Poikio) tanu = (3) We shall prove that the line is situated between the two regression curves (113) and (114). Making (49, 491) the zero point of the coordinates, the three tangents to be com- pared are Mir Fog 1 Ras ea » and 5— {Ho2— Hao + V 24 dui} = tan u, Moo Pe Saas, Si (M02 — Pao)? + Ayn where the p’s now are the moment coefficients about the mean. According to p, 2 0 we have eat Pe od Peo Pu since Hit < Poo» Mog As V (poe — flog)? + 4s < Moe + Heo» we have tan u < 4°, Mi 6—2 84 Choice in the Distribution of Observations It rests to compare tan uw and Mu we find M20 Re 5 2a Mi 1 { Quit ll all ae tan u ——* = ~=— {ugg — flog — —— + —Pey—- —— | +— —pin)e- joes Deal ae M20 ae Koz — 20 ee U3, (Hozt2o — Mn The factor in curled brackets is hence positive and we have tanw>or < ne M20 according as 2 (eS Oe <0, we have thus proved that Miu Ho2 — £ tanus —. M20 Miy (4) In order to find the standard deviations of the constants of the line we ~ shall express the observations, the standard deviations of which are Vao and yo, by a parameter r to get an equation for each observation. Suppose b;,=a+ 7; C08 u, yY,=b+7;sin u, and suppose we have a good approximation for a, b, uw, 71, 79 ...... ry from which is calculated « and y corresponding to the observations. The differences between observed and calculated z and y can then be expressed by Az, = Aa—r;sinu. Au + cos uw. Ar;) Ay; = Ab + r;cosu. Au + sin wu. Ar;} and we can carry out an adjustment, Aa, Ab, Au, Ar,, Ary... Ary being the elements. The normal equations are: 1 N sin u cosu cosu Sip Aa HO. Ab 27 he Aeree Ar a eee tA a a a a Qa ] N cosu sin u sin u — = {y,;} =0. Aa + — Ab4+% {r;} —— Au+ - Ary +... + Ary, 8 ay: Df ey; _ sin wu cosu = {r[ - a tet =a avi |! sin w “ cosu a >{sin?w cos? Hie nal ; = — Z {r,} Aa + {7;} - Ab +3 {rij2] — 4 SON | autn — ——)cosusinuAr, +... a ec y y « +ry(— ——)cosusinudAry, “yaa: COS U sin wu a de OY vee 2 COS u sin w L costa — sin? wu = A set abt (< - ) coswsin udu +( an +0. Arys COS U sin u — Ary Ay y 87 sin’ i - dl 5 cos?u — sin? wu eae Noise Y Ad +1ry (; - *) cosusinuAu+0. Ary +... 4+ ec + ) Ary a Y y «a a GY Eliminating 7,, 7 ... 7y from the first and the third of these equations by means of the last N equations, we obtain {sin wAx,; — cos uAy;} = N sin uAa — N cos uAb — & {7} Au...... (120), and X {r; [sin uAx,; — cos uAy,]} = X {7} sin uAa — & {r;} cos uAb — & {77} Au...(121). KIRSTINE SMITH 85 By eliminating the 7’s from the second of the normal equations we get an equation identical with (120), which shows that we have one more element than we can determine. From (129) and (121) we are however able to find (sin wAa — cosuAb) and Au; we get sin wAa — cos wAb = nena) X {(mMg — m,7;) (sin uAx, — cos uAy;)} and Ma = WA a x {(m, —7;) (sin uAx, — cos uAy,)}, 1 1 5 where m= 7 x ir} and m= N & (7%). For a point of the adjusted line corresponding to 7, we find, according to (119), Pq = sin uAx, — cos uAy, = sin uAa — cos uAb — 1, Au. The standard deviation of p, is seen to be the standard deviation of the position of the adjusted point (#,, y,) in the direction at right angles to the line. We find Dy = Te x {lm — m7; — 7p (Mm, — 7;)] (sin uAw, — cos uAy;)} UD a5 1 2 © (asin? (fp =m)” and =H (a sin? u + y cos? w) \! + ern This standard deviation is quite analogous to that obtained for an adjusted ordinate when the abscissa is errorless and gives the same indications for the dis- tribution of the observations. For o,, we find » _o* (a sin? u + y cos? u) EAN (mz — mi) ; again emphasising that the standard deviation of the 7’s ought to be a maximum to give the best determination of the line. In conclusion I should like to express my thanks to Miss H. Gertrude Jones for the care she has devoted to the preparation of the diagrams in this paper. ON THE PRODUCT-MOMENTS OF VARIOUS ORDERS OF THE NORMAL CORRELATION SURFACE OF TWO VARIATES. By K. PEARSON anp A. W. YOUNG. (1) In several recent investigations we have found it desirable to have the values of product-moment coefficients about the mean of the normal correlation surface. The present paper deals with the case of two variates. If the correlation surface be f N OE Eee OPEV UUs ee Se oh 22 ANG es eons, oF (i) 270,0,V 1 — r ‘ where o, and o, are the standard deviations of the two variates x and y and ¢ their correlation, then we define the sth-ith product-moment coefficient to be eS ee aaa % Qe= az | | DS yt ede CY i.e. ccc wheels cece eee (ii). Further we write Ds P= Ga4/(ClCi) Bait (iii), so that p,,1s a purely numerical quantity and a function of the variable r only. Clearly from the symmetry of the surface Pos, at+1 = Pasa, at = 9. We are accordingly only concerned with cases in which s + ¢ is even. We propose therefore first to give the general algebraical expressions for the lower values of p,;, and secondly to provide tables for the numerical values of these product moments proceeding by increments of -05 in r. Since s + ¢ must be even if p,, be not zero, it follows that s and ¢ must either both be even or both be odd. In the former case p, ; does not alter when 7 changes sign; in the latter case p, , for negative r is simply p, ; for positive r with the sign changed. It is accordingly only needful to table p, , for positive values of 7. For the purpose of testing computations the following formulae are of value: 0.22 (S$ tt—1) rp t= 1)@ = DiC =) pee ee (iv), Pat — (6 91) Ds ¢22 te SHP emi e 1) — NS a) sate al ts Oona eee (v). Or, again we may write sso AST) Tig age cosine coke oh ane cee ee eee (vi), and we have res (a IG Meet e rr sotonGamedtodhcadonacoos (vil), which is capable of numerical evaluation in a single machine operation. The general values for any normal product-moment coefficient are a 2s | Dp ust ( (2r)2 ai Pes, 2t = Ost ae ls SC Suomi eee (vil), m r(2s+ 1)! (2¢+ 1)! ¥at ( (2r)24 ; Pasi, 2t+1 Where ee IG2WECSaN Cre eo K. Pearson AnD A. W. Younea 87 (2) We are now in a position to set down the algebraical values of the product- moment coefficients. (a) sort=0. (ey) =O (yeti ABS) Goro (2¢ — 1), Poo=1, Pao=l, Pao=3, Poo=15, Pso=105, Poo = 945. These are of course the simple moment coefficients of the normal curve when the unit of abscissal leneth is the standard deviation. (OS LOb — ol, a Pap enti— ey seq) soa Olsiere (ot l)in, Vii =) P34 = 98P, . P51 = 157, “pa4= 1057, po. = 945r. Generally p.;1,1/Poi,9 = 7 and provides a means of finding r and testing how far the correlation of two variates is normal. (G)e SiO t 2" Deo ore lin ls Olntscs. (2b — We 23}, Pei Pg lO (1 47), o¢ = Loi(1 + -67°); Peg = 105 (1+ 872), Po, = 945 (1 + 10/2). (ins or’ — 3. Dee Deal 0 Omen (al + Dig qo trey, Do,9= 37 (3+ 277), ps5= 1ldr(34 477), pz3.7= 1057 (3 + 677), p39 = 9457 (3 + 877). (e) sort=4. On Pera leon Dns. (2b— 1) (3 1262+ -4¢ (6 — 1) o}, Paa=3(3 + 2477+ 8rt), p46 = 15(3 4 367? + 2474), Dag = 105.(8 + 4872+ 4874), 4 19 = 940 (3 + 607? + 80r4). Ga sion’ — 5. Desi —Porag — dD s..... (2b 1) er {lb + 202 + 4t (¢ — 1) 7, 5,5 = 15r (15+ 4077+ 874), 5,7, = 105r (15 + 60r? + 2474), Ds, 9 = 945r (15 + 807? + 4874). (g) s ort = 6. De, ot — Poe—1.3.5...... (26 — 1) {15 + 908? + 60d (é — 1) r* + 8¢ (é — 1) (¢ — 2) 78} De,6 = 15 (15 + 2707? + 36074 + 487%), og g = 105 (15 + 3607? + 720r4 + 19278), 6,10 = 945 (15 + 4507? + 120074 + 4807°). (hk) sort =7. idee Se oe eee OE {105 + 2100? + 84¢ (¢ — 1) r4 + 8¢ (t— 1) (t— 2) 1}, Par = 1057 (105 + 6307? + 50474 + 4879), 7,9 = 945r (105 + 8407? + 100874 + 19278). (1) sort=8. Meat = Pere =. 2 O'.--.2(at— 1) {105 + 8407? + 8402. (¢ — 1) r* + 224¢ (¢ — 1) (¢ — 2) v6 + 16¢ (t — 1) (¢ — 2) (t — 3) 78, Ps, 3 = 105 (105 + 33607? + 10080r4 + 5376r4 + 38478), Ds, 19 = 945 (105 + 42007? + 1680074 + 1344076 + 192078). 88 Higher Order Normal Product-Moments (4)e stoned — "9: Dot Poro le eee (2¢ + 1) 7 {945 + 2520tr? + 1512¢ (¢ — 1) r4 + 288¢ (t — 1) (t — 2) r® + 16¢ (t — 1) (¢ — 2) (t — 3) v8}, Do, 9 = 945r (945 + 100807? + 1814474 + 691276 + 38478). (4) es ore — 10: 10,21 = Pot, 9 = 1.3.5...... (2¢ — 1) {945 + 9450dr? + 126008 (¢ — 1) 74 + 5040¢ (¢ — 1) (¢ — 2) 6 + 7208 (¢ — 1) (t — 2) (t — 8) v8 + 82¢ (¢ — 1) (t — 2) (t — 3) (— 4) 7°}, Pro, 10 = 945 (945 + 47250r? + 252000r4 + 30240078 + 86400078 + 384071). The table on pp. 90-1 gives the numerical values of these coefficients. We proceed to illustrate their use. Illustration I. In discussing the relation of auricular height (y) with age (z) of a girl’s head a sample of 2272 individuals was found to provide the following product-moment coefficients: G4 Olof 12; 3,1 = 74:447,616, Yo,1 = — 1-957,022, Ya,1 = — 108-701,559. Are these incompatible with normal correlation? (See K. Pearson, On the General Theory of Skew Correlation and non-linear Regression, Drapers’ Company Research Memoirs, Biometric Series II, p. 35.) We have o, = 3:064,819, Oy = 3:454,125, and y = -294,128, and the leading subscript above corresponds to the x coordinate. We need first the values of qo1, 93,1 and q4, on the hypothesis of normality. Clearly q., and qq, will be zero, and using linear interpolation: 93,1 = Fx Cy P's1 = 99-437,979 x -88256 = 87-759,983 = 87-7600, say. In the next place we require the probable errors of these q’s. The general expression for the probable error of a product-moment about the mean is given in Biometrika, Vol. tx, p. 38. In our present notation it is No? 4 = 920,0t — Ve 1 3 do,0Psa,t +P G0, 29s, ta + 28t 94,1951, Ys, t-1 — 284541, tYs—1, 1 — 2EYs, 1419s, t-1- Now remembering that for a normal distribution q vanishes when s + ¢ is odd, and that q45 = 30,*, while éant me ds, i= 10 2 Mey 6 P S; tox Os we have No*,,,, = G4,2 + 492,0971,1 + qo, 2 772,0 + 4971,1 92,0 — 493,191,1 — 292, 2.%2,0 = On iOyt (LOD ag + Of tl ane dee iows: (oe Oy , 9 , ¢ , 4 Gp eS VN {10p'..9: 4 87? + 1 = 4r pe — 2p ooh seeeasne een ee ee es (x) K. Pearson AND A. W. Youna 89 No?,,,, = 6,2 — 3,1 = Fa Oy (10096, 2 — p's,1)» Oy, ! 19 4 1) = 4/N {100p 6, 2 = Pp 3,13 iolslarstakeletctcleievarelecelciele/sicle:eisi=icle'aiacenicleleis7s.4)p\0:e-0:0.0\s'eis) aisle (@ . No®,,,, = 3,2 — 14,1 + 1692,0973,1 + 2, 2974,0 + 841,193,194,0 — 895,193,1 — 244,914,0 = o,80,7 {1000 p's 5 + 16p"34 +9 4+ 247’, 1 — 80P'5,1P'3,1 — 50P's of, One oN {L000p'g » ar 16p"54 at p 5 80p'5.1P's1 a 60D", a} PNticecn te (xii). 9a, 1 Gq, 1 = We require accordingly to determine the following p’s: p's,2, P's,15 P's,2> P’s,1> p's. and p's» by aid of our table with second differences or direct calculation from the algebraic values in terms of r. We have P's, = 1:173,0226, p's, = °782,3840, p's. = *403,8135, p's1= 4411920, p’g,9 = -227,86002, p's 2 = +177, 6695. Also -6744898/V N = -014,1505. Substituting we find the following probable errors: Pais of | — -O12,926; PWR Ob gs — -1-20,651, P.E. of qs,1 = 6°625,903, 121 De Ole Ufo a DIPANT(olsio We can now sum up our results for these data: yr = -2941 + -0129, 9o,1 = 0+ -7206, 93,1 = 87-7600 + 6-6259, da1 = 0 + 51-2677. The probable errors would have been to some extent modified had we been able to calculate them on the true and not the observed r. We have AQ, 1/P.E. of go. = — 2-716, Aqs,3/P.H. of 93,4 = — 2-009, Ag, ,/P-E. of Gaq= — 2°120. Thus none of the deviations are excessive in terms of their probable errors. The system accordingly does not diverge very widely from the normal. At the same time the deviations are all in one sense, i.e. in defect of the normal value. and are all greater than twice the probable error. It appears therefore probable that there is some significant if slight deviation from normal correlation in the growth of the auricular height. Illustration II. For the correlation of the contemporaneous barometric heights at Laudale and Southampton the following values have been found: : Southampton (z) o, = 3:250,067) r = -780,225, Laudale (¥) = nein = 2922. Higher Order Normal Product-Moments 90 00-T | +000‘00¢‘6€0-T +000‘0SETSE-1 x000‘00¢‘6€0-T +00‘000°SF6- 000‘0S8 ‘TSE: 00°00¢‘680-T 00‘000‘SF6- 0000°0¢0-T | 00-T | G6: | L6FSLLO88- TEeo'SsseEL-I L9F0F9L‘F88- 41L°860°1S8- GSS‘ FES ‘SSL-I CT‘TL8‘968- GS‘GLG‘GB- GI83°SE6- C6: 06: OGL LOL EPL: 4 PIG CPS OFG- xO8F'8Z8‘6FL: x08°8SL0LL- 009‘°TE9°E86- OF FIP OLL: 00°96¢‘8TL- 0F99‘0E8- 06: GB: lOF9OST FZ9- TOEL‘O80'98L- L68F°L89°ZE9- 4F9°696°CL9- GCS FEY CES: GL‘TEL‘8¢9- GS‘SLO‘SZ9- GI8P SEL: gg: | 08: ¥89€ SSL‘1ZE- 8F9 LOL ‘6F9- 098 SEL ‘TES- x09 T3S‘9E- 009°L88‘00L- OF86F ‘09S: 00‘9¢0‘8E¢- OFOL679- 08: GL: ISZE‘986‘EEF- L90F168‘S8¢- LT80°LOL‘SFF- L9c‘TGe0SF- GZ9‘06F'98¢- GL‘S9FFLF- GS‘9G9°ZOF- GLE6‘OLG- GL: OL: 861 FL0 698: x69 69L PEF: +079 °€69°89E- x0P 89F E8&- 009°G69°L8F- OF ‘OLF‘66E- 00°980‘96E- OFZP‘00E- OL: | G9: \lSE0‘98F‘e6z- LESLF8e‘sGE- 1#10‘8Z9‘F08- L6F'SE6 FE: CGS‘898°Z0F- GL‘LOF‘ FEE: CS‘CLF‘LES- GIFOLEF- cg: 09: *G6G EF9 TFS: 986 °LI9°E8¢- *069°GLT0SE- x06 L&6°EL¢- 009‘LFF‘0ES- OF'8ES'8LzZ- | 00°9G0°98<- OFOE ‘08E- 09: cc. lI [88°STF96T- I6L6‘OLS‘9GE- L910°FSL FOE: LTFF19‘6se- GES ‘9F0'69G- ST6LOOSS- = SSSES1FS- GIOL‘6SE- | gc: og: +000°GZ9‘8ST- «000°0SS‘6L1- x000°SLE‘S9T- 00°02‘ I6T- 000°0¢E*L1Z- 00000‘68T- 00‘00¢‘Z0- 0000°G8Z- | Og: Cr: lOFL°ESS°LEL- LO6L‘SFOIFL- 1698°€E6‘ZET- LF8‘6S1‘8ST- CZSS9LFLI- CL°LZS FST: CS‘CIL 69L- CIPI ‘CFS: cP | OF | x&ZI6°SIFIOT- «P98 '8EL‘601- *08F006°C0T- «08°8Z96ZI- 009°SZF'SEL- OF GFO'SSL- 00°9T9‘OFT- OFFE TIE: OF: cE: |lO€Z‘SFE‘O80- IZ6E‘SOFF80- 4108‘6SF‘E80- 49%°COL'SOT- C6S‘CET ‘GOL: CL‘E08‘00T- CS‘SSS‘OLT- CIOS‘ISI: | ge: 0g: x88F 9L8°890- «8h 6£0'F90- x09E°ZL8'F90- 09° 166°€80- 009°E0¢‘S80- OF‘CF6‘080- 00‘°91¢‘960- OFFL‘9ST- | O08: Gs: lGSF°6866F0- 4696‘Z69‘LF0- LFE8 ‘FOF 6FO- I6L‘FL‘S90- Z9'OFL°990- GL‘896 F90- GZ‘9ST ‘080: CLE6 GET: cs: 02: +809°89¢'680- *OST EFS FEO: x0F90Z9'9E0- x0F'8E8‘6F0- 009°6ES‘ZE0- OF‘99F'ZCO- 00‘9L1‘L90- OF8I‘6IT- 0Z: CT: IS69‘°988‘L80- LOPPSF8‘SZ0- 4988°OLL‘SZO- +ITF8L°C80- | S2e‘06FTFO- CT‘S60°SF0- CZSEELSO- | GIZe‘90T- CT: OL: *6L0°F09°9Z0- x96S9°SE6'F10- x09 'Z8E'9T0- x0G LOL'SZ0- = 009°E60'FE0- OF‘06S9E0- | O00‘9EF‘OCO- OFZS‘L60- Or: co. | llgL8‘ers‘szo- 4EFL‘ZS8T‘L00- 46Z8°E°6°L00- LFO‘CZETIO: | SGZ'ZLL‘6Z0- ST‘S9L'ce0- | Se‘eee’or0- | STO8‘I60- | O0- 00- +000°009°ZZ +000°000'000- +000°000‘000- +00°000°000- 000‘0¢8‘8z0- 00°00S*TEO: —-_-00°000°EFO- 0000060: 00: | = ee a — pees oe J ell | = 7) 99 ff 6's J As ff $2 ¢ oF ta peated) eo ih ai vd KH | | | | 00-1 | 000¢‘6E0-T | 00‘000°CF6 | 000" 0G0-L 000°00E-T | 000E“6E0-L | O'EFG: | 00'0GO-L | 00G-T | 060° | OO*EFG — O'OGO-E | 00ST | 00-€ | 00'T | 00-1 | G6- | GOOSLTG- | SZ‘968'6E8- | G61F6- | GZF'G9E-T | GZ9ELFG- | I°E98- | GZ'S96- | E8E-T | GO8-S | SL°L68- | $266: | S2F-T | $8-% | 96. | G6. | 06- | OFLZ'908- OO'OLL'CFL: | OOF'ZF8- | OOF'LFZ-1 | OOG6'6ES- | F°C8L- | 0O'6LE- GLE | 029:S | | OS0G8- O'GFG- | OSe-1 | OL: | 06: | 06 G8- | SESe‘GOL- | SL‘8F9°FS9- | SL6‘0SL- | GLEEST-L | SZ9S*LLL- | 6IIL: | GZ‘008- | LOT-I | ShP-S | GS‘E08- | S68: | GLS-T | G-% | G8. | SB: O8- | OZL8°ET9- | 00‘O9S*FLE-.| 00'L99- | 00Z"LZO-1 | 0008669: | 9°ZF9- | OO'9TL- | 890-L | 08Z-% | OO'9SL- | O'OFS | 00Z-T | OF: | O8: | O8: GL: | GZ9C‘TEe- | SSTEO"ZOS: | SZ9‘O6S- | EZI°8Z6- | GB90‘9T9- | G*LLG- | GEOG. | GLE | ESTE GL‘SOL- | S*LBL GZI-T | G6:-3 | GL: | SL: | OL: | O8EL°LCF | 00O6S‘9EF- 008/088: | OOS*GES- | OOSSLGS- | 9'OTE- | OO'I6GG- | 888- | O86-T OS‘T99: O'GEL- | OGO-T | OL-% | OL OL: GQ- | SI68‘°I6E- | GL°EOL*LLE-: | GLOLGF- | GLL‘GEL- | SCOL‘SGF- | 6°6SF- | GZ‘OEg- | LO8- | SFS8-1 | SZFI9- | GZ89- | SL6- | S6-T | G9- | Go- 09: | 0968‘EEE- | OO‘O8O°EZE- 009° ae | 009°699- | OOOL'FSF: — - F.LOF- OO'FLF- GEL OZL:L | OOL9G- O'0E9- | 006- | O8-T | O9- | 09: | Gc. | SFOL‘I8Z- | GZ‘990°8LZ- | SZE“LFE- | GEB‘FES- | Gz9E‘O8E- | T‘°6SE- | CoccF- | £99 | SO9T | SL's: GLLG- | GZ8- | §9-T | Ge. | ge. OS | OOEZ9EZ- | OOOEZT‘9EZ- 000"0DE- 000‘EzE OOGLOSE: | O'STE: | OO'GLE- | 009- | O0S-T | OS ‘CLF | O'GZE- | OSL- | OS-T |. 09- | 0S: Gh | SCOF9GI- | GL°SST°6GI- | SLT"L9¢- | GLO'EST- | GzOB"S8e- | I°GLs- | GB‘ses- | SFS- | SOFT | Go‘GS zF- GGLF- | GLO- | G&L | Gh | GF OF: | OFSL°I9L- | OOOZE'DOL: | OOF'SIZ | OOF'S6E- | OOOL‘*SFS- | F6EZ- ie 00°r6z- | z6r- | OZE-T | OO'8Le- | OOZ | O09- | Ost | OF | OF Gg: | S8S9IST- | Sa°19S‘LET- GGE'E8I- GCL‘OPE- | SZ9Z‘OIS: | 6°LOZ- | Gu‘O9T- | LFF | GEST | GL‘OSS- | GL9E- | GZS. | GO-T | ge: | CE: Of | OZ9FCOT- | OOOISTIT: | OOSTST- | 00Z‘98Z- | OOSSBLI- | 9°O8T- | OOTES SOF. | OSI-T | OS*Esz- | O'STE- | OSF- | 06- | OS: | 08: | GZ- | GL89°Z80- | SL‘E6S‘°880- | SL8IZI- | GLE FES | GzOG‘ECT G°LGT- | G6°90Z- | SLE- | SSI | GZ‘9Es- | S°Z9Z- | GLE GL: | GZ | Ss | 0Z- | O8FL°Z90- | 00‘0FO'890- | O08'F60- | OOS FSI- | OODE'ZEL- | O°8EI- | OO'O8T- | 8FE- | O80-T | OO'6ST- | O'OTZ | OOE- | 09- | 0 Oz: GI- | S9LO‘SFO- | SZ‘OLE‘6FO- | SZS“690- | SZO‘LET- | GZ9L‘SII- | 6°ESI- | SSOLT- | LZE- | SFO-T | SL TFI- | SLGT- | G3Z- | Sh | ST- | ST- OI: | 0901°6ZO- OOOET*ZEO- | OON*EFO- | 009060 OOS6*EOT- | F'ETI- | OOGSI- | SIE- | OZO-T | OS‘FGEO- | O'SOT- | OST- | O& | OT- | OT- GO- | S69C°FI0- | SL°8z8°STO- | SLS*ezO- | SLO‘SFO- | 9c98‘960- | T“LOT- | GZ‘SeT- | €08- | S00-T | S3°LP0- | 9°6S0- | S10 cT- | ¢0 CO): 00- | 0000000: | 00°000°000- 000°000- | 000000: | O0OS*FGO- —-O°EOT- | 00‘0ST- | D0 000-1 00°000- | 0°000- | 000- | 00: | 00-008 —————— | 7 se | ee ~— = I | UL 0 8 | eae | ae 8 | xe 8 mOraG d | oe + MSG | ue OG) Cnc | Pare) | eae Ol | eid a Ud | eed: dL | | | | | | | | xx TuaIfoog Uoynjarwog ey) fo swsa, Ur sarin A on) sof aovfing yousoxy ay) fo sjuaif[o0g juawopy-pnporg ayy fo 29D], 91 GELS oansy [euy puoAodq yovayqns onpea yoexe IOV K. Pearson anp A. W. Youne i D ‘On[VA JOVXe oY} JO oINSy yeuy oy} ureyqo 03 Gf ppy $& ‘onTVA JOVX oY} JO soInsy peuy oy} urezqo 03 EzTt ppy || "GZ°6=6 ‘GL‘'G=9 ‘Ge‘E=e ‘Ga'c=ZG ‘Sc‘O- =O Indy [eur ft "son[eA 4oexo Joy GL‘g=6G ‘Gz‘9=9 ‘Gz‘'G=G ‘GL’g=F ‘“GL‘Z=E ‘GL°T=Z ‘SZ‘I=1 ommsy yeurg ft “Sonjea qQoexXa * 2 OL/ “d=? * a ogee siya UL xx | 00-1 x008L'066'LFS-9 0008 CFE SFF-E x000SSFE'SFF-E — OGOO"EZO'LZ0-Z_|_ 40000°9C0" ‘LEO | x0000‘°0SETEE-L «0000°EZ0°LZ0-% «0000°098 TSE | G6 | F8E8‘9Z9'CIO-S | GEZLZ8SLOL-S | SESE TESEIL-G ILELF6T'EE9-T LELE'LE9'SE9-L | SFELSLEOIT-T in LF68‘0S0'CC9-T I | S1OGS‘9E8“OZI-T | ¢ | 06- NZFOOCSSEIS-E | GFIGIZLEIT-Z | SSIFSIL‘'9ZT-Z 9E6FESS*8OE-T GSFO'ZEGLIE-1 | SILGTSO'SI6: — x9LES*9OL'EFE-T | x9S0T‘080‘°EZ6- C8- | 6FSOIFO'SSS-Z | GECFCLE'GED-T | LOT9OL8‘ES9-T + SI6S‘F6%‘CF0-1 EFITES9'SCO-L | LPZLZEGOGL: ~— PLZETSFFFBO-1 | SEL90‘EZE“6EL O8- -—EE9E'ZZT‘O9T-Z | 9ELe‘sts‘c 696-1 ZBE8‘EONLLGT 98ES*LLO“GSS- LOST‘ L66°SE8- 1Z06'C98‘O19- x F8I66ZL°698- | xFOES*ELT“0Z9- CL: TLIL‘ZL6°C09-T | LZEs*9E9°E96 6ZGLOTL6L6- ZOFZLOS‘'SF9O: | - SF6E69S6CO- | OTFO'LOO'FGF- — |bZL98‘ETE'Z69- | SLEBE*SEFE0: OL: | SZOZC6L‘EST-L | F6E8‘OL9‘ Ze: | 69SS‘69F CFL: ESTL‘SEL‘90¢- FEZOOLL‘OTS: LOOSZOLL6E- — xFOSF'OST‘SFS- | xF8E9°EOS‘9OF- C9. BLOBS FOLF9S- SCE6 GEE TSS: | +E88L‘SLe‘coe: OFS TLS Z6E- OLZ8‘Z16‘LOF: ZOFS 00ST LIE: (DETCC'GLEOEF- SEELL‘OL9'SZE- 09- | + G9gs‘egG*GZo- ZOFO'SIE TIF: | 8L08°E98‘0GF- LOPS ‘SIS ‘OE: FPZO OFL OLE: FEOF COG TSS: | x9T9S'FEE'GEE- | x96FFETS6SS- ce. PCOS FEG LFF: F609"E00'F0E- 1696°GE6'LLE: 7866699 OE: GL8ESFE ‘LES 06L9°60'S61T- | bLISU'LEL‘6SS: | STZFTFE8"FOR- OS: | ¥GLE6TLZLIE: | xO0SZ9'OLFZES- | xO0EST9ZE"8E- «0008'S CEFLI- «000¢°620"08T- xU0GLSFL'FST- | x0000‘0SF'S6I- x0000‘0S9‘09T- CF | OLGOOLL'ZEe- FZOL‘SGO'T9T- | 8ZIS8°LFZ‘99T- GEST 'EZ9‘OET- GILO'TFS'SEL- ESFS'CES ‘GIT: | b9008"SSS‘OST- | STFOL‘69G6‘FZI- | CF: OF: LOFO‘O8S‘ESST- SCL Cee‘CIL- C9IS 06E6IT- ZFLE‘906'960- LOS9'FFS‘OOL- FESCLEGT6O- | x9CFEEOLELT: | 29ESE‘GOF'960- | OF- ce. 0Z°6‘108‘FOL- LIFP'SE C180: | 6FLZCFL‘F80- L8GESLITLO- 6989'Z88'EL0- IL6L‘L9L‘°690- | b86FO°GTI‘F80- | SS61E‘OLL‘SLO- | oe: 0g: GLEL CFS ‘OLO- OS86‘618'9S0- | 9ZES8‘OFF6EO: SI6L‘FOL TSO: LIOG'FSS'ECO- | — ESLE'GETTCO- x FFLE‘E9G‘T90- | xF99S‘OFO'9SO- OS: CS: 6E6L°6FS‘9FO- |— GZO8‘EZE'SEO: GBS T“EFS1FO- L@OS‘LEE‘LEO- 9E89‘OSL‘SEO- | +GOIESIFSE0O- | bLFSS*E6z‘GFO- | SLEFELLEZFO- | Gz 02- FCL6°C8L‘0E0- 6EEO'8EN'9ZO- —- 9GOF‘OZF'8Z0- 6699°998'9Z0- 9TFS°ZOG*9ZO- 1686‘ T1¢‘LZ0- | xFOEF'8Z0'EE0- xFCOS'Z60°SE0: | 08- CT- GEST ‘SIZ ‘0ZO- FOSS TFL'OLO: | 0986°0L9‘610- [SCs°COS'6L0- OOLFSE9‘L10- LEEEOLS'SLO- | BEBLEZEEFZO- | SETZOCOFO'FZO- | ST- OL: 608E‘9E9‘EL0- F1Z0‘006°600: | 61&S°1S0°FTO- 6FOF6S9'FLO- LEFS8‘°SZLOLO- OL6L‘T69‘T10- | x9€S E1FS'SLO- | x9108°C09°610- | OL- co: | 96°F‘I90‘OI0- SIELF8EF00- €269°FZ6‘010- 8&69'E16'T10- 8ZLL°090°S00- 6ZEE‘°S6S‘CO0- | b9G1Z°SF E10: | ST8ZL‘*669°9T0- | SO- 00: 00&Z‘0E6‘800- | +«0000°000"000- | x0009°76"600- x0000°EZ0‘T 10: +0000°000°000- | «0000°000‘000- | #0000°ELTFI0- x0000°0SL°E10- | 00- | | L | OL‘or gf 66 O18 8'8 62 | be | ors gf 89g dL (-quoo)—sjuaraiffaog juawopy-nporg ay? fo aQu], 92 Higher Order Normal Product-Moments The f’s of the marginal distributions show a markedly skew and non-normal system. The regression is, however, closely linear. Discuss the values of the product-moment coefficients : Jo,1 = 11-919,404, 91,2 = 15-598,613, Yo, 2 = 401-523,496. For a normal system with the above correlation coefficient we should have: J21= %1,2= 0 and g.. = o,7%0,7 p's 5 = 362-192,761. Thus Aqe,1 = 11-919,404, Aqi,2 = 15-598,613, Aqs,5 = 397330, 730. We require to consider the probable errors of the q’s, which are given by -6744898 times the following standard deviations: Coa oN RO are el a el a — oe a _ FO," ' .2 ’ Dn OC Sam VINE ee aa tO fa at li aI Dieter 2 aoa ae 2 oa TN {1009's p— pa a. ce (xiii) We determine for the above value of r: Do y — 2217, 0021, Dig =P a =e oA OOO! P's, 4 = P's,2 = 1-030,50246, P's4 = °617,239437. Our results for a normal distribution are: Pan eot i) se O04. S525 Pot ¢,7— 21-0914 73; P.E, of q;,9 = 1-320,585, P.E. of goo = 15-360,681. Hence NGp a) bets OMG —l0s920% AQs) 5) PBs Ot g455 1 1-82 Ago, o/P.H. of G2. = 2-560. The deviations in the higher moment coefficients are at once seen to be markedly significant. But it will be noted that g,, as in the previous case does not differ so markedly in value from the normal as the odd moment coefficients. It seems there- fore likely, when a distribution is markedly skew, but the regression linear, that the even-even product-moment coefficients will not differ widely from the normal values, but that the even-odd ones will do so. It is possible that this is related to the fact that in distributions (such as 3 x 3 tables) which can be reduced in various ways to a tetrachoric table, correlation calculated from regression line diagonal cells is usually far more accurate than correlation calculated from non-regression line diagonal cells. Equations (x) to (xii) are of value beyond the present illustrations. Further uses of the above formulae and tables are provided in a memoir on ‘Generalised Tchebycheff Theorems”? which will shortly be published. We have to thank Dr Kirstine Smith for much help in the preparation of this paper. THE CORRELATION COEFFICIENT OF A POLYCHORIC TABLE. By A. RITCHIE-SCOTT, B.Sc. : $1. InTRODUCTION. We have at our disposal a considerable number of methods for finding the coefficient of correlation between two characters from a table of frequencies. These methods may be summarily named and classified as follows: 1. Product Moment. Tetrachoric r. Marginal centroids. Biserial r. Three Row 7. Variate difference methods including the correlation of grades and ranks. Equiprobable tetrachoric r. Mean contingency. 9. Mean square contingency. CoS Oi Co te Each of these methods has its own specially appropriate field of usefulness, but there still remains one class of table for which no entirely satisfactory methods have been devised, namely those which contain more than 2 x 2 cells and fewer than 4 x 4, to which the tetrachoric and mean square contingency methods respectively may be applied. It was with a view to investigating satisfactory methods for such tables that the following work was undertaken. Such tables arise under many circumstances, particularly when we can, as in many psychological investigations which depend upon the instinctive judgment of some character, definitely assign individuals with pronounced characters to either end of a scale, but ave compelled to relegate doubtful cases to an intermediate but somewhat indefinite category. We have, in a word, good, indifferent, bad; present, doubtful, absent—classifications resulting in a frequency table with three categories for one or both characters. In the present memoir a normal distribution has been assumed as it has been found to be not infrequently applicable and its assumption has given fairly satis- factory results even with distributions which are not strictly normal. §2. Notation. Let the normal surface (when standard deviations are used as units) x+y" — 2ray Nz (a, y) = ee. 2(1—71°) 94. The Correlation Coefficient of a Polychoric Table be divided as in the diagram by planes drawn parallel to the yz plane at the points - v=h,, x=h,... and by planes parallel to the zz plane at the points y = k,, y = ky .... Let the planes intersect in lines whose projections are the points (A, *,), (hg, ky), etc., contracted to 11, 12, etce., where the first figure is the suffix of h and the second figure is the suffix of k. h, hg hs hy her ans n : | ) n n n n u 21 31 pl 1 lm, | I 1 21 31 20) | { | 1 eS N42 Noe Ngo pe Neg | | rM.g 12 42 |i hy 2 so] a2} | M3 es Ns Wis | 13| 238 83/1. e243): | F | SS ee - | 2a | mit Kg Nig | Neg Na n ies = | Tae) ties Ny. eee Ny- N. My. Sys Sallie, OR A -—qc—,-—-—_— qr Msg. The frequencies 1n the cells and the marginal totals are indicated by 1,1, 149, etc., and 71., Ng. ... 0.1, Ng, etc., as shown in the diagram. The surface may also be regarded as divided at each point 11, 12, ..., into four quadrants. One of these quadrants is shown by the dotted lines in the diagram. The quadrant in the position shown, viz. the left upper or (--) quadrant will be A. Rircuik-Scorr 95 regarded in what follows as the leading quadrant and its frequency denoted by m. Thus the quadrant shown will have the frequency m3. In the ordinary scheme for a tetrachoric table, the quadrants are denoted by a, b, c, d, and when necessary these letters will be used with the appropriate suffix. Thus the division at the point s . ¢ would be represented as in the diagram, h, (14) as, he eta? M., st | | k, | ¥ | Cn dss | Meg ‘ eee | Ms. Ms. || N 1 The marginal totals corresponding to the leading quadrant are denoted by m;., m., and the complementary totals by m,’., 1.4. Clearly any cell frequency may be expressed in terms of quadrant frequencies since ze eg) Ui — besiege ee ten os] a pen ele § 3. HENnneAcHoric Mretuop. In order to determine 7, since we assume the distribution to be normal and we know the marginal totals, only one more datum is required. This for example may be a frequency block (or the total frequency on a continuous system of cells). As special cases we have the “ briquette’’ or frequency on a rectangle of cells or again the quadrant frequency. The block may be the frequency contents of a group of corner, marginal or internal cells*. Consider (for future use) the general case of a quadrant frequency. Let My. Met meh where ,7 and ,@ are the tetrachoric coefficients. Then = 190, Met N 7 TOO Cie GTAR CMRI) Marden sue net aedahastuebezantubtaas (le) In using Everitt’s Tables of the tetrachoric functions in which 7, and 0, must be less than } we must either rearrange the table or adjust the above formula for the position of the mean with regard to the quadrants. It is more convenient to adjust the formula as follows, dropping the suffixes as we are dealing with any point of division. lator sti wir + ete Wal +... * A “cell” is the least element for which the frequency is provided in the original data. Cells grouped together for any working purposes are collectively termed a ‘ block.” 96 The Correlation Coefficient of a Polychoric Table Let Ween WS oh thy Mean in a, ‘d az = T) + O) — 1 aP N =T +0,—1L+7) 0) +716, 7 + 72O5'7? + ... = Tp tO Earn Oar Cae iceny) = T Op BiG (7, 0), 10 pecan ccm ce et cues dee eee ee eee (2). Mean in 6, m Cc Nee Oi = Og c= Oe (g20 7), “ates bree vosnes aes cnt eee (3). Mean in c¢, ie 9 b yao = Oy — (Ty Gy + 710, (— 7) + 72°03 (— 7)? + «.-) = —(l—7™)%— O (7, 6, — fr) = T9 05 — Or 0, — 71) ve cackon sae nk seen ee eee ee REE (4). Mean in d, W = T G5 + OL (005 7) ew haces pee eeReReeeee (5). In place of taking a quadrant we may take a marginal or internal block. I shal] only consider the latter as the case of a marginal block may be deduced from that of an internal block by removing one of the bounding planes to an infinite distance. In discussing the central block we in effect reduce any table to a 3 x 3 (ennea- choric) table as we consider it to be constituted of a central block (or group of cells) and 4 marginal and 4 corner blocks. I may therefore use the nomenclature for a 3 x 8 table without any loss of generality. ic aE a9 9, + © (a7, 29, 7) — 17 oy — (ur (17, 99, 7) = tig Op (ur (oiy40 17) ie aio Conte (ul (7, 39, 7) — (eT i 170) (605 Aah) +| (cars 14) (0, = AU) 7 + (s73 — aT) (0a — 392)'77 + ote. io ot Ly) N + (971 — 473) (08, — 191) & + (aT2 — 172) (292 — 192) 7? + ete. As an example of the rearrangement of the formula for computation consider the case when the mean is in Mp: ; Mo, ; (reel Ta) ia = 97 009 + at1 001 1 + ute o0o 7° + oTs os 7° + --:, A. Rivcutre-Scorr 97 : mm (mean imv0) => = 47009 + 171 201 T— 172 20o 12 + 173 203 7? — . Moy f , Q, 72 ' 6. 73 (mean in ¢) -so* = 99199 + Tr 191 7 — oT 192 7? + ats 103 7° — 2 n (mean im @)) 2 = 475189 + a7 10, 1 ata 185 7? + 7s 103 7? +... Moo - WV aN (gg — My2 — Moy + My3) = ee "= 494) 7 + (a2! + 172) (382' + 192) 7? Mei Ta sUs 10a) Tr A CUC, Sastuninsceeucsanet esses (7). It will be noted that when one set of categories is symmetrical about the mean, i.e. when say 57,’ = ,7, all the terms of odd degree in r vanish. This corresponds to the fact that symmetrical categories may be reversed without altering the numerical value of the marginal totals and their relation to the central frequency ; but such reversal will change the sign of r. a (ary’ aay 171) (00; § 4. Stranparp DeviaTION oF + BY ENNEACHORIC METHOD. We have now to determine the probable error of r found in this manner. Throughout what follows differentials will be used to indicate random sample variations, i.e. it is always supposed that the variations are small as compared to any quantity varied so that all the dn’s are small, or all the n’s are large quantities. /e it fa |S | 2 Of ii)) CHB CL) Sa One Fr enone ors ee (8). Since the variations of the means and the standard deviations are, in this form © of m,,, involved in the variations of h, and k,, we have of of dh, near oh, Evaluating the differential coefficients, dM, = k,—rh, e7 32/2 a : ow [ 2( (h,, y, 7) dy=N oe ie Bt over dyis, ae (10). This is the area of that portion of the dichotomic plane «= h, which bounds the quadrant m,,. But the area of the whole dichotomic plane is +0 N [ z(h,, y, 7) dy sme TOE SIE) 2 ak oe ee (a, so that if we write £ See IN 2 BA Ores ees UNE eS ae ee (12), ‘ k, —ths where Ay = roe ee Bie 2 hype ays eee hes (13), the factor A,, will be that fraction of the whole dichotomic plane section, which bounds the quadrant m,, and will have no dimensions. The value of A,, may be taken directly from Sheppard’s tables of the probability integral entering with Biometrika x11 7 98 The Correlation Coefficient of a Polychoric Table the argument ee It will be convenient to refer to this tabled integral as &, = p so that k, — rh, Ay © (a) i a ee 14). Var? a) It is convenient to note that with this notation 0 N S —1_2 7 Further since a alee ert ada, ia AN ee The gd Was ee and dh, = aig 2 RN ieee Hence aie AN Abie 2 Beg 15 n ah, = eis: Nay GIN gs a dcaae eee eee (15). Sbe of Similarly ah, dk = BGO. sah ees ee Eee (16), where By, = & he — hy (17) he = at os Lilie ; , HR SD oh Lastly ie PE la (x, y, v) dudy he ple =WN ie in dp” (x, y, 7) dady he phe qe = WN ie ae ech z (a, y, 7) dxdy ma Nz (hs, ky, r) HEY ag cule ake dad nseadceeed Pumenb en As Seles aut cso cee eae (18), which is the length of the ordinate at the point (s, ¢.). We may now write dm = Adm, + Beams Net Ote ete: eee (19), and =y1d" = Adm, + Budm.,— Wins) se. oee ee (20). Considerable use will be made of this formula later and the following abbreviated notation will be used: A,dm,. + Bydm., — dm, = 0P p= — Ya dln eee (21), and Ags - Bane Ne Payton eee (22). The reader must be careful to note that 5P,, is not dP,, but only a part of it, and this symbol is used here as at once a conventional abbreviation and a memoria technica. . Since Noo = Msn — Mg — May Maas *. ANg, = AMgg — Uy — AMy, + dingy = Ayding. + Bodin. + Xodr — Aypdmy. — Byydin.y — Xi247 — Ay dm,. — Bydm., — xy dr + Aydmy. + Bydm., + xd... (23), A. RitcHiE-Scorr 99 = (Xo2 — X12 — X21 + X11) Et = (Ag, — Agy) dmg. — (Aq, — Ay) dm. + (Boy — By) ditt.g — (Boy — By) dim.y — Uttgg wreceececeeeees (24), Reference to the diagram shows that A,. — A,, and the other coefficients are the proportions of the areas of the trapezettes bounding the briquette of volume Ng. These area smay be systematically named for the whole table thus, that is, the areas of the planes meeting in the line of which the point s, ¢ is the projection, in the direction shown, are named from the point so that A, — As, +1 = Ost, Be tm B,_1, oe Bet. Hence we may write As, — Asa — Gey Aj, — Ay = O42, By — By = Boo, By, — By = Bo- If now we notice that since m,. = N — ng. etc., and m,. = ,., dM. — — ONaa, dm.g = — dN. 3; die die, Wii Wap we then have = (X11 — X12 — Xo1 + X22) dy = — (aygdng. + ay2dn,. + Boodn.3 + Bo dn. + ditzy) cena (25). Expanding this in terms of frequency volumes this becomes (X11 — X12 — Xo1 + Xe2) Ur = (ayg + Bor) dq + Boy Any + (429 + Bor) drs; + dyzdnyy + Mirzy + AyodNg2 + (42 + Boe) Uys + Bodie, + (ag2 + Boe) dts It has already been shown by Pearson (Biometrika, vol. 1x, p. 1) that when random samples are taken from a population so large that its composition is not appreciably affected by removing the samples we have the following relations: Ong = Mas (1 = “*) errant ho p. (27), Mean (dn. dnyy) = — Aare eco let ean ness anne (28a), Mean (dn, .dn,.) = — ie chee ea Eikeeeaan (285), Mean (dn,. dn.) = ngy ue a SE ay eet, eS (28c), Mean (dn,, dn. +) = — eet Pea een Crees (28d), 100 The Correlation Coefficient of a Polychoric Table Ngsi Nz No iiss ( Mean (dn,, dn, .) = gg (2 — “) AERA RI IO (28 f). Mean (dn,,dn;.) = — Hence squaring both sides, summing for all possible samples and dividing by the number of samples we have, (X11 — X12 — Xe1 + X29)? OF? = (42 + Bo1)? Nyy + Boy? M1 + (ee + Box)? Nga + G42? qq + Nyy + g9?%g9 + (aye + Bop)? Mg + Boe? Mag + (dee + Boo)? Mos l (a2 + Bor) M13 + Bortar + (agy + Box) N31 je Nr + GigM1o 1 Noo ti Goo Man) eee (29). + (a42 + Bos) M13 + Bor Meg + (a2 + Bog) Noe The expression within the large brackets = OygMy- + AggNg. + Bo N.1 + ByyN.2 + Noo. Calling this Nm we have (X11 — X12 — Xer + X20) Or? = (a4 + Boy — M)? 2, + (Bo, — m)? Noy + (ag9 + Boy — m)? M3, + (ayy — M)? My + (L — Mm)? Nyy + (agg — M)? Ngo + (a2 + Boe — m)? M3 + (Boe — mM)? Neg + (Go2 + Bog — m)? Ngg (30). The following form for m is instructive although giving an apparently less symmetrical form than the above, 1 m= 75 (Gist. goa Oa a ei ee Wes) 1 = NT | (Ay — Ayy) my. + (Boz — By) (N — m.,) + (By — By) my + (Bog — By) (N — mug) + M22 — Mg, — My + ma} = e | Aven. + Byym.g — My — AqymM,. — Byym., + M4, — AggMy. — Boy M.y + Mg + Ap, Mg. + Byym.1 — Mg, + N (Ag, — Agi + Boo — By} =a, Re 11 12 21 22 We may then write (X11 — Xie — Xa + Xv)? Gr 3B? BY? = (a1. — O99 + Bor — Bos + 4 M1 + (0 = Ge5 + Bo: — Boo + ¥) Noy 2 2 ae (ie Bo + » M31 1 (a1. =033 10 Bae 1 ee) N12 1 1 2 2 a (5 — Gee 5 — Bos + 3) Nee + (0 Sf) N30 2 2 2 al (a1 — gg + -) M43 1 (0 — ge + =) Neg + (*) Lng) doce: (32). A. RivtcHiE-Scorr 101 It will be shown in a later paper that the coefficients of the cell frequencies are functions of moments of the frequencies about the mean. If any relation between h,, h,, k,, k, makes Mine ais Xen ree = 0: while the right side of the o,? equation remains finite the value of the expression for o,2 will become infinite. x1; — X12 — X21 + X22 Obviously vanishes (1) when h, = h, or k, = ky, i.e. when either of the central categories becomes vanishingly small, and (2) when h, = —h,, ky = — ky and r=0, i.e. when both sets of the extreme categories have equal frequencies and the correlation is very small. (1) When h,=h,. Then x1;=X12, Xo1= X22» 12 = Gog =a Say, Bo, = P,.= 0, Ng = Nop = Nog = 0, Ng. = 0. a (ny. + 73.) Then m= N (0. and the right side of the o,? equation reduces to zero giving the indeterminate form : for o,”. (2) When h, = —h,, kj = — ky and r= 0. This case will be discussed in the next section. §5. Sranparp Deviation or ENNEACHORIC r IN SPECIAL Cases. Two particular cases are of interest, (1) when 7 = 0, (2) when the table de- generates into a 2 x 2 table. m. (1) Whens=0 A, = & (k,) = We? Mg. By, ms € (h,) mz N ? Py = Agms. + BygM.g — Mee Noe M 4M Mg eM 4 ve == a. N Mse = ies Mig— My 1. Hence Ct Ane A a aN’ ane Me— My Nz dy = Ay — Ay = ai Paar eat Ng» Similarly Bos = Boy = Nie M = Ago + Boy ssi ease eae! = ae W = iN (M41 — Myy — My, + Ms) 102 The Correlation Coefficient of a Polychorie Table X11 — X12 — X21 + Xoo = N (HK, — HK, — H,K, — A, K,) = N (H, — H,) (K,— &,), and the right-hand side of the equation reduces to WENe Pe a Bees (7) (M41 + M43 + gq + M33) + (ma Nog — Ng.\? Hy) Wns Wae\* Eis (i ) (yp + M3) + (1 =f — NT | | Nop: = Nog this resolves into (Ree), (N — ng.) (N — 1.9) ; No Ne; Remembering that 2” (N= Mee\? (M2)? Ne: N N (Sr) Gye) Oe ONS oe as PY ao war)! ee nal) Ns.N ey Fe) (yt) — meat (Sy) NW N — ny.) (N — 0.5) Ng. Neg ei aates) _ ea) isa {Ng.N.g + (N — Ng.) Neg + (N — Ney) Ny + (N — ng.) (N — n.2)} Ny. (N — ng.) Nog (N — n. Se eee (33). Hence aK 2 ee re hase N—n,.) n..(N—n. tl taneeGrens INCE Ci ae Phe ( W 2) .— N 2) a ee yy a = re Dear nopy 34 and om “JN Nh A) (Kak os (34). This may also be expressed as follows: 2 9 _ Mae (N — Mg.) Np (N — 2-9) Ong? Ong = N N NzMeg Nye + Ng. Ney + Neg Nie ON) N = Noo (My Nag Nay te Mgg) cee even a eeelectentene (35) i A Ve Ngo (My + Ng + Nyy + Ngs) an oo 2 Se gs ce nbs ona Cae 36). NH, — Hy) (hi — Ba) When the extreme categories are nearly equal and 7 is zero H, = H, and K, = K, and the value for o,? becomes infinite. It is necessary then to keep second powers of differences to determine o,2. Keeping second order terms we have: sco 0 C dm = al dh +; ary Ola alaiGen orf sch + ak +L ay +54 (anya + 2 =, (dk)? + om (dr)? 2 (dh? Oe Or? o2 a2 q2 +2 ah d (dhdk) + 2 ae (dhdr) + ou (akan) ARB GHIEaS 06 (37). C2 To find aye ete. we may proceed as follows. A. Rircntrk-Scorr 103 By straightforward differentiation we find that OP st A? — 2rhy k orf ) | N iP é 21-7?) Ohm Ch \oen/pa lew ayy 24h? —Qrhq Pye} hN fc mY a Sell rN _W +k? — 2rhk = — —— é 27) dy -+ ee 2(1—ry2 QnV 1 — 72 J -2 : nV 1 — 7 ee "™ ay, (- Wil = *) Pee ered oe en, oie. (38), a SN ( = *) ee ee (39), o*f ax INE _, (k= kr) (k= rh) ay? = a = —_ pe | T 1 = PD) XG oiptalietslersiisverevsfstulsieiniate (40), CjmeeOXn = he Oh Or Oh, ce ro » (41) Oy oy = k=th 5 anor Ai ] oF Pe) » (42), 2 of ne == af eset cam te ima: t ouiat ooua nouns tet nde ses (43). Hence summing for all samples and dividing by the number of samples and denoting this operation by %, we have, since quantities of the first order dis- appear, an equation involving % (dh)?, & (dhdk), etc. We may determine these values thus: din. Won = WH,’ m ms. (1 — => , dm,.\* o( x) ~ S(dh,r=% cae) = ee (44). Meg (gee: - Similar! & (dk)? = — ( w) 45 y (Uy)® = a oon eereerteeeeetin (45), mM... Sida) = s (oe Z lee aN : (46) Wits vt — wae) NAT a sen een eer e ee oes To find (dhdr) we have — ydr = A,,dm,. + By,dm., — dmg, eis eee eee xdrdh, = NA. . — NH, & (drdh,) = Ag, & (dm,.)?+ By & (dm,.dm.,) — S (dm,,dm.,) Mg. eo Meg. 2 —PA ie: (1 — 7) + By, (an ane ret) — Msg, (1 — *) echoes (47). After simplification this becomes (1 = *) 1 eae = Bi (Wet =. Mst), 104 The Correlation Coefficient of a Polychoric Table ee na eee ™;. 4 ce & (drdh) = — a \( ) Py — Bim md} aie (48). Similarly S& (drdk) = — : (i *) P,,— A(m,. —m \\ (49) ~e, 2 eo, ‘ NKy N st se st J eee ees ees . Equation (37) then becomes 0=N (- hi *) % (dh)? + N (- kKB — ¥4 % (dh)? x { , (h— kr) (k— rh) jes ‘reals ear LS @yr25 —5 xX + & (dhdr) ge 4 & (dkdr) + 2yS5\(dhdh)......1 ss ee (50). .. Substituting the values of the $’s, and transposing o,? we find mM,. = m,. (1 —-= x {a Gee my a ( x) : ( 7) peers | ap cee o2=—-N(hHA+?r v) eR Met (1 = 7) } y Meg. +2 aE Pe ‘NHy (1 7 iT) ler mn Bs, (m.; a) mao} ok — rh 1 i mM. 25 may” Wit (i _ ae Ps — Ag: (ms. — sdf M,.Mag = Ey) meme Gi, sie,(o:0'n vsa‘a pie, ote aveielececeye ebetelorere aieie’e eia¥afetats ehele oe EOE O1). + OY ER ree (51) When r = 0 and hy = — hy, ky = — kg, x reduces to NHK, etc., as above and the equation reduces to NE Khe (1 - ") tas (1 aa ie. (52), aN («KB +r x) Met — NH N NK If we take this equation for each of the four points 11, 12, 21, 22 and combine them according to the scheme mg. = 14; — Mg — Mg, + Moo as before, the left member of the equation becomes — NHK (hyk, — hykty — hgh, + highs) 0,2 = — NHK (hy ky + hy ky + hyky + lyk) 07? = —4ANE Kh kyos eee (53). In a similar manner the right side reduces to h my. k m. WH, (ms ot N Nan) + WK, (1 | W 7) nda fae eee (54), which may be further simplified as follows m My. Ma + ae Noo = No 4 WT (m.. — 249) A. RirentE-Scorr 105 _ 9 Mae Mey (1 ee 8 ) rs N ep + "2 oa WR a een (55) Similarly Noa + i Nog = 2 a re it ets Saeceeemionce (56) Hence substituting we have No v C. SAN Kaka oe — 2 We ie my. + K, mas} Deeb oe ce fi (57), dn a Eh and G2 = INHKD K, = M,.+ EK, ma| ene eect ee (58) (h, and k, are of course negative numbers) which remains finite when the extreme categories are equal to r = 0. (2) When we put h. =k, = © the enneachoric table degenerates into a tetra- choric table and we have Ts Ss Ning = 05 N31 = Nga = gg = O, (Oo = Mei Uy X12 = Xa = X22 = 9, GyoMy- + Boy -1 + Nop and hee N But Ay = Ay — Ayy = wo — rh, a= 74) mm oe ( |. 8 ae a Similarly Bx = 1 — By. Hence m= (1 — Ay) mi. + = Bu) M1 + N = my. — Ma + M1 N IN = (A, %: aF Beet = N41) = N al Pu =] Ne P : , “X16? = ea Ay— By + 1) oe Ge iz Bu) a P : Pav? a (3? ae Ans) Nie + (=) OO hg ee Meee (59). This form of the standard deviation of a tetrachoric correlation will be referred to again later, and will then be reduced to a still more symmetrical form. From the above it will be seen that we can determine the correlation coefficient from any frequency block, assuming a normal distribution, but the accuracy of this determination varies with the position and size of the frequency block. The- probable error (-67449c.) varies from cell to cell, and an unlucky choice of the work- ing cell may lead to a correlation coefficient with large probable error. A correction 106 The Correlation Coefficient of a Polychoric Table of this latitude might in some cases be obtained by using another cell. But as the r from this cell would probably differ from that previously found, and as neither of them would be identical with that of the normal surface from which they are sup- posed to be sampled, we must find some means of approximating to the “ best” 7. The most general method of doing this, following on the above, would seem to be to weight each of the frequency blocks and determine the weights so that the re- sulting probable error of the weighted 7 is a minimum. In doing this we must have regard to the fact that the variates we are dealing with are not independent but correlated. We must consider this method. Let the polychoric table have p rows and q columns so that the indices of the last row and column are lq, 2q, ... pg and pl, p2, ... py respectively, and let each of the frequency volumes be weighted by an arbitrary weight w,,, W , ... dicated by the same suffix as their respective cells. Then WN + Wiese + -- Wy My + Wyy (Myg — 41) + Wey (Mgy — My) + Wye (Mag — My —— Ma, + My) +... i Wee (Mise — Mey gd — Mg poy len ow) alae = (Wy, — Wyy — Woy + Woe) My +... + (Wee — Wert, t — Ws, t41 + Wot, t+) Met + «+ I = 4 Nag Hb Wye Mye to oc. Weg Mean F nate vhs hoe ce nace ee ee eee EERE eRe (60). i ] Then N (14 %y7 + Wiis + -.-) = W (044 M41 + WyeMy2 + «.-) = Wi (47 ACh + i) + Wi2 (47 305 —- ) = woe wee cere er eeevee (61), which is an equation to find 7, using @,, as an abbreviation for sO + gh 2-100?" + o7g-0g7? + ete. (Compare the usage in equations (2) to (5).) Myqr Moy) +++ Mp1, Mp, --. are complete segments of the normal solid and are inde- ad = .T pq, and these terms disappear from both sides of the equation. The w’s of the cells in the last row and last column of the tablemay therefore be dropped and we have (p — 1) (¢— 1) w’s to determine. The probable error may be written down in the manner already shown and as there are (p — 1) (¢ — 1) independent frequencies there is sufficient data to determine the w’s so that the probable error is a mimimum. pendent of 7, Le. There is no essential difficulty in carrying this out except that the coefficients rapidly become very cumbersome as tables increase beyond 3 x 3 for then the simplification dm,. = — dn. is no longer available. It will be found however that the same result. may be derived more simply from the method discussed below. § 6. Potycnoric MErnop. The frequency surface divided into p columns and q rows is divided at each point 11, 12 ... into four quadrants, and for each of these divisions a value for 7, ViZ., Ty, Tyo ---, may be found by the tetrachoric method. These may be regarded as approximations to the true value of 7, and their weighted mean found, the weights A. RircutE-Scorr 107 being determined so that the probable error of the mean 7 so found shall be a minimum. Cnt + Crt t .- Let r= Boe eee tae MOET tbs aoeaek 62). CROs hah ey) Then (Gite Crete Oi Oa grt Cel po Ames 2s ee cesdeenes sues (63). Squaring, summing for all possible values and dividing by the number of samples, (GR ot = (CC act\P 12>) (OC pores) Wa su)’ asin. (64), where Ost = Ore Bes, ft’ = rg rey Let S== 2 (Cyto)? + 2% (Cet Cee Oscoer Ree, st’) CnC): Then for a minimum os os LS = ~~ dC iG. = g aC, dC, + aC» Che 0, OKO dCi, + dCi, pire Ob aN os . (= =(-z~—-—A)dC,,=...=0. (; = d) dC, (; = ) dC, Coy? + Cy2071072 Ruy, 12 + Cy3011013 Ry 15 + oe = r, ORomGre Matas + On ster er + Cor Onettra a5 + 5, (COs) Se Ike lp = CAntan + Calas + ee wer meee ree nv ccc ccserreccene (73), s dr, == COT + Ore Olas + doh —— Ts Ch oe a Cis ee eee eeeres (74), N11 X12 — (Cyydryy + Cyydrys + «..) == (A,,dm,. + By, dm., — dm) + oH (A,.dm,. + By.dm.. — dims) + ete. 11 Transposing and rearranging we find CO 11 Xu dm, + Cis diy, + X12 Crs 270 (A,,dm,. + Bydm..+ x47) + oe al Byodm.. + X424742) + ete. 12 11 A Cu (e dh, + Chi dk, + in Ap a) +5 Cie oe dh, + Chie dity + fe dyy) + X12 2 5eR oh, Ok, Ory; oh Ok, Oi. fase RE (76), h, ky ; where fa=N | z (uv, y, 7) dxdy = mgt; 6! (G} C Gi u My 41 2 My +. = a (179 199 + Gro) + —? (179 299 + Dye) + Xu X12 Xu X12 (77) eda ii); using ©; as in equation (61), page 106, which is an equation of which 7, is a root. If in this equation we put Cy == x, Cis = — X12, Cor = — X01, Coo = Xap, and the remaining C’s=0 we have the equation for the enneachoric coefficient and r, now appears to be the weighted mean of four tetrachoric 7’s, tae = Niele sXe ot ae Xeohae | So My hin am Bon A. Rrivcute-Scorr 109 Comparing this with the generalized equation (61) for enneachoric 7 previously found in which all the frequency volumes were weighted, we have Y m Cy O11 Sa X11 Y Cb Wi2 haar ’ X12 oy Ost Ws = > Xst and since Wet = Wet — Wyit,t — Ws, t41 + Ws, t415 it can be easily shown* if there are p rows and q columns that Crt Oni Os tay tun gig + +1: Og oi + Wear gt sia gia t +++ Weia,g—4 + Wy-1,t + Wy—2, t4+1 + tote Wp.a Stole dete lercieiéve; steve terete e lexersie Bc ite.) 5 that is the sum of all the weights having the same suffixes as the points contained within the d quadrant of which n,, occupies the corner. * Consider a two-fold extension ruled and named in a manner similar to the polychoric scheme on page 94. Then My = Nyy Met = Nyy + NyQ + Nyy veeeee + M4; + Nq1 + Nog + Nog «00s «+ Maz + gy + sg + Ngg voeeee + se Hence 0447741 + oy +... = Wy Nyy + Wy (Myq + MyQ) + Hyg (My + M2 + M43) + Wey (Nyy + qq Heroes Ny + Noy + Nog + ovevee Noy + Ney + gq + eveeee Net): Tf we rearrange this in terms of 7,1, 2,2 we shall have Nyy (Wyy HOy9 +erseve Wee) + Nyy (Wy2 + Wyg + ever st) SW Ny + WyQNyg + veeeee WetNgt + sevens WpqNpq+ It is clear that w,, will be the sum of the w’s belonging to all the m’s of which n,, is a constituent. that is, from the figure, all the m’s whose boundary lines lie beyond the lines h=s —1, k=t—-1, i.e. st Wp =Z Vw 11 The relation Ng = Mop — Mg_y 4 — Mg» t-1 + Mg-4° 1-4 may be compared to the partial finite difference in two variables A,Ay U;; a Uz41, deol a Uy41, Po U;, Y+1 ar ies y> which may help to make the above relation clearer. 110 The Correlation Coefficient of a Polychoric Table § 8. CoMPUTATION OF f’S AND o's. As the computation of 7, even for an enneachoric table is somewhat lengthy, it is necessary to have a definite scheme to work to. In addition to this the values of the R’s when resolved into their constituents present some interesting features. A new expression for tetrachoric 7 has already been deduced from the degenera- tion of an enneachoric table. The following is a derivation of it directly and in a more symmetrical form. Consider the tetrachoric table D a b Ney F E L c d Was G Ny. Nos N Let A and B have the same significance as before, 1.e. A is the fraction which the area of the plane DE is of the whole dichotomic plane and B the same for FEZ, and write gP = Any. 4 Bie 20 eee (79), where the a suffix is used to indicate that it is the P of the leading or a quadrant. Then since the fractional area of HG will be 1— A and of HL, 1— B, the corresponding P for 6 quadrant will be A ke — Ang. +- (1 aro B) Ney a b =A(N—n,.)+ (1 — B)n., -— (n.. — a) = AN — (An,. + Bn., — a) = AN og cca von ctneauk taut ston oe Rea ee cee eee (80). Similarly wb = BN = 4 Pee eee ee (81), aP = — N(A BR) ee Pie oie see eee ee eene (82) Hence we have ae phe ek EGP HEN ene (82) bis. We have already seen (20) that — yd1 = Adm,.+ Bdm:, — din, — 0.0 Pi cade eee (83). Using the symbol % as before to denote the operation “sum for all possible samples and divide by the number of samples” * we have ; * Tt would be useful to have a distinctive name for this operation, verb as well as substantive. A. Rivcute-Scorr Sate a2 = & (5,P) = S (Adm,. + Bdm., — dm)? 2 = A’m,. + Bem., + my + 2ABm, — 2Amy, — 2Bm, — oe 2 = At(a +0) + Be(a+ 6) + a+ AB 24 — 2B)a— 4 2 =(4+ B—1)?a+ A’c+ BO - oe a Ror RIE Orr ere cm Ree (84). But w=A(at+c)+ Blia+b)—a=(A+B-—1)a+4+ Ac+ Bb+0.4d...... (85) and a+b+ct+td=N, Rese) (One)? wP\? we\? ; we\? wP\? =(44+B-1-%)) a: (3-4) b+(4—-%) c+ (0-*) d Hye Goes ae =) ui Es {Pra sh cel AT) cea ae ees a (86). Further (— gP)a+(.P)b+(%P)c+(—,P)a =a{N (A+ B—1)—,P}4+ (BN —,P)+c¢(AN —,P)+4+d(—-,P) =N{(4+ B—1)a+ Bb+ Ack}— NP =N{A(a+c)+ B(a+b)—a}—N,P = NIP = IN IB =O camedeie cooge tadoc Cen eRDn Con aS Een OCa eee ener reer ee (87). The above form (86) of the square of the standard deviation (omitting factor) is interesting as involving only the squares of the P’s. Since the P’s are connected by the relation (82) bis and (87) their values may be determined from any two of them. The R’s. Since — Xt Ose = OPs: and — yy Or yy = OPyy, XstXs't Ost Ol se’ = OP OP yy and _XstXs' Ot Fs Ree. y = BD (OP 28P ov) = Soe. ee’ (SAY): In conformity with this notation 2 Ci Ase ae = evecr: It is useful to have a verbal rule for writing down such mean products as > (OPP .4). The following will serve. Multiply the detached coefficients of the differentials in the 6P’s as in ordinary multiplication; strike out the products in which the related frequencies have no common frequency and insert the common part of the frequencies after the related coefficients. From the whole subtract the full products of the P’s divided by the total frequency. This may be proved as follows: Te The Correlation Coefficient of a Polychoric Table Let p and q be any frequencies in a given distribution in which the population N is so large that sampling does not alter its composition; then we have the well- known results (p. 99) S (dpdy) = — 7 Now let p and g have a common part c so that p=p te, q=4 +. Then % (dpdq) = Sd (p' +c). d(q' +0) = & (dp'dy’ + dp'de + dq'dc + dc*) _ Pq pe re(l 4 i N N WN N __Wt+OW'+9,, N =e-f. Now the mean product of any two linear functions of p’s and q’s, S= Dd (4Pit tePe t+ -..)-d (hig + kage + --), will consist of the sum of the mean products of terms such as i,dp, . k,dqy. But S (i,dp, . k,dq;) = 14; (dp, . dq) = Us ky (ct aa Pel) > where c is the common part of p, and q;. Therefore Se3 fi k, (cs = aa SE pees Jee Hence the rule. As an example consider S,,.9,, Syy-21 = > (8P1,5P21) = 9 (A,,dm,. + Bydm., — dm) (Ag, dm,. + By dm., — dmg) = Ay Ay. + Ay Bay my — Ay im + By An Mg + By Bom, — By my — Ag my — Byymy + My — sme = (Ay + By — 1) (dor + Boy — 1) my + By (Aor + Bor — 1) rq + By Borns + Ay, Aoi Me + 0. Asses + 0.0. Mop + Ay, Agi M3 + 0. Agy.Mo3 + 0.0. Mg. A. RircHir-Scorr 113 1 ( (Aq, + By — 1) m4 + By M1 + By N31 Sa + Ay M2 +0. Moe +0. Ngo IE Ai Mi3 + 0. Mog + 0. Nag (Ag, + By — 1) my + (Ay + By, — 1) Ney + BoyNs1 ae Mes Agy M19 + Agi Noo + 0. Np a Agi M43 + Agi No3 + 0. N33 12 P aE (4n ar Ekteee te =) (4n Ba a) M1 IP 12 P q P at (21 i *) (41 ied Oi (eet = Noq + (Bn a -) | Ba a =) N31 12 Jes P. E ar (41 = 2) (An rT =) Ny. 1 ( a =) (4m = =) Noo P P Iz 12 ae (0 W) (0 7") N32 + (An ai =) (An = ) N13 Iz B PL P. ails (0 a =) (401 a) Nog 4 (0 =) 0 — 7) Ng aieestes soe SG (88). In the above the P’s are ,P’s and remembering that oP. ne Ay,+By,-1- a= = We? ete., we have ie P, Ales Pe JB oe Sioa = ( oH) ( ) My + Ga ( Tt) Noy + (=e) (Se) Nas ‘BP PEN P. oP P * (a) Goat (ey) a) net Oop?) 1e. ; + CP) EP) ne (8) Er) net i = yr taPrraP ata — ¢PiyaP artes + oP irc? e131 + pPyyePor (M12 + M413) — oP irePo1 (Moe + Nez) + oP iraP or (M32 + M33)} The relation between the coefficients in the above expression is very simple. We have already seen that New? ei (— gh)? 4220 (6)? 0 GP Ad (= oP)? cnsicrece ts (90). In the quadrants of a tetrachoric table write the P coefficients. Thus a aP ae ae ++ »P a ak The a frequency is related to — {P, etc. Consider now the empty scheme of an enneachoric table regarded as a tetrachorie table with the point of division first at, say, 11 and second at 12, and write in the P coefficients as above. Biometrika xm 8 114 The Correlation Coefficient of a Polychoric Table Divided at 11. =e ay ame Pu pPu ara an ce leat Pu ny wu ae oat Divided at 21. doy Ni meea lon iiherelesr pPo Pox a Pon pPo pPoy = Py apt Now if we superpose these two schemes upon an enneachoric table with a frequency in each cell, each cell will then contain the P coefficient and frequency of each term of the expansion of S,,. (with the omission of the factor x) thus (— aPu) ~ Pa) Mn (Py1) (— aPor) Mer (cP a1) (cPo1) Mo1 (Pir) Por) M2 (— oP) (o Por) ee (— oP) (— aPex) Moe (0P31) (oPo1) M13 (= aPi) (oPo1) Mos (— aP a) (~— aPar) Mop When 11 coincides with 21, R becomes = 1 and the mean product degenerates: into the square of the standard deviation. This may be summarised in the following table in which the letter, a, b, etc. gives the suffix and the sign gives the sign of the P required. Py P,, Psy Py» Ny -d -—d -d -d N19 +b -d +b -d N13 +b +b +b +b “er +¢ +¢ -d -d Nes -a +¢ +b -d Nog -a -a +b +b ee +¢ +¢ +¢ +¢ Ngo -a +¢ -a +¢ Nga -a -a -a -a Thus the coefficient of 39 in Syo.9, 18 (+ ¢P 42) (— oP 21). This table is sufficient for a polychoric table of any size since any two cross points st. s’t’ in the table, with the planes through them divide it into nine portions or groups of cells, each of which is represented by one of the above cells. A, RitcHir-Scorr 115 The relations between two superimposed tetrachoric divisions involve the deter- mination of ten constants, four o°8, 041, O12, 21; So, and six R’s, Ryz.42, Ryy-01; Ryy.99, Ry2-21, Ry3-22- Roy.22. The o’s follow the example already given, the proper suffixes being attached. The value of S,,.,, has already been given. The remaining five S’s are as follows: 1 Siw = W2 (ePrraP eM + cPrrceP rz (Mar + M31) — wPiraP 12 M2 — oP rircP 12 (M22 + M2) rod Ceres ers OPE Wem ed ae Pe 2s | COPE Ree C9 eee ROR Ee rE (OL): 1 Sir. = ya i1aP 22 41 — cP yaP 22 Mor + ePrreP 22 M31 — oPrraP 22 M12 + aPrraP 22 M22 6 = aP yy ¢P 2232 + oPrreP22 M13 — aPuvP 22 M23 + aP ral o2 Mash verre (92). 1 Sj2-21 = Ne {aP iz aPo1 M1 — cP rzaP a1 Mar + cPrzePo1 M31 — aP 126P 21 M12 + ePizeP a1 Moe ‘ » = ePyeaPo1 M32 + oPr2vP 21 M13 — aPi2oP 21 M23 + aPr2zaP 21 Maa} +++ (93). 1 Sy2-22 = V2 teP al 22 (M41 + M2) — cPizaPo2 (Mar + M22) + ePrecPo2 (M31 + Ms2) + 5PizoP 22 M13 — aPr20P 22 Mo3 + aPrzaP 22 (Nag)}-veeeereereeees (94). if So1-29 = 2 tePoraP oe (241 + M1) + cParcP 22 M31 — wPoraP 22 (Miz + M22) — aPoicP22 Noe 5 steep atglaal asics Moai cia gai al oa lagre xovcaveseeseeecont (95). A more convenient form of the above for actual computation purposes will be found on page 120. We may now by means of the P’s express the standard deviation 7, in a form consisting of sums of squares. Y —(3C,) dr = 5 ( SP.) Cee (96), 1 2 ’ 2 1 ine, . (BCulPot = {E (SLSP a) = E (CH) Sener + 2B By Xst Xst XstXs't’ Cu (x Cr. Cis Peel pelt dE + — Siig +. Sane WAGE. 0 or as pe Oe emmy cea ) CafC. C;: C1: + a = Sui, 12min a Si, 12 + ate Si: 13 ale +] + etc. eee e rere reeeee (97). 1 Now consider the S’s to be expanded in terms of the frequencies and pick out all the coefficients of the frequency n,, say. The coefficient of the n,, taken from Sim, vm Say Will be : Pyy.: Pym in which the quadrant suffixes will be determined by the relative position of n,, to lm and I’m’. Let these undetermined P’s be de- noted by p. We shall then have as the complete coefficient of n,, Ci (C vf Go (G} ee ee ae a 047-9) ey “p +...) Xu = Pu-Pu X10 Piu-Piez ee Pu-Pis3 SP OO ae —— Dobos = Pia» Pia +.) + ete, X12 \X11 a X12 Lele X13 gee Since we are dealing throughout with the cell n,, the quadrantal suffix (i.e. the 8—2 116 The Correlation Coefficient of a Polychoric Table a, b, ete.) for any J, m will be the same throughout. Hence we may write the complete coefficient of n,, as Cu ee 12 13 = ream Dan at: Sma eee te ) Xie ee va X12 a X13 si C1» ( u C1. 13 ) 011 aE + ete X12 a x a X12 ig X13 0 =( D9 + ot pa tae) (98) Ser a cr vie Pree y In the case of an enneachoric table for example we have, XC,, being = 1, C C C C 2 oP = (-=" Pa Sa ee Pr) Nyy X11 X12 X13 X22 C C C C 2 + (0 <8,Py— <2 P 2p % Py) Ca? ot ee ee ee C C C C x az ( tar = ae == pligae = Poo) Ny3 + ete. ...... (99), Xu X12 X21 X22 the P’s being at once written down from the table on page 114. Or more generally thus: Since the P of any cell ,,, with reference to any cross point (st) is invariable it may be written generally as ,,P,,. This notation gives up the recognition of the equality of the P’s in any given quadrant but gains in generality. The quadrantal suffix and sign may be supplied by inspection. We have then the following lemma: Ssv' = B (8P5P ye) = PsP oe Mar + oP ster se Mar + + 12Pster2P 9 Mie + = >; (es Pasi Nim) Ce i ee (100). lm The standard deviation of 7, may then be developed as follows: (UC1) dr =X (Cor) Y Ay sf SP.) +25 Gee SPP.) XstXs't' C54\2 a Oban = [{s (=) Unessie alg 2m Cs Cow al ee Peet Nim ln Xst st Xst Xs't! More fully written this is C (2C,1)?o,? ae (= eden x1 eee oP a nleae + ) Nyy 1 Ca 5, Cla Coenen " ai waa + rf yo + TE aay ae Goal ecoreate o50 ooo (018), Xx X12 X13 11 A. RitcHte-Scorr iLalive §9. Tue Stanparp DeviaTIoN or POLYCHORIC 7 IN SPECIAL CASES. The value of the standard deviation when r = 0 is of interest and may be got as follows. Assuming + = 0 throughout: As aa € (ky) = WV? By, ae =€& (hs )= oF 2 and writing m’,. = N — m,. and m’., = N — m.,, then M,.M. Gigs -F, ie ‘ , ite Pst = Ya "Se a m;.m’. cleat = 119). ee alae M'1.M’ 44Mg.Meg ( ) ee) With these values we have 1 dy he G1 G2 Gr IE Be. |. on €€' € ena Aooon is symmetrical with respect to the centre of the square, hence* lte!’ ete | |l—ece’ e—€ ; N2 Noo = ie =<) (15 9)" eae ete oe bet cell ile hen le ee” ae a); 1’ 4Mg.Meg Pees, (121). The remaining minors are easily reduced and we have after reduction and sub- stitution don ay ene (n, — ,) St (a — ah) °2 on mM,.M'y. My. m.yM’ 4 BPR eo udiso (122), Aooie ea <2) N5 Eee (H, ak, H.) Sy (Ce K,- K,) O12 M,.M'y. Mg. Mgt" oo \M oy woe nine aces (123), Ao (1 — 62) (1 — 9) 2, & jee i1,) se (K, - 7 Ks) O21 Ms.M'5. \M 4. MyM 4 Mag eth emcee (124), Aooee = eee) Hs) ae H, H,) Gy (A IK K,) Sop M,.m',. \M'. Mol’ co NNW 2 deseswereen (125) * See Scott and Mathews, Theory of Determinants (2nd ed.), p. 89. A. Rirconte-Scorr 119 Summing these four latter expressions we have ide 2H,H, | H,? (N= Neogg = (= <2) = 2) NO 0000 ( )( ) Ms. Ms.1,. MyiM’s. RN en Oa Bae ie = l Mciibioee (eT te DSA anos : ries Aoooo Bae A AN 0000 — OPE LOG TELM LLL 3 He Nin eee |. | Keaqm ok Ghar is 7 ey, 7 boas, 7 MGM ag? MM A 8 Msg oo sah i Wane Masa © My... My. Mey Ny \He LSS Het Hoe x ky? Ee eR Re art ieee mM’ 5 bee ite ) When the table is symmetrically divided in both categories and + = 0 we have M,.=m',., H, = H,, K, = Kg, etc. and the above reduces to Neo Neo M1 oS Se NeH?K? . 4 (“2 — 1) (2-1) 4N2 2 Kee: M2 4N°H* K Ny. M4 M1. a * VN 9 and oS a rae con rede eed a caeenee (128) § 10. CoMPARISON OF THE STANDARD DEviaTIONS OF PoLycHoRIC r AND ENNEACHORIC fr. We may now compare the standard deviations of 7, and 7p. Se Nagi Xo X20) O% 6 — OL ay = OP tg — OP ay + OP op. coecesssecosees (129), C I C \ C C 5 Cu 5P Che § Cay Cx (Cy + Che + Coy 22) Cas 11 Pr» OP o, + — SP o. 11 X12 X21 X22 from which it appears as before (p. 108) that the enneachoric r is equivalent to a polychoric r in which the weights of the 7’s are x41, — X12, — Xo15 X25 Le. — X1u%11 — X12%12 — Xo1% 21 + X221 22 Xia > Nazis Net + Nee Hence also the standard deviation of the enneachoric r may be written OP = (= aPu t+ aPie + aPan — oP og)? My + Cte. ei... (132). Upon expansion this reduces to o on le ale ele N (Ay al By oo 1) ale a 12 a N (Aqs als By» = i)| ae ete i aaleon a V (Aan | Ba — t) = oho, +N (A, + Bs — 1)) ™ , ue Le {(Ags a Ay) (Ajp A,,) (By, ay le a etc as (ee ere hy (Oeil sey ee or P,,—Py,—P jee) ae = lw {a — O42 + Boo — Bar a ue N ai a Ny, + ete. ...(133) (using a, 6 in the sense of p. 99), which is identical with the corresponding coefficient in o,? as given in equation (30). Ve 120 The Correlation Coefficient of a Polychoric Table § 11. FormMuLAE FoR CoMPUTATION. The forms found for S are not convenient for computation. The following have been found more expeditious. Various other formulae are also collected for reference : N he+k? —2rhyky = Nz (h,, k, 1) = ——— . 202) 3 a 134), Xst ( s t ) ov 1 rake ( ) k,— rh Ab oe e ( t : :) ; V1 — h, — rk Bee (=) ‘ V1 — 9? I. = Ag, oF Be = le Pi. = Aun. + Ban. — Me when no quadrantal suffix 1s used @ is understood. P 2 a 2 2 2 11 FR Sain a, Ts Ay ++ Ay Cy + By Or ee . eee (135), and similarly for S,..32, etc., Syew= Wa Wis + By Bye (Me + M1) + Ay yn Pu Pr. eR ee AnHcdt 6.000503 136), + Ay, Ay2N45 N (138) Syy-o1 = TT, 1%) + By We) %1 + By Buy N31 + Ay Tyg. a Py Po eb ee No cesses buona) (137), S11 +99 = Th, Toga + By Wop %1 + By BooM31 + 4 Toate ii Bee oo (138) Te ee tien | 3 Syo-01 = Th yo Tey M4, + By W111 + By Boy N33 + The Ante + By Aoi tee = Py2Po es ae (139) ee No cites ; Sie-2.2= — The les (a + M42) + Byy Tog (M21 + Me2) + Bye Bog (M31 + N32) {Pres Ph + Ay, Aos%13 — aa iene eiecceteoes oa ae epee (140), Soy +99 = TT 5; Los (141 + Mex) + By ByyNzy + Ag, Hos (Mia + Mae) Tals) (141), + Ay Age (M3 + M28) N Xst° Fst” a Spans | VSie oy = Xst XstX se Fst Feit Retest! => Sgtes't! erololeleteteletetervicreletoisveleteteteterels (142), Sh site A eet Te XstFstXs't' Fs't! Doineies Ue ena a ae (143) VSien Sy os't' A. RitcHIE-Scorr 121 In place of calculating o and R it will be found easier to employ the S’s directly by writing the equations for C in the form Geen LO Sire ny! Sirs aie =} 3 12 13 a 7 ee a er eae (144). @,, pha gy, Piao, ee ze X11X12 xa X12” 8 xieX18 Eliminate the A by subtracting one equation from each of the others; put Cj, = 1 and solve by successive elimination for the remaining C’s. This is preferable to using the determinant as it is at least no more laborious and lends itself to various checks for accuracy. The A should be determined from each of the equations as a further check. Then we have Xr By putting Ch, = Xn, Cre = — X12» Cor = — X21, Co2 = X22, We may derive the enneachoric standard deviation from the polychoric 7 in a form convenient for computation in terms of the S’s, (X11 — X12 — X21 + a2)? Ore? = Surear + Syo-12 + Sor-01 + Soa-02 + 2 (Sir-22 + Sio-21) — 2 (Surerz + Sir-or + Sia+22 + Soi-22)-++ (146). § 12. CoMPARATIVE RESULTS OF VARIOUS METHODS OF FINDING 7 FROM A 3 xX 3 TABLE. In testing the methods developed in this paper upon actual material it was thought desirable to try them side by side with all the other methods of finding the correlation coefficient so that some indication could be got of their comparative accuracy. Each of the tables was therefore dealt with by nine methods which are indicated in § 13. These tables were selected at the beginning of the investigation, and had the course which the research has taken been foreseen probably a different selection might have been made. Two of them, I and III, are normal tables with an arbitrary population of 1000. In Table I the frequencies have been taken to the nearest integer and in III to the nearest two places of decimals, so that any irregu- larity in them is due to the roughness of the approximation to the true figures. In the 7,, we have an additional lack of approximation in taking 7, from the curve* for determining 7, and also in r,, ry and r, from finding the class index correlation from a small number of marginal groups. In IT and IV we have actual samples. A rough test of the value of the various methods may be made by finding the mean square deviation of the calculated from the “observed” value of r, each constituent being merely weighted with its total frequency, regarding the product moment values of 7 as the “observed” value. Thus let ,, v2 = total frequency in Tables I, II; R,, R, = product moment value of the correlation coefficient in Tables I, II; 7,, 7, = correlation coefficient calculated by one of the methods, then writing (my + Ng + ...) 2? = my (Ry — 174)? + Ny (Ry — 12)? + «.., we shall have in &? a measure of the goodness of the various methods. This gives the following values of &?. * Tables for Statisticians and Biometricians, p. lvii and p. 65. 122 The Correlation Coefficient of a Polychoric Table Mean weighted square deviation of calculated from “observed” or product moment values of r. = >2 (omitting H) Mean contingency Bee see eet ty -00138 “00102 Mean square contingency ute nao. 2 -00089 -00060 Enneachoricr ... as ale aon eae 00364 -00036 Polychoric r ds ste Ane sod Ry “00004 -00002 Tetrachoric r aoe as ace sone oe -00018 -00016 Mean tetrachoric r ae Sd eo MeaT -00005 -00003 Mean weighted tetrachoric 7... So) PH “00002 -00002 Three row n from mean dispersion* ... nm -00020 -00019 Three rowy from “individual” dispersion 7, ‘00151 00144 Marginal centroids a a oe 13 -00215 00255 I have given the value of X? including and omitting Table H, which gives very anomalous results, as yet unexplained. Broadly the best results are given by 7,, 7m and r,,, and, Table H aside, the best result is by r,. In the case of r, the results are not quite satisfactory. The figure given was arrived at by taking the mean of the raw figure from the curve and the same corrected for broad categories as suggested in Tables for Statisticians and Biometricians. An attempt was made to find an empirical formula which would give better results with the tables here de- scribed, but the result was not worthy of record. With three row y, although strictly the method is quite inapplicable to 3x3 tables, it may be useful to notice that when so applied the best results on the whole were got from assuming the distribution to be homoscedastic and using the mean dispersion of the arrays. This was largely due to several of the tables being divided so that some of the arrays contained very small frequencies which had therefore large probable errors, giving an undue effect on the result when squared. When such small frequencies are avoided the results appear to be about equally good. Of course our theory fails, as we have already pointed out, when any cell frequency is of the same order as its variation. Comparing the probable errors of 7,,, 7,, and 7, (tabulated for convenience in the Appendix on page 133) it will be seen that on the whole they are in descending order of magnitude. They differ very little from each other and, considering the labour involved in finding 7,, 7,,, would in most cases give a result with a sufficiently low probable error. The method of marginal centroids as already known is unsuited for tables with so few categories. An interesting and important relation which is not shown in the tables of numerical results (§ 13) is the degree of correlation between 71,, 712, et¢., Viz. Riz. w, Ry - 21, ete. These are collected in the table on p. 123. All the enneachoric tables are arranged so that reading from 7,, to the right, and downwards, r is positive so that the values R may be compared among each other. It will be seen on examination that Ry, . 42, Ry. 21, Rig. 22, Rey. 22 are on the whole ereater than Ry, . 9, and Ry. 9; and of the two latter R,, . 9, is usually the greater. * See § 13, 8. A. RivrcHiE-Scorr With regard to the computation of r, it will be seen from example appended to Table A that the amount of labour involved in dealing even with a 3 x 3 table is considerable and will rapidly increase with the number of cells, and it is very desirable that some short method of approximating to the weights (C’s) of the 7’s be devised. For the present it may be of interest to give here the C’s for the various tables used. Ry 1. Ry +o Ry. 28 A +3418 3041 -3196 B ‘6977 -6040 -6132 C “4880 *4550 4751 D -5100 -5678 -6038 E -2180 4953 -5378 F 4252 -4422 -4133 G 6633 6723 -6565 H 4395 +6592 +6567 K +3842 +3488 +3656 L 5014 -4831 -4798 M -5282 -2820 -2961 Pairs of brothers -8203 +8203 °8732 Ray. 22 Ry 22 Ryy-a 3450 -0608 -1466 -7050 4048 5231 4821 -1872 2614 4907 2620 +3221 1813 0830 1217 +4307 -1209 +2392 “6701 3414 -4769 4367 -2391 +3219 3882 -1124 -1632 +5162 -2276 2611 +5213 0865 -2128 8732 *8837 *8756 Cy Cy. Cx Or. A I -51333 -38587 ‘71892 B 1 -26872 *19654 55240 C 1 27000 26993 -34266 D 1 *65737 -15786 85445 E 1 3°79397 -16186 3°81496 F 1 56452 -43667 *72594 G 1 —-02973 —+13835 -69769 H 1 -72919 33838 -79708 K 1 -59665 -48857 -66355 L 1 -39951 34626 -27119 M 1 -36146 -38718 -74467 Pairs of brothers 1 — -26399 —+26399 -82793 The case of Table G, with negative weight for 7,, and r,, is suggestive and needs further study. The table has the characteristic that the mean is in n,, and the marginal frequencies are decreasing in magnitude and nearly equal in both sets of categories. The table ‘ Pairs of brothers” which is accompanied by similar weights is taken from Biometrika, vol. 111, 1904, p. 182, and is given below. It compares the athletic capacities of pairs of brothers. Second Brother First Brother Athletic Betwixt | Non-athletic Total Athletic 906 20 140 1066 Betwixt 20 76 9 105 Non-athletic 140 9 370 519 Total 1066 105 519 1690 124 The Correlation Coefficient of a Polychoric Table 11 = °8046 + -0126, Ty = °7190 + -0162, 1, = °7190 + -0162, To» = 8028 + -0132. Cy=1, Cy = — -26399, C,, = — -26399, C,,'— -82793. tf, = °8382 + -0122. These negative weights require further investigation, particularly the conditions for the existence of zero weights, but it is clear that certain divisions are to be avoided in determining 7 from a 3 x 3 table. | On the whole C,,, Cy, Cy,, Cy, are in descending order of magnitude. § 13. PricIS OF THE METHODS OF FINDING THE COEFFICIENT OF CORRELATION. 1. vy. Mean contingency, corrected for class index correlation. 2. ry. Mean square contingency, corrected for class index correlation and where necessary for the number of cells. 3. 1,. By selecting the central cell, the method first described in this paper. As its use treats any table as virtually 3 x 3, it may be called enneachorie r. 4. r,. By weighting the 7’s so that the p.z. shall be a minimum, the second method described in this paper. As it is applicable to tables of any size it may be called polychoric r. 5. 415712) %21, 22. Tetrachoric r of the various quadrants. The probable errors were calculated by the complete formula (p.z.) and also by the approximate method (a.P.E.). (Lables for Statisticians and Biometricians, p. xl.) 6. %m- The unweighted mean of 73,, 712, 721; 729: 7. % . The mean of the 7,,, etc., weighted by the reciprocals of the squares of their standard deviation. CFs Heb Mesctinpene Three row 7 calculated from each of the dividing planes as planes of reference with a class index correction on the foot of the columns. Since the standard deviation may be found in this case from the individual arrays or, assuming the distribution sufficiently homoscedastic, may be given the. mean value o V1 — 72, I have used both methods for the purpose of comparison. These are distinguished by the headings “individual dispersion” and “mean dispersion” respectively *. 9. +,. By marginal centroids. The probable error of 7,, and 7,, was obtained as follows: Let the correlation coefficients 741, 742, -.- have the s.D.’s as Ghep oo and the weights eas Ging a * The probable error of Biserial (or three row 7) has now been given (Biometrika, Vol. 1x, part Iv), but too late for use in the present paper. A. RitcHte-Scort 125 Then r,= 2hirin (dr)? = Datyy” (dry)? + 2De yy tye dry, dry. Lt, ‘ (Xt11)" ; ES Lt? oy" + 223 l12011 012 Ry ELI ot Piel 2 ao (147). 4 (ty)? When t,, = ty). = ... = 1 we have the mean 7, 7,,, and if there are / 7’s 242 5 Gee aa oe gD mee Meee Laat (148), which for convenient computation may be written Sy Sie 1 9d Sut a Xu" X11 X12 = In finding the mean weighted r we may regard 7,, as the mean of ¢;, uncorrelated . 2 i) ; a ey values of r of equal weight each having the s.p. 09. Hence oy, = Wa and 14; = 11 ie. the weights are proportional to the reciprocals of the squares of the s.D.’s. _ Putting this value in (147) we have Cie) § 14. DETAILS OF TABLES AND SUMMARY OF NUMERICAL RESULTS. I. The first table examined was taken from Pearson and Heron’s paper “On Theories of Association,” Biometrika, vol. 1x, p. 220, Table XIV, and is a Gaussian surface for r = -5 adjusted to give whole units in the cells. l l 1 2 Sue eo 4 De One teh 8 | Total 1 i 20 5 2 2 — 34 | 2 21 145 79 36 10 9 1 301 | 3 6 94 85 D4 19 22 4 | 284 4 2 32 39 31 12 17 4 137 546 | — 18 28 oor ie pall 18 5 105 7 = i 22 24 12 22 q 98 8 = 2 Gapiposse oe |= 13 7 41 | Total 36 | 322 | 264 180 69 | 101 28 | 1000 | | | I | The frequency in heavy type contains the mean of the surface. A. Table I divided so that the mean falls in cell n,.. 1+2 34+4 54+6+4+748 Total 1+2 193 122 20 330) 3+4 134 209 78 421 5+64+7+4+8 31 1183 100 244. Total 358 444. 198 1000 126 The Correlation Coefficient of a Polychoric Table ty = 482 r, = 4840 + -0170 A.P.E. Ty = 50346 + -02094 \ 74, = -498 + 02872 (-0290) r, = 48594 + -04918 To = -510 + -03210 (-0303) Tm = 5050 + +0246 [ 1, = -508 + 03505 (-0321) %, = 5045 + -0211 Too = °504 + -03259 (-0340) r, = °5145 Mean Individual dispersion dispersion hy 5031 4950 Ns “5057 4975 hy 5045 -4955 Nhe 5058 4949 I here insert as an illustration of the new method the constants required in finding r, for the above table, and the calculation of S..;, Py, and II,,, and the equations to find the C’s. Table A hyk, lyk, hyky hake , 498 510 | +508 504 h —-36381 ~-36381 -84879 84879 rh ~-18118 _-18554 43117 42779 Eh 3733945 3733945 2782707 2782707 k _-42615 69349 — 42615 -69349 rh ~.21222 35368 ~ 21648 34952 Ek 3643145 3136735 3643145 3136735 Reva ~-15159 ~-71749 1-06527 49927 h—rk Ss _ ~-1748086 ~ 8341218 1-236735 -5780570 h-rk Ey | 3928931 2817266 1856865 3375594 h-rk B=€5—— 4306151 -2021063 -8919072 -7183872 ese ~-24497 -87903 ~-85732 -26570 k-rh aa ~ -2824913 1-021920 ~ -9953132 3076287 k—-rh i, 3833377 -2366676 2431048 3805048 k—-rh A=€7—, 3887835 8465906 -1597920 -6208174 X- 165-0602 102-7353 78-53720 122-5923 I —-1806014 0486969 0516992 3392046 P 90-44050 128-8716 111-9422 383-0020 ee -001812696 000692554 -0006728030 -0001333663 SyalX 2X - 002265285 (003626002 0007349840 Ribose = a 002700292 0008662932 So2/X xX = = — 002335123 C: 1 51333 3859 71892 * The suffix : indicates that appropriate suffix is to be taken from the column, A. Rrreui-Scorr 127 Calculation of Sy2.01- Tl, 0486969 Bis -2021063 TI,, -0516992 An -1597920 1 485895 Me «209 -74.9650 Bes -2021063 Ay -8465906 TI,, 0516992 A -1597920 Mm 122 1-274745 a ae Ga 4-193630 By 2021063 17-351825 By "8919072 P,, 128-8716 me 20 Secon Py 111-9422 a eo 14-426170 IL. 0486969 2.925655 An _—_*1597920 ag 107353 My, 134 1-042704 Xen 78-5370 + 8068-543 = +()003626002 Calculation of Py. Calculation of Ih,. A,, _°3887835 Ay, 8887835 m,. 358 139-1845 By *4306151 B,, 4306151 -8193986 m.1 335 144-2560 1 283°4405 IT,, = — :1806014 My, 193 Pu = 90-4405 Equations to find C, °001812696C,, + -000692554C,, + -000672803C,, 0006925540), + 0022652850, + -000362600C,, -++ --000672803C;,, + -0003626000,, + -0027002920,, + -000866293C'x. = A, :000133366C,, + -000734984C,, + -000866293C,, + -002335123C,, = A. The solution of these equations gives the first row of figures in the C table on page 123. -000133363Co. = A, 0007349840 = A, B. Table I divided so that the mean falls in cell n,,. 14243 4 5+64+7+4+8 Total 14243 462 92 65 619 4 73 31 33 137 5+6+7+4+8 87 57 100 244 Total 622 180 198 1000 128 The Correlation Coefficient of a Polychoric Table ly = 533 rg =°510 + -0161 A.P.E. ty = 5007 + -0250 ) 71, = -499 + -028 (-028) r, = 5073 + -1487 | 7. = -501 + -030 = (-030) %m = °5010 + -0254 | 72, = -500 + -031 = (-032) Ty = °5008 + -0253 } eo. = -504 + -033 = (-034) vr, == °5445 Mean Individual dispersion dispersion Ne, 4921 4873 Nea “4917 -4763 Th, -4858 -4802 nh, 4881 -4671 C. Table I divided so that the mean falls in cell »,,. 14+2+3 44+5+6 7+8 Total 14+2+3 462 121 36 619 44+5+6 119 79 44 242 7+8 4] 49 49 139 roca ; 622 249 129 1000 ly = 537 ry = °5183 + -0202 A.P.E. Tp = 4988 + -0241 \ 71, = -499 + -028 (-028) r, = °5055 + -0652 | 7,, = -490 + 035 (-035) Tm = *4985 + -0253 | 7, = -505 + -035 _ (-035) Ty = °4973 + -0311 Too = 500 + 039 = (-043) 7, = °5480 Mean Individual dispersion dispersion Ney 4895 -5041 "ky 4934 4615 "hn, -4846 4730 ns -4947 4496 D. Table I divided so that the mean falls in cell 14». 14243 4 5+64+7+8 Total eo 277 38 20 335 3 185 54 45 284 4+54+6+7+8 160 88 133 381 Total 622 ; 180 198 1000 EH. The mean is in cell ny. | A. RitcHIk-Scorr Mey Nhe hy Nhe 0631 *0235 2330 + :0239 0236 741 = ‘501 + -030 Te = -499 + -028 Loy = -508 + *035 Mean Individual dispersion dispersion -§211 -4798 *4885 -4907 -4956 -4709 “5115 +4553 Table I divided so that the frequency of ,, differs very little from the frequency of a table with the same marginal frequencies but of zero correlation. r he Here 142 3 44+5+6+7+8 Total 1 27 5 34 2 166 79 é 301 34+44+546+7+8 165 180 320 665 Total 358 264 | 378 1000 301 x 264 : ; é 000 79-464 so that the constant term in the equation for r is a small quantity and any error of sampling will have an excessive weight. It will be found as one might expect that the p.n. of 7, is very large. The very large value of 7, is due to the column having the marginal total 378, for the frequency 2 in it is the nearest whole number to a true value and being so small, a small absolute difference makes a large fractional value resulting in a large difference between the true and apparent standard deviations of this par- ticular array. Actually the method applied is inapplicable to a frequency of this order. Vy Vg _ Biometrika x11 502 -4827 + -0246 4991 + -0245 > 4658 + -4371 -4995 + -0327 To = -4995 + -0247 } ry = 5065 Ny UL) hy Nh Usk UG} Mean dispersion -4862 -5169 -4950 -4966 500 + 056 -498 + -029 500 + -072 500 + -030 Individual dispersion -4906 -4991 +5022 ‘7150 130 The Correlation Coefficient of a Polychoric Table II. The second table examined was taken from Macdonell’s paper “ On Criminal Anthropometry,” Biometrika, vol. 1, p. 216. The original table is too extensive to be given here, but may be found in loc. cit. The horizontal categories are the heights of 3000 criminals in feet and inches, and the vertical categories the lengths of their left middle fingers in millimetres. The correlation coefficient found by the product moment method is -6608 + -0069. F. Table II divided so that the mean falls in cell nyo. | 55,%,”-64,%.”” 64,9,/"-66,9,” 66,5,”-77” | Total * | 9-4-11-3 mm. 682 270 101 1053 11-4-11-:7 mm. 282 351 286 919 11-8-13-5 mm. 90 299 639 1028 | Total 1054. 920 1026 3000 ry = °6635 t, = 0170 4-000 A.P.E. r, = 6544 + -O101 \ ry, = 667 + 013 (-014) r, = 6316 + -0301 | 7,. = -670 + -014 (013) Tm = 6530 + -0101 [ 7, = -644 + -015 (-014) r,, = 6538 + -O101 } rag = 631 + -014 (-014) Ooi Mean Individual dispersion dispersion "hy -6477 -6295 Nky -6548 -6306 nk, -6647 6151 ho 6345 -6510 G. Table II divided so that the mean falls in cell n,,. 55%,”-65,%," | 65,%,"”-66,%," | 66.9,”-77” Total oe el = a - r os = 9-4-11-5 mm. 1122 176 216 1514 11-6-11-7 mm. 191 96 171 458 11-8-13-5 mm. 203 186 639 1028 Total Tst6n a das 1026 3000 Vy = “731 ry = 6426 + -0077 A.P.E. t, = °6613 + -0108 \ 7,, = 680 + -012 (-013) r, = °6808 + “Ne | 11. = 668 + 013 (-013) + -0111 Lee 642 + -014 (-014) ry = 6573 + -0112 } 9, = -631 4-014 (-014) 7, = (182 = 3 I a Or OO eo ! A. Rircut-Scorr Mean Individual dispersion dispersion 6553 642] 6473 6546 “6657 -6669 6277 “7113 H. Table II divided so that the mean falls in cell 15. 131 55 yp/-654%;"" | 65;%4"-66;5" | 66y4"-77" | Total | 9-4-11-3 mm. 840 112 101 1053 11-4-11-7 mm. 473 160 286 919 | 11-8-13-5 mm. 203 186 639 1028 Total 1516 458 1026 3000 ry = -669 re = 6172 + -0088 A.P.E. ry = 6479 + -0107 \ ry = -648 + -014 (-014) r, = -5162 + -0438 | 7, = -668 4-013 (-013) 1m = °6478 + -0108 [ ro, = -644 + -015 (-014) ty = °6591 + -0107 } rap = 631 + -014 (-014) r, = °7920 Mean Individual dispersion dispersion Ny 6539 6279 No 6472 “6156 Mh, -6479 6103 Nho -6345 6418 Ill. The third table examined was taken from Pearson and Heron’s paper “On Theories of Association,” Biometrika, vol. 1x, p. 219, Table XIII, and is a normal surface having r= °:3. decimals were used. The values of the frequencies to two places of 1 2 3 4 5+6 7 8 | Total 1 4-04 17-16 7:55 3°30 0-91 0-92 0-12 34 2 17-41 123-59 79-76 46-64 14-61 17-67 3:32 301 3 8:86 93-00 78:31 52-04 19-20 26-40 6-19 284 4 2°83 37-73 37-24 27-51 10-95 16:31 4:43 137 5+6 1-62 25-21 27-75 22-09 9-26 14-64 4-43 105 Z 1-02 19-50 24:47 21-39 9-58 16-36 5:68 98 8 0-22 581 | 8-92 9-03 4-49 8-70 3°83 4] Total | 36 | 322 264 180 69 101 28 1000 | 132 The Correlation Coefficient of a Polychorie K. Table III divided so that the mean falls in cell np. Table 142 3+4 5+6+7+4+8 Total 1 49) 162-20 | 135-25 | 37-55 | 835 34+4 142-42 195-10 | 83-48 42] 5+64+748 53°38 113-65 | 76:97 | 244 Total 358 | 444 198 | 1000 ry = +3088 Tg = °2960 + -0199 A.P.E. Tp = 3000 + -0246 )\ 7,, = -300 + -033 (-034) 7, = °3000 + -0991 | 72 = 300 + -036 (-035) 'm = °3000 + -0249 } 11 = °300 + 038 (-037) Ty = 3000 + -0247 J r.. = -800 + -038 (-039) fo — 3025 Mean Individual Mean Individual dispersion dispersion dispersion dispersion A “3011 +2988 7h +2998 +2997 Nie -2999 -3003 ie -2989 2989, L. Table III divided so that the mean falls in cell n,;. | 14+2+3 44+5+6 7+8 Total 14+2+3 429-68 134-70 54:62 619 44546 132-38 69-81 39-81 242 7+8 59-94 44-49 34:57 139 Total 622 249 129 1000 Py => °330 | ry = °3095 + -0225 A.P.E. ry = *8000 + -0282 ) 7,, = -300 + 032 (-033 r, = 3000 + -1031 Tyo = -300 + 039 ( = -3000 + -0326 15, = °300 + -040 (-041 Fy = +3000 + -0285 } reo = -300 + 046 ( r, = “8152 Mean Individual dispersion dispersion Nk, +2965 -2975 Ne -2910 +2898 Mh, +2953 +2975 Nhe +2909 2871 IV. The fourth table examined was from Pearson and Lee’s paper “On the Distribution of Frequency (Variation and Correlation) of Barometric Heights at Divers Stations,” Phil. Trans. A, 1897, vol. 190, p. 453, Table IX. The original A. Rircute-Scorr 133 table is too extensive for reproduction and may be found in loc. cit. A condensed form of it will be found in Biometrika, vol. 1x, 1913, p. 223, Table XVIII. This was selected as an example of a very skew distribution. The correlation coefficient found by Product Moments is :780 (Biometrika, vo]. 1X, p. 223). M. Table IV divided so as to give a reasonably large frequency in the cell gp. The mean falls in the cell ny,. | 30-1” and over | 30’’-29-8’” 29-7’ and under | Total 29:9’ andover | 1086-5 | 412 43 1541-5 29-8”-29-7”” | 144-5 275 103 522°5 29-6” and under | 56-5 323 478-5 858-0 Total | 1287-5 | 1010 624-5 2922 fy = 189 Tg = -7504 + -0210 A.P.E. ty = *7864 + -0077 \ 14, = +780 4-010 (-019) r, =*7745 + -0151 | 74. =-787 4-011 (-011) im = *T8TT + 0078 f 75, = -795 + 012 (-011) ly = *7858 + -O0077 To = -785 + -O11 ( 012) r, = 8770 Mean Individual Mean Individual dispersion dispersion dispersion dispersion Ne, *7857 ‘7116 Nhy -7962 “7417 Nhs -7812 -6951 Nhe “8065 6841 Appendix. Probable errors of 7m, 7», p- P.E. of P.E. of | P.E. of | arithmetic weighted polychoric r | mean (7) mean (7,,) (7) ie A | -0246 0211 0209 B 0254 0253 -0250 | C 0253 ‘0311 0241 | D 0239 | -0236 0235 E 0327 0247 0245 iF ‘0101 0101 ‘0101 G ‘0111 “0112 ‘0108 | | H -0108 ‘0107 -0107 | K -0249 0247 0246 L 0326 0285 0282 M -0078 0077 0077 My thanks are due to Professor Pearson, who suggested the enquiry, for his ever ready help and advice throughout the work. I have also to thank Miss Alison Robertson for assistance in reading the proofs. ON A FORMULA FOR THE PRODUCT-MOMENT COEFFICIENT OF ANY ORDER OF A NORMAL FREQUENCY DISTRIBUTION IN ANY NUMBER OF VARIABLES. By L. ISSERLIS, D.Sc. 1. In Biometrika, Vol. XI, Part III, I have shown that for a normal frequency distribution in four variables, if Pxyet = SSSS {Newt xyz} /N zye2et denotes the product-moment coefficient of the distribution about the means of the four variables and q,,,; 1s the reduced moment, 1.e. Vayet = Diyat| Ga Oy Or One then Veiyet = Tay Vee Vie lnk oe Te tae oe enaenc onsen ote eee (1). In this result any two or more variables may be made identical leading to a variety of results for moment coefficients of distributions containing fewer than four variables but of total order four, for example identifying ¢ with « we obtain Gsejg Vga DT ey Neg 0 Soke Tocca eee (2), and putting y=z=t=¢@ we find g,:= 3; of course q,, = 1,, and q,: is merely By. I suggested that (1) was probably capable of generalisation, and I now propose to prove a general theorem which gives immediately the value of the mixed moment coefficient of any order in each variable for a normal frequency distribution in any number of variables. 2. Consider a normal distribution, total population N. Let Ny... denote the frequency of the group in which the characters differ by 7,, %, ... 2, from the mean values for the whole population and let Pit ole pln = S (Nya. nha 2” --- 0) [Nooo nce soe ate (3), denote the moment coefficient of the most general kind about the mean values of the characters. The corresponding reduced moment will be =e; l 1, l G5, la ete = Palisa (ita) Ta Gees ny, etree ae (4). Then for normal distributions, It: beodds: gait, On ca nea ase ts eee a eee (5), andi if.m beeven,. “Gist n= 0S Canlica cena) eee eee (6), where the summation on the right-hand side extends to every possible selection of n/2 pairs ab, cd, ... hk, that can be formed out of the n suffixes 1, 2, 3, ... n; equa- tion (1) is thus a particular case of (6). Equation (6) is the theorem it is proposed to prove. The value of qyhol.., nln is at once found for given numerical values of the indices 1,, l,, ... |, by writing down (5) for 1, + 1,+...+1, variables and identifying the values of 1, of them with that of the first and so on. L. ISsERLIS 135 For example if we require the value of 9,252.2 we commence with 12223 ; qizsase = S (Tap cal es) = 112 (1'3a1's6 + 135746 + 136745) + 113 (724756 + 1257 46 + 12675) + Tq (193756 + 1957'36 + 26735) + 115 (723746 + 724136 + 126734) + "16 (Tos Tas ++ Toq l35, + Tos tsa) Cece recor cre rcccee sve rer erereeseseseeeeese Gi: Identifying 4 with 1, 5 with 2 and 6 with 3 we find at once Gig? tei te aege oe OTigghe 1 Ol plage «ve es enen dss saaus (8). 3. We note first that q. which in the more usual notation for distributions in one variable is p/p.” is known to have the value 1.3.5... (m — 1) when n is even. As regards 8 (rgy?cq--- Tnx), Hf all the n variables are made identical, each term becomes unity and the number of terms is the same as the number of ways of break- ing up an even number (n) of objects into (v/2) pairs. This last number is clearly n! n—2! 4! so Be rey OPA 2!n—2!2!n—4! °° 2!2! which also reduces to 1.3.5... (n — 1); thus equation (6) is correct for this par- ticular case. Secondly let us consider the value of qn-1.. The mean value of 7, for a given value of 21 18 1y902%,/0;, let o Ly = Typ at, + Xp. O71 Then the distribution of X, fora given value of 7, is itself normal and its kth moment is zero for an odd & and 1.3.5... (k—1) (05) for an even & where ,o, is the standard deviation of 2 within the 7, array so that paras ae 1027 = (1 — 742°) 07. i Qx"-1g = —y-y7~ Mean value (2,""1 2) Cy Op 1 H Gp 2 = = Mean <7," Mean (7,,—@, + X, oy" 02 oO; aT a sea leteoie ares (20 — Bl Vine eatec 20 UC ci wineras cee veces ote (9); The method employed in the original proof of equation (1) is not convenient for generalisation and we will now prove the equation Yio3a = 112734 1 13% 2a + yal 23 by the method that leads to the general case. O oO r Putting as above Le = Tig — yt Xe, Oy op. fy = 113 —~ 11 + Xz, Onl 136 High Product-Moment Coefficients we have 594 — Mean of (012,25 %,) = Mean of {x, (Mean of x, 7,2, for a given value of 2,)} = Mean of E {Mean of (1 ee X,) (m1 Stee X,) (ru S42, + x,)t|. rl O71 O7 Now for normal distributions (and if the original distribution is normal, so is that within the x, array), Mean X,=0, Mean X,X,X, = 0, while Mean X,X3 = (109) (103) 1723 — 93 (123 — 112118) i ‘ wana =V1 = 12 0,V1 — 32 : 12 92 18) j= Vise N/a 113° = (Tog. — 19715) CoOg. + 07 or dividing by o,020304. Yiosa = M127 1371409 + Ge (12 ("3a — 113714) + 713 (12a — VM 127'14)} + 11a (723 — 112718) = 110134 + To3?1q + 1147 28> since gz = 1 and qsi=3. Thus our formula is established for the case of four variables. 4. We will establish the case for n variables by induction, and it will be con- venient to denote by 44o34..., the value of the reduced product-moment coefficient for the variables 2, 3, 4, ... n within the x, array so that Mean value of (X,X,... X,) (102) (193) «++ (an) where X,, X3,... X, denote as before the deviations of the variables from their means within the z, array. Of course when v is even, 19234 259 19234...n 18 Zero since n — | is now odd. Let n be even and assume that our formula has been proved true for all even values of n up to n — 2 inclusive, then aon = eam (rds Vane) = Mean {ry (m1 s a X,) (r1005 z ap X,) ee (ring a x,)| = 115715 -.- Tin 0905 ... 0, Mean (@;")/o,"=" + 8 {(ryaTwl 10 ---) (GaOpOe ---) Mean (X, Xg)} Mean (2,"-*)/o,"-* +S {(rig%wTie --+) (Cg 0pG, ---) Mean (X, XpX,X5)} Mean (a,"~)/oy"— piaeee +S {71,0 Mean (X,X, ....X,)} Mean (a12)/Gq252-3-- sues sorcese eee (11), L. ISSERLIS 137 the summations in each line extending to all possible permutations of the suffixes 2,3,4,...”. The last line for example being Mean (7,") O71 {74909 Mean (X,Xq... Xn) + 11303 Mean (X,X,Xg... Xn) + «.- 4-47.76, Mean (X_X, ... Xp-1)}. Now we have seen that Mean (X,X3) = (93 — 719713) @203- Similarly, Mean (X,X3X4X5) = (192) (193) (104) (15) (172305) = (402) (103) (104) (105) [(a'23) (245) + (435) (2a) + (a"28) (7°34) = ("93 — 112713) ("45 — Mra 15) + (35 — 713715) (124 — T12714) + (125 — T1215) (73a — 713%); and our assumption of the truth of equation (6) up to (n — 2) variables will enable us to write down the mean value of every product of X’s occurring in (11). Dividing by 0,0, ... 0, we have, remembering that Mean 7,"/o,"is 1.3.5...(m—1) Gree elas. Ty) leo O ... (7 — L) +S {fallic «++ (fap — Nie te)} 1.3.5... (m — 3) Setahitie ee tas — als) (ys —tislis) |} lsd Or..6 (m — D) oe hee ES ftiaS (Gap — tials) (ys — TiyT8) (Ten — Tietip) ---]} 1.205 (12), where S’ refers to permutations of aBy ... only, and S to permutations of all the suffixes a, b, c,... a, 8, y..., 1.¢e. all the suffixes 2, 3, 4, ... n. It is clear that when the right-hand member of (12) is completely expanded no terms can survive which contain as a factor more than one correlation coefficient with suffix unity. This is easily verified in simple cases, and if in the general case a term 7015-7 ao --. Survived, this term would reduce to 7,.\ when we identified the characters a, 2, 3,...”, which contradicts the value 1.3.5...(n— 1)”, we have already found for it (equation (9)). The value of the right-hand member is therefore easily found by neglecting all terms containing more than one such factor. Hence on the assumption that (5) is true for all values of m up to (n — 2) we find 9rog..n = S {tiaS” (Tap Tyaep ---)hs but this is exactly the formula we wished to establish for it is obvious that S (apt ea ++» Tnx) Where abe ... k is a permutation of 12 ... n is equivalent to Sag) (aptys «--)} where a, a, P, y ... isa permutation of 2,3, 4, ... 2. Thus our formula which has been proved true for 4 variables is seen by induction to be true in general. 5. Formula (6) can be exhibited as a multiple definite integral: Let A denote the determinant whose Ath row consists of the elements (Tks Yon <=- Th—1, «+ Met, b> -+- tnt) and let A, denote the cofactor of the element in the Ath row and kth column. 138 High Product-Moment Coefficients Aap, Bre Lp, 2 a ah no np Let x >) ( A One T 2D Anz meal and Sa Ter mae a SS (21)? oy 05 ... Gp VA +n fn a then | | | Lu Bohs. By ZAG Ae... 00a — ON tan eae eae) eee (13), where a, b, c, d, ... u, v are the suffixes 1, 2, 3, ... . in any possible order. It is clear that (13) will enable us to write down the value of the multiple in- tegral | Pe- dz, ... dz, where P is any polynomial in 21, 25, ... Z, on Q a positive a quadratic form. In fact, let La, n%_? + 2UA pL pVq, (Aq = Uqy) be a positive, definite, quadratic form, then ee) DQ me) W= we | 4% Po ... Ly MEXP — £ (Uy Ly? + 2UGp 9% pL,) dU dty... OL, 1 fe ie I ANGRY op HOD ae ae Pca ee na nt a Y ane | Nees =O Ops? <2 Gn (227)? Gy 9.25 ..0y VA I L Fae eX — 54 ( EA gy to + OSA, ma) dy. a 2A \ On Oy =D [rev%ea «+» Tar] Where abc... hk is any ee of the a, + a+ ...+ @, suffixes of which a are equal to 1, @ are equal to 2 and so on. Let D denote the determinant of the quadratic form and D,, the cofactor of a,, the two multiple integrals will be identical if 22 2 Oe Crear ont CHP AUD) oc ] = O7 Tae f = Gq Oats On Ora ND pee Hence rp.2 = [DyqP/DppDoq and o,2—= D,,/D while A = D°/D,,Dy ... D n sosthiat W- aia “3D, Die. Dean (13), nn? where a, b, ...h,k is a permutation as above, and m=a,+ a,+ ... + @, is even. W =0 when m is odd. As an illustration of this result: ie fe i (Ma? y?z? + Na*yz) f —Ol —O! —O exp — 4 (aa + by? + cz4+ eke + 2gzx + 2hay) dxdy dz 3 / _ (27 Oy gran + 2AF? + 2BG2 + 20H) +27" y eH + AP), A7/2 ~ A512 where A, B, C, F, G, H are the cofactors of a, b, ¢, f, g, h in Perea ee load 20) N= she Bose L. ISSERLIS 139 A cognate result is discussed by Mr Arthur Black in the Transactions of the Cambridge Philosophical Society*. Black’s integral is | Ve-Ydz, ... dz, where V “nn and U are any quadratic functions, the only restriction on U being that it should be essentially positive. Other particular cases have been dealt with in the paper previously quoted, and for the case of two variables several results are given by Mr H. E. Sopert. For reference we add a table of values of the reduced product-moment coeffi- cients that occur frequently in formulae for probable errors and similar work. qui = 3. G2 = 312. Que = 1 + 274)". Ge03 = 23 + 212113. Qus = 15. sg = 1572. Guioe = 3 + 127457. Guia: = I yo + 679°. yes = 3 (M3 + 223712 + 2713712"). Gitog = 3 (P93 + 12749743). Grzo2g2 = 1 + 2rg3? + Qrgy” + 2ryo? + Bri27 23751. Gs = 105, dire = 1051, dysg2 = 15 (6ryQ? + 1). Gaso2 = 15 (414. + 3749). dutos = 3 (8r yt + 2477p? + 38). Ptrog = 1.3 ...A — 1 (199 + AT i217 15): A even. Qyrorg = 1.3.5... (A — 1) 15? P 43 + 113 + W979]. A odd. For the case of two variables we add the following formula which is easily proved by the methods employed in this paper. duvar = ap (w+ 0) 7° + (5) (2) ob (w+ 0 — 2) 2 (1 — 28) + (Cb Mb u to sy rr — +t the series terminating. Here b (2m) = 1.3.5... (2m —1) is) Oe) Ol) and m = m! * Vol. xvi, 1898, pp. 219—227. t Biometrika, vol. 1x, p. 101. t This is virtually the formula (xxxii) employed by H. E. Soper, l.c.a. corrected for some misprints, ON THE MATHEMATICAL EXPECTATION OF THE MOMENTS OF FREQUENCY DISTRIBUTIONS. By PROFESSOR AL. A. TCHOUPROFF of Petrograd. INTRODUCTION I (1) One of my pupils, O. Anderson, in a brief exposition* of his researches on- the Variate Difference Correlation Method in Biometrika (1914), draws attention to the superiority of the method of mathematical expectation over the methods usually employed by English statisticians. The small popularity enjoyed by the method of mathematical expectation in England is not of course accidental. ” English scientific tradition rejects the concept of “mathematical probability. From the time of R. L. Ellis and of the first edition of John Stuart Mill’s System of Logic, the logician’s basis of probability has, in England, been the notion of empirical frequency. English mathematicians have followed the lead of the writers on logic in their preference for the idea of statistical frequency, and the method of mathematical expectation has naturally shared the fate of the concept of mathematical probability on which it rests. Notwithstanding its deep-rooted historical basis, English statisticians should break with this tradition. The substitution of statistical frequency for mathe- matical probability does not obviate the logical difficulties in laying the foundations for a statistical study of Causation, but merely shifts them elsewhere. The gain from the point of view of philosophical representation is sufficiently doubtful, while from the purely mathematical point of view the rejection of the ideas of mathematical probability and mathematical expectation is accompanied by very substantial disadvantages. Verbal formulation becomes very complicated, leading to loss of economy of attention: it 1s continually necessary to speak of “the statis- tical frequencies which would become established if the number of occurrences were infinitely great.” The absence of a sharp distinction in terminology between statistical frequency in the exact meaning of the term and those quasi-empirical * Anderson’s research was carried out under my supervision in the statistical seminary attached to the Economics Department of the Petrograd Polytechnic Institute; the results he obtained were to have been published in extenso in the Proceedings (Students’ Section) of the Economics Department, but the War drew Mr Anderson away from his scientific pursuits to other work of a more practical character and the complete publication of his researches had to be postponed. Au. A. TcHOUPROFF 141 “frequencies which would become established in an-indefinitely great number of occurrences” often fails to make the very statement of the problem clear to the reader, and occasionally it would appear, to the author: when reading published papers one not infrequently feels that the author does not give himself a full account as to what it is he is really calculating. Little harm follows so long as the problems dealt with are comparatively simple. But at the present time there are problems waiting for solution which are so complex that the slightest obscurity in their formulation threatens to become a source of error in the final deductions. When we start with “mathematical probability ” and “mathematical expecta- tion” as a foundation we substantially simplify the mathematical exposition. The logical analysis of the conclusions to which we are led is not injuriously attected by the substitution of one set of terms for the other during the calculations. (2) If the variable magnitude XY can take the values &,, &,... & with proba- bilities p,, p2, ... px, I call the system of values &, &,... & and the values Pr» Po ++» Pe associated with them “the law of distribution of the values of the variable X.” The law of distribution of values lies at the base of empirical “ frequency curves,” just as the mathematical probability of an event les at the base of its statistically established frequency. Denoting by the symbol HX the mathematical expectation of the variable magnitude X, we have as is well known: k EX => ;&:; j=1 z where p=. Me I call the variable magnitudes X, Y, Z,... mutually independent, if the law of distribution of each of them remains one and the same whatever values are given to the others. In this case HX remains constant for all possible values of the variables Y, Z, .... If the law of distribution of X does not remain the same for different values of Y, Z,..., the variables X, Y, Z,... are mutually dependent. The mathematical expectation of the variable Y on the supposition that Y has received the value n;, Z the value €, etc., I denote by E(t S -)X and call it the “conditional mathe- matical expectation of X on the supposition that the remaining variables have received definite values.” It follows from the definitions that B(X4+V+Z+...)=EX+ EV + EZ+... both in the case when the variables are mutually independent, and when they are correlated, and that Ha EXOV7 a...) — (EX CRY). :, if the variables are mutually independent. 142 = Expectation of Moments of Frequency Distributions Lt Y e In the case in which XY and Y are correlated we have: k - EXY = p,£, EY, j=] Je k bs EY = p,E®Y. t=1 1M (1) In investigations in the theory of probability we frequently have to deal with expressions of the type: N(N—1)(N-2)...(N—k+1). Following the example of Capelli*, I use the shghtly modified notation : Pa Agee py a N(N-4Y)\(N 42)... MN +h-1) = No) ee k Let Ne= > y,NC4 i=1 hep fete Shenae octe carmen (2). NE = & (-1) J The coefficients I have denoted by a, 8 are beginning to play an important part in the theory of finite differences+ and are of the first importance in all investigations into the law of large numbers. Their properties were first studied systematically in Chapter III of Cramp’s well-known work, Analyse des réfractions astronomiques et terrestres; some of their properties were discovered by investi- gators studying Bernoulli’s numbers; recently they have received the attention of the Italian mathematical school associated with Césaro and Capelli. The methods I employ to solve fundamental problems of mathematical statistics are directly founded on certain properties of the a, 8 coefficients. In view of the fact that I shall later on frequently make use of these methods, I state here, without proof, those properties of the coefficients a and 8 that I shall have to quote in the present paper?. (2) We have: a1. = 1 | Ce tot Gl Cen ln nossa nnhhccnodammnonnisadoaautroonatnpaeBandden jodgb0 5c 200002758 (3), On i= Ope a + Okara | Fe Os eas ChOSN Paes DiGi & Oe ay : Sere a Wer Aa * Vide Capelli: ‘‘ Instituzione di analisi” and the same author’s ‘‘L’ analisi algebrica e |’ inter: pretazione fattoriale delle potenza.” (Giornale di matematica di Battaglini, Vol. xxxt.) t+ Cf. A. A. Markoff, Calculus of Finite Differences (2nd edition). + Readers interested in the proofs of these properties, many of them established for the first time by myself, will find a complete analysis in my paper in the Proceedings of the Petrograd Polytechnic Institute. Au. A. TCHOUPROFF 143 Putting es ee She ert ates i.e. denoting by C;’ the number of combinations of /: clements / at a time, we may express 4%, ;—, 1n the form : n—-1 Ob, k—n = = A Of eater irr ever vayerelafeisceseve\eis/a\9.si01s\a\eio\s (elaie:s (5), 7=0 where the coefficients A,,; are independent of /: and are defined by the relations : An o=(2n—1) Ana App (i=) Agere (21g aL) | reat acelin (G6). Annet Ln aa Hence alo... (2% — 1) \ Hee bd. >...(2n—1)4[n—1] | A= 1.3.5...(2n—8) {k[n— 9 +2[n —1]-3} | Ans=1.3.5...(2n—8) {A [n— 2-41 4+ 4g [n — 2] + i [n -— 2IP} " ea edD)...(20— 9) (aeq(n — 2] 4+ [nm — 2) 9) + le [nn — rote | oe +o a2) Ans=1.3.5...(2n—5) {rpg [n — 8 + ghey [n — 8]-9 +44, [mn - 8] +745 [n—3] 14+ hy [vn - 3] The coefficient A,,,-; can easily be expressed in an independent form. Putting 1A Sy Sans Os we shall have : ee CR OE Th, ‘aie SS eee ia (9), where the summation extends to all possible positive integer values of 7, ., ... 7, satistying the relation: 4,+7%,+...+%,=1, and to all integer values of h,, hy, ... he, satisfying the conditions DM i eee Ny hy thet +... thpy ante. Introducing the notation 7 inf (@) = VW fw —j)= = Ely CF (@=0) hie eee: (9), and noting that yh h— He =C a we find from (5): (k) on k—n n-1 ? og a A: Vv! Al On j=9 He k-j 144 Ezapectation of Moments of Frequency Distributions When h >0 and 2n—h > By Ae ~ bcvscauless ssiiqdeeiiieelaa see eee EER (12), where the coefficients B; ; are independent of / and are determined by the relations Iii SURE oo (CH) 1) Be; = (2) —1 =1) (Bas Baal eee eee (13). ipods BrP) son i) Hence we have: Biol ope e) =) Bj=1.3-5..2—-)sy—U QG- i. 15 lle ae eC2j 2) eer) oe en tj 25 ss ees = Or & ©. Il CSF 209 Or : \* 2(14): By, =1:3.5... (3 —5) ete GV — 209 4 4 — 21 + ely — 2 +4 21 Byg= 1.3.5... 2) —5) (alls Li BI + 3 + EG 38 | + 197-3) + 8-3) From (12) we find, when h > 0, Tex) y2j —i—f \ Vin By, oe = 2 Bi, ene fe jth ae Vo Fr,5=9 Vee h S h-i Vi By, 3= Res rely (tt I oB8oDaNDOI O00 0e20 (15). tia = B,,=1.3.54.2)-2) va Bij = Bi ask wR ie vi ita =0 ; Au. A. TCHOUPROFF 145 (4) Further, it is important to note certain relations connecting the coefti- cients a and B: n—-Ie a= Oni ORIG O's menos tase « Da Lay thee: (16), O= k+m-1 Ope Rx, THM) isie's\sisielove\sis @lsla\eseiels.« (17), On, n—k Brisk m= iM t= where the coefficients R,,,,; are independent of m and are determined by the relations : Bom, i a Die \ Ru, o,: — dlyoe | Ry, mo = (2k + 2m — 1) [Ry-a,m,o + Re, tll | Bee (alice): Ry, m8 > (2h ap i = dl t) pe nes aE Ti meal + (k + 2m — t) Pieaa micas + Rx, tae a Ty m—1,i—1 From (18) we find apie ig — Wee Oi. (2Si—> L) CF | peat ies a ain a er a e(L9)) By sta = a -1.8.5... 28-1) {OF + C4} | and, in general, Tipp poise oh HH OL a 1 Sean an Se eRe (20), j=0 where the coefficients 7,;,; are independent of k and are determined by the relations: Ts,h,o = Are h h = 754,59 = Ber AR 7=0 ge WS Veg (St 9 — ht) Te,h-a,5+ (SJ) Te,taja Putting U 1-j l—j-1 ) a : Mer So . ) t,. ite ae {Oo 7-3 pte OS ei epm l 8, 2h, 7) Pe oe ha ee. (22), , Bs < l-j on Dies TU anaes Ce Ur 2+, 7 j=0 we find further: h es s-k-l Ry, s—k,2h Sah t. ht Cer | cee (23), h ae OS s-k-l ys—k—-I-1 k, s—k,2h+1 nae or hl [C.on-+2 WF Cay Sa Biometrika xu 10 146 Hapectation of Moments of Frequency Distributions where tso,0%1.3 om ee eT fa i=1.8.5...(28— 8) (élis= 11 is— l= te,2,0= 1.3.5 --.(28—5) {aby [s—2]-9+2,[8—2]- ne J [s—-2]- 45 [s—2]? ts 1=1.3.5...(28—5) {o4y [s—2] 1+ sh [s— QI) + 42 [s— 2] 4-4 [s—2]} ts00=1.3.5...(2s—5) [,85[s—2]-94+4 [s—2]-14 47 [s—2]-14 4[s - 2-4] UPR vi demes eres Seis nee 49 ol WYP ie t's, o= 1.81 Dee. (2s — 34a, fs — Ie es = ee Eis soil | f’gy1= 1.3.5... (28 —8) {ot [s — 2-41-44 [s — 2] 4 2 fs — 2] | | A Or = bo H | Et — \ | ,e J fien9=1.3.5...(2s—5) (een [is = Slat te [ig = 8 la ig allel totes Bj te is— olla eee 45))) fo1= 1.8.5 ...(28—5) Geter (8 = BIg [SOI ele! +48 [s— 3-94 & [s- 39 | (25 — 5) {yds [— 3] + fr[s— 3] + 8h [s—3I9| + 48[s—3]-44[s— 3] pope ee Or From (17) we find, when h > 0, vi B ns ya Me+2m—i—-j \ (n) a, n—-k “n-k,m 7 Aa k,m,i ~ n—j | QBk+2m+h | (2) n,n—-k ery as = 0 7 2K-F 2 —h a B = ae h-i -kln-k,m = any t ~ w-2k—Qm (n) n,n-k'!“n-k, im Solar SEO ED) k,m,ti ~ w-2k-—2m+h iv ...(26), | 2k+2m ¢ 7 = 9 Im, — vi hs ea Boe m a hk, m,0 7 1 3. 5. (2 k+ ZIM. 1)C Be. n) | = 1 a 2 (n) Bon n-k [chee m ms R, m, n-2k—-—2m nth | Me Oo =e Bran m saa (5) In Chapter IV we shall have to deal with more complicated expressions of type: : Views oa. Ge ry hy Orgy raahig «2 Orgs ry he rg bret techy Wao he ee In my previously quoted paper they are not considered as I met them for the first time in connection with the problems considered in the fourth Chapter of the present paper. My discussion of these expressions has not so far led me to results which may be considered final, and I shall merely indicate the method by which their fundamental properties may be established. Putting n+r.t+!...¢7,=R, h+h,+...+hp=H, let us replace a and 8 by their values from (5) and (12). Noting that, as is well known, [et yJoms S Cyi ali yeni, Au. A. Touourrorr 147 ry Pae—W [r, — hy + 2 — hg to FE H My JOV ON ¢ Se Fe Uh gly ES y : S - ay ay ep =, (2h, — 1,)! (Qh, — 4)! ... (2he— be)! g Gi! Go! -- Gra! (2f— 2H —j—g)! vr; [-(@2h,-1,)] i [-a,] ie} ([-—@hy—1,)) ra [- 92] Re AO SESE [72 — he] 4, we find: ’ Tee Bae Wes Ce Crier het Or rene Raa fee | 3 i ms Cee i Aye pines eon Byun, j - | FEO =O pa 0 (2h, — t,)!(2h.—1,)!... (2hy—h) 1 (QF -— 2H—J)! | SV 3: Me py @hy I] ply b)) | f | ie —(2hp—y—lp— (9h . — (2hy—lk) —(2f-2H-j- k-1 1 [—@he-1—Ue-1)] ([-(9r-1)] Vv" rl (2hx JU aera (2f-2H—-j—g)) (rp-1) | k-1 rea o. hy] (1%) ‘ke | hy-1 hy-1 hy-1 SoS a where S denotes Dee ceo We 1 TP=00,=0)) We=0 ; 2f—-2H—j 2f-2H—-j—g, 2f-2H-j- Gy = 2-0 Thm S denotes > > ee > g q=0 92.=9 9k-\=9 and J=N"+t+G. +--+ Gea- If we note that Mies rea [r, — h,]i-1 = 0, when 7, > 2h, -—1,+ 91, ete., we see without difficulty that, when 2/< R, the sum we are discussing is equal to zero. If 2f=R, then the only non-vanishing term in the sum is the one corre- sponding to ;=/,=...==7 =0 and g,=7— 2h, go=Ve— 2ho, »-+; Pea=Th-1— 2hnr, 2f— 2H —g=7;,— 2h, and the sum reduces to Cree Grae dar C,,2h% A}, ,0 AG uae Aliin.0 Bie NAndoUcHobonAdE (28). If 2f=R+1, there are three types of non-vanishing terms: (1) terms for which 1, =1,=...=),=j=0, and for which, of the quantities 9,, 9, .-. Gra, 2f — 2H —g, one, e.g. g;, is equal to r;—2h;+1, and each of the rest is equal to r—2h; (2) terms, for which 1, =1,=...=],=0, 7=1 and all the quantities 9: = 7; — 2h;; (8) terms, for which 7=0, one of the quantities /, e.g. J;=1, and the other quantities / vanish, g;=7;—2h;+1, and the remaining quantities g are each equal to r — 2h. Noting that Vr Xl X — hi-# becomes r!h (7 — 2h+1) when k=r—2h+41 and X =r, we can without difficulty reduce the sum we are considering, for the case 2f= R +1 to: 21, 12h y2hK / ) : Jal 6 (Op {Cre nome Ajo Ap,,o ++» Anzo Bry ; \ = ths 0 2h, py2h, 2hy, i MC ae On Any An Ang Bey oo =e ) 2h,-1 py2h, Rh , + On OF nie Ce Anan ges Aap Bru, ‘i (29). =e , -H, 0 12h, cy2h, Wry y2hk-1 ao Ce neice Ci, AtyoAnso ++» Atg-o An Beir 2 2h, y2h.-1 y2hh ' sae aU Aj, ,o Ani.» Anjo Bras stewart | 2 rk-1 "ke H,0 ) 10—2 148 Expectation of Moments of Frequency Distributions CHAPTER I Consider a variable magnitude X, admitting the values &, &,...& with probabilities p,. po, -.. py. Let us make NV experiments, and suppose that the law of distribution of the values of the variable remains unaltered, and that the separate experiments are independent of one another. Denoting by X; the value taken by the variable in the ith experiment and by n; the number of times, out of the NV experiments, that the variable X takes the value &;, let bi =H | B | Me 9 SIX Dae Bal pe; | fe = hh (Xm y= 2 Oe Sr (E; — m,)" Bes bi adaaegeahe (1). PE a is | SONY eee rae Wi, 9° jet j | My, (N) = EX" | My, (iN) = E [X wy) ‘a my |" ) We have, whatever be the law of distribution of the variable X : k k > p= = p; =1, inal gan k Sane j=1 Noy, (N) = 14, My (N) = bh = 0, Mo, (N) = bo = 1. We find further, without difficulty : by =mM,—- mM; ) We = IN; — 3m, my, + 2m 3 | bes =m, — 4mm, + Om,m? — 38mi Hi. =m, — CO mp m+... +(— 1) C2 m,_4m," +... p++e(2). + (— 1)? 0,72 mom? + (— 1Y (r — 1) mm" By, (MN) = Mtr, ON) OA Meyp—y, (N) Wy SOD in (- Tye Ch My—h, (N) mt ar o00 + (= 1) Ce me on mi (= 1D GS De See =, Au. A. TCHOUPROFF 149 Conversely, expressing the quantities m, in terms of w,, we find: iol \ m,= HW {(X — m) +m)" =m" + & Cf my" py + py | h=2 r-1 : | My, (Ny) = ™," + > Canta Ln, (nN) + fr, (x) Lo seeeeeeeees (3). n=2 i ; | Yh D2 (-1) 6 m,-a Mn, (N) = Ge NC ree, Mn, (N) | h=0 h=0 sf II (1) Noting that eet r Ty X" wy — |: x, ) Ni a ; N oe we write | > X;| in the form of ie oe Sees Ses Se mr, Xe, Gt t 7 Tye Var ee Tj: “ fi where, as is known, the summation with regard to j extends from 1 to the smaller of the two integers r and V; the summation with regard to 1%, ts, ... 4; extends to all integral and unequal values of ¢,, ¢:, ... 7) from 1 to N, and the last summation extends to all positive integer or zero values of 7, 7%, ... 7; satisfying the relation Tata... j= 7. - Passing to mathematical expectations we find hence f 1 r! a 4 SUAS) *. wr - ; My, (N) = Bas Se SS —— -, EX Paz EX ee EX if gi% Tye Pores. Tj: ai 5] ts N s Te: + My Myy vv Mygs ANG eesti ony Tyre! sa 74! where the summation with regard to 7 extends to all positive integer values from 1 to the smaller of the two numbers + and JN’, while the second sum extends to all positive integer values of 7, 7, ...7;, satisfying the relation : Uaeeisd (py mean, att If r < N, we have consequently : ie : My, (N) ~ Wi 2; R,,; NA efeistetisXefehagerene’ oe vetetaietelehs Sra eleliohe! wseyets (4), where the R,.; coefficients are independent of NV and are defined by the relation: | eae ig i se r! Me sige} any amelie By erate tl My, Mpg ve+ My, and the summation extends over the ranges specified above. 150 Hapectation of Moments of Frequency Distributions Hence we find: = pl-(ht0)] ae 1 TR ij Se ape ara = My Mig +++ Mags > ste a Nyt ho... hy! where the summation with regard to 7 extends to all integral values from 1 to the smaller of the two numbers h and 7 —/h, and the second summation extends to all positive integer values not less than 2 of hy, ho, ... h;, satisfying the condition : hi thot... th =ht+it, : ; Mr Mpg)? 0. My, J or R, <= Spl) mh Sy ees a ee is ob. ee id (5), — a — aay F Dl Jot vee Jp! [Aa YP [he tp... [ay er where the sammation for 7 extends to all integer values from 1 to the smaller of the numbers h and *—h,and the second summation extends to all positive integer values of j,, jo, ... Jr, Satisfying the equation : tpt oe tj =t, and to all positive integer values of fy, hy, ... hy, satisfying the conditions: Di SRyiltg ene [VW -— 1}! Ri,,j, jal we find: Mysa (N) = Ze late Tess Cr Lert > SL 1] Rk, | , (2 WN ( 2 r+1—h bd = ae {Nites a S NON ag Ce My+i—h jana 2 N j=l h=j ,) On the other hand 1 i: My4i, (N) = New = Nie isp, i Hence Ry = M41 zs ayes = > OHE My+i—h 1G Fa er eretsiets) oh oiesevelerehsvelsie's'erevarsie's (9). h=i-1 Ryans = My, R,, aaa m" | (3) Putting (see Introduction (2)) Ape —j] — SS (S 1 Bir Ni-S, =o we find from oe My, (NY) = 4 SS Ee 1) Ni BR, | r, (N) = 2 ve é 05 "9 | 1 ere ere | Bieng 1) Bisa B | ; FSi i : r feel (10) =m"+ > Wi,~ S (= 1)" Brien, br pith | i=1 h=0 Ne + ay OF ma ba + ap eS Comey a pe tC? me wl + | When r = 2, 3, 4 we find: 1 \ Me, (N) = M2? + W He | ne | Ms, (N) =," oy My plo + = bs meer ((lule)e —————— 6 1 s My, wy) = Mt + NV My? fly + a [4an, ps + 3uo*] + ye L# [es — 3.7] 152 Hxpectation of Moments of Frequency Distributions Il (1) Ifm,=0, then = Hh (X —m) = EX" =m, and fer, (N) = My, (WN) Putting m, = 0 in the formulae of § IT, we may, consequently, replace in them the quantities m by the corresponding quantities w. But when m=0, R,,,, = 9. ifhr/2, then R,,,_, reduces (cf. (5)) to ng? bad? ..- png at Gal ce yp) eae syle eee where the summation extends to all positive integer values of j,, js, ... Jy, Satis- fying the condition j,+j.+...+jp=7—h, and to all positive integer values of hy, he, ... hy, satisfying the conditions : 1 eas a >> DiI N 2 a fis : IV =, dondonbsuo05R0DG00800000000 (14) Nt an yr, Ent. (5)-n ? where Ent. (5) denotes the greatest integer in 5 If we now put r-h—-1 Niet hy = > (— iby eens Nr-h-t a=0) we find after some transformations: r-1 1 a Me, (N) = > Neti, x (- LN [Sr ith,h Msc r—ith i=0 h=0 15 ae ay i Ru tpinonnnncdio: (15), Har+1, (N) = = Netti >> (- Ne Braecnen ee aren i=0 nh=0 or ae (5)-1 1 br, (Nn) = >, ae AN 3 (— ie B r JL TNS se LG) Ent. (5)- ith,h~ 7, Ent. (5) -ith *)+in=o = yEnt. se! Au. A. TCHOUPROFF 153 When » is even, Met os (at— 1) pe", time Mr! Mngl? «++ bn jy Ons, — S —— f eeir-h— (27)! WHA (p—h— a) RS jo! eG! Lhea Y [Ae ts... [hag tI] % (17), yes ee B35... (rT) Sri rol gett yay a PY , i \ Jilja! «Jp! [hat [hel] ... [hp te where the summation for ¢ extends to all integer values from 1 to the smaller of the numbers 2h and r—h, and the last summation to all positive integer values of j,, Jo, «+» jp, Satisfying the condition : Atjo te typ =t and to all positive integer values of h,, ho, ... hy, satisfying the conditions : ISI allen (23). (PII try IO al ast os — EG Ma st be TT Mal Hobe Behe ere te st yl—6] a Ls fs — (7 Ae Be 3] 1 ita 3 Us sMatrs ghar nat 7 pes r—1 —2 —4 r r 1 8 | ¥i ea ) a fis at es te a 8 mee y) On the other hand, dle 1 Poy y-1 5 ...(24), (ieee) +% Bh Merl h BS (ole 7 ee (Oe 2 ar, 2 = (3 2r)! Q tr op = hi(2r =)! pmo or Mh Por—h an—1 Mi ne = Portia , h N flies a >» Cr, Ph Per+i-h h=2 Au. A. TCHOUPROFF For r = 2, 3, 4, 5, 6, 7, 8 we find hence: Il M2, (N) = 7 Be 1 B8,(N) = Hypo Ms : [21 3 1 : aw) = 77a Ns + BNE pst] = a9 Ma? + are [Hts — Bye] [2] 10 Hs, (y) = Fy Nes + LON pos bo} = 73 Mae + wall — 10 p13 2] Me, (vy = a (Ne + I [15 pts flo + 10 p47] + 15 NO p,"} ey a : i Pet, mere eye Lateetts © tts — Spee | + ap lite atts — TOS 20s] 1 - Y 9 M1, = WW {Ne + NON QT ps be + 35 ps Ma] + LOB NT poo") _ 105 a i : ‘ = We Hs be + Ne [Bhs Me + O fly, — 63s 1427] 1 + ape Loe — 21 soe — B85 pfs + 210jo4]} Bs, (N) = = (Wg + NO [28 pHe + 5645 Me + 35p,"] + NI) [2104 p.? + 280pu.2 42] + 105N I p,4} 105 70 s z a Ge Pa! + N? ys T As? be a Dpis4] ae 4 [pls flo + Sps ply + Spey? — VO py pe? — 1203" po + 165 p0*] 1 on N? [os — 2805p. —56u5m;— 85 pu2+ 420 spo? + 5605? He — 63004] J (2) Noting that 1 N 2 1 ( hy N a Xw)- Mm == & (Xi-—m) =—1 sd (X;-—m)+ =F (Xi-—m,) N y24 N ee t=h+1 = e {a (Xa — m]+(N —h) [X~w-n)— ml}, we find [Aww — mM} = re, Che (N hy LX ay — may LX we ny — ma], br,(N) = = = OF ht * (N — hy)! pra, ay Mir—ny, Tr—2 NV? by.) =" wr, wy + me CR poy, ay (NV — bh)! pi, ey + (WN — BY" per ny: 156 Hapectation of Moments of Frequency Distributions Putting h=1, we obtain r—2 NV" py, (yy = br + = CN — 1)! prs Be, ay + CY = 1) rep Hence: ; ets NN” pry — (N= 1)" pe zy = Mr + = CN — 1)! ps oe avy, jay Ne and Nt uy ~wy = Nyy = S 2 Us tes (v ee Dy UM icge ysesnbcecso: (OA EN. j= If we give r in turns the values 2, 3, 4, ..., we find the relations (26). IV (1) From the relations (15) and el we find: we a0 T fer (N) (wee fy 1 2r,r—ith are Ie wel (es if i) ( +3 No z (- ye Bree h 123N5" .(2r sz 1) ps! As N increases the ratio /” “ consequently tends to the limit 1.3.5... (2r—1), if Me” (N) ae 1 a h > ==) Beane arjr—ith = aS B p B BEL ORS tends to zero. But if this last expression tends to a limit different from zero as NV increases, then the limit to which pene tends cannot be equal to 1.3.5... (27 — 1). Bez ,() ! ..(2r —1) any value of r and is eee, both of the value of V and of the law of dis- The quantit 375 8,-i+n,, does not become infinitely great for | Y es yg tribution of values of X. In order that pag should tend to 1.3.5...(2r—1) D) it appears then to be a sufficient condition that T,»-i+n Should tend to zero for 7=1, 2,3,...r—1. A sufficient condition for this, in its turn, is that expressions of type 5 eee Miy! Mhg)? 00+ Mngt Ni eae should tend to zero, when the quantities j,, j.,...jr, are connected by the relation: Jtjot-.. +=, and the quantities h,, hy,... hy satisfy the relation : hy jy + hojot . thyjp=2i-h +I), and / can take all integral values between 1 and 2(¢—h). Finally this condition is satisfied, if expressions of type Nu; [Wal Au. A. 'TCHOUPROFF 157 tend to zero, as N increases, when 7 = 3, 4, 5,..., 27 —1, 27. Noting that when _Mar— [usanl™ the well-known result (ef. A. A. Markoff, Theory of Probability, 3rd Edition, pp. 329-330). The probability of the fulfilment of the inequality 1 Tet <= E >) (X; a) < ty 2 ING Si Wye rt, ; tends with increasing NV to the limit au | e dt, whatever be the law of dis- Tt, these conditions are satisfied the fraction ; tends to zero, we arrive at tribution of values of the variable X, if only it satisfies the condition, that Ny; si [Mp Ya should tend with increasing NV to zero, when 1=8, 4, 5,... 0, and if at the same time the law of distribution of values remains unaltered, and the separate experiments are mutually independent*. (2) From (22) we find: 2r,(N) TMi) ihe 4 CG. = GC — 2) ps | = , = _ 1 — ae cree 139 ... 2n — 1) ps" ww) : ral 2 ee ua 9 eh Thus we see that ees) °) tends the more slowly to 1. 1.(2r—1), the more 2 ,(N) the law of distribution of values deviates from the Gauss-Laplace law, and the greater ris. If uw, >3y,, then for sufficiently large values of J, EEN) Heda bia (om 1); fs" (N) ; N ' ; ae * The condition “ - Tua tends, with increasing N, to zero for i=3, 4, 5,... ©,” while sufficient, is, as is well known, not necessary. From the form to which Liapounoff succeeded in reducing the condition (see Proc. Imp. Acad. Sci. vu series, vol. x1) it follows, among other consequences, that g : : anal the law of distribution of values of X(y) tends with increasing N to the Gaussian, if N i tends to zero. Noting that en =; [4-3], 2) we see that this condition is satisfied if “an tends with increasing N to 3. It is in this way that M2” (N) Liapounoff’s results justify arguments based on the examination of the two moments only uz) and #4(y) in deciding the question, whether the law of distribution of the values of X(y) tends to the - Gaussian with increasing N or not. The assumption usually underlying such a procedure, viz. that 2 the law of distribution is the Gauss-Laplace law, if is =0 and ES. is clearly inexact: the coincidence 2" Me" of the values of two or even more moments does not guarantee the identity of the laws, but merely compresses the possible divergence between them into limits which become narrower as the number of coinciding moments increases (cf. the investigation by Chebysheff ‘‘On the Integral..., forming Approxi- mations to the Value of an Integral” in Ocuvres, t. 11, and the related papers by A. A. Markoff ; cf. also T. T. Stieltjes, ‘“‘ Recherches sur les fractions continues” in Annales de la faculté de Toulouse (1894). 158 jxpectation of Moments of Frequency Distributions V Equations (15), (16), (17) and (18) hold for all laws of distribution of the variable X. If the law fulfils the conditions fis 108 LO 10) leeway, then poyi3(vy = 9, for all the coefficients 7,,4;,,., then vanish. As regards the coefficients 7',,.,-,, they take the form: h Ton ph = V.B.5 ee (Qr—1) EY rita Qeté yr hi ' j=1 Wass Moh; Pong! tee Poon lf Dil Jo! --- Je! [(2hy) ff [(2he) 2... [(2hg) ]2r where the last summation extends to all positive integer values j,, jz, ... jy, Satis- fying the condition: 7; +j. +... +jp=%, and to all integer values of hy, he, ... hy, satisfying the conditions : 2 Sliy align alles ijt hojot no thyjp = hte If the law of distribution of values of X also satisfies the conditions: pg = 1.3.5... (26-1) gy’ for ¢=1, 2, 3,...7, then (cf. Introduction (5), (8) and (16)) h Pay rh = 1.3.5... (2r—1) B WH BME ar x i= [1-8-5 Qh — DIF (1.8.5 Qh, DM [1 See Cp ag Dil ja! oo Jp! [(2Ax) !]4 [(2ke) 1]... [(2h-) Pr h 1 =U35 (Gr Dw DE ( ee Fil jal oo jpllPal]* [hal]... [ple ili3 he pd unas and eae eee el Ee Mar, (N)= 1 . 3 MO! cies (27 — Dias 2 NEA > (- 1) ay y—i+h rocentn =a t= v=0 =1 3 = Dy 1 r 1 q + r tal era ee een Ye ll) We 1-8-5 rier Lies hin. Thus in the case when the law of distribution of the values of X for i Rim,m NEMS Nm > inne Li eh n=l h=0 9 iene ee. (2), a Ne = N’ 2 (= UD aie Siemens eae = 1=0 where h j E if Bim, ae NS mle eto pn—h-i S) aa (ay 1 ata ae eae if (3), i=1 Jil jo! +s dp! [hal [hol] ... [Ag] where the last summation extends to all positive integer values of j,, jo, ... jp satisfying the condition: j, +jo+... +. jr =, and to all integer values of h,, hy, ... hy, satisfying the conditions : BN Nats lbp, lijithejot oo thyjp=h+i. Hence : me Rim, mi = Py ———- Rim, m—1) — (Or Bn? Mor Leena =ai ocr, [i boy te (Ope Lae sr (4), eens = 15,8 [oy cme peor? oF 10¢7 [pg Moy M3. + Og [ie May Rim, m—4] = 105 Cre fy ploy + 105C,,’ fy boy” May + LOC; pee Mer bar a 10g; Fp Ss bay te Cre fy bsp and, on the other hand Ron =. mr \ m-1 | Rem, 21 aa 4 > Cra Hhr F (m—h) + \ ia MMs oases 0) De (5). m-1 Rimsii a, > (Oh a M(m—hi Bia 1 h=i- 160 Hapectation of Moments of Frequency Distributions Substituting in (2), we find: ; Lan ae es Ew es rial a ie D epee [ Moy — feel 1 {mi ml] 4] ‘ a ie 1 a fy—4 [a= jee? ns cE fiyt3 Wiese = Ofloy by + 2 +... (6). (2) Hp’, may also be calculated by means of recurrence formulae. Put EN” w= E | = > (X;- my |" = BA Fe Pn sac cose (7). i= From relation (8) of the first chapter we find: (m) i yt ) : Ne WW) Shy {fae qF aC 1 Him—h) r¥ Oe Sietalevatetnelereectetteterenere (8). Noting that ae = Ny,, we find hence : ¢ a = Nite + NIA P= NV [par — po? | + NV? pe? » (NV) oe vy) N fis, + NI) 3 poy by + ayia! by? Soon) 4 2 2 — 2 = o, (vy) = Nyy + NO [bps py + 8 ploy? | + NO! 6 poy by? + Jal ry 0 Mb and, in general, yh? S93 NAD se eee (10), 15 1CN)- rt where (m) \ Dy 18 Son Linr | (m) _se am | mo Mr | Sl (m) mM h (h) Dy tan > CG. kinane 2, Asal | (11) 4 h=t-1 ; (m) (m—2) (2 - =) aes m—s (m—1) | = _ a m—1 Bate (m 1) for D iB ees by Dn =(( ) Pr br ae Ds | i = Cn? Por by | 2-3 F 10¢=2 bed : v (ole aL 3c" 2 LL fee | ee ae Mer on vate b! | pms v ny Mor } and so on. Substituting in (10), we obtain y ™ = NE u ym = NI-™] poy + NE mv) (Ghee pear Teg DN) m—-s + Nim jie 3 = Ge 1 _j Br! I thim—s 3) y+ 30,3 Pay pay ; Po ri (Ihe) . — m—-1 : yh ar NES D2, Cae: Mim—nh) rv Par aP IMT n= Au. A. TcHOUPROFF 161 (3) When r= 2, in the case when the law of distribution is Gaussian, we find from (8): me (C2), Ve « = m S fh Paty Pee m—h y ik oy) =k .9-0...(2m—1) fo a me eel ..(2m —2h—1)p, ,(v=nf* Hence noting that yo ae ay (NV —1) ps, we find: yp Na Nise 2 Uo (Y) N (N + 2) pe’, 1 y= MT +N + #) wh Oey = NN + 2) (WV + 4) CY + 6) pet Let us assume that for all values 7 < m, y® -=N(N+2)(N 44)... (NW + 20= 2) pol. 2 y N) In this case : (m+1) __ m+1 9 > i oe = Np) {1.3.5 ... (2m+ 1) 163 Oy. 8.5 oo. (2m = BM #1)(N =I) M4 1). (W +2), h=1 (m) ie {1. 3.5... (2m —1) Sl 1.8.5 m= Bh = UV = IW 1). V+ 2h 3h, ee m— and, consequently, (N+ 2m) p00", = Nuge 3.5...(2m —1)(2m+ 1) +(N—-1)1.3.5...(2m—1) m—1 15m Oly 1.8.5... (2m — 2h — 1) (2m — 2h + 1)(N-1)(V 41)... (NV + 2h —38) h=1 m— m— = = 1.3.5... (2m —2h—1)(N —-1)(N +1)... (N+ 2h- vy} m—1 Bis. Sonal) +2 O,71.3.5...(2m—2h41)(N—1)(N41)...(N + 2h —3)| ho (n+) 2,(N)" Thus when the law of distribution of values of X is Gaussian, Lt Ny ya NV +2)(N +4)... (N+ 2m — 2). for m= 0,1, 2,3,... 20 (18). Biometrika x11 ; 11 162. Hapectation of Moments of Frequency Distributions (4) When 7 = 3, in the case of a Gaussian distribution of the values of X, Sul Coe ck mad ae Ys cy =A ee pitas Cy Hs (mh) "3, (N=1){ * Noting that WN) = 0, if the distribution of the values of X be Gaussian, we tind: Vs) = NGS Ie Bie Win Oe Sea ae = bak 8 Penne rt(2d toys Ys «vy = 9 and, in general, UN eae 0, (4) tf 3 9 U3 y= Nue1.3.5.9[5N +72), (6) OF ay = Nu? 3.5. 9.15 [252 + 1080N + 15912], When r=4, for the case of a Gaussian distribution of the values of X, we have: (in) ‘ Te Pe AL 2(m—h) (A) U4 aN) =N{1.3.5...4m—1) 43 Bees (Ole Lska isa) Shel Gia 0=— IL), COWEN () Y4,v-1) (VN —1) 4, =3 (NV —- 1) pe’, Oem = Nut 3[3N +32], (3) ne = Sar Vey = Nu 27 [N° + 32N + 352), iG Replacing r by m and py) by H(w,— p,)™ in formulae (22) and (28) of h Chapter IJ, and replacing w;, by } (— 1)* Oh" w* wu, and putting ) r= h = ia Ly ChE joe KMih-k) r = v.G%, k=0 we find: ver ; one ole as : \ Ep = = 1.8 De ante XG Pia 2 i a E Bape eS 5 at Nm (Sank Xa 2 Xt nl XX — pale Xe | 1 Spey cies Se : : , + Vine A (ee Gime, Gee COPA GH Ce ¢ seal isl ibe ACP . eee ar ante tL sip eee ae Poa wl + Ami Xm XX, — aa Xi" X,+qheml-1 Xm" X,! ae =) G a \ = vee zy ml-3| Xia aes ae a ; : mi) xs] aF oe | 24 Au. A. TCHOUPROFF 163 1 DY a Y fe E (ey — py PPO = 1.3.5... (2m $1) aa lm Xn X, =e ee EF (Se. Gi Gas thom Gey OD. + gy ml- aXe Xe — mi 8 Xe | 1 Fe + GEE gag XX, + gto m4 A XX, Nuss + hqml-4 Xm *§ XX, > (15). ai. ; Fi Wl XP gt em Xo XX “ai (m—1)(m — 2) ap rae m6) MME xX; .o 36 SSeS A) Gee AG AG, fe gig Xs m—5 XN. oxi a aagp me! {-7] AG m—7 Xs (m—1)(m—2)(m—4) [-3] Yim: m(3m—-1) yma y . 162 le eee ecerny =F tof Hence: Bgl pn) = [i — fu N B aoe. ? P : E (uw are fr) = NE [ds> ie D ploy by a Z| (16). Di 3 16 « QB © ; 2 . EB (mw r— Br = NW? [ier — pe P ats a Aut, [Ly a Dfly)" al 2 bay fly? =: 6u,*| III (1) Noting that N N / 1p 5) ¢ - - SES. ve s . , 0 Eby by = ae 2 (a — my) + pet Ct my A 1 2 Noir, ee ee! Bry bry} = Bry ry, + N [Misys ~ Mrs), we find: 77 , / Pi / ec ] FE (Wor, = Mary) (Mir Brg) = ey Miry = Bey Pr = [ ryt, — fry bere |: Similarly we find: ae ens 1 es Kp ry Big Borg = N3 Vir tretry sae [Mrptrs fers © Mrytrs Pry + Mret+rs br,] ; + NI) py, pry Mag} s Bry My Br AP N eas Kis ar Pry+r3 bry a Pry+r3 Bry aa Buy, Mis Hrs] it ar N?2 [renters po (Hrytr, Mrs ats Bry +73 Mi» ate Miytrs My) ar 2 fy, Mis ir, |, 11—2 164 Haxpectation of Moments of Frequency Distributions and iE (u,, a Hr,) (ax a: ry) (ir, ee Hrs) 1 6 = ne [aaeiar — Prytr, Pry > Brytr3 Bry BPretrs Bry es 2 py, Pr, fr, |- 1 if y , ia (7 Ku big bry Kiry = N: ON Tis te 8 2 te a Vig [Mrytretrs Pory O Prytrstry Mrs + Mrytrytra Bre + Pretistrs bry] eae [Moy try Pistr4 a Brytr3 Protra ae Britis Porytrs] a Ni [eee Br Bi, ate Mrytrs Bre Bry ar BMry+ry Bry brs ar Brytrg Bry Bry at Prytiy Bry Mrs aE Brs+14 Bry Pre | tN Bry Pry brs Pry} I = ry Pos Pag Brg TH [ors ry Mors Mory F Por try Mery Mir F Pry try Pre Pers F Mary try Moy Pry N ar Bretry bry Bry + Prstry Pry Pry|| 1 ie N?2 [Mrptretrs Mary F Pry prety Mir © Marytrstrg Pry + Mrotrstrg Mry 1 Prytre Mrstra FH Mrytrs Mrytry Herp try Mrytrs — 3 (Mr +1, Mrs Mary Prytiy Pry Mry © Mrytry Pre Pry cs Brotrs Bry bry als Mrstry Bry Pry tr Brstry Pry Pry) a 1p, Bry brs bry] 1 : a N3 [erytretrstrs ae (Mrytretrs Bry aE Pry tretry brg ar Piytigtra Pre ar Protrgt+ry bry ar Biy+r, Migtiy ae Mryt13 Mrotig ar Brytrs bre+ts) ar 2 (Hees brs byy =P biry+rg Pr bry “Te Miytry Bry Hrs ah Piy+r3 Boy Bry Si Mry+tirg bry bys sl Mrstiy br, ber) a 6, bry Mrs br, |, and E(w’, ae My) (ie a Mry) (ies sae fers) (M4 = (Pp) ie | = WN? (Mi +19 rstis oe Piy+rs Mrstig ae Prytra Hratrs) ra (Miry try bys Big te brytrs Br, brs b prs try Py Baty + Para tity bry Pry F Mrgtry bry Marg + Merge Pry Pers) + Oly Pry Para era U f a5 WN3 Mr tirgtrytra — (Porytre tis Brg FH My tretre Mig tb Mrytrytry Brg + Mrotratre br,) ra (Lt, Mrgtiy + Miytrs Mrytry + Mrytrg fre-trs) +2 (Mr try Perg Marg F Mrytrs Pry Mry 1 Mrytry Bry Mary 1 retry Bry Pry 1 Protry Pry Mrs th fbrgctrg Mery Pry) — Spr, tery Pry bora} The coefficient of 1/N? in EE (ny = fry) (My = rg) (rg = Hg) (He rg = Ha) Au. A. TCHOUPROFF 165 can be expressed in the form [ery try — Mary Mery] [Magny — Pry Pra) + [Mrytrg — Mary Mag] [retry — fry Mery] + [Mrptry — Mery Hrg] (Hratrs — Bra Brg | = [E (wr, — br.) (Hg — Pre) ] « LE (H'rg = ry) (eng — Hr) | +B (wn, = Mey) (Mig — Mrs) LE (Hong = Mg) (Hg = Mey) LE (i , = br,) (Hirg — Hr) LE ng — bre) (lrg — pers) |*- (2) In the general case, putting and agreeing to denote t—-h+1 ¢-h+2 i-h+3 G > > >) Aes > bv. Diss f,=1 Srx=f\t1 Ss=f.+1 Sh=Sfn- +1 Wie (A) we find: y) Is heals ey () EM ine el NEN | a N aa “7 Nt j=0 1, 4-9 i,t inl J e. : () qr OS, agen Gall) ota a > nian g=1 NI p= eed a BT eh (2) = Here, ee = TT, and ie may, when 27 <7, be written in the form Pape es eee, so =F. G4D) Brg Mery oes Prp and, when 2) > 7, in the form ixp BG ee i Sou LK Ww r(?) where Kk ji Prptrpt trp and fie denotes the sum of all products of / factors of type pa, Ming «+ May possible, subject to the condition that /,, h., ... h; appear in their correct order, in sums of not less than two terms taken from the numbers 7y,, 7y,,... 77,,,, and that the number of summands composing /;, is not greater than the number composing hys,. We have in this manner: . V1 7 P qH® Ss sy ; Ti ioe me eae nae ea yy i= ao — aa flare Myre A 7-2 =i-1 i - He = v Ss He 5, G0 Rea ca ere are ha hut ge Ai=1 fo=f,tl1 S3=fo+1 < [iy feed edsf eee SD [Me sire berg tire bh Marg trp, Mery pr cine ee a POSEN: ee UAE Ske TARE + lod ] a ne ty "re +r p,- ke , 0 eat eae Se erp, Borg, Borg, Mop, * Cf. Soper, “‘On the Probable Error of the Correlation Coefficient to a Second Approximation,” p. 97 (Biometrika, Vol. rx). 166 Hxepectation of Moments of Frequency Distributions @ IT fe j zs = Pope me gk LL trp +77 +47 OS.) big, Her, Hong Bing, 8 Is Ny 2 z [Mr trp, Bry trp te a+ ey trp, Mr ptr g trp 1 yee FON re a nn EERIE (6 Bi FY ae Sa ly ; =| H : My p try, Hep try, bry +rp, Py pt pe Tape Mrp trp, Saal SI, (6) Bre, eae Mrz. a (ips bre, and, on the other hand @) Js 1 Brytret.. try = bre (21) 2 wed 2% 21) aac = Fy y= prs Prong berg try, Mrmtny tng + oe ; j=l A=l fo=ji+l o a Qi-k+1 2i-—k4+2 2i a 2 x meee > Mog tng tetry, Mralrj try be try) be - ee y . 1 ae ho Jk Jape? Jk A=1 p=fitl Ik=JIk—\ +1 i+1 7+2 22 Ss Ss S: an Ee aes a SMG ATG HANG; Br—lr5.t9j, +4754) A=1 jfo=fjtl Wi=hi-1 +t . +1 Qi B41 (2¢+1) s S Ss Ay y= > Pep brarg i 2s SH ce seri j=l A=) h=it “s i+2 i+8 21 ee = aut ee Mrgciig eet g eta lngeh Macher tha le A= do jl Ra=hi-1+1 : ae ; ‘i F A 28 (2) In the case in which 7,=7,=7,=...=7;, the coefficients fel ‘ become Ry, and we get again formulae (2), (4) and (5) of the present chapter. (3) Let us agree to denote Bi; — ie) (Hie, Se) ee are We have: Ey dp=EV')—-— = Hes, F(T oli fry Hey, 2 (Uh eng j. J, (2) (= 1) 5 = Marg erg. = Beary, bh (WL [ny a vee bn | +... +(- le ea : Pr, Marj, sre Brgy, K (Holes, ay, ode os +(-1) 7 ((-1) Ty. Using here the values found above for HTI’(, F(T yf ngs and so on, and denoting by Kr" "a the sum of all products of J factors of type yp, Oe o/h possible, subject to the conditions that /,, h., ... hy appear in order, that the sums contain not less than two summands chosen from the numbers 7), 7y, ... %4_,,,, and that the number of summands in /; is not less than in /i;,,, we find that the coefficient of 1/N* in the development of 4.) du equals: Au. A. TCHOUPROFF 167 2i-t-1 (1) Me & C1) Co" Baa: +3 (- Loo part oa x —k (or %4-t+h) x ( "S > I] (20) Ys SE-k+ Bry, ge eileen Bers. = (— ye Boi-t4b~1,% t—k (or 24-1-t+ k) Tei [ty Benton: a x S Dy GN as a) 1=1 St-k+) Mr pd eae i= +2 bry Par, = ey Pattee k jf, (2) “Po 2 h= > KS he) ee ee t—k (or 2i-2-t+h) TI (a) [My Lip. eee le x = 2, - REG a ise) T= (ES ICEEL ay | Ue rer 0 heee i ( ar ) Bry Bry, dh feoraie t-1 + (=) OS py bry ee My 2% (— 1) Beitr, Guia alee <2 wh e=0 t—k (or 2i-h-¢ +k) Tl es 1 Heng Peng dy eee ' x > Ge ak a hy) eas t=1 f.(t-k+0) Mop uae eon a (= 1 eae D> aa cee ae Sy) (= 1 ye Pru, k Jj, (2i—¢-1) 2i-t— ey t—k (or k+1) i Ne Use ; Sere (rp. Vy. Ve x > > " pone 1K} Tigwdas ns Tren) t=1 ff, (t-kt+l) By, & An Op lelips k+l ) where the summation for / extends to all positive integer values of J from 1 to the least of the upper limits. By the aid of simple transformations the coefficient of 1/N* in the development of H., can be expressed in the form: 2i-t-1 \ Y B(-1)' ley 2 (—1)* Cy Boia, t n= ts1 t-k : ee (1) Me KAT hay) = Ent. (¢/2) fa JAt= ht) Popp ite ee iyi 2i-t-] h Own a EO ee 1 Patek nk h=0 ant. (¢/2)—1 k+1 Tl : | ay = (2 yr (re . aa a Se (— Ft o = : - KA Ma pay) te. (18) k=0 l=1 f, (t- etl) Bry, Pig Era | co ; 2i-t-1 0 ee al Cee oeue eg | =0 Ent. (¢/2)—1 t-2h-2 in | i ae — 2i) r (pp a a5 aP mS, (= 2 Ds > 2 kK Wh Migr + "oy_oy 3) k=0 j=0 f, (2t-2k-7) fry iy eee i eye | Qi- D+ 2h+j x S (-—1) Gin. ; Boi H os 2i—at+ek+j Poei-t+k—h,k | L=0 y) 168 Huxpectation of Moments of Frequency Distributions Noticing that (cf. Introduction, IT, 3) r—k-1 Ps (—1)* C,," 6,2, =0 when m> 2k, n=0 ; r—k-1 (HU Cut Braye = 1.8.5... 2k— I), 1=0 = all ; h nN er d . io ibs > (—1)'C,,_, Brae = > Bing Cates n=0 g=0 where B,,, has the value given in the Introduction (12), (13) and (14), we see that the coefficient of 1/N‘ in the development of £; du is zero, if t<7. The coefficient of 1/N* equals i-1 (<1)? gg 15345 Go — le ea) k=1 IT ei 2k) laa Bry, One By, 2i-2k x S; (2 7M i-k AG Me? + MO, eecaloy where by Wes is denoted the sum of all the different products of type My Hhy «+» Hy, possible, subject to the condition that 4,, ho, ...h;~ appear as sums of pairs of the numbers ry, f,, «++ Myeg_ on: . ik : When 7,=7",=...=7, the aggregate of terms denoted by M « = ) x reduces to (2i a a} a 1) (22 = Ik — 3) se) A Bic Tsar and sige OPS ys 3S Wey gt S, (2i-2h) Bry, Mrp, BOS LARP hs a ia =1.3.5...(2k—1) 08 7™ wp Qi — 2h — 1)(21 = 2k — 3)... 5.3. ee = 103 (Den (2) 1) Orme aee a The coefticient of 1/N* consequently becomes (cf. (14) above): $2355... (2U— D(a) Sunilarly, we find that the coefficient of 1/N‘ in the development of Hei, du reduces to 2i-¢ ) t : hoo (= 1) IT (27-41) ree (= Ie C ‘st Bois, t 1=0 é Gea t-k lle: e S (21-41) r (rp tpt 1s eke V(r i Katee) k= Ent. (¢/2) l=1 f, (¢-k4+1) Bry Pip Aap Aa 2i-¢ S ; S h (th x he (SC area ee kent | n=0 Ent. (¢/2) -1 k iD bas 4 by A 2 : 4 , ! fo GS. G1) ae iinet KM the Ha See: k=0 ; l=1 f,(¢-kt+l) Mr, ty, eee als Pe 2 hc OR LC een cermer mae | n=0 Ent. (¢/2)-1 t-2k-1 lee A ~ = (2¢41 b . “ oie ead atlyta > 2 —— ED 9 Mg Toy op 9) k=0 j=0 F,(2t-2k—/j) Bry Pry, ree LAE atch 9-24 2k+G41 ; e ay x > (= iy C “of —ot--ok-+j-1 Bor tyk—-Ha, k A=0' y) Au. A. TCHOUPROFF 169 When ¢ (- ye Oe Pik = NH 8h4 5) she (2k — ils); =0 i-1 1 = : 2 (uC) Ge p= SB, Or = 185k 1G ka 1) eh =1)) 2hk—-1 g=0 2-1 3 = Ble WCC Bienes = we find: (- De Sie we (27+ 1) 24 HT era ies cri se ye Lo ly. ese tysety Sg | fe (27. t—-k+ c ZA +[G@-k+1)+3(h-l] & as eS Me | Sf, Qi 2k+2) Bers, Re: Aye ans (i) alr My 2953 = (i—k+1) (ih) where M,, , , has the value given above, and M, °°, , denotes an analogous 9 sum of terms of type pn, Mig ++ Hig_, With the single difference that in the aggre- gate h;_, there occur not two but three of the numbers 7y,, 77, «++ Tyo,» While the remaining 22 — 2/'— 2 numbers of this series are distributed in pairs among hee Tas cee Wipe eis In the case 7, =1r.=...=12i4,, the coefficient of 1/N* in the development of FE oixs) du becomes : (—1)'1.3.5...(2¢ +1) 36 a +(—1)'1.8.5...(20-1) [1+ 3 @—1)] Ch, ee oor 2k i-h-1 3 7 i-1 94 —Ie+ : + S(—1 1.3.5...(2k—1) ee Oy f. Moy Mor C5, ont, 1-8. 5...(22— 2k — 3) k=1 -2h+Q We i-k+1 +[G@—h4+1)+3(b-D] Cy we By, 1.3.5...(28- 2+ Df ee rua C2 1 Sub. 420-8) - 0 v oy7— ss) ¢ 9° wile dance (20-1) 5 Wisp — pe |) [ian Spon ftp 22," |. (Cf. (15) above.) (To be continued.) MISCELLANEA. I. Preliminary Note on the Association of Steadiness and Rapidity of Hand with Artistic Capacity. By M. L. TILDESLEY, Crewdson-Benington Student, University College, London. (1) This preliminary note is based on observations made by Miss M. Dalgliesh at the request of Professor K. Pearson. As a teacher of drawing in a large school, she had a long experience of the amount of artistic imagination and artistic craft in her pupils, and was able to obtain appreciations of their other abilities. The categories she supplied for about 60 pupils were: (a) their ages*, (b) the number of years during which they had learnt drawingy, (c) their artistic or non-artistic capacity, the former being subdivided into imagination and craft, (d) their mathematical, and (e) their musical ability. The steadiness and rapidity of hand were to be tested by the well-known “maze’’-problem. Three mazes were prepared, I, IT, III, of varying degree of tortuosity. The nature of the problem was explained to the pupils. They were to enter the maze at A and leave at Q, a continuous pencil track being drawn from the point of entry to the point of exit. The performance was to be considered the more excellent the fewer the occasions on which the pencil track touched the boundaries of the maze path—such touching being termed a ‘‘bump.”’ The ideal pencil track would keep steadily in the mid-path and parallel to its borders. No distinction however in this preliminary experiment was made between a non-bumping wavy line and an ideal track. The efficiency due to keeping clear of the boundaries was simply determined by the number of bumps. The minimum number of bumps of any girl in any one maze was one and the maximum 72. Further, the performance was to be considered the more satisfactory the greater the celerity with which the track was completed. The minimum time (taken with a stop-watch) of any pupil in any one of the three mazes was 18 secs. and the maximum time practically 3 mins. Contrary to what might by some he anticipated, there was not a high negative correlation between the number of bumps and the time taken. Although on the one hand an over-hasty temperament might lead to many bumps, on the other a certain celerity tends to straightness of path while hesitation leads to bumping. These points will be more easily grasped from the correlation results provided below. (2) A question arises as to the relative difficulty of the three mazes measured in time and number of bumps, the average values are: Maze I Maze II Maze III Average number of minutes taken 2-002 + -043 1-208 + -031 1391+ -037 Average number of bumps made 20-68 + 1-297 15:39 + -927 31-82 + 1-392 These numbers, however, can hardly be taken as measuring the absolute difficulty of the three mazes, for (i) they are not of equal length, and (ii) they have not the same number of changes of direction. Approximately the following hold: Maze I Maze II Maze III Length of mid-path ... dag a .» 1025 mm. 700 mm. 730 mm. Number of changes of clirection a -- 128 94 84 * Their mean ages was 14-43 years with a standard deviation of 2-07 years, the actual range being from 10 to 18 years. t+ The mean number of years during which the pupil had learnt drawing was 3-84 with a standard deviation of 2-20, the actual range being from 1 to 8 years. Miscellanea 171 and thus we have: Maze I Maze II Maze ITI Number of ems. described per minute .. 51:20 57-95 52-48 Number of bumps per ten changes of direction 1-616 1-637 3°788 Thus judged by time whether absolutely or relatively to its length Maze II was the easiest, Maze I the hardest. But against this must be set the fact that Maze I was taken first, and probably the pupils in this case proceeded with more caution. Absolutely and relatively to the number of changes of direction Maze III was the hardest. Maze I does not seem to have been harder than Maze II, if we judge its mean number of bumps relative to the number of changes of direction. Our total numbers being few it seemed at first desirable in order to reduce the probable errors of our results to treat each trial as an independent event and thus reach a total of over 150 cases. This possibility is, however, excluded by the difference in difficulty of the mazes; pooling would have produced spurious correlation. We were thus compelled to work out correlations for each maze, or to pool the total achievement of each pupil. We have sometimes adopted one and some- times the other method. To obtain a single general standard of efficiency in maze description we have taken as a com- bined measure the inverse product of the time taken and the number of bumps made. We shall speak of this as the “inverse product” simply. It receives some sort of justification, when we note that the factors are not highly correlated and further when we note that we desire a measure which shall increase with efficiency. The fundamental problem we had in view is the following: To what extent are steadiness and rapidity of hand as exhibited in maze-tracing the result of training? to what extent are they innate? Before proceeding to the discussion of this problem we may note the variability in period of time and in number of bumps for the three mazes. Maze I | Maze IT Maze III 8.D. C. of V. 8.D. C. of V. 8.D. C. of V. -410 +-026| 29-48 +2-04 | +:031) 23-91+1-6] | 340+ -022 | 28-15+1-93 | - -02 5:86 | 15-44 +-98 48-54-13°75 | | Time in maze...| -479-+-03 No. of bumps...| 14:39 -+--92 | 69-59+ 6-22 | 10:29 +66 | 66-854 | | | | These results seem to indicate a conformity with the general law that the harder the test the greater is the scatter*, i.e. the weak fail more conspicuously and the able succeed more markedly. This is a law manifested in most stiff competitive examinations, or again in the difficulty of making marked distinctions in the case of easy papers. We are speaking here of the scatter or variability as measured absolutely by the standard deviation. It is noteworthy that the relative variability (or the variability as percentage of the mean value) as measured by the coefficient of variation appears in the case of the bumps to be less in the case of the harder maze. We are thus driven to the conclusion that the emphasis of the difference between the ineffectual and effectual in a given task while increasing with the stiffness of the task does not increase proportionally to that stiffness, but probably at some lesser rate. (3) The first problem to be answered is: How far is steadiness of hand an individual character- istic at all? Will the same individual do well in one maze and badly in another? The answer to this problem lies in the pupils’ correlation in efficiency in performances in different mazes. Now whether the characteristic be acquired by training or be innate we should anticipate a change with age. Most innate characteristics grow stronger or weaker with age, and this must be taken into account. The following table gives the chief age correlations: * The high variabilities in the case of Maze I are we think due to the manner in which different individuals attempted a novel task. 1 ee Miscellanea TABLE I. Correlations with Age. Characters Maze I Maze IT | Maze III | No. of bumps and age __... oa .. | — 687+ :059 | — 532+ -065 | - -549+ -063 Time taken and age oe spe «. | — 241+ -085 | + -003+ -090 | + -152 + -088 Inverse product and age me .. | + 4024+ -076 | + 3174-081 | + -467+ -071 No. of bumps and age for constant time | — -607 + 05 7 - 5639 + 064 | — -542 + -064 taken Time taken and age for No. of bumps | — -336 + -080 | - -101+-089 | —- -112 + -089 constant This table shows at once a very considerable relationship between age and the number of bumps made: the steadiness of hand increases with age. On the other hand in the case of Mazes IT and IIT no relationship between age and time taken was demonstrated; in Maze I, however, there was possibly a slight relationship between time taken and age, of the opposite sense, however, to the insignificant values in Mazes IT and III, i.e. the lower the age the longer the time taken; this is not improbably due to the novelty factor involved in Maze I. On the whole we may reasonably conclude that the relation between time taken and age is not important. Confirmation of this arises in the case of the inverse product measure of efficiency and age. The efficiency increases with age, but because it includes time is not so marked as in the factor of bumps alone. The two remaining correlations indicate what happens if we eliminate respectively the influence of time taken and number of bumps. We hardly improve the relation between the number of bumps and age, if we make the time taken constant. On the other hand we get one significant but small correlation and two insignificant correlations, but all three are now of the same sign if we measure the relation between time taken and age for constant number of bumps. It is thus possible that there is a very slight relation between time taken and age—rapidity slightly increasing with age for a given degree of steadiness of hand. This leads us to the direct problem of the relationship between rapidity and steadiness of hand. TABLE II. Correlations of Rapidity and Steadiness. Mazel | Maze II Maze III No. of bumps and time taken... = : 052 + -090 165 4 -.088 — 432 + -073 No. of bumps and time takei for constant age He -246 + -085 | — -193 + -087 422 + -074 No. of bumps and time taken for time learnt | — -070 + 090 — -164 + -088) — -444 + -072 constant Without regard to age, it is only in the case of Maze III that we can assert that the numher of bumps increases inversely with the time taken. Allowing for age the associations are more marked, but by no means as intense as we had anticipated. Further they seem to be dependent on the difficulty of the maze—i.e. the harder the maze the closer the relationship. A privri one might imagine that a slow transit would escape bumps—it is so, but not in a very emphatic manner. We suggest that a certain degree of rapidity is really helpful in avoiding bumps; it keeps a straight course in the straighter parts of the mazes, while it is rapidity at the angles which is calculated to produce bumps. There are probably therefore two factors at work. We can now turn to the question of individuality in maze-tracing. We find the following correlations: Miscellanea 173 TABLE IIL. Individuality in Maze-Description. Correlations Mazes [ and II Bee Tand IIT |MazesITand 11 Rigen urapse meg ee gee ere fe, 1540301) 1-761 020) + “876 4-02) | Time taken ate es jae ese .. | + 594 + -058) + -436 + -073) + -807 + -030 | Inverse product... 406 sae 308 ... | + -708 + -046| + -661 + -051 | + -695 + -047 | No. of bumps, age constant a Tee ... | + 644 + 053] + -648 + -052| + -826 + -029| Time taken, age constant | + 613 + -056|.+ -493 + -:068/ + -816 + -030 Inverse product, age constant : | + -663 + -051 | + -585 + -059| 4+ -652 + 052 | These are very noteworthy correlations and it is evident that there is a very marked degree of individuality in maze performances, whether we judge steadiness, rapidity or the combination of both involved in the inverse product measure of efficiency. These high correlations it is true are lessened, but still large, if we allow for age*. The existence of this marked individuality brings us then to our main problem. Are steadiness and rapidity of hand—which to an appreciable extent increase during childhood—products of training, i.e. of environment, or innate characters develop- ing with age? (4) How far does the length of time during which drawing has been learnt influence the rapidity and steadiness of hand in maze-tracing? The correlations are provided in the accompany- ing table. G TABLE IV. Influence of Time Learnt on Steadiness and Rapidity of Hand. Correlations Maze I Maze IT Maze IIT No. of bumps and time learnt ... Hee ....| — -216 + -086| — -286 + -083]| — -209 + -086 | Time taken and time learnt eas ii ... | — 077 + :090] + -029 + -090} — -010 +- -090 Inverse product and time learnt ioe .. | + -151 + -088 | + -272 + -084| + -287 + -083 | No. of bumps and time learnt for constant age | + +132 + -089} — -008 -— -090) + -089 + -089_ Time taken and time learnt for constant age ... | + -053 + -090) + -032 +. 090} — -102 + -089 | Inverse product and time learnt for constant age | — -066 + -090) + +137 + -088) + -067 + -090 Thus while there is no sensible correlation between rapidity of hand and time during which drawing has been learnt, the small amount of correlation between steadiness of hand and time learnt disappears when we take these correlations for constant age. As far as this material is concerned, we see that steadiness and rapidity of hand are not the result of drawing practice, but are probably innate characteristics developing with age. This result is so important that it needs of course independent verification, but if true its suggestiveness is great. For crafis in which these characteristics are essential, they can better be obtained by selection than by training. We have endeavoured to throw further light on this point by approaching the subject from other standpoints. The correlation between age and time learnt is + -5051 -- -0671, and it may be argued that time learnt in the early years of life is not of very great importance, and that this possibly accounts for the correlation being rather low. We accordingly confined our attention to the 33 children who did not learn drawing before 10 vears of age. The correlation of age and time learnt now rises to + -8555 + -0315}. We then dealt only with steadiness of hand, and took as * The correlations of times taken are increased not lessened, but we have already drawn attention to some irregularity in the relationship of age and time taken. + This is only very slightly lowered if we confine our attention to children who make the same total number of bumps, i.e. age and time learnt for constant steadiness of hand is +-8254-- -0374. 174 Miscellanea our measure the inverse of the number of bumps made on the three mazes added together. We found steadiness of hand (i.e. inverse of total bumps) and time learnt now had the significant correlation of + -4030 + -0755, while age and steadiness of hand gave a correlation of + 4877 + -0895. We now corrected these last two correlations for age and time learnt respectively and found: steadiness of hand and time Mae for constant age — -0034 + -1174, age and steadiness of hand for time learnt constant + :3015 + -1067. There is thus: (a) no relation between Pate of hand and time during which drawing has been learnt, if we correct for age; (U4) a definite relation between steadiness of hand and age, if we correct for length of time drawing has been learnt. The corresponding correlations in the case of the whole population under discussion were: -0900, -0776, in sensible agreement with those for the special population who had not begun drawing before 10 years, although in the latter case the correlation of age and time learnt was + -8555 as against + -5051 for the general population. Steadiness of hand and time learnt for constant age + -0334 4 Age and steadiness of hand for time learnt constant + -3731 4 IT As far as these results go they confirm the view that steadiness of hand is an innate character developing with age, but having little association with training in drawing. (5) We now turn to “craft” and “imagination” as factors in drawing ability. If these be correlated with efficiency in maze-tracing, it will not necessarily follow that efficiency in maze- tracing is associated with effective drawing training, as apart from length of training. Possibly craft and imagination in drawing are themselves in the first place innate characters, developing no doubt with age, but not necessarily intimately associated with time during which training has been given. In dealing with “imagination” and ‘‘craft” the method of “biserial-r” was adopted. Poor craft contains the classes “‘minus,” very bad and bad, and good craft, the classes medium, good and very good. Poor imagination contains the classes minus, very bad, bad and medium, and good imagination the classes good and very good*. We have first to note the influences of age and time learnt on craft and imagination. TABLE V. Influence of Time Learnt and Age on Craft and Imagination. Character pair Value of correlation Good imagination and age : ih aoe ae — -479 + -094 Good imagination and time learnt Bob ies oe + -033 + -090 | Good craft and age i oe: ee ace Bs — :096 + -112 | Good craft and time lennate hs a ae Sar + -166 a “114 Good imagination and age for constant time learnt... — -675 + [-060]t Good imagination and time learnt for constant age... 363 + [-078] Good craft and age for constant time learnt... ane — -217 + [-086] Good craft and time learnt for constant age... .. | + :261 + [-083] Now the absolute correlations are extremely interesting, there is no relation between time learnt and either imagination or craft; these factors of drawing capacitv appear like steadiness and * The choice of series was made solely with a view to obtaining not too small frequency in the smaller series. + Probable errors of these partial correlations are given as rough estimates for they are calculated on the basis of all the component correlations having been found by the product-moment method. Miscellanea 175 rapidity of hand to be innate. The influence of age on craft is not significant, but age seems to weaken imagination, i.e. the younger children are more imaginative in their drawing work. If we turn to the partial correlations we see, however, the bearing of these results. For a constant age imagination is moderately influenced by the time learnt; but for a constant amount of training it depreciates more markedly with age. The result is that there is no apparent relation of imagination to training. The diminution of the innate character with age is really more influential than its growth with training. Again for constant age there is a very moderate influence of training on craftsmanship, but for a constant time learnt craft diminishes with age*. The result again is that innate change with age masters training, and unless training is persistent, good craft will lessen, so that the correlation with age is either negative or insensible. The small influence of time learnt on these factors of drawing efficiency is remarkable, and it is highly suggestive to see that in certain characters training may only suffice to prevent deterioration, and does not provide a marked expansion of efficiency. We may now turn to the influence of imagination and craft on the steadiness and rapidity of hand exhibited in maze-tracing. TABLE VI. Influence of Craft and Imagination on Steadiness and Rapidity of Hand. Characters Maze I Maze II Maze III Good imagination and no. of bumps + -239+-109 | + -187+4-109 | + -252 + -109 Good imagination and time taken + :005 + -115 + -029 + -114 + +167 + -112 Good craft and no. of bumps — 163 + -113 | — 109 + -114 — -039 + -115 Good craft and time taken + 500+ 093 | + -104+-114 | — -308+ -107 | Good imagination and no. of bumps for | — :059 + [-090}+| = 092 + [-089] | — -014 + [-090] | constant age | Good imagination and time taken for con- | — -130 + [-089] | + -035 + [-090] | + -277 + [-083] stant age | Good craft and no. of bumps for constant | — -272 + [-084] | — -189 + [-087] ] — -110 + [-089] age | | Good craft and time taken for constant | + -494 + [-068] | + -104 + [-089] | — -298 + [-082] | age Now these results are of much interest and suggestive for further inquiry as soon as we are able to deal with much larger numbers. In the correlation uncorrected for age there would appear to be slight relation between good imagination and a large number of bumps, but it is only because the younger students are more imaginative. Corrected for age the correlations are all reversed, but are seen to be of no significance. Good imagination and time taken cannot be considered to have significant relationship either before or after correction for age. Thus the imaginative factor in drawing skill is not sensibly associated with rapidity or steadiness of hand. With regard to craft there do appear to be significant associations, but they are clearly changing with growth of experience in maze-tracing. Uncorrected for age, there is no really significant association between good craft and steadiness of hand although the constancy of sign is to be noted. After correction for age, it would appear probable that a small association exists —good craft having the steadier hand. But the relationship appears to be weakening with ex- perience and is hardly significant in the third maze. The same change makes itself manifest in the time taken. Those with good craft took the longer time in the first maze and there is quite a * Miss Dalgliesh reports that she judged of the craft capacity of her pupils quite apart from their age or technical ability. Thus given two children of 9 and 16 years of age whom she had rated with the same grade of craft capacity, the elder child would (if teachable) be doing the better drawing work, having had as a rule longer training. But this increased technical power was not regarded in the craft grading. t See second footnote, p. 174. 176 Miscellanea sensible degree of correlation; in the second maze the correlation is insensible, while in the third it has become negative. In other words in the third maze good craft has begun to tell. It would require far more material and prolonged experiment to be certain how far it is the hesitation of the good craftsman over a novel task (Maze I) or the greater difficulty of the third maze which has told in the favour of good craft in that case. All we can assert is that within the small range of our experiment the slight relation of good craft to steadiness of hand appears to decrease, while the relationship of good craft to rapidity of hand is only beginning to develop in the third experiment, and is then not of any substantial intensity. We have already seen that there is only a small association of good craft and time learnt, about enough to allow for the deterioration of craft with age. Hence the slight relationship suggested between good craft and rapidity of hand is not necessarily an argument in favour of such rapidity arising from training, it may well be the result of an association of the innate characters. (6) The remark at the end of the last section leads us directly to the problem of whether other qualities than those of draughtsmanship can be directly associated with steadiness and rapidity of hand. A priori we think there is much to be said for both mathematical and musical capacity being innate*. The former except in the case of geometrical drawing gives small training for the hand, but it does enable the owner of the capacity to realise more or less vividly a conception of the desired perfect maze description. On the other hand music not only gives much finger practice, but in the case of special ability probably signifies an inherited flexibility of hand. In our division of the material we have made only two classes—those of the students who possessed marked ability in music and in mathematics were separated into small classes from the remainder—the mediocre, the non-mathematical and the non-musical. We then applied the biserial method. TABLE VII. Association of Mathematical and Musical Capacity with Steadiness and Ramidity of Hand. Characteristics Maze I Maze II Maze ITI | Mathematical capacity and no. of bumps | - -112 4-139 | - -216+-136 | — -136 + -138 Musical capacity and no. of bumps... | — -091 + -170 | — -042 4-170 | — -088 + -170 Mathematical capacity and time taken ... | — -608+ -106 | — -390-+-126 | — -214 + -135 Musical capacity and time taken ... | — 803 + -162 | - -233 4-165 | — -029 + -170 | Mathematical capacity and no. of bumps | — -012 + [-090] | — -148 + [-088] | — -049 + [-090] for constant age ke ae Musical capacity and no. of bumps for | — -042 + [-090] + -012 + [-090] | — -042 + [-090] constant age | Mathematical capacity and time taken for | — -592 + [-059] | — -396 + [-076] | — -237 + [-085] constant age _ Musical capacity and time taken for con- | — -290 + [-083] | — -234 + [-085] | — -046 + [-090] ate © | stant age | Now none of the correlations of mathematical or musical capacity with steadiness of hand are in themselves significant, but as all six of them are of the same sign, we may possibly assert a slender absolute relation between both and steadiness of hand above the average. On the other hand the rapidity of hand shows at first definite and in the case of mathematics marked relation- ship at first with both mathematical and musical capacity. But this relationship seems rapidly to * The correlation of mathematical capacity with age was + -175 -- -137 and of musical capacity with ‘age + -098 + -169, which are satisfactory as showing that the teachers really judged capacity and not knowledge; they are as far as they go also some evidence for mathematical and musical capacity also being innate characteristics. Direct evidence for the hereditary character of musical capacity may be found of course in the pedigree of the Bach family. It is less demonstrated in the case of mathematics. but the Gregories might be cited, and possibly one or two recent instances will occur to those familiar with the Cambridge Tripos Lists. Miscellanea 177 diminish with experience of maze-tracing. It is probable that the mathematicians had at first an advantage which was not maintained. It may be suspected that they realised better at the outset what was required; an appreciation also of time taken may have been a factor at the outset with the musicians. But both these advantages seem to diminish as the non-mathematical and non- musical gain experience. The above remarks refer to the total correlations; when corrected for age we see that the conclusions for steadiness of hand are confirmed; even for constant age it has no sensible relationship to either mathematical or musical capacity. Rapidity of hand does show relationship with mathematics in a more marked, and with music in a less marked degree when we correct for age. But again both the associations are lessening with experience; for the third maze the superiority of the good mathematicians is only very moderate and the superiority of the good musicians has become insensible. There can be little doubt that the more marked superiority for Maze I was due to a better appreciation of what was needed in a novel task. (7) Conclusions. The material dealt with is admittedly slender, and was analysed only as a step towards more elaborate returns; it was made in order to determine what additional experiments should be tried. Our conclusions are therefore suggestive rather than dogmatic. We have not been able to associate in a marked degree rapidity and steadiness of hand as exhibited in maze- tracing with any training; we more than suspect them to be innate characteristics *. Good craft, mathematical ability and musical capacity seem to some not very marked extent associated with rapidity of hand, but it is noteworthy that even in these cases the advantage was rather an initial one and tended to weaken with experience. An apparently noteworthy point, which is well worth confirmation or contradiction, is that continued training may only just suffice to maintain a grade of efficiency, which deteriorates with age. It would be of much interest to demonstrate that training in some cases does not create or even develop a faculty, but maintains it at the higher range of efficiency which belongs to an earlier age. It is possible that the teacher cannot develop imagination in the later stages of youthful growth, but may be able to preserve the greater imagination of the child by proper training. Certain faculties may be most intense at certain stages of growth. If education seizes upon them at this age and maintains their then intensity, we may be apt to overlook their history, and suppose them created by the educational process. The point is worth a direct and more intensive investigation. Here it is only a suggestion. I have to thank Professor Pearson for his assistance during the preparation of this paper. II. Sur les moments de la fonction de corrélation normale de n variables}. Par SVERKER BERGSTROM, STOCKHOLM. 1. La fonction de corrélation normale de n variables peut s’écrire Vaprés le théoréme célébre de M. Pearson} 1 v=na=n R i Bt Ky Kc Z=— @ sone ig= ena : de eR ee ohn hee ri (1), ua (Qr)2 0109... Fn VR [* After thirty years’ experience in teaching in a drawing office I think it safe to say that within a fortnight it is possible to assert of the bulk of freshmen engineers whether or no they will be good draughtsmen at the end of their two to three years’ course. The power of rapid, steady, uniform bold work is there in germ or it is not. Knowledge of method and accuracy of result may be acquired, but only to a minor extent can anyone acquire that which distinguishes a good from a mediocre draughts- man. K. P.] [+ The present paper reached the editor later than the three other memoirs dealing with allied topics published in this part of the Journal, but the methods adopted are of sufficient interest to justify its appearance in association with those papers. | t Voir K. Pearson, Phil. Trans. t. 187 (1896) et t. 200 (1903). Biometrika x11 12 178 Miscellanea eS eycrrene cial ae IL coccee Ton | 7 oll Palle SSF eee eee eae 50) Pall Zalll scocoscenon Pee ate een cruoncca( 2). law, Unies Bedoooel | et R,, est le complément algébrique ou le mineur de l’élément dans Ja ligne pieme et la colonne gieme, Les variables _X,, sont mesurées & partir de ses valeurs moyennes et o, en sont les dispersions; rq désigne le coefficient de corrélation entre X, et X_, ot "pq = Taps Tipp le Le moment Paras sah est, par définition, ies [X,%X.0... Xj" 2dX, dX, dX, tv Cest la valeur moyenne de DG AGL ane Gets Pour simplifier ?écriture nous emploierons des coordonnées normales oO p Introduisons, de plus, les notations il aS ye JR La fonction de corrélation devient LESS z — S$ DDapgUpx pees eMart ticornoscococod= (EI: w (27)? Notre but sera a calculer la valeur moyenne | Oy Og ap AVENE ads,“ o00 Cam. Ib 2. Considérons Pintégrale ie eel a r= [Jo fa Lola ea ec dc Aton Aan. Ll est facile de vérifier que cette intégrale est uniformément convergente, pourvu que | Ang |>k>0, k étant un nombre fixe. En dérivant Vintégrale J par rapport a @,,*, on aura f Cay 2 Cp Ap+ Oq+1 eo J\= | oe fa CE RP ae Ca FRR Aen Petas hin e Goonoudcaoocud (lc Done, si J est une fonction comme de a,,, on pourra obtenir, par une dérivation, J,. Il s’ensuit que notre question se raméne au calcul des dérivées par rapport aux coefficients ap, de Vintégrale = ce | devas BAEC he ee iicctcenonnednanonanacrcasaonsocd ilo dont la valeur est, @aprés M. Pearson, ed I ee EP PUR Nn OM feisissoocol LU, ou D=||Apq | * Tl faut se rappeler que @pq=Aqp- ~T Ne] Miscellanea iI 3. Introduisons maintenant les notations A ee g He Re oa tate, Docawencauantevesncacsee (EL); ited Ott, a, a, me D, iia = g g Pr Pr eileto me NGG a One x 3 4 sae Lee, Y% PyIy “WpPy (hr) HN 0 0 Prepay nytg= HE (ga — bag —) cocvenscesevesscesstenseeenee( 2) Ey WZ 1 Dl Nous dirons que l’on calecule, par ce procédé, la dérivée symetrique de F. La définition sétend au cas oll p = g, pourvu que lon convienne que / OF ee Opp Dérivons ainsi successivement (8) et (9). , oe re ee Ga py: 5 la = M [%p, %a,] Ane D1 Mascaeeraeseovassteer eee Loy 77 3 Zo , , C4 7 IE Pyy? Pp, = [%p,a, Rial" Sins D194 P ry ~ Saaae Dy dy? Poi rreeereeee (LA) On est conduit a poser, tout généralement, 7 sk-1 M [Xp 4, ++ ® py q,| = UE. {(2k ~1)113F — (2k-3)N2E +... DQ acpi) NS Oo DMT eh DEMS cssnees (15), ou, pour abréger, (29+ 1j!!= (29+ 1) (2p - 1) - 3.1......... Ren Haase src on ..(16), 4 2 R ay A iii et ott 2), désigne une somme de termes contenant 7 dérivées symétriques divisées par D et dont la P ; ‘ : 4 somme des exposants svmboliques est égale 4 k; il y a dans >, autant de tels termes que l’on puisse grouper en 7 groupes k éléments, en ne tenant compte de l’ordre des groupes égaux. Ce nombre est i! S i, = 2 eagle as Neat seein cee cise coterie Seewente ites ou Qpriage eee | Mya, + Mgag+...=k eatece mene eure esis Rewsiosete season LS))s My HM, +...=4 | La formule (15) peut se vérifier par la démonstration par récurrence. Nous n’y insistons pas ; elle ne présente aucune difficulte. 4. Il faut considérer une dérivée partielle dont se composent, en premier lieu, les dérivées Avena O =) symétriques et puis les sommes 2h . i) A okD Pods La dérivée z Op, a, 0a - On Tele doit étre, évidemment, au signe prés peut-étre, égale au complément algébrique du terme Apa, Upyay ++ Up ay,* Le signe s’obtient par ’évaluation du nombre des inversions dans p,, py, «+», Py—Soit 7—et dans q,, Jo, +++» U-—Soit 0’. Or, en vertu du théoréme bien connu sur la connexité entre un mineur d’un déterminant et celui de son réciproque, et & cause de Videntité 1 D=5, 180 Miscellanea ce complément algébrique peut s’écrire W192 +++ Afr 1 pad... aK Rurk-1 a Pa Pk Rr-k Did. Dye R é ot le déterminant au numérateur est formé par les éléments communs aux lignes 1, Po, ... Dg et aux colonnes g,, do, --. gq, du déterminant R (2). oD Dido ++ Wh Done PR oe SRL eS nniaiobdobenoondovcosceal I). Ody a4 mae Op a4, R ) On pourra d’ailleurs éviter la considération du nombre des inversions en substituant a P Wido+++ Vk Di D2 Dk un déterminant dont la diagonale principale consiste des éléments , Py? DyIy > aro) Poly? et que nous désignerons en échangeant la lettre # contre Y. Permutons, en effet, dans le déter- minant Q des lignes ou des colonnes: on ne change que le signe. Les indices du déterminant R navant pas d’inversion, on voit que NGa++ Ip ay) WNd2++- Vk pe ies 1) Bn, py ; okD 1 yee Uk On aura done enfin Sane ee a ee [ Escntslgne ceeeionaen eee (OU CET py DOGG yr EDs aetna, — Hinetie, Wo Ua ahy LG iiss, PO aie "Dd Pydg 1° PLE Si deux des nombres p (ou des qg) sont égaux, le déterminant (21) contiendra deux lignes (ou colonnes) égales: il s’évanouira; et, en effet, la dérivée seconde d’un déterminant par rapport a deux éléments appartenant a la méme ligne (ou & la méme colonne) est nulle. La dérivée symétrique du kiéme ordre consistera d’aprés (12) de 2* dérivées simples, telles que (20). On les trouvera en faisant changer de place les p et les ¢g 4 méme indice de toutes les maniéres possibles. En désignant ces changements par S, on aura k 1 dove Me ps ) os (on% qk 2 P\P2-+6 Dk hee ie \ lieu ee eee I faut remarquer que le terme formé de la diagonale principale de (21) se reproduira une fois et une seule dans chaque dérivée simple; il y en aura 2* dans la dérivée (22). 5. Revenons a lexpression (15). Elle va se simplifier. Tout d’abord 20 = JD. Puis, considérons un terme dans 2s soit par exemple D™ pm — pli) D Dee apa BA RSSEBSHOR Besbpdndnohonbsecodnooccsasend. (2S) M+ Nat... + = k. (n) eS a Ga, Fa, *** Lay ) 3 D ‘Da,Pay** Pay Cest un polynéme en 7, homogéne et de l’ordre n. D’aprés (22) Cela étant, on voit aisément que l’éxpression (23), homogéne aussi, est de l’ordre Nyt Ny +... + N=k. Miscellanea 181 Le terme Tpit? "Pally 17) MDpdy, TIT Tt titers tte etesteeneeetecees ees y aura le coefficient gmtet Mi — ok, Done le terme (24) obtiendra dans ey le coefficient 2” 1, , Généralement, en mettant dans l’expression (15), on voit qu’elle est une fonction homogéne de lordre k en r. La nature du probléme exige que cette fonction est aussi symétrique. Permutons, en effet, les indices du premier membre de (15); le résultat ne devant pas changer pour cela, le terme (24), y permutés les indices de la méme manieére, conserve le méme coefficient. 6. Tl ne nous faut donc considérer qu’un seul terme de (15), soit (24). Ce coefficient est (Qe Ill g’ = (2k—3) (ea 1+ (2h 5) 112g" 7... a 9"-1 gl k k k k t=k ane eS (Ihe Moron yar ge cas (25). i=1 Je dis que celte expression est égale a Punité, quel que soit k. Je vais employer le raisonnement par récurrence. L’assertion étant évidente pour k = 1, je la suppose démontrée pour /, et nous allons voir qu’elle subsistera encore pour k + 1, c’est-’-dire que expression i-k+1 . A Be 23 (-1) @k-2i +3) 1133 geen ere (26) sera égale a lunité. . i Envisageons d’abord les nombres ¢, . a 1 Cherchons In en supposant connu g) . Ou Pélément (n + 1)iéme doit étre placé dans l'un des 7 groupes formés de » éléments: il y en a ct ig. éventualités. Ou cet élément seul doit faire un groupe: les autres éléments formant alors (¢- 1) groupes: on aura ainsi i-1 . In éventualités. pS i sags i ta On D’ot la formule 1 al (27). En convenant que ga. = 0, 0, cette formule subsiste encore pour 1=n+1 et ale L’expression (26) peut s’écrire maintenant, en vertu de (27), i=k+1 ee -_ BS (1)? (2k - 2643) 1122 (k— 4 4.2) gh TF 4 gh i=1 c é t=-kt+1 ; ; ; k-i+2 = S&S (-1) 1 (2k -214+3)!1!2'-! (k-1+2) 7, t= 2 i=k : aoe + & (-I)*? (2h -— 20+3)!! 2-2 9 ; i=1 Changeant dans la premiére somme 7 en (7 + 1), on trouvera enfin pour (26) izk ee Z (-1)1 (Qk-2i 41) 1121 {2 (k- 641) + (2k-2i+3)} gO t=1 fe Ce n’est autre chose que (25); done la somme (26) est bien égale & Punité. ©¢.Q.F.D. 182 Miscellanea 7. En tenant compte de ceci nous écrivons pour (15) la formule générale M [tp %p, vee UK 1 Xx] = Strap 2 ay "Pasa, Pay, Pay. On obtient les indices des 7 dans cette somme en groupant les nombres p en k groupes deux a deux sans tenir compte de ordre ni des groupes ni des éléments d’un méme groupe. La somme (28) portera done en tout (2k)! oF i! =(2k-1)!! termes. Si quelques p sont égaux, on doit les imaginer affectés des indices; au résultat final on se rappelle que Soit, par exemple, Di—h)s —wec— sors On retrouve Ja formule connue fol —lg? — tke 2" dr =(2Qk—1)!!. | 20 8. Envisageons le cas général IM, [ecto n Pe D’aprés la formule (28) cette valeur moyenne peut s’écrire i=pj=p (n) S-Ay, TN Tbe iy aj one accuse oc eee senete nee eeemeentteceeen (2c) n t=17>% ou ae sont les solutions du systeme C43 + Cyo + ee + Cy = AY Cyo + Coq + eee + Can = Ag (30) lip + lop + os + Opp = ap e,; devant, de plus, étre des nombres pairs. Considérons A,,. On doit ordonner de toutes les maniéres possibles a; éléments en p groupes contenant respec- tivement e,;, 29), «-+) pj éléments, en tenant compte de ordre des groupes; il y en aura a;! toa a C17! Cog l -.- Cpy! maniéres. Puis on doit accoupler e;; éléments avec autant d’éléments; c’est ei; ! manicres. Enfin on doit ordonner e;; éléments en $e, groupes deux a deux, en ne tenant pas compte de Vordre des groupes. On en aura (ey = 1) it possibilités. D’ot, aprés des réductions, ee NE oe ciara eee Ode sayy el ai a) M [xy Lo” ... ty PJ=S vat a0 =1j> ot e;; sont les solutions du systéme (30) et ott ej !t= ey; (e;- 2)... 4. 2, Ove Traitons comme application le cas p = 2. On voit que ey. est au plus égal au plus petit des nombres a, et ag, soit ay. Il fait Miscellanea 183 ! ! ell : , a, a, ay: a, a: ag (ag ) acs o if fa. 2?) = r seo eee ot (Gracy oe (ay —ag+2)!! 21! z ! » (ag —1) (ag—2) (a. -3 = ieee AN Ga UNG 2) ae 9) ye Geese Sn, aM Or (32) + - = = - ——s (a; —ag+4)!! 4!! Pécris enfin les moments jusquau sixiéme ordre, en employant la notation igs a : (Cape tree! a Mey con hill Bg = L; By = Nie. Ba 3: Bs, = Big = 872, Bog = 277, + 1; Bory = 279713 + M235 Briar = "12% + Ty3%24 + Til o3- Bg = 15; Moe Sra 2n a tos, Sag — OF 15 + O%aes @ ox g | Bay = 12713 + 8723, Bye. = 67" y2e713 + Gryer23 + 3713, Bove = 8713723 + 27742 + 2771, + 27%. +15 Bit Or yislia + BPP 34 + BP y3%o4 + BP yyh235 Poors = 277 12% 2a + AM eT aon + AM ean + 2 ae 27 o5%o4 + 345 Bora = 20 yo 345 + 2 po a'35 + 2 oP isl34 + Wy alos + 2Wisisoa + 2 aM soa + Voalas + Coal’ss + 95"343 Braga = 12273456 + 719724756 + --- (15 termes). Les moments jusqu’a Pordre 4 a deux variables ont été calculés par M. Pearson et puis par M. Soper qui a donné une formule générale pour les moments & deux variables; Biometrika, t. 1X, 1913, p. 101. (On doit remarquer cependant que sa formule (xxxIt) est atteinte (une erreur typographique.) Le cas de trois variables a été traité par M. Wicksell dans ‘‘The general characteristics of the frequency function of stellar movements, etc.”’ Lund 1915, p. 11. Enfin M. Isserlis a déduit notre formule (28) pour le cas 2 = 4 ou de quatre variables dans Biometrika, t. x1, 1916, p. 189. III. Formulae for determining the Mean Values of Products of Devia- tions of mixed Moment Coefficients in two to eight Variables in Samples taken from a limited Population. By L. ISSERLIS, D.Sc.* A. Let py, p3, denote the product moment coefficients, referred to a fixed origin in a sample of size n extracted out of a population of size N, there being four independent variables 2,, xy, %3, %,. The mean value of p,, in many samples is Pj), the corresponding product moment coefficient for the sampled population. Let dp, = py, — Pj,, denote the deviation of the moment coefficient of the sample from its mean value, then Mean value of dp,. dp, in many samples = X (press — P12P3a)s N-yn ee Ne and pjo34 is the product moment coefficient with respect to the four variables. where [* Dr Isserlis sent me a paper containing the results of the present note with others accompanied by proofs in the course of 1916. It has been impossible to publish that paper so far, but it is only fair to him, having regard to the fact that other investigators are now entering this field, to publish his formulae in association with the memoirs printed in this part of Biometrika. Eprror.] 184 Miscellanea This result gives many particular cases if we identify two or more of the variables. For instance Mean value of dp,2 dp22= (pys2 — P12 P2) é E n where p, is the second moment coefficient with regard to x, ie. p’, in the usual notation for one variable and so forth. Also Mean value of (dp,z)* = . (p11 — py"). B. Similarly if there are 6 independent variables x,, %, ... %5, U- The mean value of dpy,y dpz4 IPs6 , XX ' = f ; ¢ Pe ee =p (P123456 ~ Pi234P56 — Psas6eP12 — Pirs6P2a + 2D 12 P31 P565> , , N - 2n where ==> N-2 Particular cases are , Mean value of (dpy)® = os (ps — 3p)4 Dy + 2p), or Mean value of (dj’,)* = os (ug — Bug's + Qy’.°), and Mean value of dpy.dp25¢Pa = 1x [P122232 — P1223 Pai ~ Pes Piz — Ps1%2 P22 + 2719 P23 Pai l- C. Let there be 8 independent variables x1, v2, ... %7, Xs. The mean value of dp, dpo5 UPs5 P75 = we [ P1234 Pse7s + P1256 Psazs + Piz7s Passe — Pizea Ps6 Pas — Psozs Pi2 Pa — Prose Poa Ps ~— Psa7s Piz Pss — Pr27s Psa Pss — Psass Piz P7a + 3P12 Psa Pso Pr) ; xp ar oh [ Prosasezs + Pisa Poors + P1256 P3a7s + P1278 P3456 — Pi2zas6e Pos — Piraazs Poe — P125678 P3a — P345678 Pre); where gh = 2 - 4y’ + 3x” + 4y’/n - 6y”/n, e —1+ 3y’ - 2y" — By’/n + 4y”’/n, and x” = (N - 3n)(N - 3). When the sampled population is infinitely great oe =x=xX =X =L GH=1- Ain. As a particular case, / Mean value of (dpy'=*$ (3p,2 — 6pys py? + Spy4) + ae (pis + Spy? — 4py5 py), a a or in the notation usual for a singie variable, Mean value of (dp’s)* = we (By’,2 — 6’, pe’? + 3p’y4) + Me (we + Bp’y? — 45 H’o)- When the sampled population is normal the results of (A), (B) and (() can be immediately ex- pressed in terms of the correlation coefficients 715, 713, «+773, by means of the formulae established on p. 138 in the current issue of this journal. + J UNIVERSITY OF LONDON, UNIVERSITY COLLEGE | . The Biometric Laboratory (assisted by a grant from the Worshipful Company ‘of Drapers) Until the phenomena of any branch of knowledge have been submitted to measurement and number it cannot assume the status and dignity of @ science. FRANCIS GALTON. Under ‘the direction of Professor KARL PEARSON, F.R.S._ Assis- tant: Ereanor ParrMAN, M.A.; Crewdson Benington Student in Anthropometry : Miriam L. TILDESLEY. ‘This laboratory provides a complete training in modern statistical methods and is especially arranged so as to assist research workers engaged on biometric eS The Francis Galton Eugenics Laboratory National Eugenics is the study of agencies under social control, that may wmprove ov impair the racial qualities of future generations, either physically or mentally. The Laboratory was founded by Sir FRANCIS GALTON and is under the supervision of Professor KARL PEARSON, F.R.S. Assistants: Galton Research Fellow: Etuet M. ELDERTON ; Mary SEEGAR, B.Sc., M. Nog Karn. Secretary: E. AuGustTA JONES. It was the intention of the Founder, that the Laboratory should serve (i) as a storehouse of statistical material bearing on the mental and physical conditions in man, and the relation of these conditions to inheritante and environment; (ii) as a centre for the publication or other form bf distribution of information concerning National Eugenics; (ili) as a school for training and assisting research-workers in special problems in. Eugenics. | Short courses are provided for those who are engaged in, social, medical, or anthropometric work. At press and to appear shortly : A Monogtaph on the English Long Bones: By Kari PEARSON, F.R.S. and Juria Bert, M.A. Pat I. The Femur. Text Part I and Atlas of Plates Part I. (All Rights reserved). BIOMETRIKA. Vol. XII, Parts I and Il CONTENTS. TI. On the Standard Deviations of Adjusted and Interpolated Values of an — Observed Polynomial Function and its Constants and the Guidance they give towards a Proper Choice of the Distribution of Observations. x By Krrsting Smits, Corenbeera ; : we ze Z PSU Nad = y ‘ ES : in ee "3 emeee re El dee TEs Bes enn ee Boe i MeN yh bret Sie MR se a a Se eae ae Il. On the Product-Moments of Various Orders of che Normal : Gorrelatiba’ ” Surface of Two Variates. By K. Pearson and A. W. Youna.. . 8 III. The Correlation, Coefficient of a eee Table. ee A. Rrneute-Scom, oe BSe. : s : ; : : : mu TV. On a Formula for the Product-Moment Coefficient of any Order of ee : Normal Frequency Distribution in any Number of Variables, By: -L. Isseruis, D.Sc. . Sy rue : 4 i ’ : V. On the Mathematical Expectation of the Moments: of i recraeauy bee tributions. By Professor Ar. A. TcHouprorr of Petrograd. Part reel Miscellanea : a a ee (i) Preliminary Note on the AYR ue le of Steadiness and aii of ae ¥ Fi Gavehaty Stay Rees * with Artistic Capacity. By M. LL. ‘TILDESLEY . : .f Be kine oa | (ii) - Sur les moments de la’ fonction de corrélation normale de 7 variables. Wei ee cans Par SvERKER BerestROM, Stockholm ~ . é : : ey ele (iii) Formulae for determining the Mean Values of Products of Deviations pee Pibee mixed Moment Coefficients in two to eight Variables in Samples taken / from a limited Population. By L. Isseruts, D.Sc. . ‘ ieee BS. x yaghaca eit: The publication of a paper in Biometrika marks that in the Editor’s opinion it dackains either in ue method or material something of interest to biometricians. But the Editor desires it to be distinctly understood that such publication does not mark assent to the arguments used or to the conclusions drawn in the paper. Biometrika appears about four times a year. , A volume containing about-500 pages, with pe and | tables, is issued annually.’ Papers for publication and books and offprints for notice should be pene to Professor Karn Prarson, he ox University College, London. It is @ condition of publication in Biometrika that the yaper shall not per i already have been issued elsewhere, and will not be reprinted without leave of the Editor. It is a very desirable that a copy of all measurements made, not necessarily for publication, Youle accom- pany each manuscript. In all cases the papers themselves should contain not only/the calculated constants, but the distributions from which they have been deduced. Diagrams and drawings should be sent in a state suitable for direct photographic reproduction, and if on decimal paper it should be blue ruled, and the lettering only pencilled. Papers will be accepted in French, Italian or German, In the last case the msnuseript should be Z in Roman not German characters. Russian contributors may use Russian bat) their papers will be | translated into English before publication. eat | Contributors receive 25 copies of their papers free. Fifty additional copies may be had on Vy payment of 7/- per sheet of eight pages, or part of a sheet of eight pages, with an’ extra charge fori fabian ae Plates; these should be ordered when the final proof is returned, eet q The subscription price, payable in advance, is 30s. net per volume (post free): single nu mbers eater: 10s. net. Volumes I—XI (1902—17) complete, 30s. net per volume. Bound in Buckram 37/6 a) A : volume. Index to:Volumes I to V, 2s. net. Subseriptions may be sent to C.F. Clay, Cambrid versity Press, Fetter Lane, London, E.C; 4, either direct or through any bookseller, and PRE ah Gs respecting advertisements should also be addressed to'C. F. Clay. Cases for binding volumes can be, supplied at 3s. 6d. net each. | Till further notice, new subscribers to Biometrika may obtain Vols. 1—XT together, in paper-covered “parts ox £13 net. Subgeribers who prefer it, can have these volumes hound i in buckram at 78. ‘y net per volume. The Cambridge University Press has appointed the University of thibade Press Agents for the sale of Biometrika in the United States of America, and has authorised them to charget the following prices: —_— $7.50 net per volume; single parts $2.50 net each. . : PS Ng CAMBRIDGE: PRINTED BY J. B. PHACK, M.A:, AT THE UNIVERSITY PRESS. : Bi) ts “FOUNDED HY. Pe SO ar ani FRANCIS. GALTON anp KARL PEARSON EDITED 1 By pine hand a | ae a Seis ARES . FE ASOMAT bic ym KARL ‘PEARSON ee ee Le aw ae Te shes a 1919 " own DEC 23 1913 CHAPTER Il National Mt Ce I (1) Let us put 12 Ey = [X; o X (yy |" = Vy, (N)crerreserscreesceecsseccoens (1). We have Y4,(N) = 0. Noting that Ge aX = a ett, we find vw) = B[X;-Xw) | = L[X,- XI. Replacing X,— X,y, by (X,—m,) —(X (yw) — m,), we find rol . - 7 . = . . Vr, (N) = Me + = (— 1) C0 B(Xy — my) (X wy — mY + (— LY Be, cy: Jj= ; 1 ee, But [X (w) — mf = Wi (eos m,) + (N — 1) (X 1) — mm), h x Se where (N-1) = Noma os i Hence | “eave ve h h | Vy, (N) = Pe + = Oe = Cf (NM 1) Mya oa, —1) + (— DY br, — | =1(— 1)! rej N= mit SS int Ss & 1p! ea hess (iy Nr Fe | + CP (Nt ten Pn, (y-1) + (N — 1)" p,, wo} ie N-1y S = a |b te (— 1)’ (Ore Hr-h Fh, w-yt(- 1) Hrcr-o} N-lv ¢ = (AGA) EUR OE pra mi arn J On the other hand, replacing X,—X(y) by J Bet Ni N [CN = TX CV ety = ars © [X,— X~w-1], we find: N-ly UN (Az) 2 (a OP pee aL ee) a ea (3). Biometrika xu 186 Expectation of Moments of Frequency Distributions Replacing in (2) quantities of type j,(y-1) by their values in terms of the quantities « (cf. Chapter I, formulae (15), (17) and (18)) we obtain : Ne Ne ah "SI 1 rn, ay=( N ) {be +r S Cs . Horan Ci Z1y4, | (- 1) Br—isy, j Tn, jeg i=0 2h+1 Lae ot = sc. C; b2 he, > Sh(N = 1H = ee 1) Brin Pons, h-i+j (4), r-1 j a6 ek (N—1y lyr = (= ly Brij / a iNe= 1 or+1 \ Vor+i, (N) = ( —z-) |b | ee Qh 5 1 - pa Ord tat ah WSR? = (— 1) Braisi,j Pon, nmis 5), r—-1 Dh+1 h-1 1 ce ~ = Csi Meor—2h = (N— — 1) , S (= Ey Le) —itj,j Pops, ves r—-1 1 & ; | = 2 Wop *, (—1) Bp -igs,j Porta, it } and, hence, Vy, (N) = hr — FF lr a trl] Mr-2 He] f 1 1 ©). a We jae Br — 118) ppg fg — $7) ppg s+ FTA ppg us| +... As N increases the ratio 2“ tends in this manner to the limit 1. iP (2) Putting r= 2, 3, 4, 5, 6, in (2) we find without difficulty N= 1 M2, (N) = yy Ha = Baan (N —1)(N—2) 3 2 V3, (N) = NW: Bs = Ms — 77 Ms = Ws aes 2 ) 34 2|* V4, (N) = bs — a by — Bo? ] + Hye a — Spe ile = [Hs =< ~ 3p; | V5, (N) = ie 2 Ms Me] ras Ne Die Bis Mo| — De 8h3 Me] (7). + = [os — 10s fe] = 3 5 2 3 ¥6, (N) = Hs — 75 [2Hs — 5 pts Me] + qy2 L8He — 15 os He — Aes? + Opa ] 5 re — py l4Hs — BB pts oe — 16 pug + 42 p08] + aa [Bp5 — 36, me? — 225° + 63 p10" Seat SS See ee eee 5 — ays Les — 15 pes fe — 10 pus? + 800") * For footnote see page 187. Au. A. TCHOUPROFF 187 (3) If the law of distribution of the values of X satisfies the condition : fa — 0; ati Oh Dee Op, then, as we saw above, feisi(wy=0, 7=0, 1, 2,...00 and, as appears from (5), Voigs,(y) = 0, 1 = 0, 1, 2,... 00. When the law of distribution of the values of X is Gaussian and p,=1.3.5...(20—1)p.', c=1, 2,... 00, then, as we have seen Poiwy= 1.3.5... (20-1) pa" * and, consequently, N—1\% 2 oh V9i, (N) = (=) = On Moi—eh Meh, (N—1) _(M HU" & pu 18.5... @i-2h-1)1.8.5...(h-1) =( nao 2 Oe) - N—-1\" : il =( ) 18.5 20 1) ay S = eae \ | | | oo CF i Poi = (N — 1)! wai, cn : II (1) Let us put , Le Yr) = We [Xi - Xo] o ap Thnnee senses seeeeneenes (9). Ww”, = 2 (Xi-Xwnl'}"| | We have: vs 1 (1) ; EV’, on = Yr.) = W Wow) | , Mm | Ely’, al” = ali : eee (10). y ? sig se i oe Ely. ay — ran]? = ae 1 amb Om We oxy Yan | (2) When m=2 we find: ® _»s Sry i Wan HIS = [xX og Beet) =E2 2 [X; -Xoyh + BS 2 2 [4i~ Xonl [X;—-X i] | jt = Nv2p, (yy + nn by) 2 | xX Xonl [X.-— Xi] * The quantity v4 (x) can also be written in the form N —-1) (N-2) N-1)(N-3 a(t = Deg Oe ) ta Buy") +3 ro Mees ¥4, (N)= R. Henderson (‘‘Frequency Curves and Moments,” J. I. 4., 1907), while giving correctly the values B2,(N)s #3,(N)s M4,(N)y ¥2,(N) aNd v3, (1), erroneously gives (p. 435) : N-1)(N-2 N-1)(N-3 2 V4, m= Ft Te - t a aT 3 p12”). 3(N-1)- , The true value of v4.) exceeds the value obtained by Henderson b —a 13—2 188 Hapectation of Moments of Frequency Distributions reviiare Niji oe we have | | [X,— Xun] X2— Xen] = ya LV 1)(X,—m) — (X= m) — (N= 2)(X (yay = 1m) [CV = 1) (Xa = ma) — (Xa — ma) — (N= 2) (Kava = m,)]} =/(%,X)- (Py = [(X1 —m,) + (X2—m)] |X ww y=) (ee = [X (v-2) -—m P, where F(X, X2) =[Xr— 7m] [X. —m,] — SSS Xe —m,)+ (X,—m,)}. Hence: BX, — Xi) [Xe- Xo] bee N-2 =8 (ft, Xo 3 Cp cher. xar Ge) x {[((X — my) + (Xe — m)] [X w-2) — 2m] — [Xw—2) — ml?) : N—-2\? - Tate = E (F(X. Xl +0 (ee) oa,ar-n BK XO a OR ee LN Ne eeu 4 es 43-3 OO (= =) sin, ev -2y EL f (X,,Xo)}""* [(Xp— my) +(Xa— m4) PE. h=2 k=h N But (see Chapter I (15) and (16)) Ent. (2)-1 1 fa AT => sy s 1 k sted oh k A +9 Bay? = (N — DA Ey Ey eye Ge y Ban, GF 5) ith k, Ent. (E) i+, 5 ee & 2 Nee Eif (X,, X,)}* => > (= yr! Ne y ct ap Hosta Ms—f+g> S=0 g=0 E{f(X,, X2)}* ee —m,)+(X,— m)Pr* Sh 2h+2f-k GQaaly ail \f 7 => py (— WOO: h Con -497-k Wile Msthaf—k—g Ps—h-ftg- f=0 g=0 : Hence: 7 Te r N- N(N-1) BX, — Xen (X, —- Xin = Z xe vy pO Cyt trts rel (N 1)" (N=2) + O, py rae (j ee C1 Oop Mrmr feg Mra ift9 * The development of E {f (X,X»)}"—*[(X,— my) + (X2—- my) *-* in powers of N does not contain any terms of higher degree than N° while at the same time the development of uj, (y_») contains no terms of higher degree than To obtain Ely, (Ny) 7, (wy)? correct to 1/N# it is con- 1 ‘ k+1) * N kat. (=) sequently sufficient to carry our calculations as far as k=2t. Au. A. TCHOUPROFF 189 r—2 2f4+2 ; N pee i ae, (N— 2) 2 uP af OF Ke e i (- ye Ces Cas ( Nort braf—g br-2-f+9 S=0 g= k(orr) r—h 2h+2f-k h pyk-h vf g Ss we k+f+j S Bs bee CO, Cr 3 Os op 3 z=0 )=0 h=Ent k+1 f=0 g=0 a : k+l\_. (N— 1H (N — 2) Ent. (F*)-# a eS Neth Brant. (5)-é+i, j +k, Ent. (E) iti Mr+h+f—k—g Pr-h—f+g € = N? pw? — (per? + 203" [Pings Pra + Hy] ae apes be — 20? [Mr rae + H,-1] fo} + {OP [Apa Bra + Ayr) +e [2brss bro + Sérgr Mra + 6p,’ | — 80,7 pyr pa 207) Oya [Hr More + Mya] Me — 2077 One [Mts Mr—s + Aor Meme + 3 py 7-1] Me — LAC? pp Myo Me — 140? 7,1 pe — 4c? Pr-i Pr—2 B3 — 28 [My Mrs + B pra Mya] ls + BCP WP pig Me? + 18C,9 [Mya Mra + Pe pao] fo” + 608 [Mp Bria + Aya yes + 3p’ ,..2] pee} +.... Substituting in (11) and replacing v,,(y) in (11) by its value in terms of the ws by formula (4), we find after reduction : W = N22 + N {pop — (27 +1) we — QP png pa ) , (N) | + pg fy +P (1 = 1) py fps Ha} iz {2a — 7 (2r— 1) papa ba — 49? fpr ra — 7(897 + 1) py? — 17 — 1) Mp ys Mrs | + 37? wp Mote (7 — 1) (40 +1) py Myo Ho FO fp ros Mo | (12), +9? (7-1) pra roe Ms rl} 9 9.9 9 9 9 a eon Mp rs ls Br? (r — LP pepe oe? — 9° (7 — 1) (7 — 2) php Mrs Me” Ste yl] z eee hrs ut +... and hence P Lee) 2 \ E [vs 7) — OP = ae We, ay — >, ay | 1 7 | = ay (Mor = Me? — 20 pers Mra + 7° Wa Ma} | 1 és — 7 27 pop — 7 (27 — 1) pope Me — 7 (7 — 1) rye Mpg — AP? gr Mp (13). +15) pas Mpms Mo i —7 (+2) wP+2 (P+ 1) 7 (9 1) Be Mpae b+ 30 Wp ho F.°(9 1) opr bya Ms | r(r-1ly , = 18 (P= 1) (P= 2) pop prs pat — Pe, pt ie 190 He«pectation of Moments of Frequency Distributions (3) When n=83 and n =4, we have similarly : We ay = Nv, a) + 8N (M1) ELX, — Xe [Xe— Xe +N(N-1)(N — 2). 4[X,- Xl [X2- Xl [4s - Xo! Wm Nou, + ANN — 1) BUX Xen’ a Xen + ce 1) LX, — Xwwy}" [X2- Xo ]” +6N (N —1)(N— 2) B[X, — Xe)" [Xe — Xe [Xs— Xl +N(N-1) (NV - ave 3) a — X wy" [X.— Xi) ]" [X3 — X wy ]" [Xa - X yy |". Determining we and we ee from these relations and substituting the a values found in (10), we obtain 'E [vw — Yew) and £ [v’,, «wy — vp, an] expression exhibits no special difficulties, but is so unwieldy that I do not give them here, contenting myself with the deduction of £[v’,, (w) — y», E [v’,,«v) — Yr, xy |t which will be shown below (see Chapter IV, § mt). — Il (1) Noting that cy) > and N N 3 S [X¥;-X (iy P= = eee —m,) “Vl £an —m P= = Kimmy FE (X;- m) |. 1=1 t= and putting Ty Ns (X;- m)| | > (X;- m)| i=1 rif 8 Xin Xu} | [= i m) | Z s) a (nN) — Oe s) r a we find Gn) ‘a ~ m Wea -E| = ee (m) ee h 1 Gt aa h, 2h) 1" Nm — Vi, (N) + ee (- 1) Nk m 2, (N) +(- ) 2m, ( N) = (m) ag im, Vom = Al > (X;—m} | a i=1 (m) moi h (m-—h, 2h) =W, , (N) ae CG, as (N) + N™ M2m, (N) (m) (m) ee h 1 7 Va s ae or We m= sy ee Ce Wh (m—h, 2h) 4, (Y) and, on the other hand, ys) St Qh 1 po-hs+2h) e Us. (NY) i C. Ni Z,. (N) ove * Popts,(N)) (r, 8) "Ss (rh, Ca Brit amy (N) = 2 ve 1)! Cr ait (N) + (— 1) NTF poras, (NY » l= —N™ Mom,(N) srreeeeee | won | Their Au. A. TCHOUPROFF 191 whence: os Oh ry aan + (yt OE | + We stores, an [1+ (= 1Y]=0...(07). (2) Noting that N r N 8 Oy | = mye > (X;—m)| [X= m)+ = (Xi—m)| i=2 i=2 2, (N) Sal. . = Ports + = C; Poprs—j (NM — 1) py, (—1) + Mar (N — 1) ps, w-1) fe TaD Ai (h) eee J (h, j) ap > Cc. Mor+s--2h Vs ,(N— 1) + > > Ge C Mor+s—2h—j Ue (N-1) h=1 h=1 j=1 i. ue (r) s (7,3) (7, 8) += Gs Moar—2h us ya Ms Ves (N- yt RS C Bs-j Us (N-1) + Ue , (N- 1)’ h=1 ~ j=l we find : (7, 8) (7, 8 J a i h 7 (h) Usa — Ug ix = ECF tenon OF —1Y ty, v1) + Or Maresh Vaw-p j=0 i= Se : ‘ r—-1 = h j (h, j) h (h,s) « a = CO, Morssonj Uy "yy + a C,, Meraat Us waa) b=1 j= n= and hence: gy” Ss) 2 C = N j > a s y 2, (N) — Mor+s—j = ( —f). Bj,(N-f) ae ee r Mor+s—2h = 2,(N—f) | Sys (h, 8) Soh had te) + = Citeeean S Us n-. pt 2 SSO, CO) borzsonji & Uo w_py h=1 0 =1 j= 1 fal ane here (see Chapter I (14) and Chapter IT (10)) Fe | Ent. (4)-1 oe (2)-#+1] i NF) np > j Tj, unt. (£)-i fo Sy () Sc 9) Sy $ NEON TO Dies, oped 2087? and noting that (0, 8) To an = N* Ms, u” 0) = y RCN) ae. 2)? 1, S h Une = W luoiet 3 CO, Bevan (NV — 1) pao} 2,(N) =Nisyet & NEV 3, pb; m= = pF, j=l j=l k m ; SI n q . / — q 2 4 Pe = 2 pi [G— ms br = aad [&;— ma)’, = I= Xx) = & pj & = my. But En; = Np; | En? = Np; + NU p? = N2p? + Np; (1 — pi) | En? = Np; + 3NW™! p2+ NW p? | = N*pé+3N* p? (1 —pi) + Np: (1 — pi) 1 — 2pi) + (1), Ens = Np; + 7N |p? + 6N p3 + NO ps | = N*p#+ 6N*p3 (1 —p;) + N? p2 (1 —pi)(7 — 11p,) + Np;(1 — pi) A — 6p; + 6p?) ) and in general, as is not difficult to see*, r oe r—h ; ' Eng = = NIV Gs 9 Dil ae N* pi 2 Cl a, ngBisy7 Pe os.00: (2). si h=1 : =0 Further, denoting by P) the probability of n; taking the value h, and by hae the conditional mathematical expectation of n; on the assumption that n; J takes the value h, we find: | Os pi (1 —p)*-*, (h) ais ms Pj Ez, =(N h)y — pi’ N iy ote! ce ae P Enyn; = > PLE = % (N-A)AC, p? A — pi)’ hn, = N(N—1) pip;. h=0 ee: =0 Similarly we obtain : En, n,n, =N(N—-1)N=2).pi, pin Pig | En; 1, ... ui, = N(N—-1)...(N—k+1) py, pi, -.. Diz J * See my paper, ‘‘ On the Mathematical Expectation of a Positive Integral Power of the Difference between the Frequency and the Probability of an Event,” in the Proceedings of the Petrograd Poly- technic Institute. 196 Hapectation of Moments of Frequency Distributions En 2 ny = VI] Pi Pj db Nt) PEP; \ = N* p? pj + N°? pip; 1 — 8p;) — Npip; (1 — 2p;) En2n? = NU) p,p; + NM! p; p; (pit pj) + NW p? p? | = N'p? py + N°pip;( pit pj— Spipj) +N *p; pj (1 — 3p;—3p; +11 p;p;) — Np; pj (1 — 2p; — 2p; + 6pip;) En?n, = NO lp, pj+3NO! p2 p+ NU pe p; » (4), = N*p? pj +3? p? p; (1 — 2p,) + N? pi pj — 9p; + 1p?) — Npi pj 1 — 6p: + 6p?) En? nj ry = NO pipypr + N™ pe? pj pa = N+ p? pj pa + N* pip; pr — 6pi) — N? pip; pr (8 — 11 pi) + 2N pip; Pr (1 = Spi) a Vo manna S&S > Ni-Gt highs EAE gs 2p a Metra: es ngs Ore: ba De! Dj teaas oaceuaeet ee (5), y= bg = and in the general case : ry up Vk epee rs Sess = (Ity-Higb. thy) uhpiy ks I Ein,” 1," Se Pea arcs Wve #1 Wee hy Ores fig o> Org hy Dig Die enue i= t= j= If the numbers 7,;,, ”;,,... %i,, referred to k series of independent experiments, then we should have: Eni” nf 0. Mre= Hn” Eng... Eng" Tr; 12 rk = a Ly a hy = ‘ = SB eee EMO NEM WOM iy ty Oey hg oo Seas te Bid® Pil «-- Bight hi=1 ho= hke=1 (2) Passing from the mathematical expectations of the numbers of repetitions to the mathematical expectations of frequencies, we find : Epi = pi \ Pe en ek ; Epi? = pi + pil — pi) | Ep? = pe + Spe (1 —p;) + = pi(1 — pi) al — 2pi) r (@) 6 1 Eps =pi+ wee (P67) Ne pi (1 — pi) (7 — 11p:) 1 : + 573 Pi (1 — pi) (1 — 6p; + 6p?’) SS r-1 Ep;* = = — h=0 1 h é yn di os (= Oy, nty Ben tf Dil corres (8), Au. A. TCHOUPROFF ay 1 Epi pj = PiPj— Hp PiPy Ip Y 9 1 . i Epi pj = pi pj + 7 Pivi (1 — 3p) — Fe Pi (L — 2pi) ig 7 y 3 2 EDi Pj Pr = Pi Pj Pra— Pi) Pr + ys PiPi Ph er eee 1 Epi? py = pi py t+ ape Dy — 2pi) + 779 Pips CL — Opi + 11p?) 1 ~ V3 pi pj (1 — 6p; + 6p,?) Epi" pj? = pipe + VPP; (pit pj — Spi pi) + x x pipj (1 — 3p; — 8p; + 11p; pj) — - pi pj (1 — 2pi — 2p; + 6p; pj) Epi pj Pu = Pe Pj Pa + yi i Pj Pa (L — ie )— WR v3 Dis Pa (3 11%) ai = G 1)f- Ons, ry—-hy Ure, re—he sparred vee fr-hy—hy Pi. * P5* : (7) where the summation for h, extends to all positive integer values from 0 to the smaller of the numbers f and r, — 1, and the summation for A, to all integer values from 0 to the smaller of the numbers 7r,—1 and f—h,. Substituting the values of Hp/—4 pj” in the development of ri 1, E (pi — pil (pj — pj)" = = 3. (—1)ht4 0,4 C,,2 pb pj Epi) p/m, we find after some rather tedious transformations : Pi+12-2 1 form-) \ Ul / ~ E (pi — pi)” (pj — pi = > Ne = f=Ent. (nee) : h,=0 S—h, (or 2-1) Ss (—1)f-h-’e Noh pr-by" VY" — Di "DP; (")) (75) Ari, 71~ha Ary, ro—he me h.=0 - (16). | Cy Sera Ns sak a a ee 1 —1 ™-1 +r,-h,-h.-1 ogre = a (Le ata ry hy Ore, rahe | 2 chy yy tons Bry tre—hy—he,f-hy—hg pir pyr In the general case we have: ts ee 4 Y - E (p's — Dis)” (Pig — Pig)!” ++ (Dig — Dig)” } Mtr+.. +72 pf orn) f-h (or re “VY fom hy Hy Cr DY = > = eS | ane > ae NTT ANAL WN h,=0 hy=0 h,=0 i Mit i) x (— 1 fam ham hy Bar ee lke Tee gy” He ee a >(17) tthe CR re eg (rn) ¥ Oy). a eee | | Arp, Th—-hh Br,+.. Are —hy—.. hy, Fahy. — he 1 rm—-1 r,—-1 r-1 r1—h, r,—h | ee > fs (- iat Sd aaa aay pe rytret... r= i Nr Ey, =0.n,=0 “Iy=0 ” % x Te ee Any, re—hh Brit ary —tn= iets Mt AT hy 200 Kapectation of Moments of Frequency Distributions If we agree to put A,,,=1 and C,-1=0, then if »,+7,+...+7, = 2r, the first term can be brought to the form (cf. Introduction (28)): 1 r(or 7,-1) r—h, (or 7-1) r—hy—hy— ...— hy) (or 7, - 1) ; ie hy p: - eS Se ars S (—1y-¥p pe aN h,=0 hy=0 hy=0 (18) Qh, hy 2h ‘ gi Oa Oi Oe Aig, Ate, * Ai, Br ve and when 7, + 72+... + 7% = 27 +1, to the form (cf. Introduction (29)): iI r+l(or 7,-1) r+1—A, (or 7-1) r+1—h,—h,—...—he-, (or ry) \ a. Sa > (<1)tnE pe aN Fhe hy=0 ~ hp=0 : | ry—h, hy, 2h 2h x p;, ape iY NH. C. 5 OB ry Atay + An, pel oP RY | > (19) 2h, 2h, 2h,-1 py2h; 2h + OF Ot Ang + Ate Breit O,” OC," 0, * Anns | x Aj, 0° one 0 ee ote. | y2h, as 1 py2he-1 | ace C nee C. Ay he, Ae ro Aye, pags 2. rH, 0f | Noting that, in sd aghite ann (3), m ped , , 1 rYaVeor) Eps, Dig ++ Pig = EN ™ Pig Pin ++ Pi = Pi Pig * Diy = (- De a Bn n (20), we find, on the other hand, easily : E (De =e) a — Pi,) + Die — Piz) ke 1 k-1 h 3 (— 1) (ell 21 = Pi, Pig ++ Baa pa Wh V* Bin = Diy Dig ++ Pit > a By, sn—k ep =0 k=Ent. @) Hence: AL / , , 20 94, » E (p's, — Dis) (Pig — Pia) +++ (P'ig — Dig) = Pin Pig +++ Pig [ net wh | er : 15 130 120 | LD) eas (p is — Pig) = Pig ++ Dig \- Wet N34 me vf f (22). (210 924 720 E (P's, — Pi) + (pa - Bi) = Pig +++ Pig Vy ~ “N +o II (1) Noting that = m,.(v)= BX ns =H | Pj 6] : we find k k Ms, (Ny = E 13 De ee, epi acy é,| j=l A=l Gh k k = > a Ep? + > > Sr &. Ep’ eae j=l ts Je k = a ‘a lpr 3 whi a-p)|+ 3 by Pe? , Sar [Pa Pig — yh p.| ie 2 1 k il = pe Pj e} + NV rae Pj EP - V 3 > iby = m? i [1% _ (|. Au. A. TCHOUPROFF 201 In a similar way, the other formulae deduced above may be obtained (Chapter I, § II). (2) Noting that ke My — My, = & (pj — pj) &?, v1 we find: E (m’,, — M,y,) (m’,, — M,, \ Ee eeetihine a s , Ts 2) -E\% > (pi — pb, *+ 2 = (pi, — i) Pin — Pra) §;, ei. | das Ai=1 Jo) R Ae a ee | ; es ag ae il M25). cee ae ae ert eae Sete | ene ty j=l 3 Ly a P| Weicia, is Br N Par | I 1 een a, | = 7 Merrt rs — Mery Mery} Similarly we find: E(m’,, — M,,) (1, — Mz.) (10,4 — M5) i ( : : f ( 24), Site Ge es Lee re Mrstry Mg, + Mpa pry My, | + 2M, My, Mrs) Ee (a, — Mp.) (M0 pg — Mg) (1 gg — My) (Mp, — My,) \ IN? = ae We ir ge ep (Up ap — 1, ee | + ae — My, Mrs] (Mertrg — Mo, My,] t [Meyer — Mr, Mrs] [Meoysry — Mery My, ]} Sam ae y 4 y ai N3 Ur trotrs+ra [itp ep acere Moy + Mr trotra Mrs + May trytry Mr, + Megrrotry Mr, | + Mp trg Megerg + Mri41, Merging + Mp, 47, Moree, } and hence, or directly from the formulae of § I: ? 1k EK (m, —m,)P= yy Liter —m,"| 3(N— \ E a 17, \3 1 Ks 9 3 | ‘ Y (My, — mM, = RE [Msp — 3m, m, + 2m,*| (26). , 1 E(m, — m,)!=— We a My”)? + Ne [sp — 4irdsp My + 3M?,, |} | J) Biometrika x11 14 202 Expectation of Moments of Frequency Distributions 1th (1) Denoting by o’ the difference X ,y)— 1m, and by dy,’, the difference My — My, We find k r v, = = p; [& — Xu) = = p; =e (— 1)" C2 (& — mr" w* | =H = C20 peat Ce OF fh pa OO Miges te | (27). = bp + dp, — Cp" poy — Cody’, + Ceo”? br-2 | ) a CZo” dp’ ps = Cw" He—3 Cio'dp'ys +... W. F. Sheppard, in his well-known investigation “On the Application of the Theory of Error to Cases of Normal Distribution and Normal Correlation*,” terminates the development at the third term, taking Vo = Mp Fp, —TO fig Ay hiceiecs-cecacece se (28). Hence : Pf Ue te = py, / = / =d , ae , Vp — Ve = Ve — Bye = Ofy — Vz @ , E (ug — vy) = B (0! = pp)? = Eh (dp YP — 20 pp Hood py’ + 17? Ho” 1 - 2, ee = WV (Her — Mr’) — NV Tha Mra + yl 7-1 Me : , = oy (Mer — Be? = 20 Mrs Brat Ms o- We thus obtain, with full accuracy (cf. Chapter III (13)), the first term of the development of #'(v,’ —v,) in powers of af This is explained by the fact that N : eee the terms rejected by Sheppard in the formula (27) do not yield terms of order vy in E(V, —V,). Owing to the same circumstance, we also get accurately the first — term in the development of £ (1’,, — v;,) (vr, — Vr,), Starting with (28): Ev", eat Vy) C= ious ) = E (os ca Mr) Ce _ borg) y =F [dus TT bry wo | [dp's, — 12 Mr—1 o'| 1 (29). NV (Pretes May Pry 1 Pry Pag 12 Prt Merge F 1% 2 Bry Mre-1 a * Phil. Trans. A, Vol. 192. Au. A. TCHOUPROFF 203 But we cannot start with (28) in the calculation of the mathematical expec- tation of (y,’ — v,), Just as we cannot obtain the further terms in the development of E(v,’—v,)? in powers of 1/N. For these purposes Sheppard’s method must be put into a slightly changed form: more than three terms must be retained in (27). In the calculation of the terms of order 1/N?, we must at the same time rely on the formulae of § III of the second Chapter and on the following relations easily deduced from them : need) , 1 Ko diy ame NV Pra Do [onsen (30), Ho * = WN Be ] , / VA 1 \ Ew du ry du 2 Ng [Mritron 7 Pry Br. Pry Mr, | Bho 1 | oo"? d py = We [Mrse — Mr Me] be mccin Sabor (31), 1 Ko’ = a7, KB: Ne"* j \ y / , , 1 Ew’ du 7 UM ry Up'y, = WN? [oryta Mates + Maret Meyers + Brgti Mr, T Pay Pre Mrs 7 Pry Mreti Pers 7 Bry bre Prt] | 1 Ip NB [Mrtretrett ioe (Mrytry+1 Brg + Mrytrget bry + Prgtrgtl re) | faa (Mrytrs Mrs+1 a5 Prytrs Prot mF Pry+i Mrstrg) +2 (Mrs Mrs Mrs + Pry Prot Mrs + bry Mee Mrg41) | jv "9 / / I ¢ ° ody ry dey, = WN? [ery try My + Ura Mryti — Mery Mery po] 32). 1 ar N3 oraress a (Mr te Pre te Bry Mry+2) | ( — (Cry trg Ho + 2pryss Mrs) + Zor, Mr, Me] | ip 3 I Eo* dp, = Ne Mert Be aF W3 [Mrs — Mr Ms — Borys Me] | J : > ae | : Kw" = ye be a Ww: [Hs — 3p."] 72 (2) To get the exact value of terms of order 1/N? in the development of E07, — br.) (Ur, — ry) IN powers of 1/N, it is necessary to start from tf Ve yf f / if dle <—_ 1) 19 Ve — by = (py — Ppp, @) — (re dp’ — set My—2 @ ? PT ee, r(r—l1)(r—-2 ” +(” 5) 2 dag OE) og ct). 14—2 204 Hapectation of Moments of Frequency Distributions After some simple transformations we find: AW fal : ' 1 \ rp, (v Tite Hr) (v yee Mrs) = N {Prytrs ae Br, Pr area Pry Mri mee lae Pry Mr-1 | FT Ys Pry Pri be} 1 (1+) (1+12—1) ar N2 \- (71 + 12) Mrytr, + nl D) ~ Prytre—2 Me 2 Maryse Prg—2 + ee Mry—2 Prg+2 1 [-3] ree o(7: ‘laa ail) ry 1 i ee +7 +12) Myr Mra FP (11 + 12) Mya Bret — “Mt Mro~3 Me (33). = tr bry Prot By, ty (7, ste Y2 alg 2ry7 2) Bry Pr, Ty ee 1 (7 —1) = ner) (3r, + 2) Pr, Pr —2 be — 1) a (Br, + 2) bry» Mor Me | —3nr, (7+ 12) Mri Pr Me — 41 (od te 1) Pry bre—2 M3 aa ee (7 — 1) 12 Pro Marga a + en" rfl Pry Pre—3 pe + tr r, Mry—3 Pra He” + 3n(m-I rir 1) fry -2 Brg—2 ue Slaels | Noting that i Ce = Vr,) (x, ay a) jal E(v',, a i) (es ae bry) _ (vy, =; br) (Vr, - ee) j ) 1 =H 1 Pr) (Vr, — Hr) — ig Lit. Pry Mr, — 21102 (Ge) Pry Pre—2 Me —4r, Gee) Mr —2 Pry M2 + ree (ee Wie Cay 1) Mr —2 Pr--2 ue] "OG we find hence : 7Y / / 1 \ E (v 1 Vy) (v tT. Vy.) = N ereers —12 Pry Prg-1 rami Pry Myo+i | = bry Pry F172 Pry Pra fo} I (7, +72) (+72, —-1 + ie \- (71+ 12) Mrytr, + (n 2 ve!) Mry+r2—2 M2 + $12 (12 — 1) Prt Pergo t $11 (7 — 1) Mra Prete +1 (Ti+ 1) Mrysa Pry F 11 (M1 + 12) Pry Prt. — E71) Mgt Mry—s Me | Sala! bry-3 Prt fot (i +Tet+ 112) Mr, Pre = +1)a,¢,>1) ee Py,—2 ha 11 (7, — 1) (2+ 1) Mry-2 Mery Me | Bi. (Ty + 12) Mra Mri He — 2 BMT. (1% — 1) Myr Mrg—2 Ms (34). Bea WW (ert + Be Alek 2 ani” 1) 13 Mry-2 Mpa Hes + Err Pry-1 Pro—3 pe +47! 11s bry—s Pry-1 2 +47 (71-1) 72 (72-1) Mee Mrye ust te cies | Au. A. TCHOUPROFF 205 Putting 7, = 7,=7 in this, we find (ef. (13) of Chapter ITT): 7 2 1 9 9 } Ev, — vy, = WV (Mop — 20 pps pa — fl? + 7? pra fa} | : | Hapa (= 20 Mor Fo (20 = 1) pope Ho + PP — 1) Mrgs Mone + (35). + ir? pgs ra — TO) ppg fps Me +7 (7 + 2) pb = 20 (7° = 1) pp Mpa Me — 89° pp flo — 7? (7 — 1) fra fro Bs | +79 (7 —1) (7 — 2) ppg Myais a? + 30° (7 — 1)? ppna pee} + .s- (3) To find accurately the first term (of order 1/N*) in the development of TE Oe, = fer) Y rg = Pag) vg = Pra) we must start with , , , , hh r(r—1 ie Vy — Py = (Apr =r fy") = (re Opt p — 9 =) jty-a00). Using the relations (30), (31) and (32), we find without difficulty : Ki, cy Hr,) (V'r, =a oe) (Vp, oa. ) 1 7 Ne Jensen a [Hrtre Mrs + Mrytrs Pro + ry Pro+ry| a5 2Mr, rz rs Taz ["; Bry (Mrytrst1 ~ Prov brs — Pre Lrs+1) Fy by (Mrytrgtt ~ Pry brs — Bry Mryti) +3 Mri (Mrytrgt — Pry Pry, 7 Pry Mro+1) | + P72 Mr Mir (Hg 42 — Pry Po) + T47"3 Pry Mrg-1 (Mirgt2 — Herp He) + 127s Pry Parga (Mryg2 — Pry Me)| = 1172's Mya erga Mery Bs | — Us (Mast Bretng aH Mires Prze rg + Pry Myre — Peryta Pre Mery = Mery Prat Perg—a — Mery Mery Mrs) ar Ue) (Mrs Pro+r3-1 a0 Brg Prytre—-1 2 Pr. Mry+r3 in Pry Pry brs = Pry Marga ryt — Pry Mery Pry) Ey (Maser Mrytrea + Meret Mrytry—a + Mery Mrgtrs — Bry Berg Brgts ; = flry—1 Pret Pry — Pr, berg Hers) | FPS Mra (Mrstrg—a Ma + fora ir erg — Pir Pry—1 Pe) + oP 5 Prs—a (Mrytrg—1 Pa + 2Mry ts brs — Pry Merg—i Me) + (36). FP Ts bry (Marg pry—a My + 2 pry Margi — Mrg—1 Pry He) F123 Mga (Mery trea Mo + 2 pry borg — Pry Pry—i Mo) ETT: My Mary try Pe + fly Merge — Pry Pirg M2) FT 3 arg a (Par pry 1 Mo + pry Mrgta — Pry Pry M2) | — [87 rer; (Mry—1 Mers—1 Mrs Ma Mra Pre Mrs—1 Pe + br, Pro rs Hy) | + [E13 (7s — 1) pyyo (Mryery Mat pr yaa Pret — Pry Pre Hs) FST (%2— 1) Mya (Mrysirg Ma + Qtr 41 Mga — Mr, bry M2) + 511M = 1) pa Meets Mo + Derg ts Mrgta — Pers Hr, He) 206 Kapectation of Moments of Frequency Distributions — 3 [drirs (%3— 1) bra Merger Berge Ha + $7273 (13 — 1) Pry Pr. Pr3—2 Ke | ar r T2 (7 7 1) Pry Pre-2 Pr3ti M2 ats a1 ("5 aa 1) r, May Mry—2 Prs—1 Ke +én(m—-1)” Mr,—2 Mrg—1 Mgt Hot 31% (1 — I) 75 Br —2 Past Mr3-1 He] + 3[$rrers(7s — 1) pra Pere Prs-2 Me + $1717 (%2— 1) | X 13 Mra Pergo Pre He + OP) (CRS nae Mry—2 Pry—1 Mr3-1 ra +... | Noting that EE (Up, — Yq) (U'rg — Yrg) (Y'rg — Yrg) = EY ry = bry) (Org — ber) (V'rg — Mg) — {(Yp, = pry) EB (0'rg — bre) (U'rg — bra) + (Ung — Mra) x E (COPS = re) an ae Mrs) aF (oy = brs) BK OT a ) (DG. oat H+,)} +2 (Cs a Mr) (op, ral ry) (Vr, ai brs) (37), X [reves — Pre Pry — V2 Prg—t rg — 1s Mire Mrs + T27°3 Mry—1 Prg—1 M2] + [12 Mr — $72 (%2 — 1) fyg—o Me] [Mears — Mr, Pry — 12 Pry Brg 13 Pyysa Mga + 117s Mp, Pry He] +([r; brs — $73 (73-1) Mry—2 po | [Hotes 7 Pry Pry 7 17) Pry Pret = 1 ory Mg + 112 Mri Mry—1 Mel} +... we find hence the first term in the development of E Ga a Vr) (Chies, i Vee) (vr, Si, Vrs) \ | =i in) (1, — Pra) (55 airs a (lr Mr, — $71 (7% — 1) Hr,—2 Me] | | | ) in powers of 1/N. Putting 7; = 7, =7;=7, we find: , i ; : yr Hy) al Ww {Har — 39 for Bra — 8 (7 +1) Ber by +3r(r — 1) poy Mrs Pa = GF Popa Pergr + 69? papa Mya fo + 37? Ppgo Mpa + Q0 (+1) Mpg fr Mra (38) +86 (7 — 1) pegs fre — 992 (7 — 1) ppg pa bya Mle + (3r at 2) (OR = ar (r ar 1) ber” My—2 be — 97? ET) Me Bip Bo 8 pa Ms $7? (PD) pep Mpa Me} + oe ¥ r / 5 3 E (ug — vy) = EB (vg — fy) + NM: [rp —4r (7 — 1) pp fe] X [Mor — 2P fra Mya — ee +e? Mra fa] +. | 1 7 WN? {Mor — 8F Parsi Mra — Slop bp — OF Mop Mrz | (39) +69 (7 + 2) frga Me Mra + 87 (7 — 1) wy bye — 6r? (7 = 1) Myr Mra Mra Mo + 2p? — 3r? (2r a 3) Br pa fe — 79 pa Ps + 3r° (r —1) ra Pyr-2 p."} +... ) r + 67? plop Mra Me + 89? ppye pa | Au. A. TCHOUPROFF 207 (4) The first term (of order 1/N*) in the development of E(u’, 5 br,) (Cae = bry) Cie = fu) (Ce a br,) in powers of 1/N is obtained exactly from (24). We find: E Ge ag br,) (Ye ba b+.) Oz, a brs) Ca aa br) \ | ar WN liner ar Bry eA Deretrs a brs je ar [eepecers age Bry Ts) reer am Hrs, | | ar [Prins — Bry Lr, | [Mro+rrs — Pre brs] = [Ty hry Pret Mestre F Pry Pre tiry + ryt Meets — Meret Pry Mery | — Pre Prati kry — Pre brs Prt) He Mry—a (Maryn Barges + Pry Prytry + Moret Mrytrs — Bryti Pers ery — Bry Pr Bry — Pry Pry Prgti) + 1s flrg—a Maya Mretre + Prats Mryery + Bren Payers — Pry Pry Pre = bry Mery Mery — Mery Mery Brat) | Hs Pog (Met retry + Pret Prytrs + Mrs Pree — Prytt Pre Peg . (40). = Pry Berga Miry — Pry rg ryt) | +[rire Pera Pera (Margery Ba + 2Mrgsr Morya — Mery Pry He) FLT 3 Mery Mga Merry Me + 2biry ta ryt — Pry Mery M2) FT Parga Mergaa (Margy Ho + pry a Margi — Mere Mery Pe) F273 Perea Pers (Pry tre Me F 2p Marya — bry Mery M2) F127 5 Parga Marga (ry srg Be + 2s Pry — Pry Perg M2) | FST Pag Morya (Mary try Pa + 2M ryt Pret — bry Pry Po) | | — 8 [7 172% 3 Mry—i Pry—i Pry Mryt BeF M124 ry Pra Mery Mry-1 Me | FPS 4 Mrmr Prt Mrg—a Mra Ma F234 Mery Mrgi Pry Pry fe] | + B72 374 fry —1 Pry —a Herg—1 Pry He ]} +... Noting that EE Up, — Vp) (Org — Yr) (W'rg — Yrg) Org — Pre) = EB (V'p, = Bary) (Ur = Berg) (rg = berg) (Ung — Pera) Oy te) Bs tO IO ee teed ; Ug re) Uy [he De [lg CU oy — fixe) + (pg — Pry) EE (V'ny — Bry) Org — Per) O'rg — Ber) + (Peg — Pry) Evy = fry) rg = Bere) (rg — brs) } + (Ye, — fry) (Yrs — Bry) E (rg — Pers) ng — Bry) + (Vig = bry) rg — Berg) B (Ung = berg) (Ung = bers) siz (vy, — Pr,) (Vr, — r,) E CH — fry) Ce aa berg) +r, = pera) Wry — berg) 2 U'ey = Bey) Ug — Be) + (Pp, — fre) (rg — Berg) E oy — por) Org — Her) + (rg = Hrs) rg — Berg) 2" = bry) (Ur — Here) = 3 (Vp, = pry) (Ura Mra) Yrg — Mirs) rg — berg) 208 Kapectation of Moments of Frequency Distributions we see that the terms of order 1/N? in the developments of i O., i i) (ie. 3 fire) (he a fry) (15, os ty) and Ee (vp, — Up )U pe Vp) ig, Ve) VG — I) coincide. Putting 7, =7,.=7;=7=7, we find: yi / 1 « 9]9 6 ; wD) (v, a v,)* a WN? {3 [Har ms Pek ae 12r -r-i [ or Pra 7 Pro eel + 69? pep [or He + 2p ngs = Per? Ho] — 120% pong Wea Pe + B74 why 3 po} +o. \ (41). 3 9 9 9 9 2 = We [Mon — Par — 29 peygs fra +7? wpa PoP + «-- The same formula (41) gives also the first term (of order 1/N*) in the develop- ment of H' (v,' — p,)'. (5) In the general case if we agree to denote Veg) (UY ope rs) oe Yee re) OY es and (o'r, — Pry) ("rg oi br) set (V'n, a 7) by 8p, we have: : ae i , op Edy = KdOy — & (Vr, — bry) Dae. xe hel V rn — Brn 6 +> (Urn, os Pry,) (Yr, — Pry,) B | a) Ge Hives h=1 hyg=h, +1 3 (y th, AT tin remains We see that, in the developments of Hd™y and Hé*y in powers of 1/N, the first terms (of order 1/N*) coincide, since g 6p > 7 — (re, = ay) E , n=l U th erp ; 1 : oi contains no terms of order Ne On the contrary, in the developments of Hd?) y and #é"'* » the first terms (of order 1/N**) are different, for t 6 Git) yp 7 > (Un, ~ Pr,) LS = i= 1 LE tea contains terms of order 7 One SEs eens WNT Formulae (18) and (19) of Chapter IT permit us to calculate Hy and Hd" y in general, to an arbitrary degree of accuracy, in the same way as Hd"y, Ed®v, Ed v were found above. The actual expression, however, is of so un- wieldy a character that I shall limit myself to the calculation of the first term in the development of /'(v,’ — v,)*, coinciding with the first term in the develop- ment of (v,’ —p,)*. In the calculation of the first term of the development of # (v,'—j,)” we may take Ve — Pe = du,’ mem ao, Au. A. TCHOUPROFF 209 and limit ourselves to the calculation of the terms of order 1/N*‘ in the development 27 E (dp, — rp, o' ! = & (— 1) Ci, 75 pi, E (du, YI 04 .0...... (43). j=0 In formula (19) of the second Chapter put 7; =r, =... =7;=1, and UGE | 1 iy eee =H y= 5 . we find for 7=0: B (dpeg P= pV. B 5 oo. QA) [pap — pe] one for 7 — ll c Roane a eee Z E (dp, ) ae oO = Ni oe Oban (21 — NY tary Fes —p,’]! AE de (44), When j = 2h: E (C/T wa wl : ‘S : , 20 ast —2h—-2] l (or h) Ses oS 18d. (2) = 2h — DE —1) Ce y aN Ps Da yy af (} ) Rae 7 Ie Oe Ole 2e— 1). 13-5 2.20 — 2h — 1) vet ORY 2 iy = vi Se ae 1=0. f=0 27 [4a —Ah]-Ahl j_-¢ of -Mh—-na (—f)iQfyi or Pry h, Be Ni ag h(ori-h) 2°9 ines hi-a [i —h Fp, oe x >: —— op — fe eel eet eae g=0 (29)! [Har — Hr] ) When j= 2h +1: vi (Gf Near @ ti \ af =~ ¢ Q = 5 ¢ l-f of h—-f x OF [Qh]-°1.8.5... (Q0-2f—1).1.3.5... (2h 2f-1) py a | J i-h-1 21-41 2i-2h-21-2 =a 2 (11.8.5... Qi 2-2-3) OS, \ | Etoun) ft ets , ; | fae a (Qh e Ne rLh 865 e227 Val .B25...(2h—2f—1))| 0} oar . t— fe ole lis to, Peay fo 1.3.5...(2h+1).1.3.5...(Q0—-2h-1) HEP EM ( N' i=0 f=0 27 [a —h—1-AhCA yp optt 2i-2n—v-91 n= Cereus a i Wes 2h gal) 153 be..(4— k=) Ni he(or i= h=1) 2 pO AM [i —h — VM 4 x = 7 ed pee 2 i-h-1-g 4 ae 2 (2g E 1) ! [Mor Lr ] \. (46). 1 eg —————— 210 Hapectation of Moments of Frequency Distributions Substituting in (43) we find, after suitable transformations : 173.5 22(26—1) E (vy oa Dy) a Wi [peop lig ee Te, rel 2r Bra Heel Sige (47). : E ni pee Noting that ———7— [E@,—r,y] tends with increasing N to the limit 1.3.5...(27—1), Vp — Vy [# Ca = vey [E (v,’ —v,)? 2t-+1 that the law of distribution of the values of V,’ tends with increasing N to the Gauss-Laplace law. and the ratio tends with increasing WV to the limit zero, we see lies ee E (us ,; tends with Comparing (47) with (14) of Chapter II, we see thi increasing WV to the limit 1. In the case in which the law of distribution of the values of the variable X i (v' or oe : SEG or Por)? ’ tends with increasing N to the limit 1, for every positive integer r. But BY ots Pata tends even for a Gaussian distribution of the values of XY to E(w or-+l Porsi)” the limit ditferent from unity: follows the Gauss-Laplace law and po. = 0, for r= 0, 1, 2, 3,... VB 256 (27 1) ~ (Qr +3) (2 aD). . (47 +1)" Corrigenda to Part I, Biometrika, xu, pp. 140—169. p. 142, Eqn (2) for az; NI-*) read a,,;NI- 4. p. 142, Eqn (4) for A‘0* read A‘Or. p. 147, 1. 19 for gy41 read gx_1- p. 151, last line Eqn (11) for m,.(w) read my (x). p- 156, Eqn ( mete fp—1 Pea py_j. p. 157, 1.6 for = cep) read v, = (X;) — my. p. 157, nas re Proc. Imp. Renae Had Mém. Acad. and under Chebysheff refer to t. 11, p. 478, especially of ae ae edition of his works. p. 160, 1. 8 for Me (ny 7 Ms Se p- 160, last Eqn of (11) for D”® read Dae r, (m) r, m—2° ud v! p. 162, throughout this section y of author’s MS. has been printed YX. ple nied) 72 p. 163, last line of Eqn (15) insert ml—*! after ‘, and in 6th line of Eqn (15) for ml-?] = read mt~ 3], after ~ a5 p. 167, 8th line from bottom of page for En; read Hai dp. p- 167, 2nd line from bottom of page for Ay'"fv "fy °"ht-2%-3) read aa a oJ p. 168, 2nd line from bottom of page for A") read again K eg AN EXPLANATION OF DEVIATIONS FROM POISSON’S LAW IN PRACTICE. By “STUDENT.” In her paper on the Poisson Law of small numbers, Biometrika, X, p. 36 et seq. Miss Whitaker after a very interesting analysis of the various attempts which have been made to test Poisson’s Law on actual statistics concludes that “A general interpretation based on a very simple conception seems needed for those demo- graphic cases, in which the law of small numbers appears far more often to correspond to a negative than to a positive binomial.” The following is an attempt to explore the general question of what effect various departures from the conditions which lead to Poisson’s Law have on the resulting statistics, and especially which conditions lead to positive and which to negative binomials when the exponential might at first sight be expected. Poisson’s Law has been applied to the occurrence of different numbers of individuals in divisions of space or time: thus of yeast cells in squares of a haemacytometer, of deaths from the kick of a horse in Prussian Army Corps which may be taken as individuals occurring in divisions of space, or of suicides of children per year in Prussia which are individuals occurring in divisions of time. In such cases it has been asserted that if the chance of an individual being found in a given division is so small that when multiplied by the very large number of individuals the product is still a reasonably small number, then the frequency of divisions containing 0, 1, 2...7 individuals will be given by the terms of the exponential m2 r Neo I 1h SEA aa 2 ec ; \ 2 I" where V is the number of divisions and m the mean number of individuals occurring in a division. For the above to be true it is necessary (1) That the chance of falling in a division is the same for each individual. (2) That the chance of an individual falling in it is the same for each division. (3) That the fact that an individual has fallen in a division does not affect the chance of other individuals falling therein. 212 Explanation of Deviations from Poisson's Law As to these three conditions (1) is seldom or never true. I propose to show that this is generally unimportant; unless the chances of some individuals falling In a particular division are relatively high the Poisson law holds; the tendency however is towards a positive binomial. Next (2) is comparatively seldom true except in the case of artificial divisions. The result of this, as Pearson has shown, is that a negative binomial fits the results better than the exponential. Lastly (3) is often untrue. It will be shown that if the presence of an individual makes another less likely to fall into a division the positive binomial, but if more likely, the negative binomial will fit the figures best. We may start from the fact that if the chance of an event happening be q and of its not happening p, then the chances of its happening 0, 1, 2, etc. times in n trials are given by the terms of the expansion of (p+ q)", viz. Dos Deeg: Dim: = CUC, As the moment coefficients of this series about the zero end of the range are v, = nq, VY. = npg + nq? whence p. = 1pq, the binomial is completely determined if we know y, and p, for 2 2 v pal q=l-p=1 = andj Vv) Vy q VY, — be and in particular the binomial is positive (i.e. n and q are positive) if Fe =) and Vy; negative if fs1, Inthe particular case when f= —1 the binomial becomes the Poisson exponential. It is therefore unnecessary to deal with higher moments than the second for the purpose in hand. Let us first consider the result of each individual having a different chance of falling in a given division :— Let the chances of n individuals falling in a given division be qi, q2, Ys +++ Yn+ The chances of their not doing so are therefore (1 —4q,), (1 —q@), (1 - qs)... 1 — qn); and the chances that 0, 1, 2... of them will fall in that division are given by the various terms of the expansion of {(1 — qi) + qm} {C1 = qe) + qo} (C1 — ge) + gg} (eee ) {1 = qn) + An}, i.e. by (1-4) (1 — q@)... 1a) +8 {1 (1 — q)(..-) (1 Qa) +8 {mq — qs) aris (1 er Yn)} Fives +S {41.9295 pan Gr ae Grit) Bon ( — Qn)} +... +9293 +++ Yn» the term S {qq93— (1 — Gri)... (1 — qn)} giving the chance that exactly r individuals will fall in the division. STUDENT 213 The sum of the above series is clearly unity so that the Ist and 2nd moment coefficients about the zero end of the series are given by two series of which the rth terms are TS (Q192— Yr (1 — Geer) -»- (1 —qn)} and 1°8 (gigs... Ge (1 — Gear) «(1 — Gn) } respectively. These series may be summed by rearranging them in the ascending order of the g products thus: Siig (l= 92)(1—¢;)-..(L=9n)} = 8 (1) — 28 (Gigs) + -. + (— 1) r.S (9, 92.+-9,)4 --- 28 {qi 92(1 -—93) 1 —- qs)+ on qn)} = 2S(11 qe Ng rat ak (mel Som ol —1)8(qa@.. Ur)... ee ee re iS CCE oo It al ~ ts) 9100 (1 as Qn)} =tS (hq see qt) + tee =a (= r.[jrol S(c yes fe—Ip—t Wass Gy) ass Slalal elo oie (6,016 oje/6-6. O18) e600: 86 bbe 0 .0:6.0' 6.0 0118 6.0.8. 0 8°80 0 610: 8 bio 8 6.6 \0e/ 0018 810 € ai 0 ale Sin Sn sis oie 08 06 6.68 0:90 8 Vivieve eee GOTO none 07, Cll Gp) sos L1G) = wae ea hea ows ang ar eesg se PS (GiGass Ope cass Adding these we get on the left v, and on the right S(q,)+ a number of terms of the form r(1 —1)"" S(qiq@--. g,) which accordingly vanish and we get v, = S(q). Tn a similar manner it can be shown that Vo — S (M1) +28 (U2) and other moment coefficients about zero can be found in the same way, but we are not here concerned with them™*. If 7, g? are the mean values of g and g?, obviousl q; Y y PEO) V1 man eaer ee eee o nn wae Mes ee ae (1), and ¥=S (qi) + 28 (qq) = S (qu) + (8S (q)}? - S(g") SG MTeg? — Te ats gk ees ostin etn, (2), = nq + n2q? — ng? — Uy care erent ve aca (3), Le flo = NG — NG? — NO” ng 1-g-7") ae. Bere! nq ( qd ee ee (4). * The moment coefficients are : Mg = NPG — Ngf2, M3 = NP (DP — q) — 38n (DP — q) gta + 2N gus, ba= npg {1 +3 (n— 2) pg} —n {7 +6 (n— 6) pg} quet+12n (Dp - q) gus —6nqug+ Bn? gULr?, where 2 etc. are the moment coefficients of the q distribution and p=1 - q. 214 Explanation of Deviations from Poisson's Law If now the distribution of chances is to be represented by the binomial (P+Q)*. Then OS ie ng — | — 07/9) Vy ng 2 (om SG ae i oth ne deeds oO ee 5). q 7 (5) Since the original q’s are the chances of events happening they are always positive so that the above expression must be positive and the binomial positive. If now we introduce the Poisson condition that ¢ though positive is negligibly small (5) becomes in general zero, for o, 1s usually of the same order as g, and in that case Poisson’s law holds in spite of the inequality of the original q’s. If 2 Og - . . however —! is appreciably greater than zero (as in the extreme case i) h=3@=Gh=--=™r=9 when =" 4): the distribution of chances is to be represented by a positive binomial. Next we have to consider the effect of disregarding condition (2), namely that the chance of an individual falling into it must be the same for each division. Let us suppose then that the q’s are all different for each division so that ng is also different. Then writing m for ng and m, m%, ng? for the means of m, m? and ng taken over all the divisions. We get from (1) Py = M1 couatigileseatcap tee ae ou renee eee (6), from (2) Vy, = ™ +m? — ng = MPM opt ig? Bilin: ove eee (7), fg = oes = Ng © ie ied sine Maes (8). As betore if (P + Q) is the best fitting binomial, Q ar be = ng? ag. Cm VY; m Hence if o,,2 > ng’, which if there is any appreciable variation in m is probable, since as explained above nq? is generally negligible, a negative binomial will be found to fit better than the exponential. Clearly condition (2) is usually not fulfilled in the vital and demographic statistics; divisions either of space or time are generally governed by different * If we suppose that q does not vary with the individual but that nq (=m) varies with the division, the moment-coefficients of the m distribution being written ,,u, then the moment-coefficients of the resulting distribution of divisions are as follows : oi m + mb2 ’ Mg=m+3 m2 + mH3 9 Mg =m + 3m? + (T+ BM) poe + 6 bs t+ pla. STUDENT : 215 environments which will vary the chances of an individual falling into them, and so we may expect that as a rule negative binomials will occur in place of the exponential. Finally, suppose that the presence of an individual in a division influences the chance of other individuals falling in that division. Clearly it may do so either by way of increasing the chance or diminishing it. If the chance be increased it is clear that we shall get for the same mean number of individuals per division a larger number of divisions containing high numbers of individuals and a larger number of zero divisions. In other words, for the same mean we shall get a larger Standard Deviation, so that p,/v, will be greater than 1 and a negative binomial will fit better than the exponential. On the other hand, if the chance of other individuals is decreased by the presence of one already in a division p,/v, will become less than unity and the best fitting binomial will be positive. The first of these two cases includes linking or clumping of events or bacteria, the second such a thing as the counting of large cells on a haemacytometer whose divisions are comparable in size with them. We have now shown that a population which might be expected at first sight to follow Poisson’s law (1) Will do so if the only deviation from the ideal conditions is that the chances of different individuals falling into the same division are not equal, as long as these chances are all small. (2) If in addition to this the chances of some individuals are large a positive binomial will fit the results better than the exponential. (8) If the different divisions have different chances of containing individuals, as 1s usual, a negative binomial will fit the results better than the exponential, except in so far as (2) may interfere. (4) If the presence of one individual in a division increases the chance of other individuals falling into that division, a negative binomial will fit best, but if it decreases the chance a positive binomial. Generally speaking (3) is the operating deviation from Poisson’s conditions and accordingly most statistics give negative binomials. Finally I should lke to point out that the object of my original paper (Biometrika, Vol. Vv) was to give the user of the haemacytometer a guide to the error which he may expect from its use, and that the net result was that the probable error of his count was ‘6745 VN where N was the total number counted* and that if V be a reasonably large number tables of the probability integral may be used, otherwise the exponential (or better still go on counting). This result is not affected by shght deviations from the Poisson law, any more than slight deviations from the normal law affect our use of the probability integral tables. * Biometrika, Vol. v, p. 355. The probable error of mean is -67454/m/M where m is the mean and M the number of unit areas counted. If in this we put M=1, then m=N and the total count is N+:6745\/N as above. THE CRITERION OF GOODNESS OF FIT OF PSYCHOPHYSICAL CURVES. By GODFREY H. THOMSON, D.Sc., Armstrong College, in the University of Durham. CONTENTS PAGE (1) Introduction —. . i : ; : : PAG (2) Peculiarities of Ps syehophy sical Data : 218 (3) The Constant Process, a Process for fitting a Nor ral Game to Das SH Uncentred Tails. : : : ; . 220 (4) The Probability of a certain Gitevers of Heder : . : . 221 (5) Pearson’s Criterion of Goodness of Fit. : ; : . ; . 223 (6) Numerical Example . ; ; 3 : . 226 (7) Urban’s Incorrect Method of comparing Goédness of Fit. . . . 226 (8) The Probable Errors of the Cr iteria of Fit : : 228 (9) Summary of the Rules for. testing Goodness of Fit ef Psy evaaicnl Curves : : : : : ; : : : : : . 229 Appendix . : : : : 5 : : : : ; , . 230 (1) Introduction. The object of the present paper is to inquire what is the proper method of examining psychophysical curves as to their goodness of fit. In psychophysics various mathematical processes are employed for fitting theoretical curves of “ogive” form* (known to psychologists as psychometric functions, but really error functions), to data of a certain kind, usually threshold measurements collected by the “ Method of Right and Wrong Cases.” The best known of these mathematical processes 1s the Miiller “Constant Process,” using the probability integralt. To make the material in which we are about to work understandable, it is necessary first to go into some detail as to the nature of the experiments which supply the data to be fitted, and as to the theories which have led to such mathematical curves being drawn through these data. Most of the experiments in question have for their object the determination of the conditions of our experiences of equality and difference. For example, suppose we compare two weights, one of which is 100 grams, by lifting them in succession by the right hand with a number of experimental precautions, into which we need * The term in this connexion is Galton’s. + G. T. Fechner, Elemente der Psychophysik, 1860; G. E. Miiller, ‘‘ Ueber die Maassbestimmungen des Ortsinnes der Haut mittels der Methode der richtigeu und falschen Faille,” Pfliigers Archiv fiir die ges. Physiologie, 1879, x1x, pp. 191—235, especially par. 5 et seq.; G. KH. Miller, Die Gesichtspunkte und die Thatsachen der Psychophysischen Methodik, Wiesbaden, 1904, par. 11; F. M. Urban, ‘“‘ Die Psycho-. physischen Massmethoden,” Archiv fiir die ges. Psychologie, 1909, xv, p. 287; G. H. Thomson, ‘‘ The Accuracy of the ¢(y) Process,” Brit. Journal of Psychol., 1914, vu, p. 46, and in various text-books, e.g. Titchener’s Experimental Psychology, and W. Brown’s Essentials of Mental Measurement. GopFREY H. THOMSON Zaye not here enter. We wish to know under what conditions the unknown weight appears lighter than, equal to, or heavier than the standard weight. An important condition is of course the “actual” weight of the unknown weight, as measured in the usual manner. But this is by no means the only important condition, The order in which the weights are lifted (whether standard first or unknown first); the number of categories into which our judgments have to be classified; the order of succession of the several unknown weights, whether rising or falling or at random ; the range over which the succession of unknown weights stretches, whether or no it contains any which are quite easily distinguished from the standard; all these and many other conditions are of great importance. Steps can however be taken to eliminate some of these factors, by means of judicious experimental precautions, and the attempt can be made to keep the others as constant as possible during a series of trials. The judgments which are given by the subjects then depend mainly on the difference between the standard stimulus and the variable stimulus , in the case of our example on the difference between the standard weight and the variable weight. Among other points of importance in the fitting of the curves is the possibility of deciding by means of the goodness of fit whether the experimental conditions have really been kept as constant as has been hoped, for lack of constancy in this respect will lead to heterogeneity which will show itself by the necessity of using a compound curve to obtain a good fit. To fix ideas, it is desirable at this point to have an actual set of data to refer to. In some very carefully conducted experiments on weight-lifting, Professor F. M.Urban (op. cit.) found that, with one of his subjects, under certain experimental conditions, the standard weight being 100 grains, the following numbers of answers heavier were returned, out of 300 trials with each of the several unknown weights. It should be mentioned that the experimental method used involved that the unknown weights were presented to the subject in random sequence, accompanied each by the standard, so that the 300 trials referred to were not one after the other, but were separated from each other-by trials with the other unknowns. Otherwise expectation and other psychological factors producea considerable correlation between one judgment and the next, which is reduced to a minimum by Urban’s procedure. Moreover, precautions against fatigue and several other factors were taken. For the details the reader is referred to Urban’s memoir, with the warning that much of the mathematical part thereof is incorrect. | - coon : ae | Grams s Answers heavier Proportion p | | ae 7 out of 300 ‘0233 88 8 out of 300 0267 92 35 out of 300 "1167 96 | — 107 out of 300 3567 | 100 | 183 out of 300 “6100 104 265 out of 300 "8833 108 | 279 out of 300 _ +9800 It is to numbers such as these that the curves to be considered are fitted. Biometrika x11 15 218 Oriterion of Goodness of Fit of Psychophysical Curves Any suitable curve which happened to occur to one might of course be employed. For example, a parabola of higher order can be used, and the curve tan~ @ has also been used. But clearly the whole experiment suggests that an error function of, some sort is wanted, and as early as 1860 G. 'T. Fechner (op. cit.) suggested that such numbers formed the integral of a normal or Gaussian curve. One usual argument is somewhat as follows, using for clearness terms applying directly to the above example. The existence of a hypothetical point is postulated, called the limen or threshold for the judgment heavier, such that above this point the subject always returns the answer heavier, and below it he always returns some other answer, not heavier. But this limen is supposed to be fluctuating from moment to moment, either really or apparently, owing to changes in the physical, physiological and psychological conditions of the experiment. If at one moment the answer heaver is returned, for the variable 96 grams, then at that moment the limen is below 96 grams. Later the answer lighter, or the answer equal may be returned for 96 grams, and at that moment the limen is above 96 grams. The values p in the above table will then be integrals of the distribution curve of this men. (2) Peculiarities of Psychophysical Data from the Point of View of Curve Fitting. The problem of fitting a distribution curve integral to such data, say in the first place the probability mtegral, has certain peculiarities which differentiate it from many biometric curve-fitting problems. Usually, when we are required to fit a normal curve, we are given the data in histogram form : That is, a number M of direct measurements is made, and m, are found to fall into a certain short range, m, into another adjacent range, and so on. To fit a Gauss curve requires the mean and the standard deviation, and these quantities can be directly found from such a histogram, Sheppard’s adjustments being used if necessary. Quantities analogous to our proportions p can be formed from such a biometric histogram, viz. : VO ipole Po = (m, + m.)/M, Ps = (mM + m, + m;)/M, Ce ee i i ee rd GopFREY H. THomson 219 and vice versa, quantities analogous to the m’s can be formed from the proportions p of the psychometric experiment, viz. : m = p,M, Ms =( po— pi) M, Ms; = (ps — po) M, In the case of our example we should have : Below 84 granis, 7 cases, mM = 0233 84— 88, rh ae 0084 S6=-199 4s OTe em 5 9900 92 96, 72 12400 96-100, Gross 12533 100104, 82, » 2738 104-908 +. 14,” , 0467 Above 108 _,, 21, » 0700 Totals a 300 _—,, 1:0000 There are however important ditterences which make the analogy inexact from the curve-fitting point of view. In the biometric histogram, if any one of the cells m; is larger than it ought to be, then any other must have a tendency to be smaller than it ought to be. There is a strong negative correlation between the numbers in the cells, a correlation, that is, from trial to trial. In the psychometric pseudohistogram however, formed from the proportions p, this is otherwise, because the p’s are measured quite separately from one another. In the biometric histogram the m’s, the numbers in each cell, are necessarily positive quantities. In the psychometric pseudohistogram they may be negative, if the p’s do not rise steadily. In the biometric histogram the actual range found in a trial is as a rule known, that is the points where p is zero and p is unity are known. In the psychometric case these points are as a rule not known, and there may be psychological reasons why extreme stimuli (such as would be required to find these points) should not be used. In our example we do not know whether the subject would have given no answers heavier at 80 grams, or whether at the other end he would have given only answers heavier at 112 grams. When we do know these points, or can assume them, in the psychometric case, we can fit our probability integral by forming the pseudohistogram, and calculating the mean and the standard deviation as though it were a real histogram*. This has been suggested by more than one writer, in England by Professor C. Spearmant, who does not however point out the difficulty that it cannot as a rule be done, because the points p=0 and p=1 are not known. * The actual arithmetical formation of the histogram is unnecessary if a summation method of finding moments is employed. + Brit. Journ. Psychol., 1908, 11, p. 227. 15—2 220 Criterion of Goodness of Fit of Psychophysical Curves In biometri¢ language, the problem is to fit a normal curve to data for which the “tails” are undefined as to range, although their areas are known. This problem was solved by Miiller (op. ct.) as follows: (3) The Constant Process. Call the stimuli Sis Say (S3500% Sis and the proportions p Dinar Disteetins then we have n equations 1 h(s—S) F | Eda =O soa ct const eee age een (1), Deo Vor / 0 to find the mean S and the precision 4, We retain for the present this form of the integral as being more familiar to psychologists. The more modern form would have the standard deviation instead of the precision as the second unknown. These equations are slightly inconsistent with one another. No pair of values Sand h will exactly satisfy all 1 equations; instead of giving zero they leave small residuals 2,. Miiller assumed tacitly that these » equations if based on the same number of experiments each, are of equal importance or weight*. We shall allow this assumption to pass for the present but shall return to it later. If we now make the usual assumptions of the Method of Least Squares, we can take as the best values of S and h those which make Y (v2) a minimum, where the summation is over the n stimuli or n equations. The conditions that this should be so are = + (v2) =0 for constant S SRS er i. (2). ag > (v2) = 0 for constant h WO Unfortunately, the n equations however are very far from being simple and linear as in usual applications of Least Squares. To avoid this difficulty we look up in tables of the Probability Integral (which psychologists call Fechner’s Fundamental Table) those n values of which correspond exactly to our n values of p. These equations are not yet linear in S and h, but if we write Cia seeuiace diner esate Saleeaeea 1 (4) they become y= hs -OS05 eee Neate soins eee (5), * There is unfortunately a possibility of ambiguity of language here as the word weight also occurs in the particular example we are using as illustration, where weights of 84 grams etc. are employed. Goprrey H. THomson DAG | and are now linear inh andc. We have now n linear equations in which y and s are known, h and c are required. If we insert any pair of values / and ¢ into these n equations they will leave residuals v,. If we were now to proceed to make > (v.2) a minimum, this would not effect our purpose. It is = (v,°) we wish to make a minimum, not (v2). If however we can find multipliers or weights M such that each (VOOR. 28 ncaa line ae eee eae (6) we can then make > (Mo,?) a minimum. That is, we can apply Least Squares to the equations (3), weighted with certain artificial weights M. The use of this device is Miiller’s particular credit in this connexion. Clearly the residuals v,, which may be regarded as errors in p, are connected with the residuals v,, which may be regarded as errors in y, by the equation == eV v, =%, Vor from equations (1) and (3). Therefore M = e-*¥'/ar. Herein we can omit the 7 since it is only the relative values of the Miiller weights which are of importance. These weights are given in most works on psychophysics, e.g. W. Brown, or Titchener, op. cit. The condition that ¥(v,2) should be a minimum has now become, that ¥ (M.”) should be a minimum. With this substitution, the Normal Equations (2) give [Ms?]h —[Ms]c= [Msy] —[Ms]h +[M]c =- [My] the square brackets being the sign of summation used by Gauss, and still persisting in psychophysics in this connexion. The summation here is over the n equations. Thence we have _ [Ms] [Moy] — [My] Ls") [LM] [.Ms?] — [As [AT] [Ms] — [Ms] [My] = EAT) EM} — [ise Poi (8). ee ES eles ho [M][Msy] — [My] [Ms] h (4) The Probability of a Certain Category of Judgment. The Constant Process remained in this form from 1879 to 1909. It is very much mixed up with the psychological method of experimenting and colleeting the data, so that frequently the name “Method of Right and Wrong Cases,” really the name of a certain method of collecting data, has been used to include this mathe- 222 Criterion of Goodness of Fit of Psychophysical Curves matical process. To avoid this mental confusion I have elsewhere* suggested that the two words Method and Process should in psychophysics be consistently used in the way in which they are employed in the above sentence, viz. Method of collecting data, and Process of calculating. Frequently the Constant Process has been called the phi-gamma method, from the use of the name phi-gamma for the probability function. In 1909, F. M. Urban (op. cit.) suggested alterations to the Miiller weights, or rather suggested the necessity of another set of weights in addition}. These alterations arise from the notion of comparing the judgments heavier with the drawing of black balls from a bag containing black balls and white balls. The analogy is in detail as follows. (1) From a bag containing black balls and white balls 8300 drawings are made, one at a time, the ball being returned each time before the next drawing is.made. 107 black balls are observed out of the 300. (2) A subject on performing a certain experiment with weights sometimes gives the answer heavier, sometimes some other answer. On one occasion, when the weights were 100 grams standard and 96 grams unknown, this experiment was repeated 300 times, with due precautions against fatigue, ete, and the answer heavier was returned 107 times out of the 300. Now if p is the observed proportion (here 107/300) of black balls in a bag, then the probable error of p is known to vary with Vp(1—>p)t. With the same sized sample, a result p=°5 has a larger probable error than a result p='8, say. If anything similar holds, as the analogy suggests, for the psychometric experiment, then the n equations (1) or (5) are not equally reliable, even although based on the same number, 300, of experiments cach. In addition to the weights M they need other weights to allow for this new variation in reliability. The combined weights M/4pq are known as Urban’s weights, and a table of these is usually given in psychophysical textbooks alongside the ordinary Miiller weights. Urban discusses the matter at some length in his already cited article, and a discussion will also be found in Wirth’s Psychophysik (Leipzig, 1912) where on page 151 the actual scatter of various p's 1S given in a diagram. * Brit. Journ. Psychol. 1912, v, p. 203. + There are many errors in the article of Urban’s quoted. See my articles in the Brit. Journ. Psychol., 1913, vi, p. 217, and 1914, vir, p. 44. But these errors, though making Urban’s conclusions in that article invalid, do not touch the point here raised, in which I think Urban’s suggestion marks an advance. + Really the true values of p and 1~p should be used but this is the best we can do. And further, the expression, probable error, ceases to have an accurate meaning when p is too close to zero or unity and the distribution in consequence is very skew. But these refinements do not matter at this stage of our argument. Goprrry H. THomson 223 Replacing in equation (7) therefore the weights M by the new Urban weights P, Urban found in the present instance S = 98:24 grams, fe OTTO. That is he represents the proportions p theoretically by using the hypothesis that the “psychometric function,” as psychologists call it, is given by 1 0°117995 (98°24. — s) if p= I ee er OTA eee (10). The theoretical values p’ thus calculated are compared with the actual values p in this table. Grams p jo Difference x | 84 0233 ‘0088 | +°0145 88 0267 | 0438 —:0171 92 ‘1167 1489 — 0322 96 “3567 | B544 +0023 100 6100 | “6155 — 0055 104. “8833 | “8319 +0514 108 “9300 | 9483 —0183 | The object of the present paper is to make clear the proper methods (@) of as- certaining, in all such cases, whether the theoretical numbers are a reasonable fit to the observed numbers, or not, and (b) of comparing the fits obtained by different hypotheses, that is by different error functions. The psychologist would express this by saying that he was comparing different psychometric functions. To the statistician the comparison is one of error functions, the natural procedure being to try first the normal curve, then members of Pearson’s family of curves, then compound curves; the conclusion in the latter case being that the material was not homogeneous. This work I have as a matter of fact already carried out, and have come to that conclusion; but it is beyond the scope of the present paper, which hopes to interest psychologists in modern statistical methods, and statisticians in modern psychology. (5) Pearson's Criterion of Goodness of Fit. This problem, of comparing the goodness of fit of curves in psychophysics, although it has not as far as I am aware ever been correctly performed, is really very simple, and could be handled at once from first principles. For the sake how- ever of showing the connexion with other work it is advisable to treat 1t as a special case of the application of Pearson’s Criterion of Goodness of Fit*, which is in brief as follows. * Karl Pearson, Phil. Mag., July 1906 and April 1916. 224 Criterion of Goodness of Fit of Psychophysical Curves Let OME Copa Conment dont be a system of deviations from the means of » variables whose standard de- viations are G1, 82, G3, --- Tn, and intercorrelations Tio) M135 1o35 vee Pain: Then the frequency “surface” giving the frequency of occurrence of each possible combination of «’s is where V=S, (=e ) + 28, (48 Pati) ek Sole ae (12). and Ry, Ry, are the minors corresponding to 74, and rz. S, is a. sum over all k’s, and 8, 1s a sum over all pairs kl other than k= When x? has been calculated, a probability P can be found, from Table XIT in Pearson’s Tables for Statisticians. This table is entered by n’=(n+1) and y?, and gives values of ie ye e 2X dy pS | yn e 2X dy ) (ee eta 22S eaay that is, P is the probability that a random sample of as bad a fit as the data, or worse, would be obtained from the theory which is being tested. The kind of data for which this criterion was first invented was data in real histogram form, of the kind called in earlier sections of this paper a biometric histogram. When the data are of this form, Pearson has shown that equation (12) reduces to the very simple form , where m’ is the theoretical value of m, and e is m — m’, and S indicates summation over all the cells of the histogram. Psychophysical data of the kind here con- sidered, however, as has already been pointed out, are not really in histogram form. Although a histogram can be deduced from them, it is only by making certain assumptions, and the intercorrelations of the cells of this artificial histogram are ditferent from the intercorrelations of a natural directly observed histogram. GoprrrEy H. THOMSON 225 It is not correct therefore to use equation (15) above. It is more accurate and withal exceedingly simple to apply equation (12) direct to the p’s. Since the latter are independent, all the intercorrelations r are zero. Therefore FR is unity, Ry, is unity, and Ry is zero. Equation (12) therefore becomes x= 8(5) Ce ee ter ...(16), and as the distributions of each p will be binomial in form provided the experimental conditions remain constant enough we have C—O) | fe casa aston doed sae sohies ees Chi where « = the number of experiments on which p is based, and p’ = 1—q’, so that 2= § 4) ear re one ee 18 he Gin (18)+ Herein the ’s are the differences between observed p’s and theoretical p’s. The probability P is then found as before. * Tf we look upon the judgments heavier, as suggested in an earlier paragraph, as being comparable with drawing black balls out of a bag containing black balls and white balls in the proportion p’ and 1-—p’, then the probable error of p is ‘67449 J ® mae , & being the number of judgments of which pu are of the category heavier. For the chances of obtaining 0, 1, 2, ... 4-1, or w black balls in a drawing of u are given by the terms of (p' +4)", q' being 1-p’: that is, the chances of obtaining _0 1 2 w-l be “i je fee Vb we p The s. d. of the above binomial is Vup'q’ and the s. d. of p therefore Eig ee : Me a + Compare Professor K. Pearson on ‘‘ Goodness of Fit in Statistics and Physics,” Biometrika, 1916, XI, pp. 239—261, especially p. 257. - We can check our equation (18) by treating the matter from first principles, and not as a special case included in Pearson’s formulae. We have, from this point of view, n quantities p which are independently measured, and n quantities p’ which are theoretically given. The variations from p’ are binomial in form, that is, they are approximately Gaussian. The probability of an error Bi , T= Pr-Pk = pace : Mb Onl’ is therefore w, = Se Ei Jia Men eee Masieenacmem ort snine ant na Verrs (a). The probability of the whole set of observed values p,, pz, p3, ... Py occurring is the product PW OER ern eae aae GRE CAc earn nit tr Ce Or er CER Onmee ee Tee Tee (b). Write this z=me 2% , a) Then Y= 8 (4,) Pq from equation (a). 226 (6) Numerical Example. Criterion of Goodness of Fit of Psychophysical Curves Let us apply these formulae to the example already cited. The calculations are carried out in the following table. as denominators of the terms of Ne The theoretical p’q’s should be used, clearly, Bo Ng | pig | a? | x?/p'q! 0088. 9912 | -00872 » -00021025 | -0241 ‘0438 | 9562-04198 | “00029241 0070 1489 | “8511 12673 ‘00103684 | -0082 3544-6456 = *22880 “00000529 “0000 ‘6155 = *B845.—«| 23665 = 00003025 ‘0001 ‘8319 1681 | °13983 | -00264196 | -0189 ‘9483 0517 | 04903 =~ -00033489 ‘0069 = | — | = — 0652 = 8 (?/p’q') | be The number of experiments was the same for each p, viz. 300, therefore px ~=S (a) = 300 x 0652 = 19-56%, Pq The Table XII in Pearson’s Tables to find P has to be entered with x? and n’ =(n +1), where n is the number of variates, here the number of p’s, i.e. 7. We find there j—3, 19) 0081S i — 2. aye 20 0005 (0 It is unnecessary, with data such as we are here handling, to interpolate elaborately. Clearly, for y? = 19°56, P is of the order foie That is to say, in only seven cases in a thousand should we expect to get our present observed p’s from our theoretical p’s by random sampling. It is therefore not at all probable that the equation (1) truly represents the “ psychometric function ” for this subject and this reaction. (7) In the article from which the above example is taken, Professor Urban was inter alia desirous of comparing various hypotheses of the “psychometric function ” among themselves. Those which he fully works out are (1) the above assumption that it is the integral of the normal probability curve, and (2) the assumption that it is an “arctan.” curve (tan 6), (It is needless to point out surely that the latter hypothesis is in itself most unlikely; however, we are here concerned with an Urban’s incorrect method of comparing Goodness of Fit. empirical comparison of the two hypotheses, and it is important that the method should be correct since it will be necessary to compare other and more likely theories, as for example Pearson’s curves.) * Compare Appendix. Goprrey H. THomson 227 It can now be shown that the methods which Professor Urban employed in comparing these two hypotheses are incorrect and inadequate. What these in- adequate methods are can best be shown by continuing the above example, which is taken at random from among Urban’s material. We have already found the squares 2® of the differences between theory and observation in the case of the normal integral, or as Urban calls it the $ (y) hypo- thesis. They are given in the table just above, and S (a2) = 00455189. We now proceed to form the analogous quantity in the case of the arctan. hypothesis. Grams _ | Observed p p | 4 x? 84 0233 ‘O795 — 0562 00315844 88 0267 “1086 — 0819 00670761 92 LUG T 1682 — 0515 00265225 96 3567 *3259 +°0308 | ‘00094864 100 “6100 6464 —:0364 | -00132496 104 8833 8222 +°0611 | *00373321 108 ‘9300 “8872 + 0428 00183184 — — —— = 02035695 = S' (.v?) 1 Urban now compares the ¢ (y) hypothesis with the arctan. hypothesis by comparing (00455189 with °02035695, and deciding that as the former is smaller, therefore the ¢ (vy) hypothesis is superior. This procedure is firstly maccurate and secondly inadequate. It is inaccurate because not S(a?) but S(a/p’q’) should be compared, and it is inadequate because no idea is given whether the observed difference is significant or not. The former point deserves a little more examination, because it is another form of an error which Urban was the first to correct, in this same article. In the form of the Constant Process as it left the hands of G. E. Miiller, certain weights are to be used on the observation equations. These weights may be called Miiller’s weights. Urban pointed out, however, that they needed amendment, and published (loc. cit.) a table of weights to replace them. These weights differ from Miiller’s by the factor 1/4pq, which arises in Urban’s treatment from an application of what he calls Bernoull’s Theorem. It is these very Bernoulli weights, 1/pg, which Urban himself has omitted in his above comparison of the ¢ (vy) and arctan. hypotheses. In order to discuss the inadequacy of his comparison we need a measure of the probable error of the quantity P used above, 228 Criterion of Goodness of Fit of Psychophysical Curves (8) The Probable Error of x? and P. We have 7 = ws + Fie, (from eqn. 18). If the accurate values of p’ were known, the variation of x? would be due entirely to the variations in the observed values p. In point of fact, of course, the p’s which are available are themselves functions of the p’s: but like Pearson in his 1914 article on the probable error of a coefficient of contingency *, and for the same reason and with I think the same justification, we shall assume that the p’s do not vary. Then the mean square deviation 2(p—p')? pen ay = 1S |o,*( 7 ) = pS A = Sy? ecco 19). ‘ "\ wg "BY 2 Therefore the probable error of y? calculated in the way suitable for the Constant Process and other processes for fitting psychometric functions 1s 674bo. 0 = 1849 Wy? acticin ee (20). In the case above where y?= 19°56, its probable error is therefore about 5-9, so that we have NOG Eo. We must next find y? and its probable error for the arctan. hypothesis. The calculations are partly carried out above in finding S(a?). Completing them we obtain the following table : p q' q'p' ed x?/p'q 0795 "9205 ‘07318 00315844 0431 “1086 8914 09680 ‘00670761 0693 "1682 8318 13991 00265225 0189 3259 ‘6741 *21968 00094864 0043 6464 3536 | *22856 00132496 "0058 8222 ‘1778 =| 14718 00373321 0254 8872 1128 | 10006 00183184 0183 — — — a 1851=S («?/p'q’) == Soe Car D4 Probable error of x2 = 1349 Vx? = 10-0. For arctan., x? = 55:53 + 10°0. For ¢(y), x? =19°6 + 59. Difference = 35'°9 + 11°6, where 11:6 = V10-02? + 5°92. * Biometrika, Vol. x. GopFREY H. THOMSON 229 The difference is therefore three times its probable error and is just significant. The final conclusion is therefore that in this particular case the arctan. hypothesis is just significantly worse than the normal integral or ¢ (y) hypothesis, but that the latter itself is very improbable. The P of the normal integral hypothesis it will be remembered was ‘007. The P of the arctan. hypothesis can be found from Table XII of Pearson’s J'ables. The entry has to be made with n’ =7+1=8, and y?=55'5, and we find given P = ‘000000, i.e. it is less than 0000005, showing how very im- probable the arctan. hypothesis 1s. The probable error of P is discussed by Professor Pearson in the Phil. Mag. for April 1916 and he shows that the standard deviation Cie AR cE pO Magid rar eane htewsw ented ants (20), and using equation (19) we get Cpa teen lend ry, INOUE Case” vas wciaeraiec.3o5a0e0 (22). It must be borne in mind that n’ =n +1, where n= number of variates. In our case therefore, n’ is one more than the number of stimuli. P,,_, 1s similarly obtained from Table XII of Pearson’s Tables by entering with the column with heading one Jess than the number of stimuli. For the above ¢(y) hypothesis we have ye 1 9'6, Number of stimuli = 7, P or P, = ‘007 approximately, i — 002 ; op=(P,— P,) x = 005 V19°6 = 022, Probable error of P = 67450, ='015. Therefore for the ¢ (y) hypothesis the criterion of goodness of fit is in this case P=-007 + :015. It is most improbable, therefore, that P is at all large, and the fit is significantly a bad one. The probable error of P for the arctan. hypothesis is too minute to be found from the table. The calculations we have performed have been for Urban’s Subject IV (heavier answers). For his other data similar calculations can be carried out. The arctan. hypothesis is usually worse than the normal integral, but not always significantly worse, and the normal integral itself is an atrociously bad fit to the data in most cases. » (9) Summary of Rules for Testing and Comparing Goodness of Fit of Psychometric Curves. Let there be » stimuli, and let Dis Diss oe Dn. be the theoretical frequencies at these stimuli, and ; Pis Po, Ps ++ Pn the observed values. Let Pan Le n-ne fl, 230 Criterion of Goodness of Fit of Psychophysical Curves be the number of experiments at each stimulus. Calculate x’, the sum of the quantities c eae Fe ga ss! Then in Table XII of Pearson’s Tables*, in the column n’=n+1 and the row x? (interpolate if necessary) find the value of P, the probability of obtaining the observed p’s or a worse set, from the p’s by random sampling. The probable error of y? is here approximately 1°35 and the probable error of P is approximately 6745 (Paii— Pra); where P,,,; is P itself, and P,, 1s the value found in Table XII by using the (x —1)th instead of the (x + 1)th column. APPENDIX. What value will be obtained for xy? if, in the example used above (normal integral hypothesis), we were to proceed by first forming a histogram and then treating this histogram as though it were an ordinary directly observed one, Le. using equation (15) above? The cells of the histogram will be occupied by the quantities m= dp x w (observation) or m’ = dp’ x w (theory) where 6p is the change in p from one stimulus to the next and w the number of observations at each stimulus, here the same throughout. p op | p dp’ ee 5 = dp — 6p’ e? / 2 e?/m' | | -0233 | ‘0088 ‘0145 ‘00021025 | -0239 ‘0233 ‘0088 0034 | ‘0350 ‘0316 ‘00099856 | -0285 ‘0267 ‘0438 ‘0900 ‘1051 ‘O151 ‘00022801 | +0022 ‘1167 | +1489 2400 | ‘2055 ‘0345 ‘00119025 | -0058 -3567 | +3544 ‘2533 2611 ‘0078 -00006084 | -0002 ‘6100 ‘6155 | | | | 2733 2164 ‘0569 ‘00323741 | ‘0150 | *8833 *8319 ‘0167 1164. | 0697 ‘00485809 | -0418 ‘9300 | 9483 0700 ‘0517 ‘0183 -00033489 | -0065 = 4 | = = _ — = 1239 =S (e?/m'p) whence x? = 300 x ‘1289 = 37:17, instead of the proper value 19°56. If the calcula- tion is performed in this inaccurate way, therefore (by analogy with data which are really in histogram form), a very wrong idea of the closeness of fit would be obtained. The reason, as stated above, is that the correlations between the cells of the histogram derived from an ogive with independently measured p’s are not such as to lead to equation (15). * Tables for Statisticians and Biometricians, Cambridge University Press, 1914. ON CORRECTIONS FOR THE MOMENT-COEFFICIENTS OF LIMITED RANGE FREQUENCY DISTRIBUTIONS WHEN THERE ARE FINITE OR INFINITE ORDINATES AND ANY SLOPES AT THE TERMINALS OF THE RANGE. By ELEANOR PAIRMAN anp KARL PEARSON, E-.R.S. Part I. Non-Asymptotic Curves. (1) We have in recent practice found the importance of full corrections for the moment-coefficients in the case of singly and doubly curtailed blocks of frequency such as are indicated in the accompanying figure. It has not been adequately recognised that even the mean of such distributions is not correctly obtained by grouping at the midpoints of the subranges h, and merely finding the mean of these concentrated groups. Still less is this a correct process in the case of the higher moment-coefficients. The practical statisticians, aware possibly of the exist- ence of “ Sheppard’s corrections,” have been warned that they are only exact for the case of high contact, and regarding this have in their doubt neglected all corrections whatever. Now Sheppard’s corrections are still valid when there is no high con- tact, and they should therefore always be used, but they form only part of the full correction * and may indeed merely amount to some 50 °/, of its value, although 75°/, is a more usual average proportion, if the frequency block does not end in finite ordinates. We propose in the first part of this paper to deal with frequency * In certain cases although part of the full correction they are in the wrong sense, and therefore if used alone would be worse than the raw moments, 232 On Corrections for Moment-Coefficients blocks such as are indicated in the figure above, reserving for the second part the treatment of the corrections needful when the frequency curve asymptotes to the frequency axis, Le. the cases of J- and U-shaped frequency distributions. The general treatment of non-asymptotic frequency blocks will follow the lines of pp. 282-8 of the paper: “ On the systematic Fitting of Curves,” contributed in 1902 by one of the present writers to the first volume of this Journal. (2) The method there adopted started from the Euler-Maclaurin formula : oT dZ hh d&Z’ 12" Te 720 da? he a, (2 —%)* . as (e= ay tt oh Dole mele Bens ANS Ties Sie ae in the neighbourhood of # =a), and -_ b, (Lp — wv) b (Lp — x)? b; (Xp = x) by (Lp — x)" bs (Xp = =) ed Ge hp 21. We Be ee Ae eer mas in the neighbourhood of w = ay. These lead at once to de | de, pee 2 ( ae = Nafh and ae. =N(-lytihe ees (ii). Exactly as in the earlier memoir we shall determine the a’s and b’s from five frequencies adjacent to the terminals of the range. In many cases, however, ELEANOR PAIRMAN AND KARL PEARSON 233 e.g. deaths in infancy, disease incidence in infancy, wages, incomes or house-valua- tions, we have details for the ends of the range on different subranges to those for the bulk of the curve. These modified subranges will be termed h, and h,, and when either of them is less than /, we shall get more accurate corrective terms tf they are used instead of the frequencies on subranges h. At the same time it must be remembered that in calculating the value of the chordal area terms, sufficient of these hy and h, subrange frequencies must be clubbed together to give sub- frequencies on ranges h. het the frequencies on the first five subranges /, or h from «=a, be Nn’, Nn’, Nn;, Nnj, Nn.j, so that m/, ne’, ny, ny, n;) are proportional frequencies, then 3 Gh. Ob 0h, Oh ie tO oh a Sa N (1-n,')= W(1+ Seasesu ane nO Oe, SOs NO 0, i oe eT On ae ae aise a Sunilarly = n/ oe No, = a ee ts 92 4 a 23 +4 ii 9s a. = 95 = ny = ny — ng = FB + FB + 2 B+ Fi Bt 58 BF, —1y — Ny — Ns — nf = a 4+ > 4? + a 43 + +7; 4a 4 +e 4°, : — Ny =, — Ns —y —Ns = a 5+ a Be x 5a + A oe = 5, Solving these equations we find a, = — gy {187n,' — 163n,' + 137n, — 63n, + 12n,'},\ a= qy{ 45n —109n.’ + 105n, — 51n + 10n;)} dz3=— +{ 17n/— 54m’ + 64ns) —84n + Tg}, \ occ. (111) C= { 83m — 11ni+ 15ni—- 9ni+ 2n,/}, ds=- {| mr— 4ni+ Gni-— 4n/+ ns}. Similarly we find for the 6 coefficients =+ A, {187n’, - 1631’ pa t+ 137 1'p_s— 63n'p_3 + 12n'p_4}, i iz {| 45, —109n,. + 105’,_, — 51n‘p_3+ 10n'p_4}, ‘5 4 c { 17n',— 540,44 + 64n' »2— 340'p-3+ Tn'p-a}, poste (iv) == | op Ln, + -lon,.— 9,32 25-4}, b,= { n'y — Ang + Gn'po— 4 p-3t+ 1p, where Nn'‘,_,, Nn',_3, Nn'p-2, Nn',4, Nn'p are the five successive frequencies ad- jacent to the terminal # = , of the subranges /, or h as the case may be. Biometrika x11 16 234 On Corrections for Moment-Coefficients Since dZ/dx = — y, it follows that = Na, _ N (pays +6 / Oy, 7 WD, If / to = =F = Gg, (EBT! — W3ng + 137 HJ — 63m, + 121s}, | NO, Yi RAGA) Yo = :s = 60h, {137n', — 1680, + 1387n',_. — 63n',_s + 12a’, 4] These results enable us to determine approaimately the terminal ordinates of the frequency distributions given by sub-frequencies, and to discover how nearly the frequency curve comes to zero at the terminals of the range. Similarly the small- ness of the quantities a,, a;, d,, a; and b,, b;, b,, b; marks the character of the terminal contact. At the same time the reader must remember two points (i) that the terminal frequencies if small may be subject to large probable errors and (ii) that we have supposed y=0, when w=, and #=«,, the terminals of an integer number of subranges. It is extremely unlikely that the frequency curve would cut the variate axis ewactly at such places. Hence on both counts, (1) and (11), we must not anticipate in actual practice that a, and 6, will vanish at # =, and # =a, for non-abruptly terminating frequency, unless we know a priori the terminals of the range and have chosen our subranges to fit this knowledge. (3) The next stage in our work must be to table the values of aZ'/da, dZ’/da, ... PZ’ /dz', where Z’= Za" at the two terminals of the range. We may do this for s=0, 1, 2, 3, 4, 5. The theorem of Leibnitz provides the needful expansions which are = oe d?Z’ Bile Z é mk Z om — Bag pe a qa t 88 (8~ 1) at? + 8(s—- 1) (s—2) a 94, LZ’ OL c UZ —P LZ ae = ae + 98x°" ars + LOs(s — 1) a Le + 10s (s —1)(s — 2) a— a 7 +58 (8-1) (8 2)(8—8) a9 4 5 (9 1)(s— 2)(s—3)(s—4) > Z. Cf 22s (s—1) a ee + 35s(s—1)(s—2)as~ C2 +355(8—1) (8-2) (8-3) ae + 21s (s—1)(s—2)(s—8)(s— 4) aw 54 + Ts(s—1)(¢—2)(s— 8) (s— 4) (s— 5) a9 + s(s—1)(s—2)(s—38)(s—4) (s—5)(s—6) a7 Z, aL’ IZ v7 “qqe = 126s(s— 1) (s—2)(s — 3) as ae 5+ 126s (s— 1) (s— 2)(s —3)(s —4) # + 84s (s — 1)(s — 2) (s — 3) (s — 4) (s — 5) a*6 ae colt Sheena er (v1). ELEANOR PAIRMAN AND KARL PEARSON 235 Now all higher differentials of Z than the fifth vanish, and therefore we may d' Z' a Z’ du Z’ cancel the first two terms of aa and the first four of da? * The value of —— An starts with the term 462s (s — 1 (s — 2) (s — 3) (s — 4) (s — 5) as S a - and accordingly this and all terms beyond vanish for s=0, 1, ... 5, or this Ate ee of Z’ is zero for our purposes. We have now to give s in succession the values 0 to 5, and subtract the result: for the first from those for the second terminal : Role | poo la ee” ee oe |e sel: kak =(-07? mje 1) N, | Se lr=(- Geet ep (e-e)™ aie — +E?) +5 (p4- pa) ®, | Sar |” =0 if g>2 5 Oe cal = (- (oe v mf) 20) N, Se |r=(- (eet eer 6(h+8)) x So - CGE RE) e-em tes eh @ Cm m* aio 9a a if BZ’ vy a Ly? Xo? c Ly” Ly a RS ie — = (7 ar Cs ia) +9 ( 2 hy C ‘Fs a 18 (6.53 ie Do Oy “oN i) 6) N, Ly" (— (oi +55) + 15 (b, he it a — 60(2 7% + ae 7) + 60 (F.- -7))%, UZ’ |%p Ney Geechee. a ey (an TG Gy eee gee a eel 2 e | da? i ( ee Ge hy, a hyt h + za a hy! “x PI ZF’ |x, ee 16—2 agen tng 0) Secgtaten | aie AS > S ES BN eS Ss g Ss 236 On Co rrections for Moment-Coefficients as dZ’ Ly se / i Lo! ~ Seale Fak =(- 6, ie — a 5 — dad) W, a Vf a id on Ue Pale => fe u I h,? + As hh i) + 12 (5 i? — Ao a — 36 (6 pe ee ) — 240) eles Lhe 0 aN AZ’ |*%p a) i Ly” Ee IL = (- (3, 52 hy? - + As hs 7) +20 (6,52 hy! i slg Rh mt) - 120 (6.52 hy? + As =) +240 @ Hag 3) ~ 120 Ge f Ore he hy X b; wi) +840 (bs 7% Ss a7") -840(75 = 97" |%p HZ! |X, aE mela ne Saar | =O - go dara _ [dZ" |*» (ON Lee P De kan =(-b, ie — a Te — 5a‘) W, BL! Vin ty? a ala (- (0.7 a ji) +15 (0 Fe a, Fs) — 60 (6.5% +a #*) — 60m") W, h, — a| > &|N [amen a. 8 Il | Pa > cal asa E a ee is? Ly Lp ms pete . ) = 600 (7 +a, 7) a2’ 1% Xp? p Xy" = (A205, 2 Ee i ( B20 (0, hy? me he is) + ee (72 h,' -— ho ) — 4200 (5, 72 igo the ia) + 2520 (i AZ’ Xp by Ws Rak = (~ 15120 (8, 7 it i re) 15120 (rs : - 7) iu, d2qa+1 7’ x, aay = 0, q > 4... b, he 2 Ly on vig) #8 (hie ay 5 “) — 200 (0s a 4 120) N, “3a)) ® (vii). Ly (4) We have now to see the relation of the present integral | ” Za da to the Xo moment-coefficients. Ce by parts we have , ee Sie Es N Zatda = | ~— Penge . IE ae ES i a cea so eee ieee” where p’s,, is the (s + 1)th moment-coefficient about the arbitrary origin. ingly we have, changing s to s — 1, ; S:{% .. Le =a + WN oA hace Erte a tens Big pe este ee ae Xo Accord- ELEANOR PAIRMAN AND KARL PEARSON 237 Thus we can write 8 & id q és bs = 2 + WV Ci4- WV Te, where (_, is the “chordal area” term and J,_, is the limit term of the Euler- ~ Maclaurin series, or if 7, = # + wh, C2 —) {AZ a8) + Ziti LG tesa api Oh YT yS—1 a Zoas- he d® Yh »>S—1 eh [a wee we h WA) : Co- dx 770" de +30240 dar — hi (Zoe) (Za — 7209600 de’ * 47900160 dae |, We now turn to the evaluation of the chordal areas. We can obtain these by remembering that if ny be the frequency on the qth subrange h, %p Z, =| Yd = Noi + Ngta +... + Np. %q Thus it follows that p Os, =h S {has + (ay +h) + (ay + 2h) + 22. + (ay + (U— LT) hy} a, u=1 if we note that Z, =0. But the series coefficient of », can itself be summed by the Euler-Maclaurin Theorem, 1.e. h {kas + (ay + hy + ... + (a + (w— 1) hy} da fey da he a | ayotuh da? e 0 Ly +uh l (gs) B (as) — | ti dt — th (a tL Th ae. ae E he d (@ ') ee ira * C (a : ) x : + 30240 Xo 1 2 =—th(a,+uh)s1+ 5 (a, + uh) +(s — 1) ie (a + uh) —(s —1)(s— 2) (s—3) gd gh! (a + uh) +(s—1)(s—2)(s—38)(s—4)(s—5) gph" (a + uh) — ... 1 he — v; and vy,’ respectively — 072,818, —-005,040, +:021,378 and + 004,769, while the corresponding Sheppard’s corrections are 0, — ‘083,333, —°257,378 and — ‘817,830. Thus we deduce for moment-coefticients about stump: With Sheppard’s Raw moment corrections Full corrections vy 1:029,513 1:029,513 956,700 vy 1°693,994 1°610,661 1°605,621 ve 3°883,416 3°626,038 3°647,416 vy 10:974,937 10°157,107 10°161,876 These values are in working units ="5 actual units. Hence in actual units we have With Sheppard’s With full Raw moments corrections corrections True values vy *514,756 vy 514,756 py’ “478,350 ‘478,8131 vy, -423,498 ve — ay 402,665 pe’ 401,405 *401,4837 Vs 485,427 v3 —4vy ‘A538 250 iret *455,927 *455,7714 va 685,934 vy —tve t+5ho 634,819 pea 635,117 *634,7360 It will be seen from these results that our full correction values for the moments about the stump are in every case accurate to 1 in the 1000, while, if Sheppard’s corrections only are made, we may be out nearly 1 in the 100. The change in the mean, second and third moment-coefficients 1s very noteworthy. In the case of the fourth moment we are out ‘0004 in °6347, while the Sheppard’s correction alone is out only ‘0001 in ‘6347. The cause of this irregularity we have not been able to detect, although we have examined carefully the whole of our arithmetic. It seemed accordingly worth while inquiring what differences would occur when the moment-coefficients were taken about the mean and not about the stump. With Raw moments Sheppard’s With full about mean corrections corrections True values Vy "158,524 v9 — 44 (5)? 137,691 pe 172,587 172,222 V3 104,226 V3 "104,226 ps 098,801 098,612 v4 149,090 »y—400(°B)2+-y75(5)! “131,097 4 ‘156,767 "156,405 ° ELEANOR PAIRMAN AND KARL PEARSON 245 It will be seen that now Sheppard’s corrections are wholly inadequate and our corrections are essential, even in the case of the fourth moment-coefficient. This confirms the view of Sheppard himself, who insisted on the importance of high contact at the terminals, if they are to be used alone. It is a convincing illustration of the fallacy of those “ proofs” of Sheppard’s corrections which do not appeal to the principle of high terminal contact. We now propose to illustrate the degree of improvement in the exactness obtained, if we calculate the abruptness coefficients on smaller subranges. Accord- ingly we break up the terminal group 65591 on ‘5a base into five groups each on ‘lo base. These are 17142 leading to n, =°1622,5272 and a, =—-1728,9281, 14979 ny ='1417,7946 d= - *0216,4780, 12959 Ms ="1226,5973 a, = — (0010,4603, 11099 ny ='1050,5442 a, = — (0002,3651, 9412 n; ='0890,8661 ad.= “00003781, : h\s whence, remembering a,’ = () a, = 5*as, we find vo a,’ = — 864,464, and = A, (a’ — gas + geap ds ) = — 071,853, Oo D495, — hy (ay — 737 0/) = — 004,559, a, — = 130, (94, — 45 (a — Gas teas) = 021,351, ay = — ‘147,818, tha (a2 — ay) = 004398, Ce eS too: Thus the moment-coefficients become w, = °957,660 or in actual units °478,830, fe =. 1:606,102 ‘401,526, Ms = 3'647,389 455,924, bes = 10°161,505 635,094. Transferring to the mean we have On ‘5 subranges On ‘1 subranges Actual values ji 478,350 ‘478,830 478,813 i ‘172,587 172,248 ‘172,222 ‘tg 098,801 098,705 098,612 ie ‘156,767 156,516 156,405 While the first column of values would be amply adequate for most. statistical purposes, the second makes a still closer approximation to the actual values, the ditterences being only — ‘000,017, +:000,026, +:000,0938, +:000,111 as against — 000,463, +:°000,365, +:000,189, +:°000,362 respectively. The greatest improvements are in the mean and standard deviation. Accordingly it is well worth using smaller terminal subranges, if they are available as in the cases of cricket scores, wages, house values, infant mortality and other frequency material. 246 On Corrections for Moment-Coefficients (8) Illustration IT. (b) We now propose to consider the moment-coefficients of a doubly truncated normal curve. We will take the portion of the above 1,000,000 distribution with unit standard deviation from variate value 1:25 to variate value 3°75 and divide it into five groups, ie. Absolute Relative frequencies frequencies 65,591 °6213,5637 27,834 *2636,7693 9,245 0875,7969 2,402 0227,5462 489 0046,3239 Total 105,561 Total 1-:0000,0000 Using (111) and (iv) which now involve all five groups we find a, = — °8794,4917, b, = — (0088,5527, y= °6084,9651, b,= -0258,2393, ls = — °2970,9362, b, = — ‘0401,0477, a,= °0817,9157, b,= -0530,8779, at; = — 0057,4076, b,= 0057,4076. From these results, since a’s = a’’s and b’s = b’’s we have for the abruptness functions : al,’ — gas, ar aah noe = > 87449989, b,’ bares aybs’ + xea0 0s = 0031,8458, Gs ea = *6052,5081, 6, — 78,0, = -0237,1727, Qy — Ps As + zhgds = — °8558,9423, by — Py bs + yhpbs) = — 0006,4848, Il °6013,3975, bo! — gab,’ = ‘0211,7875 About the first terminal we have for the raw moment-coefticients », = 1:025,630, —». = 1'668,535, v3; =3'733,743, v= 10'108,966, | eee / 7 i and by (xxi1)—(xxv) the corresponding corrective terms are — °0731,4037, —-0074,9994, + :°0044,7459, — -0981,1557, leading to the Sheppard’s correction moment-coefficients i actual units : fy ="512,815,° po ='396,300, py = °434,667, po = 681,492; and the full correction moment-coefficients : py = "476,245, pe’ = "894,425, ps =°435,226, pe = 575,360. We now transfer to the mean of the block and find fy = "AT0,245; a = 167,616, = 087, 180d, Sie — 123/69 while the values for the Sheppard’s corrections only would be jy = DL2 815, o fis—"133.320 oe , — 104,701, ji — 100, a The theoretical values for the normal curve block are fy, = 476930, 9 eg = 168,025, Se pg 05957 30) fy = 133,748. ELEANOR PAIRMAN AND KARL PEARSON 247 It will be seen that the Sheppard’s corrections alone give very unsatisfactory results, and that while the full corrections for the first three moments are statistically satisfactory, the approximation of the fourth moment-coefficient would not for certain investigations be adequate. We are in fact using only five groups and trusting to these for the accuracy of our abruptness coefficients. We will accordingly now test what improvement arises when we divide our terminal groups into five sub- groups and calculate the abruptness coefficients on these smaller subranges. Thus we have h ="5, hj =h,="1, and therefore p, = p,=5. Our subgroups are: n,=17142 therefore n,’ = '1623,8952, Mp-4= 173 therefore n’,_,='0016,3886, Ny = 14979 ns = °1418,9900, Ny_3 = 124 Ny -s = '0011,7468, n, = 12959 ny! ="1227,6314, iiegas 88 n'y» = 0008,3364, n,= 11099 . ny = °1051,4300, Ol Hp = 0005,7 786, Na— DAT2 n;, = 038916172, Ny = 43 nw, ='0004,0735. 65591 489 Whence a, = —'1730,3848 and a = —°8651,9242, b6,= ‘0003,5810 and b= :0017,9049, (ly = + °0216,6599 dy =+ '5416,4969, b,= °-0000,5368 b, = ‘0013,4204, (tx = —°0010,4679 ds = —'1308,4851, b= “O001 5157 b, = °0189,4639, a, = — 00023683 ay = —'1480,1868, b, = —:0000,7579 bf = — '04738,6598, a; = + °0000,3789 ad; = +'1184,1494, 6,= -0000,3789 bf = ‘1184,1494. Determining the abruptness functions from these values, we have Gy! — hy Ge + yep Os =—°8629,6462, by’ — py bs + oAyg bs = +°0015,2171, ee oe = 5475,2345, / —73,b/ = + -0032,2164 ty! — Psy + yhgay’ = —'8543,1522, —b,’ — by’ + g45b/ =+°0007,8021, CET Ne = ‘56047508, by — gb, = + 0054,8656. Working out the corrective terms for abruptness we find them — 071,787, —-003,368, +°031,252, +4:071,446 in working units, leading to fa = "476,922, uy = 395,458; - pes = °438,5735, ps, = *585,957, or transferring to the mean we have Ma = "476,922, pp = 168,008, us = 089,722, py = 183:788, as against the actual values fe AN OO30NN Wis LOS 025505 4 — 089,100, 9 4, = "133,143, an eminently satisfactory agreement. It is thus clear that when possible it is desirable to obtain the abruptness corrections by small subranges—in this case ;'; of the standard deviation. Hence any terminal small range groupings such as are frequently provided in statistical data are useful from this standpoint. In fact if 248 On Corrections for Moment-Coefficients the abruptness coefficients are found from such small groupings the remaining sub- ranges can safely be made fairly coarse, as in the above examples, where five divisions of the total range are clearly adequate. (9) Illustration III. Mean Age and Variability of Infants at Death. It is very important in practical statistics to obtain the mean and standard deviation of J-shaped curves. A good illustration of such curves may be found in infantile mortality statistics. These have the advantage that in the early part of the year of infancy the frequencies are in certain cases given by much smaller intervals. Thus in the Prussian official statistics they are given for the first fortnight by days. Professor Raymond Pearl in a paper of 1906 (Biometrika, Vol. Iv, p. 510) has endeavoured to ascertain the mean age at death of infants in the first year of life from the Prussian data. It will be of interest to determine what changes are likely to be made in his results by the use of our present abruptness corrections. He writes (p. 512): It is evident that the grouping here [i.e. in the Prussian data] is sufficiently fine to make possible a very accurate determination of the mean age of death.... A standard month of 30 days was assumed: then with a unit of 30 days the first and second moment-coefficients about an arbitrary axis were determined. From these the position of the mean and the value of the second moment about it were easily found. Only the “rough” second moment was caleulated, as it was deemed sufficiently accurate for present purposes, and furthermore it was difficult to deter- mine the proper corrective terms to apply in this case. In the calculations each frequency element was for practical convenience centred at the midpoint of its range. The error made by so doing is negligible. With our present corrections we can test how far the errors made by concentra- tion at the midpoints of the subranges are really negligible. It is certainly right to concentrate at those points provided we allow for terminal abruptness which is very marked in this case. If we make the proper terminal corrections theory shows that quite considerable subranges, say in this case one month, may be used to determine the raw moments. It will be sufficient to illustrate the method on the Prussian male infant deaths. We have deaths per 1000 infants born : For the birth terminal we have*: Months Deaths Days Deaths Q—1 63°99 0—3 18°25 1—2 22°59 oO 6°58 oe 18°58 6—9 7°89 Seal 15°96 9—12 5°65 ahh 13°30 12—15 5°82 5—6 11°51 : 6=—7 10°61 7—8 9°30 8—9 8°74 S10 8:29 10—11 751 P12, 6:94 Total 197:32 v= 3°759,224 “ye! = 25°809,801 | * Three day intervals taken with a view to smoothing anomalous values. in months. ELEANOR PAIRMAN AND KARL PEARSON 249 The month subranges will be quite adequate at the childhood terminal. As the results are based on 1877—1881 averages, we shall suppose the month to be 30°4375 days. Thus h/h, = 10:145,833, h/h,=1. We find my! = -0924,8936, ay, = — °1877,2687, o/=— —1904,640, ny = '0333,4685, dy= ‘2966,9657, ay = 30°541,331, ns = '0399,8581, a, = — °3909,0057, ds; =— 408:253,057, ny! = 02863369, a,= °8117,2715, a, = 3303'128,589, ng = 0294,9524, d= 199.7730. a,’ = — 12253-408,881 ; N yp—4 = 0471,3156, pe 0, — 0309) n'y = 0442,9353, b,! = bs = — 004,823, n'y = 0420,1297, ees eSAb yp -1 = °0380,6000, b, = bs = — 012,670, W, ='0351,7130, b, =b;= 004,967. From these we deduce En eta, ta) 008,008, . 4b (bo —ahbrtavenbe)= 002,961, zy (de — bg 04) = — 837,793, zhy (bo — 735s’) = — ‘000,036, whence Total abruptness correction on v,' = ‘006,054, » : vo = °843,679. Thus fy = 3°765,278 months, fy = 26°570,147 (months)’, using of course Sheppard’s correction. ; Finally we reach Mean = 11461 days as against* 113-07 days, Standard Deviation = 10715 Pe 7 105°44 obtained from taking the raw moments of small elements of one day up to the end of the first fortnight. Thus, if we desire to get a mean within 1°5 °/, of the correct value, it will be well to adopt abruptness corrections. (10) Illustration IV. In view of the fact that in the previous illustration the infantile death-rate curve has probably an infinite initial ordinate.it seems well to measure, in a case which can be tested, the degree with which our corrections give the actual values of the moment-coefficients in such a case. We choose the curve | y= Le 3, and suppose ten subranges going up to the terminal #= 10, from «=0, * Pearl’s results modified by taking the average month to be 30°4375, not 30 days. Biometrika x11 1B We have On Corrections for Moment-Coefficients for the “ Further, for small terminal subranges, we have: frequencies’: x Frequency Onto 1 1-000, 006 0 x Frequency Loe Peete 0 to 2 447,2136 2 to 3 *317,8372 9 to °4 185.2419 3 to 4 -267,9492 Ferd 149.1415 fs eae) ieee “4 to 6 142,1412 4 to d *236,0680 6 to °8 -119.8305 5 to 6 213,4217 S08 kigeecon 6 to 7 -196,2616 : es 7 to 8 "182.6758 8 to 9 “i101 29 9 to 10 “162, 2777 Total 3° 162,277 7 It will be h/h,=1. Thus we have vp,’ = 3°394,907, = 20°016,109; n, ='1414,2136, @, = —‘2332,9561, a,,=— 1:166,4781], no ='0585,7863, (lg = + '2583,1707, a, =+ 6°457,9265, nz = 0449,4899, a; = — '2657,4026, Gy = —-33°217,5325, nz =°0378,93738, ty == + °1798,6052 a, = + 112°412,8250, 5 = 0393,9000, n'y 4 = 0674,8987, p39 — 0620/6337, M pag O01 Od LO, (pe Us: so OJ, nw’, =-0513,1672, These values dy (ay’ — gods + geag ds) = — 057,1279, — zha (de qi (bY — gobs + astgbs) = 004,1669, cha (0 aie a we Baa = 3'394,9066 — 052 fy = 20:016,1090 — 083,3333 + :066,7155 = 19°999,4912, My = 8'°830,8953. and which gives For comparison we have Raw moments 3°3949 20-0161 9 84907 2°9139 It will be seen that Sheppard’s results. a; = — ‘0586,1091, by = b, = + 050,0105 b,’ = bo = + '002,4535 b,’ = b, = + '000,4817 b, = b, = —:000,0497 b, = b;=+ °000,1316 9616 = 3°341,9450 Using only ane ns) — 183°159,0938. Actual values + °0500,0000, + :0025,0000, + :0003,7500, + ‘0000,9375, + '0000,3281. of a’s and b’s lead to the abruptness functions : Sheppard’s corrections Full corrections True. values 3°3949 33419 OD oD 19°9328 19°9995 200000 84074 8°8309 8°8889 2°8995 29718 2°9814 it would be better to take the raw moment results without any corrections. other hand the full corrections even in this extreme case—where (a) the Euler- sufficient to take the subranges unity at the other terminal, or h/h, = 5 a,,) = — 016,6425, (00,0205. corrections alone are worse than the raw moment In other words they should certainly not be used alone for J-shaped curves; On the ELEANOR PAIRMAN AND Kart PEARSON 251 Maclaurin Theorem fails theoretically, (b) our auxiliary curve is unreasonable, for a * cannot be expanded at the origin terminal in powers of #—are found to give results within 4 °/, of the true values for both mean and standard deviation. The variety of illustrations we have taken seems to suggest that for most practical statistical problems—even with J- or U-shaped distributions—we shall obtain reasonable results from the system developed in the first part of this paper. At the same time the method adopted indicates that for the best possible results in asymptotic frequency curves it may be needful to use a more suitable auxiliary curve for the asymptotic terminal. This leads us directly to the second part of our paper. Part II. Cases of Asymptotic Frequency. (11) In selecting our auxiliary curve to give the first five frequencies we must remember that it has (i) to give an infinite ordinate but a finite frequency, (i1) it must be of such a character that its constants can be readily determined. If we adopt Z=N(14+a1(A + Be + Ca? + Da? + Ex')), where q is chosen less than unity, we have the adequate number of constants and y =—aZ/dzex is infinite when « = 0. If we leave g undetermined, however, we should have six not five constants and might then omit #. But the process of determining A, B,C; D and q would be very laborious and involve a troublesome series of approximations. We are ac- cordingly thrown back on the retention of and an arbitrary choice of g. Olearly to give an infinite ordinate and finite area we may give g any value from slightly over zero to slightly under unity, and the size of g measures so to speak the intensity of the asymptoting. This is probably rather an important feature of the frequency curve, but as we see no way of determining it accurately without very great labour, we give q its mean value $. Accordingly our problem becomes that of determining A, B, C, D and E s0 as to give the first five frequencies or the values of Nn’, Nn., Nn;, Nny, Nn; as before. After a good deal of work they are found to be (A =—1°64964,84755n,/ + 3°35035,15245n,’ — 3°72071,6287 4n,/ | + 2:05278,64045n, — °44721,85955n,;, } | B=+ -91328,76419n,’ — 5°50337,90247 ny + 7°10669,19065n,' — 415163,83427ny + °93169,49906n,', C=— °31317,72759n, + 2°64515,60574n,' — 430806,06243n,' + 2°76448,01733n, — °65218,64934n,, D=+ -05299,17797n,’ — °53034,15536n,’ + 1-:00172,31390n, — °73032,76686n, + °18633,89981n,', E =— :00845,36703n, + -03821,29964n./ — ‘07963,81338n,' \ + -06469,94335n,/— -01863,38998n,’. The large number of decimals is requisite owing to the high coefficients they have to be multiplied by in ascertaining the values of the abruptness coefficients. 12 (xxvill) < 252 On Corrections for Moment-Coefficients Now our scheme of action is of the following kind: we shall obtain the abruptness coefficients at « = 1, or at the finite ordinate of the first trapezette, for here they will be finite. We shall then trust to our auxiliary curve to give the moments of this trapezette about # = 1, using the integral : J AZ lt, | (a — 1)’ yda =— | (a—1) --- da. Jo Jo da And lastly we shall determine the moments and the corrections of the remainder of . the curve by the process already discussed as if it had to be applied from #=1 onwards*, The moments for the trapezette before a = 1, and for the remainder of the curve, must then be added together to get the total moments and so the moment- coefficients about «= 1. The transference to the centroid then proceeds in the usual manner. Moments of first trapezette n, about non-infinite ordinate : Ny fy” = 2N(4A44+14B4+10+1D4+744), (Xx1X) ‘Mype =— BN (GA +35 B+ gC + gy D+ 13), XX1X + : ‘ mos = 16N (A+ 7h5B+5h, 0+ ay D+ 754); a) URE QR 1 / 1 1 (1 1 1 1) My =—128N (st;4A+ cis5 B+ siosC + ets D + iiss). Again remembering that a2 (%4 6 (aes)? (a, =4(A4+3B4+504+ 7D +4 94), a,=}(—-A+3B4 150+ 35D + 634), (xxx) {@;=2(4 —B+5C+4+ 35D + 1052), a,= 3;(-54 + 3B—50 + 35D + 3154), dy =f (835A — 15B + 150 - 35D + 8158). we find If we now substitute (xxvili) in (xxix) and (xxx) we shall obtain the moments of the first trapezette and the abruptness coefficients at # = 1 in terms of the first five sub-frequencies. We have Ny py = —"812,7818n, +°677,0691n,—'660,5497n; +347, 1889n,—'073,7827n;, ype = °706,7407n, —'824,1137n, +'830,5586n,— "44152182, + 094,357 2n,, CL) [Ree eee a ee nypul'= °581,4517n, —'854,1149n, +'888,8688n,—'478,0407n,+102,7607n,, * The abruptness coefficients in the previous case were determined from the five frequencies following the initial ordinate; here they are found from the four frequencies following and the one preceding it. ELEANOR PAIRMAN AND KARL PEARSON 2538 and again a =— °'067,9063n,’ — 1°651,2396n,’ + 1:177,1875n, — 554,8634n,0+ +111,8034n,', b= 332,2458n,' + °915,5792n.’ — 2°384,2525n,’ + 1:368,5243n,— °298,1424n,’, i. dy =— °988,7796n,' + 2°823,7204n,’ — 2°126,0271n, (xxxll) + ng + °472,0491n,— °027,9508n,’, a, = 2°4.97,6496n,’ — 9°939,8504n,’ + 13°394,6734n,’ — 7:822,9490n, + 1°677,0510n; , dy! =— 7°413,4958n,/ + 25°320,8792n,' — 33'899,3138n,/ \ + 20°768,5399n, — 4°856,4601n,’. Here as before n,’ =1,/N. We have accordingly to add the values given by (xxx1) to the expressions for the moments for the remainder of the frequency corrected for the abruptness by means of the series (xxx1i). We propose to illustrate our results on one or two numerical examples. (12) Illustration V. The following data provide the years of survival for 10,000 persons, male and female, born in England and Wales with congenital malformations*. Age at death Male Female Years 0-—1 8762 8753 =) 393 339 3 140 150 3—4 95 80 4—5 86 69 510 185 184 10—15 90 132 1340) 86 86 2025 63 : 52 P15; 310) 45 40 30—35 9 40 340) 18 ef 40—45 9 6 45—50 9 3} 50—55 5 Hit 55—60 —_ 6 60—65 5 — i 65—70 = 6 70--75 = 6 Totals 10,000 10,000 Now consider how we should endeavour to find the mean and standard de viation of such series under the old method. We clearly cannot use Sheppard’s corrections. If we concentrate the deaths in the first year of life at 0°5, we shall certainly get too high a mean. Now Pearl has shown by taking Prussian statistics (Biometrika, * Registrar-General’s Annual Report, p. 207, 1913. 254 On Corrections for Moment-Coefficients Vol. rv, p. 515) that as deduced from data registered at short intervals of days, the mean of the total population of infants dying in the first year of life should be con- centrated at 0°3 instead of 0°5 year of life. But our infants with congenital mal- formations undoubtedly die earlier than the great bulk of normal infants. We night therefore hazard a concentration at 0°2; but this would be mere guesswork *, and what is more would not provide the proper corrections for concentrating in the case of other years of life. We obtain, however, by this process the following results : Male Female First Year concentrated at: 0-2 0:3 0-2 0°3 Mean 1:2436 1°3313 15077 | 1°5952, Standard Deviation 45932 4:5734 57750 | 5°7532 The differences between the 0:2 and the 0°3 results are considerable and it will be found from the sequel that the 0:2 results are closest to the corrected results for both mean and standard deviation in the case of the male and the female. Indeed a quite reasonable result might have been reached by centring the deaths in the first year of life at 02. But such a priort guesses must be at best risky. When we proceed to apply our method by cutting off the first year of life, we note at once that in this case, as in many other of a like J-distribution character, a grave difficulty arises, namely we have starting from the group 1—2 not got the groupings in year or five year ranges, for we have cut off the first of our five year groups. We cannot therefore straight away apply our formulae based on the Euler-Maclaurin theory for equal subranges. The suggestion that at once occurs This of course would make no change in the first raw moment 7’, which would be the same whether we grouped into year or five year subranges on the supposition that we simply spht up our frequencies into five equal groups for the five year periods. But there will be a change for the second and higher moments. For the second moment the total frequency of the five year group (na) centred at # has to be multiplied by a? + 2h?, where h=+ of the subrange =one year, and similar corrections can be easily obtained for the higher moments. Of course this distribution of each five year frequency into five equal one year frequency groups is not satisfactory, but» with the irregular data as given it is, perhaps, as good a result as we can hope to get, until official statisticians recognise the difficulty and table their statistics in a manner to meet it, ie. 1n this case, it would mean either proceeding by four year groups after the 4—5, or giving the 5—6 frequency and then proceeding by five year groups 6—11, 11—16, ete. is to take year groupings for our material. * Actually our auxiliary curve gives 0'210 for males and 0°205 for females for means of deaths in the first year of life. ELEANOR PAIRMAN AND Kart PEARSON 255 Assuming the legitimacy for the present purposes of this redistribution in year groups we find for moments round the end of the first year of life : Males Females 1238p," = 9446 1247p, = 12079°5 12387, = 207,007-25 1247p,” = 331545°75 Again we have for males: n, = 8762, and by (xxxii) a,’ x 1238 =— 1122°222,84.40, fe = 393, a’ X 1238 = —3041534,5378, ns, = 140, a; x 1238 =— 9518°206,0741, ng OS. al < 1938 = 19254345,0950, Ne = 86. ag X 1238 = — 581964928841, Our abruptness functions are thus found to be 123854, (a) — das’ + aa Gs’) = — 83°056,6602, 60 123851, (ae — 7354) = 18:978,9435. These provide for the moments about 1: 1238p," = 9446 — 83:056,6602 = 9362°943 3398, 1233805 = 207,007:25 — 103:166,6667 — 18°978,9435 = 206885'104,3898. We now find from (xxxi) the values of 8762p," = — 6921:345,3000, and 87624." = 5951:033,6815. Thus: 10,000m,' = 8762," + 1288y,'" = 2441°598,0398, 10,000p.' = 8762p," + 1238,” = 212836:138,0713, or fy = 244,160, py’ = 21:283,614. - Thus finally the Mean = 1:2442 years and the Standard Deviation = 46069 years. We now turn to the female deaths and find with the same notation : 1247p,” = 120795, 1247 v,"" = 331,545°75. Here n, = 8753, leading by (v) to 1247a,/ = — 1014°250,5807, Np = 339, (24a, = 294.9°801,0796, n, = 150, 1247a,, =— 7980°615,3654, n, = 80, 1247a/ = 19991:399,2722, n; = 69, 1247a,) = — 60065-060,3135. 256 On Corrections for Moment-Coefficients Hence we deduce : 1247 x 35 (a! — does’ + gctog as) = — 75°422,972, 1247 x 45 (a — 73g a’) = 17:970,765, and 1247 w,/" = 12004:077,028, 1247 ,."” = 331545°75 — 103:916,667 — 17-970,765, = 331423°862,568. Again by (xxxi): 8753," = — 6961°151,020, 8753p.” = 6002°499,428. Thus: pa = (8753 m," + 1247 4,/")/10,000 = 504,293, fy = (8753 p,” + 1247 4,/”)/10,000 = 33°742,636. Accordingly we have for females : Mean = 15043 years, Standard Deviation = 5°7869 years. These are both in fairly good accord with the result that would have been obtained by the a priori guess of 0°2 for centring the first sub-frequency. (13) Illustration VI. It is not without interest to inquire if this centring of 0-2 maintains itself when we turn to other material for congenital malformations. We can use the material provided in the United States Census for 1899—1900, Vol. 1v, p.670. From the data there given we deduce that for 10,000 congenitally malformed individuals of either sex born: Died in year of life Males Females 0—1 9626 : 9543 1—2 129 204 2—3 61 57 3—4 27 49 4—5 14 25 5—10 54 4] 10— 15 34 4] 15—20 20 8 20—25 14 8 25—30 — 8 30. -385 a — 35—40 _ = 40—45 — — 45—50 = = 50—55 —- 8 55—60 14 — 60—65 — = 65—70 — 8 | LS} S | S) oO Totals 10,000 | We have as before: '’s, 374y,/" = 2657, 2s, aba, = 2595.5; S14, =f 175s 457y,"’ = 76280°25. ELEANOR PAIRMAN AND Kari PEARSON 257 We now turn to the abruptness coefficients at the end of the first year of life and find: Males | Females = 9626, and by (xxxil) | 2 =9543, and by (xxxil) 374u,/= — 808°283,5789 | 457a,)=— 942°176,2334 Ng = 129 374a, = 3203 °644,5476 | my=204 | 457a, = 3281°101,5644 ng=61 3740, = — 9271-:066,1366 | 23=57 457a3/= — 8958°636,6700 ng=27 374ay'= 23389:468,5164 — ny=49 457ay= 22229-438,8090 Ns = 14 374a;/ = — 69671:015,1599 | Ns; = 25 457a5 = — 66617'544,9966 374 x qb (ay — go 43 + asz9 ds) = — 56°784,418, | 457 x a5 (a1 — gg Ga +a as) = — 68°275,096, 374. X qhy (ae -— 7354) = —-18°962,592. 457 x45 (ao — 73¢04)= 19°991,508. Thus : 8374p)" = 2600°215,572, | Thus: ABT py” = 2527-224,904, 374 po” =71077'370,741. | 457 py!” = 76222175, 159, From (xxxi) we have: | From (xxx1) we have: 9626p,” = --7768'448,082, 9543.1” = — 7640-738,2653, or, 1 — py” ="1930, | or, 1 = py” =1993, 9626p.” =6736°839,298. | 954342” = 6604'373,5073. Thus: 10,000) = — 5168°232,510, | Thus: 10,000p)/ = — 5113°513,361, ~ and py’ = — 516,823 ; and py = —°511,3513; 10,000 px’ =77814-210,039, | 10,000py’ = 82826:548, 666, py =7°781,421 ; fe = 8°282,655 ; or, finally | or, finally Mean = -4832 year, | Mean = ‘4886 year, Standard Deviation =2°7412 years. | Standard Deviation =2°8322 years. It is clear that in both cases the centring of those who die in the first year of life is a little under 0:2, instead of slightly over 0-2 as in the English data. It is worth while inquiring what the effect of concentrating the deaths in the first year of life at 0°2 and then simply determining the crude moments will be. We find: Concentration at | neentration at 0°2 : Concentrat actual centres * | Complete corrections Male Female Male | Female Male | Female Mean | 496 | “496 489 | -495 | -483 489 | Standard Deviation | 2°729 | 2°821 2-730 | 2°822 | 2741 2°832 This process then gives quite a reasonable value for the mean and standard deviation. Thus all we have to do for a rough practical value is to use the pw,” of the first equation of (xxxi) to obtain the centring of the first group and then find the raw moments only. For a very high first group this is considerably better than applying our first non-asymptotic method and of course better than mere raw moments. The following are values found from year groups : * That is at ‘1930 and -1993. Ist Method of this paper ~ Raw moments i a = ee Wee 2 = | ; Male | Female Male | Female | Mean 609 610 592 «| «~ *592 Standard Deviation WTS} e282 2°72 2°81 The means are inadequate, but it is remarkable how close the standard deviations are to the corrected values. (14) The reader may occasionally be puzzled to settle whether a frequency distribution has really a finite or infinite initial ordinate and therefore be in doubt, as to whether he should apply the first or second method of this paper. Our [lustration IIT may be taken as a possible example of this, although the ex- aggeration of the first frequency is nothing like so marked as in the case of con- genital malformations. If we apply the first equation of (xxx1i) to the first three days’ period we find: O—3 days n, = 18°25 3-6 , m= 658 whence 18:25," = — 14:057,688 > Ts . 6—9 , n= 789+ OF ele or remembering our three days’ unit, 9—12 ,, ny= 5°65 12—15 ,, n= 5°82 Mean = ‘69 day. Our table now becomes: 0—3 days 18:25 centred at 69 days 5-6 6°58 45 6—9 ” 7:89 75 ” Caos. 5°65 105, 12—15 ,, 5°82 135, 15—1 months 19°80 ‘75 months oe 22:59 15 “ Cae 18°58 25 . - 8 15-96 3-5 i i 13°30 45 . 56 11-51 55 6—7 9 10°61 65 ” 7—8 ” 9°30 75 ” 8—9 ss 8°74 8:5 Pe 9—10 ,, 8-29 9°5 6 10-11 ,, 751 10°5 ‘ Neto 6-94 115 S Total 197-32 Hence by raw moments we find : Mean = 112:98 days as against 114°61 days, Standard Deviation = 105°53 10715, found by the first method of this paper. Here we have not used our full second method but the results are in fairly close accord, especially in view of the fact that we have not corrected for the curtailment abruptly at the end of the 12 months. Accordingly the suggestion made is that in doubtful cases both methods will give fairly closely the same values, and therefore we need not worry over which is the more correct one to apply. ”? ” PECCAVIMUS! This paper is devoted to a number of slips recently made by the Biometric School and which it is desirable to correct at once, before the formulae which need correction pass into general use. Some of these slips are due to war haste, others to neglect of terms which ought to have been included in our approximations and some to printers’ errors. We have to thank Professor Tchouprotf of Petrograd fo indicating the existence of several of these mistakes. (1) Biometrika, Vol. x1, p. 215. On the Probable Error of a Coefficient of Contingency without Approaimation. By Andrew W. Young and Karl Pearson. Down to p. 222, equation (xii), this has been again checked without discovery of any error. But on that page the authors “take J7 to be very large compared with NV and make y, = y.= xy; = X,=1” by an oversight. The values ‘of the y’s are given on p. 217, equation (vi), and clearly when M is very large compared with J, Xi = Xe= Xi = 1 and y; = 1 — 2/N. Accordingly equations (xii1) and (xiv) of p, 222 for samples from “an infinite population” require modification and should be* oa v [5 (ares) - {8 Gor : AS ‘ Ng + ye [98 ()—#8 (i) 8) +198(s) 228 (ye) +m |8(a2)- GDF - G2) + 8G) 8) + 88(phs) — 618 (Hx)p a —~ ‘a ee nr =) — eo [fE)o-9] 1 +(e (Ce) ro—sons(Q) 088) s rs —(2—4¢") ¢ — 166? + 106! — 2] +m SGE)+8@)- {8 Gay -68(82) + 98) ae (f) (2c — 4? + 8) — 6h! + 12¢? + deg? — 0? — Qe + 2| ae (xiv). * The changes due to x; affect the term in 1/N*, but the original (xiii) has a wrong sign to the third term in 1/N2. 260 Peceavimus ! We may now turn to the numerical illustrations. It will be sufficient to show the correct values of oy: in the table on p. 224. First and second terms Z of (xiv) All terms of (xiy) Old Values 02709 3 02729 | Cerecte Values 02725 02744 For practical purposes, these would all be taken as ‘027, and accordingly the .errors, although sufficiently distressing, do not modify the conclusions, that for a sample over 1000 the first and second terms of (xiv) are adequate. In the second example, p. 227, more serious changes are made, chiefly owing to the error in the sign of the second-order term (2 — 4°) c, which becomes of greater importance now that WV is reduced from 1801 in the first illustration to the 218 of the second illustration. We have for oy: First ar terms All termssof (xi¥) Old Values ‘0798 0823 Corrected Values 0693 0719 Thus for practical purposes the ‘069 of the first and second-order terms is only raised to ‘072, if we include the third-order term. We may therefore conclude that 250 cases marks something like the limit at which we need to consider the third- order term as well as the first- and the second-order terms. We now turn to the test for zero-contingency. Equation (xvii) of the original paper is correct, but the wrong value of x, was inserted to obtain (xviii); 1t should of course be 1 —2/N. This leads to c(e —-2)—-2(e—1) oy = z ie 4p N +2(c— ih Benen (xvii), or perhaps as it is better expressed : sgh Al ee I ce ne On = 773 8 ia +2 (1 _ x) (c-—1)— mt Laat ueee (xvill) bis. The formulae summarised on p. 229 must be altered to accord with the results (C) must be (xiv) of the present paper. (D) must have —2¢ and (C’) must be (xviil) above. given above. not + 2c for its last term. (II) The object of our next note is to make some additions and corrections provided by Dr Isserlis himself to his paper: “On the Conditions under which the ‘ Probable Errors’ of Frequency Distributions have a real significance ” (Rt. S. Proc. Vol. 92, A, EDITORIAL 261 pp. 23—41, 1915). In that paper he gave the values of the frequency constants By and B, (formulae (19) and (23), pp. 30 and 31) of the distribution of the moment-coefticient of any order wu about a fixed origin for a sample of size x drawn from a population of size NV. These formulae are exact and no alterations are pro- posed here in them nor in any conclusions drawn from them. In the latter part of the paper Dr Isserlis deals with the value of the 8-constants for moment-coefficients referred to the mean of the sample. These latter values were approximate and intended to be correct to terms in me We are indebted to Professor 'Tchouprott for pointing out that there is an error in the approximation, for one of the neglected terms rises. When the correction is made, however, the statement (p. 24) remains true that “for coefficients of high order the sample has to be an inconveniently large fraction of the population itself if 8, and 8, are to approach even approximately their Gaussian values” (i.e. 0 and 3). The results in the paper cited are exact and correct * until section 5 (p. 35) 1s reached. In that section, formulae (38), (39) and (41) are approximations and for the purposes of the paper should be given correct ee UL agli rs ; to terms in . for (88) and to terms in a for (41). The use of the incomplete value U Cm lee ; Ci 99 (ig Neg) 2 Uh gp Le i in equation (37) has introduced an error in the value of M/, given by equation (39). We proceed to amend this error. Ih Pe i = e : We have fu => S in, (a, — %)"| =—S(n,X,"), dX,=—dz; n . = 1 vi 7 U 7 U—1 fm} er tle FF S {dn Xs — ung Xda} il a u(u—Il : _ + rf S '- udn aa xX + ae =) Ns X dit} +... = = A+ 6+ terms of third and higher orders in dn,, d@. Now it is well known that the mean value of fifth and higher powers of dn,, dz, ... contains no terms of lower degree than the third in I/n. In the formulae (38), (39) and (41) the values of M,, M,, M, were obtained as the mean values of A*, A* and A‘ respectively. The inclusion of the neglected terms does not affect 4/, which is given correct to — nor M,, for the only term of the n fourth order in dn,, d@ in (A+ B+...)!is At But (duy)? = A? +3A°B + fifth-order terms in dn,, dv. * There are some obvious printers’ errors overlooked in proof, of which the omission of the factor (M’u)* = 2p’ 2, (uy)? + (vou)? In the first line of equation (21) is most likely to mislead. It may also be noted that the factor pv? is missing in the first term of (26) and the factor 3 in the first term of (41). 262 Peccavimus ! Hence the correction to be applied to the value of M, in formula (39) is the mean value of 3A°B. Now A= ; S (ding X x) — Uplry de and B= . aS (— uX,""dn,d&) + — CA — [yo X, so that A? =— “(dng Xe +2dn, dn, X" X ¢") — aU oS (dns GX 5") + ry ade. Let us write AA=L+M+N and B=H+K. pee ne tie wadnsdx) +S (Xe X "I dnedn,dz) nr WS (X POX dngdn,dx) + 28S (dnsdnzdnydt X UX 1X p" i Denoting the mean value of HL by HL we have (see (31), (32), (33) of paper cited) HL = - = bie su X X |, fe = ") ae 6 nx! Ns Z ats s\n eee = Ge a4. 9 Do Ne 2 aN mic -2) + ey x,-(2"9+8)x,|t = 28 fag Ue yt Net Xp) WON co | ae ux 39 \ ("= fe we) Xx? BUY S {(" gM a) (X au Yu at aX 2u-1 VY “ny! ia ae i\ a nw 2 oe c : — oS { a) 7 gut1 VY u— 2X ou V U 28 1( ie (X;5 Xt + X 5 xX; ) 298 ee OG ER CeD OY XOX," | =e uXp S 2s x a n* { : Ww 8 (25M (X ou xX ut Ox aul Y wt Xe sut1 Y vo ie OX eux i \ ne s t s § = UX [- DIY (ng X Xx ‘| - S {resis Kea XG 4b OX xy} { n2 1 (fs he Neat ae “ oe es = Ss. ea (2N eter pare AX 2" X | 9 Ns “~ IG. Gu Xp wy Nutr Xe You — 28 3- (3 at 4 als alt Ap ) +35(" s ee 2) ms ee LEN Es Yt OX’ 2u— xen} | EDITORIAL 263 n 5 uy in Ny? — 2N fu bu busi | a + feu eu + 22 fouaMuts &: UXb E a S {ss gilt ( Xs 2u Xx ey) X MAX 1 = 2X gu X _ xenxey| ns uy db’ ; a liteaey ar 2 oui fluti — Pou hu — Powter | 3 5, eter lain ap lat or mean value of HL* = = | ie 4 fg ee Cees ) Bou bu — seu buy ee e (Hau oF 2 fou—1 Mu — Pou bhu — Honsstn-o)f 5 = PTR IVE eo Y({ V 2u-1 22 Y(Vuy u-1 , 72) HM = le [s tNeg dng dx } = Xs a\ ¢ dn,dn, dx Hs 12 so that, using (34), (35) of the original paper HM= Seo [spen x [m(o (1-7) +f) exeQ0% +9) |; +8 1K, Payer [* (2N..Xy— ps) — (Xs — Xi) “lh | PAN [Viegas ? ‘ = Qu ; fo OS ie (- tie) S {dnd X,"}. n a Therefore, by (36), nN Ray = sf: "(3x6 SmoX typ (XE + Bu X, ~:))} ie eS We Oy x oy. » uta qa a 7 (us ar Soe Muy — bp} > (WD) ube aps a) en JA > = iN and using (26) of the original paper therefore Oh —1 Pb U—-2 | « = Y : = : Mean ee ieee | Sxbe TX. e (Ma + 34) 2n? Adding these various terms to the mean value of M, as given by (39) of paper cited, we find for the corrected value : M, = XX. [ 2p? — 3puflou + Psu — SU fu (Pow a 2 pout plu) no +3? wWua (Huts om Hs [Lu) —w (ua Hs) ] EDITORIAL 265 3x n> + {@ (ot ae Zhu pu but — Pou hu — 2 fu+1 four) u ar Qu? ur (Me fou—1 — fy hua bu + 2Mubuss) — = u(u—l , é Wea (3 py flu) = = bu» (ls flow = Pole + 2 rut) 5 , w(u-— 1) , ee — ww 1) puepua (Bho bus) + —_ Mu Ku-as (3y2)| a e E u (Msu ae Diba Mou — Khu Mou — hu-1 four) 2? pu (Me Meu + Pewter — Hui Maye + Pupusa) = Wu (Muse + Sf 2bu — sur) + a Hu—s (He Mou + bouts + 2p up — 2bu Puy) — UW (U— 1) fur PMu—s (Muts + So eMuti — Ms Mu) Re Soot) }. On ru Pu—2 (uy = SLs") and (40), (46) and (47) must be modified accordingly. It remains true that for a : f : : | a normal population M, vanishes when w is odd, and that in all cases B, x — . VL If we write this value of M, in the form ’ ByfK 3y¢'T gate eNO Ge NEE n* nv nv? d then in order to obtain B, correct to Li , the third term may be omitted, R has the n same value as in Equation (47) of the paper cited and XY is zero for normal distri- butions when wis odd. For even values of wu in normal distributions the value of Kis u(u—1) F uw (Hu? a Mon Mu) cae oat bu-2 ( [2 flow — fo fty”), which easily reduces to —4ux P x wy, where P= py, — 2 as in Equation (52). We may therefore add to Table I on p. 39 of the original paper the following column : | U K | ——— 2 — 2py? 33 0) 4 = 576p2° | 5 (0) | 6 — 457,650 py! Biometrika x11 18 266 Peccavimus ! In Table I on p. 39, the third column is unaltered, the second column becomes (the coefficients of ¢ being approximated) U By, 2 SOTO): 2 x : 3 0) 4 LL eA n x 5 0) 6 1099 (y’ — 0-040)? n ee The corrected form of Table III (p. 40 of paper cited) is now as follows: Table IIL Approaimate values of 8,, 8, for samples of 1000 out of a population of 1,000,000. | u Bi By 2 0-001 3-012 3 0-000 3-090 | 4 0-081 3°204 | Thus the effect of the correction is to change the values of 8, for u=2 and w=4 from the values 0:008 and 0:102 to 0:001 and 0:081 respectively, but it remains true that the frequency of the fourth moment-coefticient differs appreciably from the normal distribution. (III) Dr Isserlis also wishes to make the following emendations in his paper in the last number of Biometrika, Vol. Xu, p. 134. On p. 138 near the foot + ABC has been dropped from the bracket (8FGH + 2A F* + 2BG? + 2CH?). Also in 1. 6 of the same page for “on Q” read “and Q.” (IV) The point indicated by Professor Tchouproff, namely: that fourth-order : ers ; mean products are of the same order finally in Ws third-order mean products and cannot be neglected therefore in comparison with third-order mean products, is of great importance in investigations into the probable errors of frequency constants: in the case of small samples. In expanding functions of the deviations from mean values of subfrequencies such as én, we cannot neglect products of the fourth order in the 6n,’s compared with products of the third order. In obtaining results true to products of an odd order in the “statistical differentials ” we must proceed to products of the next highest even order to reach correctness. EDITORIAL 267 This principle, which is almost self-obvious, was, however, overlooked by Pearson in his paper “On the Application of ‘Goodness of Fit’ Tables to test Regression Curves and Theoretical Curves used to describe observational or experimental Data,” in Biometrika, Vol. x1, pp. 239—261. One of the objects of that paper was to investigate the probable errors and frequency distributions of errors in the mean and standard-deviation of an array. If we have an array of a first variate corresponding to a small subrange of a second variate in a sample of V, the law of distribution of the means and standard- deviations of such arrays when many samples of NV are taken had not been investigated at the time Pearson wrote. If there be n, individuals in such a sample, then the problem differs from the ordinary problem of the distribution of means and standard-deviations in a sample of size n,, In the fact that n, in the case of the array varies from sample to sample. Hence we cannot straight away assume that if 7%, be the mean number in the array then an,/Vn, and Gn,/V 27, will be the standard-deviations of the distributions of means and of standard-devia- tions of the arrays; still less do we know how far it is legitimate to suppose these distributions approximate to the Gaussian or normal type. As the problem is an exceedingly important one the writer asked Miss Eleanor Pairman to revise his work of 1916 by introducing where needful the fourth-order products. This she has done with certain additions and expansions. (a) From the equation on p. 289 we have: Wg Cole ONgpkHe [ONy\* mean (6m,) = mean & (—)* g fPat ae (= oP) ep at (S*) i. Np Np Np Np where § is a summation for every value of a from 1 to 2. = n But Mean SNopdNy = Ngp (1 —. ) and the regression relation is accordingly Se Ope iy gp Substituting this we see that every term vanishes and accordingly 6m, = 0, not merely to a high order of approximation, but absolutely. In other words the mean of the means of any array—notwithstanding that the number*in that array will var sequal to the mean of that array in the sampled population. (b) We have for the pth array: S (Hap + Sitgp) Xq Ny + On, S (SNqpNpXy) — S( (Tgp %q) )dny, Tip (Ty + Srp) My, + Emp, = Vin : but dry = S(SNgp) and S(Rgp%q) = Np Mp, 1) 268 Peccavimus ! and accordingly SES eT sie : ce S {[Sngp (&q — Mp)! 3 S [Orqv%o} P ? Ny + Onp Ny + SNp where #, is measured from the sampled population array mean. Now we desire to obtain the various moment-coefficients of dm,, or mean (dm,)', which for convenience may be written {dm,'}. There are two ways at first sight of doing this: : (1) We may expand (7, + 6z,)’ in terms of 6n,/n, and then take the mean values of products such as: (Stig) (Onan)? (Ongnl (lap) aesene ; This was the process adopted in the original memoir. It is very laborious and the algebra so lengthy as to lead easily to slips. Still on the present occasion we went to terms of a high order (Oy) and some of the results obtained will be so use- ful in other investigations on probable errors of frequency constants that it seems worth while placing them on record here. The fourth-order mean products in én, and én, may be added to those given on p. 245 of the original memoir. They are: My il : - Ny ai aN lhe = > (On, ON on) = Nap (1 _ 7) i! +93 (1 — x) Nop (1 - *) ; 2 a or — 2 (On, 82g Oty'p) = (1 = w) Nap Rap (1 a #) ( a a) 5 2\ Nop Ng'p Nos n DY (dnp SNgpdNqpONg'p) = — 8 (1 ae 7) “pMeete'e (1 = 3) : 2 2, = = n 3n, > (Ong ONgn ONg'p) = (1 _ A Renton (1 — =) ( — =) ; > (np? Sigy) = 7 ( -%) 1+3(1-z)a (1 -"%) ee Olpxe hap ap N WN) NV a ed a a For the fifth-order mean products, Miss Pairman also provided the following values* : 1 - & Nay 279, 6\ _ n 5S (Bian)? = Rap @ as “a @ _ = {i +2 (5 a y) ea - “| ieee Noo Noto 2Noy 1 Begg) = Baa (1 — 28) es P : aa NW Bes 2 \ Ngo! x & (On gp ON 779) = Ting Mpg! {1 ar WV — (1 — 7) WV = 6) Rng Rog _ Noa’ - 5") |} (5-5) -¥-#C wi) If * These results are of course perfectly general, that is to say we can suppress p and suppose them the mean variation values of elements ng, mq’, Nqv, Ng ANA Ngiy Of any frequency distribution. EDITORIAL 269 1 : Nop Na'p Na” 2 Oey ee < > (82% yp dNq'p ONg"p) = — a : eae (5 -- wy) V (3 = Wt 1 Nap No'p Na" 1 2 ON in » + Nap ll Map ; 1 6 \ hi, p Ng’ Ny x & (S22 gp SNg'p ONg''p ONg"p) = (5 - 8) 4 hy bat) (1 - aie) 1 6 \ Ray Nop Nop Rgrtp Ngiv =, & (Orgy ONg'p SN gp ONgpONgivp) = — 4 (5 a, P ae vere Alongside these we give the fifth order combinations of én, and 6n,,, which are deduced from these and were required for our purposes : 1 ue Ny 2» 6 \ No» ie 5 3 (Bp'8iqy) = Niigp (1 = re) (1 = FH) 1 +2 (5 - 7) eB (1- Wt | 1 3898 = Ny § = Ze i > (6n, 7627 4p) = Nop (1 — 7) is —Ny+ (1 - a) Nop = he 1(6-8)(- Ferre) 1 ni 2 = x SON; OMep ONgin) = Non tan (1 — “*) |! iar (5 - 7) (1 - 2) (a — | : n, 2 1 DCO, On on) = Nap ( — z) {i dt (1 = wy) ni, — 6 (1 —= a) tind 3 ( _ "2 a (3- ae Nap NN] WT) J’ eo Np Nap 4ny\ = (5 7 Wr) E “NW (3- ¥)| 1 P 6 \ Ngy Noy Ng» Ny\ [. An, 5 3 (80,28 gy Bry Bry») = — ( pas 7) Taeyp tye (1 — 7) (3 - Ww) 1 4 _ Qi» n SO Nay ) 1 2 (On, ON) = Ty ( — =) (1 — 7) i! +2 (5 - x Nop (1 — al ; ie a iy 2 ONG ep i YX (Snp On 8p dNg'p) = Nap Na'p (1 - z) \(1 — wy) — (5 ~ 7) V (3 - 7) J | 1 n = 1 2 2 a Pp i= (ONpON 9p Ol a5) = Nop Neip @ = 2) 2 6 (Gs + Tap — Faget N Ve. ye n ; nie rere Auf. 06 Aion > (S12, 6274) ONgipONg'p) = — ere A = 7) (5 = x) (1 =- i) . Nop Nop Nay N / n 6 Ss _ gp i : gp Dp > (ONy ONgy ONg'p ONg"p ONgi'p) (1 - 7) 4 (5 wy) ye ele 270 Peecavimus ! Also we give here additional moments* of the binomial (p + q)”" about its mean: fu = 9, IVE, Ms = npg (Pp — 4), py = npg {1 +3 (n — 2) pg}. Hs = mpq (p — q) {1 + 2(5n — 6) pa}, fo = npg {1 + 5 (Sn — 6) pq 1 — 4pq) + 152 (1m — 2) p*g?}, = npg (p— q) {1+ 4pq (14in — 15) + pg? (1052? — 462n + 360)}, fs = npg {1+ Tpg(17Tn — 18) + 14p?q? (85n? - 154n + 120) + Tp'g? (15n? — 340n? + 1044n — 720)} The values obtained in the above laborious manner for {6m,*} agreed as far as we proceeded with those obtained by the following or second method. Gi) This second method consisted in first summing for én,, on the assumption that én, was constant, and then summing for 6n,. This involved some new results which will be useful in other problems and are recorded here. For constant 7%, + dn, : Ole \ Nae n°, = 9 y _ -_ m4 Mean (6n,»)? = (1 + =| =a (Tip — Tgp) + =* 2 OF ee Pp Np / ON»\ Nop Nay . Raph ‘P qp 7p Gp" IP 2 Mean (679, 6nqp) = — (1 +—— ) ee + bn,’, Np Np Ry? dn,\ 7 _ 2n n ne : Mean (SNgp)? = (1 + =)! “qp (7 ity =P (eon “QP + 36) i ED, 8 2 bn,’ D Np / Ny Np Ny Np 8n,\ NgyN n n Neg No! Mean (614, d%qp) = -(1 de =) peel (1 — dn, —2 a + 36n, a) 4b aa én,, Dp Np Np 1p Mean (Ongp ONq'pONq"p) = te {(n, + dn,) (2 — 36n,) + 8n,*} p Now the value of this method was at once obvious, for proceeding to the sum- mations in the moment-coefticients of dm,, for constant 7, + 6, we found that they corresponded with, values to be found for the distribution in a sample n, of constant size. In other words we reached a conclusion, which should have been obvious at first sight, namely that to find the value of Mean (6m,,)$ all that we have to do is to write down the known value n, =7, + 6n, constant and then sum for 6n,. We might have pulled down the scaffolding in this correctional paper and simply started from this result, but as several of the means reached in processes (1) and (11) seemed likely to be of value, we have preferred to indicate the steps which led us to the final method. * A simple reduction formula for the moments of a binomial about its mean was sought in vain. After a good deal of energy had been spent on the problem, we believe that u, being the sth moment about the mean Ms= E (qe?* + pe)" | is, perhaps, the easiest expression for reaching these moment-coefficients by successive differentiation. 2=0 EDITORIAL oT (ii) For a sample of constant size n, the following are the moment-coefticients of the variation in the mean* : 2 pe _ o'n (Buy!) = 2H =, Np Np 1 1 F : . Now let 8 (=) equal the sum of —. for all values of p which may occur in the n, n,' p p samples of V. If therefore f,,, is the frequency with which n, occurs, the whole problem reduces to finding the values of S (frp! Mn*)s for various values of s. : Now the frequencies of the n, are simply the terms of the binomial. The term in which »,=0 must not we think be taken into consideration, for in this case there is no variability in yw,’ as there is no frequency in the array, Le. ,“. must be put zero. Thus in the notation of a binomial (p+ q)" we require to find: mo dyn lpg | na) i — 2) pr ge r al ms i Tons ge bee seen (F) and to divide the result by (p+q)"—p"=1—p”. This finite series we have not succeeded in summing. Before indicating how we may approximate to it by the mean-powers of 6n,, we can look at the problem from two other standpoints. G) If n,/N=q be not small the binomial approximates to a Gaussian of standard-deviation squared o? = npg =1,(1—7,/N). Hence +a s( 1 ) 2p re! : b 4 Cad —— = SENS é Ra VQaraJd —0 (Np + 2) ieee 2 La? = ae ie (1-32 4 SUH e308 dia: Vario -x Np LED ny ee ee rae ee Sia ras) 9 ) al TT.2 aig 1.2.3.4 wa) Se oe aay aI es eae é 1 ' | a eee ieee 2 (eae Thus S a. = le Ae w)t i, a ; ee AK 1 1) é i ) ¥{—) ==, 41 i 15 (—=— = Sees eee G Sie) Tip? +3(— w/t My Ov bs \ i Co iy cect. ie ery oa rs ae pee ee a : a) ieee x)* Gs x) + j * Here ,u, is the sth moment-coefficient of the 7, array about its mean in the sampled population. rd Peccavimus ! (i) Another method is to assume a Pearson Type III curve: y= ya*e7%, which is known to give a better approximation than the Gaussian to the binomial. We assume it to start at the beginning of the first subrange of the binomial and to have the same mean and standard-deviation. These conditions involve a+1_— ai ( se ¥ p> WN yy ; ffl L ee ee OP gael Accordingly S leak Yur *e BI geree CieaL) where A = total frequency = aa D(a+1). Hence ‘ie es eee i 8 Ge) a it, 1 (= -x) A Np VN. s(5)-—" ils fs Sl ne) a(a—1) Fig? a iy ; 2(] 2 a , fin, IN) ale SUNT NG (=)- aes ee 1 ne) a(a—1)(a—2) ii? 1-(5- xt Wi eC - 5) 1-3(2 a , n N/} fp. eal Tine Ne which are exact. Or, approximating Gat +G-w) +m) * nD ee SG) “al Ga If, a es I~ 1 es | eee , ey aaa G x) : Or V4 Tye eel: Wea Pan il : Both methods agree to the terms in (— a , but the Gaussian appears to exagge- ) P : 1 LN: rate in the terms in (— —-—;}. iy . av, (iui) We will now proceed to approximate on the basis of the moment- coefficients of the binomial. We have 1 ae! (1 3 Ory) , 88 + 1) S(Onp*) _ eae G2) On ). U Ss a = (ip + Sny)® — Ny’ ite de es 2.3 Tas : u cere Here S(én,)=0, and we will keep terms up to the order = Asis which involves p proceeding to the fourth moment-coefticient. We find EDITORIAL Ds g 1 a9 ie eee =H \e ee = 1 - (Ry +Snp»)® ip" 12> Ag, © av 1) eS} n, NV. Gorn p toes — eo) tw) Ge) al _ s(st1)(s+2)(s+3)(s+4), NA at ae 2345 Oe oy ( 7 ee BNE ee ee) IN Se 1.2.3.4.5.6 15(5 - y) ete. | as far as terms of cubic order in the curled brackets. Hence we find Qe iae-DOrhed | s(.- HGH) 3—} | S(,)= ae “Ga (8+ 5+3p) 7 a z ny 11 Ee: 0(—- 4) + | GAY ueB)ata(t Aye ] 1 il 1 OS 1 1 30 1 Ne le at) (7+3))+205 (2-4) +..}. | E : 1 i : It will be seen that these values agree to the first term in (= = 7) with those Tipe given by either hypothesis (i) or (11). For the terms in (= —- a) they appear to p be intermediate between (1) and (11). os additional terms which do not oceur in either (i) or (11) are those in powers of 7 in the second- and third-order terms, Using these results we deduce : pM, = mean (dm,/P wha (22) (ada dea(t 1) (048) e6(2- 2) a, f+ (- w)+y+m)+ 2a a) (+9 le iy Thus the probable error of the mean of an array of mean size 7, in a sample of N is: on a(=- ¥)( 11 5 (5--y) 22 (= —a) oragg Fe hh 45 (5. N Let al enle N (Oar) a = ae ip 274 Peccavimus ! Again _ pls = — 5) 4 >a) (4 Pt 40 (de er ol ae y)3+yt mt — x 11 +57) +50 ue poe hE ee (K), and if t= i, re M;* _ »B {1+6(84+ 3+ yy.) +e (11+ 5p) i soe pw ell ofl ON wae) P pe Ty i+ e(l4+ qty 5) + o(2+ x) tot which after reduction gives _pPifya fg. 5 wm) (=~ xt 56\/1 1) (|--+) i + (3+ ay + ye a, + (13+ 5) - wy) +69 A ie Pp Clearly if ,@; were as large as 0°5, ,B, for an array of 30 would be of the order ‘02 and thus the array means have an approximately symmetrical distribution. We now turn to the value of ,B,, and find with the same value of as before sy — Byte? 10 15 , 150 pM, — 3,Me =* ae {1+ (6+ 57+ + yn) St (35 yr) 8+ 2256} i [+ 5+ w+ (645 v)¢ + Mer my }y -(24+ 54 qp)o-(14 q) ye 4g are WW N lien the previous result by this we have yBo— 3 { 8 1153 Il ae aie? ae ey pB,— 8 BS 14 (44 Ht w)(s, x) * (2+ 9) G,-y) Pp Further, (,M/.)~ 1 1 2 3 5 1 1 il 1\? oe Ga nity +a (14 Saal ule =a! a Se For example, in an array of 25 in a sample of 1000, if ,8 were as high as 3°8, we should have ,B, slightly less than 3:2. Accordingly the constant ,B, of an array is not as approximately normal as ,B,, or, we have the material thrown out further towards the tails than in the normal distribution. It is probably, however, adequate to speak of the means of an array of variable size as roughly following a Gaussian curve and give the usual meaning to the “probable error” of the mean of such an array. Its value however is more accurately —, than 67449c%)/V iy — 1. 67449 Ve > P \ Np 1 + WV EDITORIAL 275 od We may adopt a similar process to find the standard-deviation of the second moment of an array ina sample. For an array of constant size n, we have Mean (6,2)? = a ~ (1 ==) \(1 Sees (1 =) ahes 7 where ,m, and ,. refer to the values in the sampled population, Thus a ae 1 Mean (8pp1)? = 2% @ 7 —) {0 = =| (82-3) + 24 p Np p ie and, summing for all values of p, 1 1 2 ——e ped 2 3 a | eae Ope = ple | is ee 5 . Accordingly we require to find S (— ) -S (<=) and S (=) —S (5) . Writing 49} p ‘ie ae 1 1 1 € as before = — — =, we have — fp LY n 2) + (p82 — 3) {s (2 —28 & + § (=) Pp 1 : ‘ ae WV and after some reductions Dp ’ : y : = 4 f | 2 3 | _ 9 5 | es nied aed aya _ os 2 Yo jf), “Sues _@ ae a ,) 4 Dt Accordingly if ,4 be the second moment of the n, array in a sample of NV, Pont [fy 2 (24 8) (21) (4 8) (11g (22 He a. [ft N Gea lee N. a) Tees i =~ 3) | Ny +49 {0B (14 B+ BYE-B)- (Oe) E-B ; ; Dalle ae It is usually given the value lnc , and further the assumption is made that (66 n,)° n Pp may be neglected in 2C ny OT np “F (Son) = 6,Mo, So that we obtain the value —ONp vv OP fee ac CR ie eee aero ee (O). i ~ i, Now whatever may be said for this result the method by which it is reached is distinctly defective and this not merely because it assumes normality. We have in fact for any distribution of size M o=V tty. Now let us measure o from the mean value @ of the sampled population and ps 276 Peccavimus ! from the mean value ji. of zw. in the samples. Then fi, will not be f@, but be equal to (1 - a) fo. - Accordingly M : ae G¢tode0= Fy 1 = r) o+00 = ( = H) be + Of. 1 1 5 or expanding 80 =G4— IM (1 +a + apt sam) 1 dp, 1 3 5 eis (1+ oy tam rem) 1 /op5\? 3 15 33 i) (1 z out am) 5 ly x 5 5 zy) Ray peal = oe py g, mace nia.: Py: +76 (E) (1+ a) -128 ) + es Now we need first the mean value of 8¢ and for this purpose require the mean 2 : ANS powers of Spy These will be about the mean value of 4, n samples, 1.€. (1 — ii) ps and are*, if we use curved brackets to represent means : (22) = {ey ee — 57) Be- az ( _i Fy) (GE) =9 IGE) = La - ) Bae i) (1 an) ea = |, = 98,=68,42)— (99, 218, — 188, 226 = ~ WP (B14 — 3R2 Myre CPs Bo Iiseaee ) +75 @B— 338. — 228, + 54) + ea {(%2) | = ps [8 B= 1+ gy Bi 4B. — 15 248, + 488, + 968, ~ 30) yeas = = oo _ a — qe Be — 408, — 548." — 968, + 3368, + 5288, — 306) + + 5d (OQ) where as usual Bor = Por+s/ fs" *, and Borts = Hoyts X Ps) ps and have reference to the sampled population. Substituting in (P) we have Mean o =a + {60} =a (1 — do) say . oe ie ee Sh ae ae =o E — gay (Bo + 8) + zag Gps (BBs — LSB? + 148. — 488, 55)| FR * The value for \(2)} is well-known, the two later values have been recently given by Professor 7) Tchouproff, Biometrika, Vol. xu, p. 194. EpIvrortau DATE In the special case of a normal distribution this reduces to 3 (oil Ne rentteesery | eee et ee. 5 aren aaunages 3). cg =| 4M 32. | -) This agrees with the value given in Biometrika, Vol. x, p. 526, Equation (xv), which is now generalised in (R) for a sample from material following any fre- quency distribution given by BB ep eatelyere We must now adapt the an: (R) for the array n, of a sample of size V. We 1 al need only to replace Mu and Fa by S (;,) and S (aa) of our p. 273 and retain up to Up terms in 1/n,. We find 1f2+8 1 i a es : Mean Onp = ony | ea 8 Tp @ i Hm) Aine 1 28 7 = (SB, ae 15Be — 26, -— 488, — 103) ho ee Cl). This becomes in the case of normal frequency 3 31 . Ny = On Se rsa ial onal |g aia eteieraueis.cieieverestys T os Onpy = ONp E diy 32h, wee eeee ( T yi We can now find o, from (P). Subtracting the mean value {dc} = — GA, from 6c me 1 1 we have, if A, =Av— am * 3M ice oe lnops il 3 Bo — {80} = 7 !ro +5 ils (+55 +san) Ie / Opts \? 3 ~a Ga) (+H) 1 /Sp\3 5 us) eal ) = 55 iis ot Rance (U) Hence squaring and taking means we find ss ae tl Epa (Loy “1 1 ae 1 Weg? | he 2 = Ne te dat. aes? ae 4 | 4* sen es b+) Ie ous 2 Spe -aiGe)} +a) aCe) + | Ep eat 48,—7B.2+108,— 248, — 23 =a “4m 7 32m! By - Bot B.- Bi - 3) eee Tee ee rice em (V); oe ge BN Be=1f,_ 1 4B,—7By + 108, — 248, - 28 = — Gh eee (W). For a normal distribution this becomes* o il Fear (1 — ai)= G i 5 MCE , cy? and ps, wy on p. 526 carried to a higher approximation by the introduction of the additional term in the Stirling’s Theorem expression for the factorial. Miss Pairman has carried this out and finds = 3 1 3 Se [ig ee ee = ee @ on Ba Tae one oe which leads, having regard to (x1) on p. 525, to gp been, “Os i agar she = os? Mote >? = on (1 — Ay = =) se ee eens er rege (II). We must now turn to the equations on pp. 527—8 to determine yw; and py. We find* : 7 oe 142-8 0+2\(0 3. 7 9 ) aH dn? ( 2n) — 4n? 2n 4n 32n? 128 3 = in (1 a; then adel f a 33/8 —> (a) d pale > yf # (0) da, b and therefore hoaee = | $ (x) da, ‘ N e+% since (# — #)/e 1s always greater than unity. b But + (a) da is the chance of an individual occurring with a deviation etz greater than e from the mean=1—P where P is the chance of an individual occurring with a deviation less than e. Hence PS ies e Now let «=o, where ¢ = Vp, is the standard deviation of the distribution. bo oe on Kart PEARSON Thus the chance of a deviation being of less magnitude than Xo is Dh aan ee (i). This special case is Tchebycheff’s Theorem *. Inequality (i) gives our first generalisation for a single variate of Tchebycheff’s Theorem in (ii)+. We can now compare the accuracy of (1) and (11) by supposing them applied to a normal distribution of frequency for the cases of deviations 1, 2, 3 and 4 times the standard deviation. In this case flog = (28 — 1) (28 — 3)... Lee’. TABLE I. 2 cae ae 1. Values of Lower Limit for P given by 1 — CeCe : rs : | ee s K=15 | A=2 | N38} A=4 1 ‘5556 = 7500-8889 ‘9375 2 4074 8125 = _-9630 ‘9883 3 — 3169 ‘7656 = “9794 9963 4 ee = 9840 - ‘9984 | 5 — = 9840 ‘9991 6 IF est cs as | +9804 "9994 i = = | ae “99950 8 - ose xe ‘99953 9 — = _ 99950 10 = pe ee = “99940 ' Actual val 3 : D eo lr p> | 8664 ‘9545 9970 “99994 Clearly the maximum for any X will be found by making (2s—1)/A? equal to unity, or if A? =an odd number, s = $ (A? + 1) and $ (A? +1) —1 will give equal limits. If 2 be an even number then s = $2? will give the highest limit. * It was first proved in the Recueil des sciences mathématiques, T. 11, according to Liouville, but I cannot trace this reference at all. It was translated from Russian into French in Liouyille’s Journal de mathématiques, Vol. x11, pp. 177—184, Paris, 1867. The proof there given is somewhat lengthy and at first sight the result might appear more general than (ii); but this is not so. Assume w=u+vu+w+... and suppose u, v, w uncorrelated, so that o,2=0,2+0¢,7+0¢,?+... then we have with minor differences of notation and terminology (especially the use of the words ‘‘mathematical expectation” for our moments) Tchebycheff’s own phrasing of his theorem. The remark of Dr Anderson (Biometrika, Vol. x, p. 269) with regard to the neglect of the theory of ‘‘mathematical expectation” by the English statistical school seems based on a misunderstanding of the moment method. + This generalised form of Tchebycheff’s Theorem was given by me in a paper for the Honours degree of the University of London in Statistics, October, 1915. bo DL on On Generalised Tchebychef Theorems (2) Two Variates; Limit to the Frequency within an Elliptic Area round the Mean as Centre. Let the law of frequency be z= (a, y) and let the standard deviations of and y be o,, os, and r be the coefficient of correlation between «and y. Let us take as our ellipse, ray 1 a Pe Tapa (On 2% + 0%) = 23 1 O05 op wv and y being measured as deviations from the mean. Then by giving special values to @,, @2, A. and y? we can get any ellipse we please. Further since the curve is to be an ellipse 1?@,.? < 6;,6.* and we shall take 7, and @.. always positive. Thus x? and all its powers will invariably be positive. Now consider; if V = I[¢ (ay) dady, the integration extending all over space covered by the frequency surface. Divide eee). Ee v\ Tein 2 010% both sides by xo", 4 7 x | f (xy) eae da dy. . Take out all the values for which y is greater than yo, then 11 ffoen(&) a ay)(“}) dady, ye Ne N y when the integral extends over the area for which x is >y%. Hence — I, 1 re > WV [¢ (ay) dady > chance .of an observation falling outside the ellipse yo. Let P be the chance of an observation falling inside this ellipse, then we have at once Now we define 1 ff , aft Dss = [| b (wy) (uw — 2) (y — y)* dady = al d (ay) «sy dady in our case, as the s, s‘th product moment-coefficient about the mean. And it is very convenient to write Qs Ded (Cx On) esos ties Mt ecm (iv) and term qs a reduced product moment-coeftficient. * We shall generally wish to have symmetry of expression between # and y, and in this case we take 029 = 64, =0 say and write 0,27/09=p and we shall have as necessary condition for the ellipse p<1. KArL PEARSON 287 It is clear that by simple expansion of the trinomial expression, we can always find J, in terms of qss’. We have accordingly to study the expansion of 1 . (1 — r°)s (Oa? — 2r Oxy + Oxny*)® 1 U=S M=S—U 6 6 0 s! oe a a mn 1 Uw Qu yu ot s—m—U ke m™m _ gsm 4 2mn-+U , (pee i— 1) es 2 (s—u—m)!m! u! y and if this value be substituted in the integral expression for J, we find 1 CE M=s—U < (1-7) u=0 m=0 I; s! (s - u —m) tm!a! Yos—u—2m, mie \- 1 ve Qu pu (hae Spell 6,.™ The lower values can be equally readily found by the expansion of (% = — 2027 “2 + Ao, r) 1 010% Oo in powers of 7 by aid of the binomial. The first few cases are 1 i= a= 7) {0x1 Yoo = 20:57 Qu aF ERA 1 l= a2 ye {O° qQa0 ar CEE ar 2011 O20 Goa Fan) 46,7 igs oF 2093) 1F 4.27 doo}, 1 i = (dl —r) ag Yoo + 2s? Foe + 30, os (O11 G42 ar 92291) — 667 (A. dor + 0,, dis + 26,1 820433) ar 126,27° CAC ar Oo os) aa 86,77" Oss}, 1 : Aue De 1p) = ary ein Ge =F O04 Ges an 48); O20 (8117 Goo + Ox Oca) ste O01? Ose Ou — 80,r (O°9n + 34 Ox CAVES ata O20 os) + O37) + 246,27? (O17 den te CL SOP: + 20;, Gus) = B20, 8 72 (ides + Cs Oas) te UG rt Qat a ae (vi). These expressions simplify for various cases, but it is clear that for the general case of unknown type of distribution we shall have to find very high product moments from the observations in order to use our generalised Tchebycheff’s Theorem. Otherwise we shall have to make assumptions as to the relations between high order and low order q’s. Since generally qo = m= 1 and q,=7, we have ii ie = T= yr? (0, ap 6:5 — 20,07). This suggests that for all cases we are likely to get simplified results, if we take 91, = Ax = O12 = 1 when we find J,=2. In other words, simplification arises if we make 288 On Generalised Tchebycheff Theorems our ellipse that of the normal contours, although of course for the general case this will not be a contour of equal probability, although it may roughly approximate to it. -Thus we find for this case, IE J iit i aap 2)2 {40 + Gos + 2420 — Ar (Qn + Gis) + 47? Ooo, L I= a-ry {Goo + Yoo + 3. (Gaz + Qos) — GF (Gar + Gis + 2435) + 127? (Gus + Gos) — 877 Gas}, L,= — en + dos + 4 (oz + Yes) + 6du1 aI, (Qn + iz + 353 + 3455) + 24? (Yoo ae 26 ar 244) — 327° (Gos sr Ya) ar 167‘ qui} orate rotat elas (vil), and the general value of J, will be i U=S M=S—-—U s! ie ET arhy py eee eee |? ° (1 =P) 420 ano ) (s—u—m)! ml yl etm ete For the case of a normal distribution the q’s are all given in terms of r (Biometrika, Vol. x11, p. 87) and on substitution we find L207 3 oie ee ee ee generally J, = 2s (2s — 2) (2s — 4) ....2, which can be shown directly, thus: = ; 1 ibe be cae i = [| @ nal WY deedy One Be = le ey 2X" Vey ely SASS Leary, if we integrate by parts, =2s(2s—2)(2s—4)...2x[- e7 Mydy = 05 (082) Os ayer Accordingly our generalised Tchebycheff’s limit becomes Qs (2s — 2) (2s — 4) ... 2 x and our best value of s will be determinable from 2s < x,’, or s must be the greatest integer less than or the integer equal to },°. IP Ss = Now the actual volume of the frequency surface inside the contour 2 —.(4- ray +%) a NORE. Copy (ony. 0 OF So | . ~4y,2 Rares : is known to be 1—e ***, and it is thus easy to test the present generalised Tchebycheff limit as applied to this case. * This result is almost at once extensible to any number of variates following the normal distri- bution, but as the actual value of the probability is known there is no value in writing down this limiting value. KARL PEARSON 289 TABLE ILI. Generalised Tchebycheff Limit applied to the Probability that an association of two variables lies inside a given contour x,? of a normal frequency surface. | 2 Actual Minimum value Xo Probability of P 4 "8647 ‘5000 (LX, 5 “9179 “6800 (12) 6 “9502 7778 (Jb) vi 9698 ‘8600 (J;) 8 ‘9817 9062 (I, 9 “9889 9415 = (14) 10 9933 9616 (1, 12 ‘9975 ‘9846 (J;) 14 “9991 9939 (I) 16 ‘9997 9976 (1, 18 “9999 ‘9991 (Js) 20 “99995 99964 (Ly) | Here as in the case of a single variate the generalised Tchebycheff limit is not very useful for low values of y,”. But if in any particular type of observation we consider it desirable to look with suspicion on an observation which has occurred and yet the odds against which are greater than 50 to 1, the Tchebycheff limit may be of value. As illustration, suppose two variates are correlated with intensity “7, what suspicion should we cast on an observation which gave the deviation of one variate 3°8 times its standard deviation and of the other 3:2 times? Here ; 1 “we 2raey Xo = 1-7 ( i at e) Ore erCRy ony il = 5, (88) - 1-4 (88) (8-2) + (82)| = 15°01, or say 15. 2° (7!) = "9962 1B7 > 9962, or the odds are greater than 250 to 1 against it. Actually the probability of the occurrence of anything as unusual as or more unusual than this is ‘9994, or the actual odds 1700 to 1 about. For many purposes the odds of 250 to 1 would amply suffice to mark suspicion, although of course in the case of normal fre- quency it would be as easy or even easier to calculate the real probability as the generalised. Tchebycheff limit. Then P>i1 The chief interest of the investigation thus far is to show that unless we use an I, of a high order the Tchebycheff limit is unlikely to be of very much service. We can obtain it in the case of material following a normal distribution, but then we know the exact result and do not need it! 290 On Generalised Tchebycheff Theorems I have considered very carefully the possibilities of deducing higher q’s from lower q’s for non-normal systems on various hypotheses as to the nature of the regression and the scedasticity. The simplest hypothesis is to suppose linearity of regression, homoscedasticity and homocliticity of both sets of arrays. es Bos = 8 («*)/o®, and Bq./VB, = 8 (a**) [oe as usual; let a single dash mark the #’s for the y variate, and double dashes the 8's for the y arrays of «’s and triple dashes the f’s for the w arrays of y’s. Then if J be the mean of the «w-array of y’s, se Le aL = ; ; VB; = we (y/o = 7 2S (Ge + y/o, where 7’ is measured from the mean of the array, S is the sum for all members of ene er Ble ih the array and = the sum for all arrays. Thus if 7,= —* be the regression line, O7 is ys 3 S4/i, ) S , py y2 Vile = fr = (nett) 5 342 (n;27) S(y) 3; (ngw) S Cy DS 2s), O71 Oy Nye O7 Ny Oo" Nye : y’? a since S (9) S (ye) is to be the same for every array. Thus Ny Ny VBy = NB, + NB" (—7), Ae VB = VB =v Bs Mee en (ix), > (1 —7"*)” Similarly VB’ = Vae i vB Wee Perr see It i, (x). Gueeh Thus it is impossible in homoclitic systems for the skewness of the arrays to be equal to the skewness of the marginal totals if there be correlation *. Again we have Ras Bi = 1S (lor -75(= vty’) Jos =, + 6r? (1 — 7?) + 8.” (1-7? (ie eye We Uh ir) or ig” = qd as ry ES Leet Br : or, again, ps —3= iva ABs 8) ee ee (x1), (bay BNO UL pif AES 2 and similarly, B.-3= Bs 4 =e Ee aN, (cm); * We note that if the marginal totals be both without skewness, all the arrays will also be symme- trical. Equations (xi) and (xii) show us that if the marginal totals be mesokurtic the arrays will also be mesokurtic. Karu PEARSON 291 Now consider qx», z 1 2942 yay 1 pags n 2g,2 qa =p S(@y (ates) =H RSe (2 & y) | ovo: =Rr +(1- i = Bir + (1-2) by symmetry. Hence it follows that in linear homoscedastic systems 8, = 8.’, and accordingly By" —3= 8" — 3 = (Bo— 3) This is of interest as indicating that in linear homoscedastic systems the one with mesokurtic margins is the only one in which the kurtosis of the arrays can be the same as that of the margins. Again = q33= . (a4) /(oFo2) = 5 > Sx? G 2 a+ y) Joos na. 4 tees (2) $3 (ot) 962) Ny Oo; = Bi ar 3r el ae r?) ist + VB, V By Pie rB, sevecer cere rer eresrsseevecee (xv) =r Bet or(l = 7) By +N BY N By — 18 Byiis.s40seiensceeterss (xvi) by symmetry. It follows from (xv) and (xvi) that it is needful for ; i Big, UB aos oka ties acae Ts oe na sitio Saziaies (xvii). Finally we have 1 qa = We (aty")/(o,407') 1 2 ae 474 = 77 BSc" G 2a +y') /o a; = 1 Bs + 6r° el sal 7) Bs — 4r? Bs + 4B; Vv By/B, ap [shee (1 oa Tye or qa = Bo + 6r? (1 = 7?) rey =a 47? Bs ar 4B; Vv By/B, = (ie SF —6r? (1 —?’) Be ar Bo By = By + 6r? (1 — 7°) By — 47° By + 48, VB,/By — 1B? — 6r? (1 — 7°) Bo’ + Ba’ Bo ee (xvi), which again involves the complicated @-relation figs (Bs a Bs) + 6r? @ ie i) (By a Bs) — 47° (Bs a Bs) +4 (BB: —8,B,) VB. By =0 It is difficult to see how the form of variation of one character can be related by the correlation between that and another character to the form of variation of the second character as (xix) would indicate, If it were we should get into great difficulties in dealing with similar conditions to (xix) for a large number of characters 292 On Generalised Tchebycheff Theorems with different correlations. If as it appears to me (xix) would need to be satisfied independently of 7, then we must have Bs — 68, = Bs — 6B, 38, — 28; = 3B, — 28,’ | 8/8. = BB The second of (xx) by aid of (xvii) leads us to 2 Bs ; 2 8; 96 (1— 5g) =98 (1-5 i): whence #, = f,’, and as 8.=, it follows that 8;=8;, 81=8., Bs=8e, that is to say the total frequencies of the two correlated characters must possess variation practically of the same type. Now I find this is very far from being the case in distributions which differ widely from the normal correlation surface. Thus it follows that the hypothesis of homoscedasticity, linear regression and homocliticity fails for such cases. I therefore modified the linear regression and adopted skew regression, homoscedasticity and homocliticity. I again got relations between the 6's, but of a much higher degree of complexity. These were tested by Mr A. W. Young and myself on the skew correlation surfaces of barometric data, but were found to fail. Direct investigation afterwards showed me that while the regression differed to some extent from linearity, it was the homoscedasticity which was in the first place the erroneous assumption. The arrays were very far from having the same standard ‘deviations. Until therefore some theoretical advance is made in the investigation of skew regression surfaces, especially for those which have linear or nearly linear regression combined with heteroscedasticity, it is unlikely that we shall have any adequate method of determining high product moment-coefficients from low ones. We are accordingly thrown back on direct determination of the high product moment- coefticients, if we wish to determine a Tchebycheff limit. The work of determining I, would involve a whole round of 8th order moment-coefficients and product moment-coefficients. It would then give us a limit of the order ‘95 for 99. Lower order I’s would hardly give values of much importance,-and it may be questioned whether a rough limit of the kind required could not be better obtained by inserting the desired contour on a “scatter diagram” and simply counting the dots which fall outside it, or indeed by taking the best fitting normal surface to the actual distribution. The reader may question whether something better could not be achieved for skew correlation Tchebychetf limits by some contour other than the ellipse. This would undoubtedly be the case, if we knew the forms of the skew- correlation contours, for then we should undoubtedly choose this equi-probable locus for our boundary. But as we have only a knowledge of these empirically—experience shows them to be frequently pear or lemniscate loop shaped—we get little help for our present problem. One other aspect of the matter may be briefly considered. We may find a limit to the probability that an event or individual will lie. within a circle of radius R Kari PEARSON 293 round the origin. This corresponds to Schols’ Problem*. It may be useful to have a Tchebycheff limit for this case, although we have yet to meet the particular instance in practical statistics where it would be of marked advantage. We can best investigate this problem de novo. Let i [fe + y’)’ b (ay) dady. y Then if & be any radius round the one 1,/R={{ (Ee red) play) dedy, the integral being taken to include the whole volume of the probability surface z=¢o(a,y). Now pick out those elements of the integral for which a? + y? is > R?, then 1[R*> {| (= Be) (ay) dovdy, where the integration extends over the above-mentioned elements only, and is therefore >|] p (ay) dxdy, but this integral is 1 — P, where P is the probability that the individual falls within the distance R of the origin. Thus the Tchebycheff limit is given by de P>1- Bee Now clearly we have i, =|| (a? + 4°) 6 (ay) dady s(s—1l) = Pos,3 + SPos—s, 2 + 1.2 Prs—44 + ++. (s—1 <= Oi Gan ot 8a23~2 oe isin 8 ) oso a1 ate Now write R= Vo + o.2, and further take tan 0 = o,/o,. Then Daeg | Ra 3G: Joos Odos,9 + 8 cos? 8 sin? 0 qoso,9 + - 2 —— ay) COs a Gis 7 Osean ~1)(s—-2 ans T a ) cos*—* 7 sin® @qos-6,6 + ni teas For the particular case in which s= 1 cers il : 1 ca 55 (cos? 8 + sin? @) = yee I; For s = 2, Fae ~, (cos! 88, + 2 cos? @ sin? 6q. + sin! 6B,’). * Over de Theorie der Fouten in Ruimte en in het platte Vlak, Verhandlingen der K. Akademie van Wetenschapen, Deel xv, pp. 1—68, Amsterdam, 1875. Translated into French in the Annales de UV Ecole polytechnique de Delft, Tome 11, pp. 123—178. Leide, 1886. t It is conceivable that the solutions given might be serviceable in the case of testing machine guns against a target. 294 On Generalised Tchebycheff Theorems Now a good approximation to g» by (xiii) must be $(@, + By) r?+(1—7°); hence substituting 16 Fo = (Be—3) (cost + r*sin® 8) + (8, — 3) (sin' 8 + 7° cos? 8) + 3—-—4(1 — 1’) cos? @ sin? 6}....... For the special case of normal distribution, if we write «= 4(1—1*) cos? @ sin? @, Ty. 1 , R: = 4 (3 —-K ) tieletsieters Again i 1 6 € 9 sa ao, Osa sig, , a (cos® 0B, + 3 cos? @ sin? @ (cos? Aqy + sin? 6q.,) + sin’ OB, } ...... and for a normal distribution, Further general cases can be at once written down, but it will suffice to give here the leading values of J, for a normal distribution : f = 1 5 1 eee ; Il. , R22? Ri ee Rs 5 (15 — 9e?), = 3 f 945 — 1050K? + 225K oa 2 (05 = 90x? + 9x*), Rw =+( 5 — 10504? + x), I, 6 Re =s, (103895 — 141 75x? + 47254 — 225«°), a=y dH (135,135 — 218,295K? + 99,225«4 — 11,025«°), a A (2,027,025 — 3,783,780«2 + 2,182,950«* — 396,900K° + 11,025«°) ....... The following table gives the maximum Tchebycheff limit for the probability of an individual falling within the circle \ /g,2+ 0,2 for various values of e=4(1—1r) o202/(o2 + o7). (I,) denotes the particular Z from which the maximum limit is found. (J; ”) denotes that the corresponding numerical value is a Tchebycheff limit found from I,, but it is not known whether J, would not give a higher value, J, not having been tabled. The second part of the table provides the values of J, from which the first part has been computed. They may be useful in the determination of the Tchebycheff limits for other values of 2. Values of «?. Karu PEARSON 295 I. Generalised Tchebychef Limit for Schols’ Problem with a Normal Distribution. Radius of circle =X Vo2+o2, «2=4(1—7°) oo2/(o2 + 0,2) h=|}] 1 1:25 15 2-0 | 25 3 35 40 = 2 = = | | a 0-0 | 0 (Ly) | 36 (Z,) | 5556 (1) | 8125 (%) | 9386 (Z,) | 98400 (44=J;) | 996924 (J) | 999528 (Jy) 0-1 | 0 (Ay) | 36 (21) | °5556 (44) | 81875 (Zy) ‘9422 (I,) | 98574 (5) ‘997329 (Is) | “999611 (J,) 0-2 | 0 (4) | 36 (1;) | °5556 (4) | 8275 (Zz) | 9459 (J) | 98740 (Z,) ‘997707 (Lg) | 999685 (J?) 0°3| 0 (4) | 36 (Z,) | 5556 (1) | °83195 (Zp) ‘9496 (I,) 98899 (I.) | 998109 (Z;) , 999749 (1,2) | 0-4] 0 (1) | 36 (4) | 5556 (Z,) | 8375 (Zp) ‘9538 (4) | 99099 (Js) ‘998478 (I;) | 999805 (1,2) 0-5 | 0 (A) | 36 (Z,) | °5556 (Z,) | 84375 (4) | 9592 (Z,) | 99193 (Z,) ‘998806 (Jz) | 999853 (Zs?) 0-6 | 0 (A) | 36 (Z,) | °5556 (Z,) | 8500 (4=Z,)| -9645 (Z,) | 99333 (Jy) 999096 (Ig) ‘999893. (J, 2) 0-7 | 0 (4) | 36 (1) | 5556 (4) | 8641 (Z,) ‘9696 (J,) °99490 (Jg) ‘999380 (Ig) 999927 (Ly?) | 0-8 | 0 (Z) | °36 (4) | 5654 (J) | 8781 (Z,) | 9746 (J,) | 99631 (Zs) 999609 (Ig) | 999954 (Jy?) | 0-9 | 0 (Z,) | 36 (1) | -5852 (4) | 8922 (Z,) ‘9809 (I;) | 99770 (Jr) ‘999788 (Ix) | 999975 (Ly?) | 1:0} 0 (A) | 36 (7) | 6049 (4) | 90625 (7,=J,)| 9879 (Ze) | 99895 (Z;) 999920 (1.2) 999991 (1.2) | { | | I. Values of the functions I, forming the denominator of the Tchebycheff Limit to the probability that an Individual will fall for the case of Normal Bi-variate Frequency within a given circle of radius \Vop+o.2. Ke I, Ip Ts Ty | TI; Ig I; Ig 0-07). 1 3°0 150 105-00 | 945:00 | 10,395-000 135,135-000 2,027 0250000 O01 1 2°9 14"1 | 96°09 | 842°25 9,024°525 114,286°725 1,670,080°7025 0-2 1 2°8 13°22, 87°36 | 744°00 7,747°200 95,356°800 1,354,429°4400 03! 1 | 2:7 | 12°3 | 78°81 | 650-25 | 6561-675 78,279°075 | 1,077,729°5025 0°4 1 2°6 11-4 70°44 | 561°00 | 5,466°600 62,987 °400 837,665°6400 05 1 2°5 10°56 | 62°25 | 476°25 4,460°625 49,415 °625 631,949-°0625 Oo6 1 2°4 9°6 54°24 | 396°00 3,542°400 37,497°600 = 458,317°4400 O7y 1 2°3 87 46°41 | 320°25 2,710°575 27,167:175 314,534°9025 OS a 2°2 7°8 | 38°76 | 249-00 | 1,963-800 18,358 °200 198,392-0400 0-9 1 2°1 6°9 31°29 | 182:25 1,300°725 11,004°525 107,705°9025 1:0 1 2°0 6:0 | 24°00 | 120°00 | 740-000 5,040°000 40,320-0000 The reader may be curious to know whether the Tchebycheff limit gives a better result for Schols’ circles than for the elliptic contours. The actual pro- bability of an individual falling within the circle of radius \ Vo2 + o,2 is given by 2 = a (1 — Kk’ cos 8) ate | 0 1l-K’cosO where x =V1—«? and =4(1 — 1) o0,2/(o;2 + 0,2) as before. I have not succeeded in finding any rapidly converging expansion for this expression *, and have been reduced to evaluating its argument and using aquadrature Thus for \= 2, «2 =°4, I find P = -963,3694. * Unfortunately Schols has not tabled P, but only gives the values of ) for ten values of x’, which occur when P=1/2, i.e. radial values for generalised ‘‘ probable errors.” Soe T dé, formula. 296 On Generalised Tchebycheft Theorems The process is not as long as it might seem. Indeed if we only need four decimal places, it is quite adequate to integrate only through the first quadrant, the second contributes nothing of importance. The value given by the last Tchebycheff limit is P >°8375. This is of the same order of divergence as we found for the elliptic contour, Le. for x. = 7, we had P = ‘9698, with a Tchebycheff hmit P >°8600. Thus the measure of approach does not seem very close in this case until we reach higher values of X. On the whole we must express disappointment at the results of the Tchebycheff process. We had found Tchebycheff’s own limit based only on the second moment of small practical value, although it is to be found occupying a prominent position in many continental works on probability. By extending it to higher moments and product-moments we have reached results which are great improvements on the original Tchebycheff limit, but the method still lacks the degree of approximation (except for probabilities over ‘99, say) which would make the result of real value in practical statistics. It is, however, conceivable that some more ingenious application of Tchebycheff’s idea may lead to a limit more close to the actual value of the probability. Plate I ka, Vol. XII, Parts III and IV iometri B ’ CHARLES BUCKMAN GORING iin 1912 from a skctch by R. Ara CHARLES B. GORING, 1870—1919. “His work won full recognition from those who value scientific research. But it is a strange commentary on the Civil Service, that, when so pressing a problem as prison reform still confronts us, so fine a worker and so human a man should have been given but the (medical) administration of a great prison instead of being called in to deal with a work for which all his gifts supremely fitted him.” The Nation. The late Charles B. Goring, M.D., was a distinguished student of University College, London, and afterwards a Fellow of that College. During his career his studies were far from confined to medicine: he was much interested in literature and philosophy, being awarded the John Stuart Mill Studentship in Philosophy of Mind and Logic in 1893, probably the only occasion on which that studentship has fallen to a medical exhibitioner*. It was not therefore surprising to those who knew something of the remarkable powers of sympathy, the width of interests and the facility of expression which characterised Goring to find that he would write a blue-book, as no blue-book has been written since the time of Matthew Arnold. He would handle facts, but at the same time he would appeal by his imagination and gift of language not only to the sociologist but to every man who is fascinated by the human spirit in all its diverse phases. Goring lived with his criminals, and studied them in and out of prison as the naturalist studies life in the field, and as the humanist studies mankind in its thronged resorts. Ask Goring what a convict’s mind was like and he replied unhesitatingly: Like yours and mine. The same delicate spirit of sympathy that went out to his friends in both the joy and the sorrow of life, drew the criminal to him, and the link often grew so close that the prison medical officer became the father-confessor: the psychology of the criminal mind was laid bare, and thus Goring’s insight into criminality, its source and its motives, grew deeper and more and more coordinated as the years of service increased. Yet he never hesitated to exhibit the same tender sympathy alike to each new sojourner and to each oft returning old prison inmate, while his own nature widened and strengthened under an environment which appears to dull the mentality of so many men in the prison service. Only last Christmas the present writer dis- cussed with him the possibility of a series of essays on the psychology of crime to be based indeed on facts acquired by scientific study, but to exhibit a structure from which the scaffolding should have been stript, and which should convince the beholder of the fitness of its purpose solely by the beauty and truth of its lines. The path to truth is an arduous one, but when we have reached the * Goring was awarded the Weldon Medal and premium by the University of Oxford in 1914 and never will a more fitting award of that medal be made; his work ‘‘ The English Convict ” was undoubtedly the finest contribution to biometry of its quinquennium, Biometrika x11 20 298 Charles B. Goring, 1870—1919 summit we know by the width of our prospect into all neighbouring spheres that we have attained it: “(ui veram habet ideam, simul scit se veram habere ideam, nec de rei veritate potest dubitare.” We may now have to await that work for generations until another prison medical officer arises with Goring’s scientific knowledge, discriminative sympathy and fine power of expression. Battling with a gaol epidemic of influenza, when he should himself have been in bed, Goring fell an easy prey to pneumonia, which a strong will coupled with a spare and delicate frame cannot resist as their combination so often does many of Death’s onsets. Goring died as he himself and his friends would have wished, doing his duty to the last at his post. His work was uncompleted as good men’s work so often must be. He was studying at the time the influence of the war on the nature and frequency of crime—a subject on which much will no doubt be said, but most probably with small scientific basis. How shall we estimate his work, now that he has left us? We pass by the criticisms of men inside and outside the prison service, for they will leave neither in their own productions nor in their criticisms anything that will remain of permanent value to the new science of criminology as Goring outlined it ; those who have had like experience lack either his insight, or his logical mentality, or his power of expression. They were not trained in the same school, nor had they the penetralia mentis, or rather what “the Romans called ingeniwm,’ which through its very innateness carries mankind onward a step, assured, not doubtful or to be retraced. The contest between mediocrity and inspiration is as old as history and the creator, the poet, wins, if not in life, yet thereafter. The world has yet to realise that achievement in every field is the product of trained imagination alone. Truth in science as in art is not the product of mere computation or careful observation, but of these guided by fertility of imagination. The creative mind has the potentiality of poet, artist and scientist within its grasp, and Goring’s friends were never very certain in which category to place him. Perhaps the specification was as difficult and would be as unprofitable as it must ever be in the case of the Florentine, the master spirit of this type of mind. To the present writer fell the good fortune to be in close touch with Goring (and his keen co-worker, H. E. Soper) for that long period of two and a half years during which “The English Convict” was in process of creation. He observed Goring in times of difficulty when the intertwined skein would not unravel, and in times of achievement when the tangle loosened as by magic. He realised the quiet persistency with which Goring grappled with the most intricate problems and the gentle satisfaction he exhibited when assimilating and recording a new and striking point. When finally the great manuscript had gone to press, we who had been working alongside him at our own tasks knew one and all that while we were losing a cherished daily intimacy, we had still individually gained a life-long friend. We felt that had the world been rightly organised—which it ever fails to be— a post in our midst would have been found available for Charles Goring, for no Appreciations 299 man was better fitted than he to “study those agencies under social control that may improve or impair the racial qualities of future generations, either physically or mentally”; none we had come across was so well suited to make knowledge reached by scientific research a factor of social progress. He knew how to clothe scientific results in a garb which captivated the mental eye of him who listened to his spoken or read his written words. Goring was intended by nature for a master- craftsman of exposition. His sceptical spirit demanding a rigid foundation for truth was combined with an unlimited enthusiasm that truth when known should be proclaimed to the many. Yet in his own life, “Thrones, powers, dominions blocked the view, with episodes and underlings.” What then is the outcome of Goring’s work? Has he decreased crime or bettered the lot of the criminal? Not directly, the solitary individual can achieve little in this sense; he has moved stones from the path of the outcast, and we can picture many a criminal who would have wished to stand by his graveside. Has he pointed out the lines upon which the state in future should deal with its defaulters? Again not directly, but only indirectly. What then has he achieved ? He has given us a portrait of the criminal as he really exists; he has painted in the nature of his physique, he has indicated his facial and underlying mental traits, his hereditary tendencies and his home associations. And he has made for ever atypical the criminal of current drama and novelistic literature. Here it is that literature owes a deep debt to Goring. It cannot survive without its villains but the individual writer will never be as intimate as Goring was with poisoner, murderer and spy. Yet if that writer approaches with intuition not the masses of statistical data, but the text of Goring’s life-work, even in its recently issued abridgement*, he will learn to see the criminal as Goring saw him, he will learn to know the real man and his attitude to crime. He will learn that Goring was a creator in the literary sense+, and with imagination stirred he will feel the impulse to adopt and adapt that realistic portrait of the criminal as only true art can do. Through literature the world at large will know at last what crime and the criminal really are. Not only will literature profit, but the world which easily grasps truth when depicted by art will understand and gain something of the spirit of the man whose life’s work alas! is embraced within the livid wrappers - of a government publication. “En mands gerning er hans sjael, og sin gerning skal blive ved at leve pa jorden.’—The work of a man is his soul, and on earth his work shall not perish. * «The English Convict” (Abridgement), Wymans & Co., 1915. ; + The present writer has many sins to atone for, but perhaps none he regrets now more than the stringency with which he docked the original MS. of Charles Goring of many of its literary qualities as unsuited to a scientific and government publication. iP. 20—2 300 Charles B. Goring, 1870—1919 APPRECIATIONS OF CHARLES GORING. To the readers of Biometrika the following sympathetic accounts of the personality of Charles Goring will appeal as they do to the Editor, who deeply values the privilege of being allowed to publish these very intimate characterisations. The first is by Mr E. V. Lucas, a college friend of Goring’s; they both belonged to one of those periods of keen intellectual activity which arise occasionally in college life owing partly to the action of waves of external thought, but more often to the presence internally of one or two original minds. Outwardly the period in question was marked by the foundation of the Students’ Union and the meteoric brilliancy of The Privateer—a college journal that one did not grudge purchasing. It was for Goring the moulding time,—the golden days, when there was leisure to think, interpolated between an uncongenial office experience and the wider but none the less toilsome experience of a medical officer on a hospital ship during the South African War. The second appreciation is the oration bravely spoken over his grave by his widow. I have not ventured to leave out a sentence of it. Round the grave were gathered the friends of his creative period, the friends of his youth, the friends of his prison calling, from prison commissioner to warder, and a scattering of humbler friends unknown to most of us, but none the less there out of love to one of the finer spirits of this life. That brilliant June day, with its unique ritual, when we paid the last respects to Charles Goring, will remain in the memory of those present as unique as the nature of the man, who in leaving us reduces still further that little school of trained biometricians, who value humanism as well as science. I. CHARLES B. GORING AS A STUDENT. I have been asked to write a few words about Charles Goring, and I have tried, because I respect the asker; but they will be incomplete because I have i seen Goring of late so little and hardly knew him in maturity at all: as a husband, and a father, and an intellectual force with all his powers at their richest. But of the Charles whom, in the eighteen nineties, we knew, the Charles whom we loved, my impressions are fresh and will always be. His personality provided for that. I say “whom we loved,” but I think we did more than love. I think that if it were possible, if it were conceivable, that any harm should be coming to him, there is nothing we would not have done to interpose our own inferior bodies between him and it. For he inspired not only affection but protectiveness. We felt that we were his guardians: his—in a very peculiar sense—owners. Not that he lacked any qualities of self-defence. Far from it. His mind was crystal clear, his attitude to life and its problems was fearless; but he had an unworldliness, a childlike radiance, that seemed to demand from his friends a contribution of cotton wool, Let me say again that he did not need this, but we all wanted to A ppreciations 301 provide it. I have said that his attitude to life and its problems was fearless. But it was more than that: it was challenging and ardent. Had there been nothing to probe and inquire into, he would not have been the happy man he was; for he was a born inquirer—inquisitor even—and mistrusted all traditional face-values. Exactly how I came to be admitted to Goring’s circle I never understood then, and cannot now fathom. Because where he and his friends brought to their dis- cussions and disputations knowledge and seriousness, I had nothing but instinct and impatience. But they suffered me, and I was permitted to sit on the outskirts and listen, and now and then to interrupt. What I chiefly remember of those evenings—at all kinds of places—at Highgate, at Hampstead, in rooms near the Museum, on the boat to Margate, on the Broads,—what I chiefly remember is Charles in argument: eager, stimulating, vivid, humorous, always gently reasonable and never losing sight of the main proposition. I suppose he was the honestest and most understandingly tolerant man that ever lived. He never trimmed; he rarely condemned ; and he had no fear. No fact was too stark and naked for hin ; indeed, what he wanted was stark and naked facts. We would all have our say— some of us solid and some of us fluid—and then he would deal with us, with quiet Socratic questionings ; and all the while we would see, burning within his beautiful workmanlike brain, the soft steady flame of that lamp of enthusiasm which was never to be dimmed until a few weeks ago it was all too soon extinguished : enthusiasm for the truth, wherever found. Of what dark passages that lamp was to illumine it is not for me to speak. There are others who have authority. But that no sweeter nature was ever allied to a passion for scientific investigation I feel myself to have the right to affirm. E. V. Lucas. June 17, 1919. II. CHARLES GorING AS HUMANIST. In asking you all to come here today, I have done what seems to me a right thing to do, and a beautiful one: for, with your presence, I have made a circle round my husband’s spirit of those minds and hearts most intimate with his, and most valued by him....You all loved him; he loved everyone of you. With each one of you he had a separate and private friendship....It seems to me that I can do him no greater honour on this day than to give him what you have let me give him by coming here—your undivided thought of him, your clear memory, and the warm and poignant tenderness that I well know possesses each heart here at the very mention of his name—Charles Goring. I must ask you to forgive me if I read from this paper what I have to say, instead of speaking it, in a more natural manner. I should not find any difficulty in speaking to each one of you separately. It seems absurd that, simply because you are all here together, in a number, that I should find it difticult. Yet so it is. 302 Charles B. Goring, 1870—1919 And, therefore, for this reason, and also because on this occasion I can trust neither my memory, nor my self-control, I hope you will forget—won’t even see—this bit of paper between us. I want to say, first, why I am speaking at all. There are two reasons. One is that I want to say something about my husband which may, perhaps, for a few instants, trace an outline of him upon the air, for you as well as for me—which may, for a moment, mark out his features for us, give us a glimmer of himself. That is one reason. The other is that I want to make, at his grave-side, and in the knowledge of death, certain affirmations. I have great difficulty in expressing myself here. I will ask you for your generosity with your tolerance. I ask it the more particularly because I know there is at least one amongst you—and probably there are more than one—who will find my attitude and desire foreign to his own. To this person, who has my respect, affection, gratitude, as I hope he knows, I want to say that, though I understand his inability to speak to us here today about my husband—and, in a way, I love him for that inability—yet I do regret it; and also I do not accept his point of view. My regrets are for the fact that his silence deprives us of a criticism, an appreciation of my husband—of my husband’s scientific mind and work especially— which no one else could give with equal authority, sincerity and eloquence. So there is room enough for regret I think....And then also, as I said, I do not accept my friend’s point of view, though I can salute it for its dignity. His view is, I understand, that reticence, and silence, and solitude best suit the great occasions of human experience—those of grief and loss, particularly. J feel— I more than feel: I believe—the opposite. I believe in Voltaire’s saying: “Le but de homme c’est l’action.” Action means words as well as deeds. I believe that for whatever other purposes we may also possess life, there is a secret Injunction upon us—within us—to express things: to do, to make, to show. And it seems to me—it is more than feeling: it is a sort of moral urging—that when the great emotional experiences come to us, we ought to give them some outward, visible sion: Form: form, in accordance with that law that, as I said just now, seems to me to impose action upon us during our humanity: form that is beautiful. I have felt, then, in the great experience which has just come to me—the greatest I shall ever know—that unless I am to be false to my own instincts, and a coward to my own truth, I must testify by some outer form, and beauty of symbol, to the quality of my husband’s spirit, and the sacredness of his memory, at the hour of the burial of his body. Feeling, and believing this, I realise the disadvantage at which we stand—we who are Freethinkers—when, for our great occasions, we need a ceremonial, dignified, harmonious, simple. There, all the Churches, who have had time to grow old and beautiful, have the advantage of us. Their poets have had time to shape inarticulate cries and struggling aspirations into pathetic and stately ritual. Their artists have had time to bring colour, and line and music to the spaces set aside Appreciations 303 for those who suffer, and those who conquer themselves. We here today—those among us who are Freethinkers—would not give up for these achievements, fine though they are, for our own best: the very essence that makes us Freethinkers. Nevertheless, the Churches, in this way, have the advantage of us. And when a person like myself wants, as I do today, to mark by outer form and beautiful symbol, the great spiritual experience that has come to me, there is no ceremonial—the legacy of the genius of ages—waiting for me: and I am ata loss. So disconcerting might this position have been, that it would have been easy to yield to the temptation that has for the last week assailed me: the temptation to do nothing; to give way to difficulty; to accept despair. All the time that I have felt it urgent within me to do honour, somehow, to my husband, on the day of this burial—all that time I have also felt an unworthy fear of the effort: and I have very often nearly decided to make none, but to have the ashes of his body buried without a sign, and myself alone as witness. I am glad I have cheated neither the memory of my husband, nor my own instincts, by doing that. I am glad that, by your presence, by the mysterious sense of the unity there is upon us in these moments—by the singing of these boys, whose music he loved, by these flowers, by the good fortune of an exquisite day of sunshine and warmth—I am glad that outward forms acknowledge the inward grace: I am glad that the influence of a lovely spirit is abroad in the air above this grave. In speaking of my husband himself, I shall have to choose one quality only of him, I suppose, if Iam to be clear. You will all know of others. And I shall not speak at all of his special intellectual gifts....I think, perhaps, his rarest and most endearing quality was his particular kind of humaneness. I say “ his particular kind of humaneness” because it was not in the least like what is called “ humani- tarianism.” He had no sentimentality. And he was never in the least taken in by humbug. But his humaneness enabled him to know, and to like, the humanity even behind the humbug. There was in him at once a complete lack of prudery and a perfect personal rectitude. Charlie was as incapable of being shocking him- self as he was of being shocked at another person’s shockingness....The fact is that, apart from cruelty, he did not take what is called “evil” very literally. He thought that nearly all people were intensely likeable when you got to know them. So that his charity—of which everyone speaks who knows him—was far less forgiveness than it was sympathy; and his kindness was always loving-kindness. If you will let me, I will tell you one or two things about him that may, perhaps, trace that silvery outline of which I spoke....I think of a certain day, some years ago, when we had a really wonderful walk together. It was one of those fortunate days—those gift-days—when everything turns out successfully ; when the unexpected leaps up; when there is adventure through it all. I won’t give you the whole history of the walk, but only these points—to show you Charlie. We had just come down Villiers Street from the Strand, and were near the Embankment Gardens, when he pulled up suddenly with a look of intense 304 Charles B. Goring, 1870—1919 alarm. He told me that one of his old convicts, discharged from Parkhurst, had , taken to newspaper-selling outside the Embankment Station. “ He talks for hours,” said Charlie desperately: “He has the eyes of a lynx. He spots me amongst thousands. He'll spot me in a minute. He always does. And that'll mean interminable conversation, and half-a-crown. Let’s get into the Gardens while there’s still a. chance.” So we dived for the Gardens, and were just through the gate, when he again pulled up.“ After all,” he said, “I have managed to give him the shp three times lately. It seems rather unfair to cheat the poor old boy again, so soon. Let’s go by the station.” So we went by the station; and he was duly pounced upon, and a lengthy, amicable gossip ensued; and the half-crown passed from one pocket to the other....Now that seems to me very like Charlie: that pang of conscience, that sense of fellowship which made him feel that, by evading him too often, he was, what he called, “cheating the old boy.” After this, we got out on to the Embankment. It was a wonderful day, I re- member, in early autumn. ‘The river was stiffly rippled; the plane trees were brilliant in colour and movement; rapid clouds were passing in the blue sky; bright traffic was flashing and humming by in the broad roadway : it was a delight to swing along the pavement, in the keen air, arm in arm—and quarrelling all the time, as we mostly did! Presently we reached the Temple, and passed upwards through the narrow passages and dark archways, and across the smiling silence of the Courtyards: and then out again, into Fleet Street; and up through Chancery Lane to Holborn; and so to the left, towards Oxford Street. And when we were in New Oxford Street, I suddenly became aware of an astounding apparition on the opposite side of the road. Charlie observed people very acutely when he was in close contact with them, but he didn’t notice things in crowds. He didn’t notice this man; and he continued to arguefy at my side, while I continued to amaze myself at the man. This person seemed to take up the whole street. It was not so much that he was so large, as that he was so blatant. His clothes were the most astounding things in vulgarity and newness that could be conceived. He wore a buttonhole that was an insult. And the way his boots shone, and his hat shone, and his walking-stick shone as he twirled it—the way he simply glared and revolved in glory, as it were, with Oxford Street as a mere margin for him—simply took one’s breath away. I hadn’t time to pull my husband’s arm, and stop his Infinites and Indefinites, before the whole bulk of this being was descending upon us across the road, and clasping Charlie with a fervent hand. I left my poor man stuttering in his grasp ; and went and looked into the Cameo Shop window. It was perfectly clear that he hadn’t the remotest idea who the man was, though he was pretending he knew him! And presently the volubilities broke down, and I heard this: “I don’t believe you know me, Doctor? I am....... ” JT didn’t catch the rest; but Charlie’s voice cleared up in relief: “Oh, of course; of course!” and then proceeded to rapid and friendliest conversation. Appreciations 305 At last, with some terrific laughter, and a perfect blaze of complacency, I heard the man exclaim: “ Well, its all A.1—A.1, that’s what it is. A bit of All Right. Everything’s going swimmingly. We’re off to the Rhine for a few weeks’ holiday. Hope to see you again, Doctor. So long!”—and he was away, with a flourish of his hat, down the street towards Holborn; while my husband came up to me, convulsed with merriment, saying: “ Will you believe it? That's another of them!” He was, in fact, another convict. Also from Parkhurst. A very bad case of fraud, I believe. Quite unpardonable. Still, there he was: free again: at large: enjoying every moment of his regained existence. And, as we watched him disappear towards Holborn in his outrageous radiancy, the spectacle didn’t merely amuse Charlie or stagger him,—it didn’t shock him, and it didn’t sentimentalise him: but it made him rejoice at the thing in the man that could so rejoice in liberty; that could swagger so in the sun; that could be so little of a snob, and so free from the Past, that it could actually come bounding, in ‘camaraderie, to an ofticial of the Prison in which its convict days had been spent! That was all right in him, whatever else might be wrong: and it caught Charlie's atfection: he liked the man. I hope this also may give you a touch of your friend. It is so difficult, with heavy words, to convey an intangible thing. But I hope your own knowledge of him may give you the feeling of his quality in all this. There is one thing more I want to tell you about him. It was last March, and we were in Manchester. We were having rather a rough time of it. We had no servant, and most of the rooms of the house were shut up, to reduce work and fires. The kitchen was our children’s playroom, and it was our dining-room as well. We had breakfast there, one morning, and were, as usual, distinctly late! and my husband had to hurry off immediately after into the Prison. It was a bitter morning: there was a perfect blizzard of snow and rain. Five minutes after he had gone, I heard his latch-key in the door, and he rushed back to the kitchen in a tremor of excitement and pity. He said he had found a woman in the street who was so ill she could hardly move. She was coughing herself to pieces; and he thought she had consumption, or some virulent form of influenza. He had brought her back with him. She was in the hall. And here I have a. confession to make. I must make it, because if I do not, I cannot show you what Charlie was. I was angry with him. I was angry because I was in deadly terror of the children catching influenza. It seemed to me terrible, at the moment, to bring that poor, infected creature into the room where the children were—the one room where there was a fire. Well, it is the look he gave me when I was angry with him that I want to tell you about—a look in which there were not so much reproach and surprise (though these were there) as a kind of lovely guilt: a baffled look: a look pleading for pardon, and saying in desperation: “ Yes, I know, I know. But, in God’s name, what was I to do?” All this was in that look, which was the very essence of Charlie: and, without a word between us, I bundled the children upstairs, and we fetched the poor thing from the hall into the kitchen. 306 Charles B. Goring, 1870—1919 I can see him now, settling her by the fire, bringing her a footstool, taking her poor, dripping shawl off her shoulders, and hanging it up to dry. We thought for a time she was going to die; but she got better in a little while, and sat, un- complaimingly, coughing; and, when she was not coughing, smiling at the fire. He had to tear off to the Prison, as soon as he could leave her, through the frightful storm, promising to bring help for her as soon as he could leave his work. He returned later in the morning, with an ambulance, and an order for a hospital: and again now, I can see him leading her carefully through the hall, hfting her into the carriage, nodding at her affectionately through the doorway, as the carriage drove off; and then coming back to me for a moment, before he returned to his work, with that same muteness, that same look of an angel’s apology in his eyes.... This was Charlie absolutely—this passion of pity for suffering. In his last two days on earth, during the height of his delirium, one memory recurred, and haunted him over and over again: the memory of two little children whose case had been tried at the Assizes, and whose bodies he had had to examine, and had found marked and mutilated by the fiendish cruelty of their parents. These children he could not forget: he mourned and lamented them, seeing them before him in his fever, and calling, and calling upon us to take them, and save them.... If I had not told you this, I could not have shown you all that I meant by my husband’s humaneness: but I do not want the last impression that I, at any rate, leave with you, to be one of sadness. I want it to be one of happiness: because he was really an extraordinarily happy man. He was happy chiefly because of his nature and character, of course ; but, also, he was fortunate. He had got the things he most wanted in hfe. He never had any worldly ambitions at all. He had always wanted three things: first, freedom to live a life of the intellect—of observation, and of criticism ; and this he was able very largely to do, in spite of the fact that he also had to earn our living. And, secondly, he wanted Friendship: and he had Friendship. And, thirdly, he wanted Romantic Love: and he had Romantic Love. The things he wanted and hoped for when he was young, he found, and still wanted when he was middle-aged. And when he died at forty-nine, he took with him enthusiasms as eager as they were when he was twenty-five. Twill say no more except to read you the inscription that I shall be putting over the place where his ashes will lie. For a great many years, I have had in my mind a line of words whose music and meaning I very much liked. I only vaguely knew where it came from. It corre- sponded to the Christian triad: “Faith, Hope, and Charity”; and it ran thus. “ Love, Pity, and Equanimity.” During the last few days, when I was wanting to find something beautiful, and expressive of him, to put in words above my husband's grave, I thought of this line again: and I have found out that 1t comes from a Buddhist Sutta. Iam not very clear what a “Sutta” is? but I think it means a “Gospel.” This particular Sutta, Appreciations 307 from which I have got my line, describes the being who has attained the perfect life—that is to say, the life of self-conquest. The passages from which I have made w my extracts are these : (1) “And he lets his mind pervade one quarter of the world with thoughts of Love, and so the second, and so the third, and so the fourth. And thus the whole wide world, above, below, around, and everywhere, does he continue to pervade with heart. of Love, far-reaching, grown great, and beyond measure.” (2) “Just as a mighty trumpeter makes himself heard—and that without difficulty—in all the four directions; even so, of all things that have shape or life, there is not one that he passes by or leaves aside, but regards them all with mind set free, and deep-felt love, pity and equanimity.” The inscription as I shall put it above the grave will be this: “ Here lie, in Sacredness and Honour, the Ashes of the Body of CHARLES BUCKMAN GORING, Doctor of Medicine, Bachelor of Science, Fellow of University College, London, and Medical Officer in Chief of Strangeways Prison, Manchester. Born January 31st, 1870; Died May 5th, 1919.” And underneath I shall put this: ate: Of all things that have shape or life there is not one that he passes by or leaves aside, but regards them all with mind set free and deep-felt love, pity, and equanimity.” KATIE MACDONALD GORING. ON THE NEST AND EGGS OF THE COMMON TERN (S, FLUVIATILIS). A COOPERATIVE STUDY. W. ROWAN, E. WOLFF, AND THE LATE P. L. SULMAN, Fieldworkers. K. PEARSON, Reporter. EK. ISAACS, E. M. ELDERTON, anp M. TILDESLEY, Tabulators and Computers. (1) Origin of the Material and Method of Measurement. This paper may be looked upon as a continuation of that published in Biometrika, Vol. x. pp. 144—168. It is based upon a census of the eggs made July 3rd—20th, 1914, and contained in Rowan’s Fifth MS. Report on the Faunistics of Blakeney Point, the Field Station under Professor F. W. Oliver's direction on the Norfolk coast. The year was a record year for the common tern, a marked contrast to 1913, the young were abundant as well as the eggs, and many of the birds were still laying. Some peculiar nests were found: (a) one entirely of seaweed, (b) another of large wood shavings, (c) one of selected small pebbles, (d) a very large nest—the largest yet met with. Some of the nests are illustrated in Plate II and will suffice to indicate the con- siderable differences between their make up and environment*. The range of ground colour with extent and distribution of mottling are indicated in Plate III, which should be taken in conjunction with Plate VIII of the earlier paper. There is every reason to believe that the two clutches, each of three eggs, were in both cases due toa single bird ; the seventh egg, from a one-egg clutch, represents a peculiar egg found in the examination of this year’s material. In all 515 clutches were recorded as against 203 in 1913. In that year there were 13 clutches with 3 eggs each ; in 1914 there were 198, and many of those with one or two eggs at the time had also one or two newly hatched chicks, bringing the total up to three. Even the nests with one egg (122 as compared with 119 in 1913) were actually nests with the first egg only of the clutch, for the birds were still laying, while most of the one-egg clutches in. 1913 were either deserted or the egg addled. Plate IV gives some further photographs taken of the Ternery. Fig. a is an attempt to catch the bird alighting in order to indicate the great length of the * The following illustrates a method of nest building, that of nest (d) above. ‘*‘A common tern laid close to the observation tent. At first there was no material whatever. But on the same day a few of the Psamma leaves from the tent were taken and deposited round the egg. The next day another egg was laid and more stuff was added. None of the Psamma had then been broken and the leaves radiated from the centre in all directions. On the second day the first few were broken and tucked neatly in allround. Then a third egg was deposited. More pieces of Psamma were added and the nest then had a very ragged appearance. It took two more days before the nest was completed and tidied up.” A Cooperative Study 309 wings. Fig. b is the only four-egg clutch observed. Plates V and VI show birds sitting, the camera being about 18 inches from the bird. The characters observed were identical with those of 1913, namely : 1. Length (ZL); 2. Breadth (B); 3. Longitudinal Girth (G7); 4. Transverse Girth (4); 5. Tone or Ground Colour; 6. Mottling; 7. Type of Nest. The tone or ground colour was in 1914, however, divided into browns and greens. The scale of browns was that of the Colour Value Scale of Plate VIII of the first paper, and the green values were judged on a similar scale divided into corresponding classes a, b, c, d, e, f, g, 4,1, k. These classes are distinguished by the subscripts 1 and 2 for brown and green values respectively. ‘Two eggs only had to be excluded from these colour value observations; one was blue and the other slatey gray in ground colour. ‘These reduced the total number of eggs avail- able for colour value reduction from 1110 to 1108. The classification of mottling follows Plate IX of the earlier paper. Types of Nest were divided into three categories, ¢;=no hole in the ground and no materials, t,= a hole but no materials. t; = both hole and materials. As only one nest (with a three-egg clutch) occurred in type ¢,, we have grouped ¢, with ¢,, so that the distinction is really of unelaborated and elaborated nests. Of the characters dealt with, the transverse girth (G)) was really taken as a check on the general accuracy of measurements. We should have m7 = Mean Transverse Girth/Mean Breadth or rather a is equal to this ratio multiplied by the factor (Il — ry, Vq,0n + 0,°) where 7g, 18 the correlation of the transverse girth with the breadth and Vg, and Vz equal zt, of the coefficients of variation of transverse girth and breadth re- spectively. This factor was ‘99990 in the previous set of observations and is 1:00006 now. Hence its influence on 7 =G,/B is insensible fur our purposes. We find 7 = 3:2071 against 3°2237 of the earlier series. Thus although the value of m is bettered, we still find the transverse girth is somewhat exaggerated, i.e. a is about 2°/, 1m error when thus deduced. It might at first sight suggest itself that the transverse section of the egg may not be truly circular. Suppose it an ellipse of eccentricity e. Then if we agree that it is equally likely that the breadth of the egg may be measured in any meridian we find ie Ere Ses rien fice Mean Breadth es if e be small. If, however, we put in the values found, Le. G,/B = 32071, we have e4 = 3320, leading to b=°6510a tor the relation between the semi-axes of the ellipse—a quite impossible value. It may be suggested that our chance of taking every breadth is not equal and that we are most likely to take the minimum breadth. In this case we should have G,/B = 1 (1 + te’), 310 On the Nest and Eggs of the Common: Tern and with our numbers e? = ‘0882, leading to b=°9576a—an improbable but not so impossible a relation as the former. It could hardly, however, escape observation, as even slightly distorted eggs are easily recognised. It seems, therefore, probable that the exaggeration of the girth in the transverse sense is due to the difficulty of adjusting the tape to the true maximum transverse section—the temptation being to bring the reading edge of the tape into contact with itself with the scale facing outwards. If we suppose the celluloid scales to be 0°5 mm. thick this would account for the deviation. Probably the longitudinal girth is exaggerated in like manner. Unfortunately it did not apparently seem possible for the fieldworkers to adopt a more elaborate system of classification for the mottling than was used in 1913 and accordingly no further light is obtainable with regard to the difficulties suggested on p. 146 of the first paper. The question of possible pressure on the surface of the egg as it passes through the oviduct influencing the amount of pigment deposited was again investigated by considering the broader egg in each pair from the same clutch (see lc. p. 146). The broader egg in every possible clutch pair has: Greater mottling in 189 cases | More dense ground colour in 223 cases The same _,, Feet A oy | The same eer PO) «5 Less . ay RHEE a ag Less dense hee Thus our 735 pairs confirm the previous result (on about 100 pairs) as far as the mottling is concerned, but not the density of ground colour. There is no dis- tinction in ground colour on the average between eggs of different breadths from the same hen, but the broader egg does appear to have less marked mottling. We shall consider later whether this result for eggs of the same clutch holds for the general population. (2) Change of Type of Egg with Season. We have: TABLE I. | Mean | | Character —-- | Season 1913 Season 1914 | | Length Z ... ae .. | 4144-007 4:21 +004 Breadth B ... vx .. = | 2°98+:004 3°01 + 002 Longitudinal Girth G, ... |) 11:39 +015 11°56 + ‘007 Transverse Girth G‘, | 9:59+-014 9°66 + ‘006 Index 100 B/L_... .. | 72°04+°'136 71:75 +070 Index of Ovality O ee 56°35 +°171 55°81 + 088 [ee : a It is clear from this table that the eggs of 1914 were significantly larger than those of 1913. As the fieldworkers remarked before the eggs were tabled and A Cooperative Study 31 reduced, 1914 was a splendid contrast to 1913; never were so many birds seen and the young were as abundant as the eggs. At first sight it seemed strange to find such a flourishing colony after the comparative failure of the previous year, but in the summer of 1914 the channel was phosphorescent at night with Plankton, and probably as a result of this the channel was also swarming with myriads of “ Whitebait,” which in their turn attracted the Terns. The suggestion is thus thrown out that a plentiful food supply increases the size of the eggs. It must, however, be borne in mind that possibly only the stronger and bigger birds survived the previous bad season. There may have been fewer very young or very old birds and thus the eggs larger. We inay now consider the variabilities of the two years. TABLE IL. | Standard Deviation Coefficient of Variation | Character | 1913 1914 1913 | 1914 | Length Z ee. cis ‘180 +035 “185 + :003 4°344°12 4:39 + 006 | Breadth B Rae nee 099 + ‘010 099 + ‘001 3°33 +09 328+°005 | | Longitudinal Girth G7... 376+ 010 *350 + ‘005 3°30 + 09 3°08 + 005 | | Transverse Girth Gi, ... °347 + 010 300 + 004 3°62 +°10 3°10 +005 | Index 100 B/L ... cs 3°449 + 096 3°479 + 050 [479+:13]* | [484+ -069]* | Index of OvalityO —... 4334+ °121 4°326 + ‘062 7°69 + °22| | 7754111] | The table indicates that the material for 1914 is shghtly less variable than that of 1913 taken as a whole. ‘This is possibly due as we have suggested to the bad season of 1913 reducing the number of very young or very old birds and so the small eggs in 1914. But most of the differences are insignificant except those in the two girths. We anticipate that a good deal of interest from the evolutionary stand- point might be reached by secular observations on the eggs of this tern colony, taken in conjunction with records of the food supply and climate both in the nesting season and after. It would be of interest also to mark certain birds and record if possible their return. (3) Associations of Nest and Hgg Pattern. It is of great interest to discover whether there is any protective action in the colouring and mottling of the egg. _ In an egg which varies in itself so largely as the tern’s this question must be considered not so much in regard to the general nesting habits of the species, but in regard to the nest and environment of each individual bird. The occasional and possibly habitual practice (see our ftn. p- 308) of laying and nest building simultaneously may indeed suggest that the birds adapt the immediate environment and material of the nest to the actual * See remarks, footnote +, p. 147 of previous paper. 312 On the Nest and Eggs of the Common Tern character of their eggs. Ifthe egg in shape, colour-value and mottling be related to the individual nest, it is hardly conceivable that a hen, especially when a young bird, can @ priort appreciate what the type of her egg is likely to be and prepare the corresponding protective nest accordingly. Such an instinct would be con- ceivable in the case of a species with more uniform eggs and building a specific type of nest; it is hard to conceive it possible in the case of such a wide colouring and mottling range as we find in the common tern. The alternative is to suppose a considerable variety of tern gentes, who like the suggested cuckoo gentes select a particular environment for their eggs. Such a suggestion is not without difficulty; it involves mating within the gens, or a transmission of the egg colour- ing mechanism through the female only. To accept the latter is not consonant with our experience that sexual characters of the female are transmitted through the male, ie. the fertility of the mare and the character of a cow’s milk are correlated with the like characteristics in their paternal grandmothers. It is conceivable that the pigmentation may vary to some extent with the immediate food supply. In this case green and brown eggs of the same shape and size within the same clutch might be more readily accounted for than by the hypothesis of two hens of different gentes using the same nest*. It might also admit of the hen having some inkling of the character of her forthcoming eggs, if the nest be made before- hand. Besides this it would free us from any hypothesis as to tern gentes. Thus far we have written as if the protective colouring of eggs was a demon- strated phenomenon. It is highly probable in the case of many species building specific nests in specific environments. Can it be asserted of the common tern ? If not, elaborate and most varied colouring and mottling would appear to be physiological, and originate before they attain protective character. In other words egg patterns have been specially selected for protective purposes, but did not originate in the survival of the better protected. It will be remembered that we have divided our nests into the unelaborated nests, i.e. nests with no material, and with no hole, or merely a hole in the ground, and elaborated nests or nests formed by a hole and with accumulated material. We shall denote these by S and C,1e. simple and complext. We will consider first absolute size as measured by the longitudinal girth, G;. The following table gives the data. The mean of the S-nest eggs is 11°556 as against 11°373 for the total population. The correlation found by the biserial r method was r=+ 0685 + 0322. * Clutch a, figured in Plate III, shows three eggs practically identical in shape and size yet of very different ground colour. Since the size is quite abnormal—being the smallest found in 1914—one can hardly believe that three birds laid three such eggs in one and the same nest! Again in the Psamma nest referred to in the ftn. p. 308, the three eggs were laid on three successive days; two eggs were alike in colour, but the third completely different. + Actually of course every degree of elaboration can occur with a hole and every degree of accumu- lation of material. Thus although we have only two categories these cover practically continuous grades of elaboration and justify the use of biserial r method of determining the association. 313 / ] ve Stud i A Cooperat 6G-E OF GZ. WOIJ S[BULIMap OM} 09 SONTVA PapLODEL OY [[B “O°T “CBZ-E 01 CFZ-E WLOIJ sonTVA oYY [[B SaJOUWP —CFZ.-E 5. ‘stl 2 | ECT | STL | OFT | | | | S[RqO], 4SoN jo eddy, ‘O5q Jo yysuoryT ‘bhy fo ybuaT pun adh, yayy fo uoynjassog "AL WIFVL ‘dnoas sty} Ut [oF 68-01 OF O€-OT Wor son[va [[B snyT, *ATUO seoeyd peuTIoep omy 07 UAyR Sutoq syuaweInsveu ayy “SsuD C6E.OL OF G6Z-OT MOTT SONTRA [[B SeyTUsIS G66-01 + —G69-1T YAH [eurpnyrsuory ‘bby fo ypu ourpnpbuoT pun adhy yay fo uoyojas00p TH ATaVL s[PI0], sa | 21 Biometrika x11 314 On the Nest and Eggs of the Common Tern This relationship is hardly significant, and if significant only of very small in- tensity. It would indicate that the eggs of greater longitudinal girth were on the whole deposited in the more elaborated nests. To investigate the matter more closely we now correlated the length and breadth of the egg with the nature of the nest, obtaining Tables IV and V. Here the mean of the egg lengths in the simple nests is 4177 ems. and for the total population 4°206 cms. while the correlation is given by r=+°0953 + :0321. This is probably just significant although only slightly larger than that for the longitudinal girth. TABLE V. Correlation of Nest Type and Breadth of Egg. Breadth of Egg. Totals 1 3 7! 19! 86! 60 128 | 180 234 | 219} 147 | 51 | t \ | * 2°595— denotes all values from 2°595 to 2-645, i.e. all the recorded values to two decimals from 2°60 to 2°64. Here the mean of the egg breadths for the simple nests is 3:028, while for the total population it is 3013. We have r= — 0952 + 0321, or the broader eggs are on the whole in the less-elaborated nests. Thus far then the rough nests appear associated with a short broad egg, although the correlations are only slight. With a view of analysing this point further we now investigate the correlation of the index with the type of nest. TABLE VI. Correlation of Nest Type and Egg Indea B/L. Values of Index. | | | Laas Walsh lose slates ni SS) 9 wD 9 Xe) 9 wD wD wD wD ites WwW 3/8) 4) 3) 40) 8) woe) et les : 33 | 71| 168 | 229 | 213] 142] 59] 23) 4 |—] 3 | 1 |— a a a | 36 | 79 | 182| 260] 253/175 | 691a7/ 4/0] 3/1 | 0 | | 1 i * 57-95— denotes all values from 57-95 to 59°95, i.e. all recorded values from 58:0 to 59-9, the indices being recorded to one decimal place, A Cooperative Study 315 The mean index for the rougher nest was 72°530 and for the general popu- lation 71°752. We find for the correlation: r= — 1372 + ‘0319, a value greater than in the case of either length or breadth, the less elaborated nests having the rounder egg. We now took C= B? x Las a rough measure of the volume of the egg and found : r = — 0223 + 0322, or r is sensibly zero. In other words there is no relation of volume of egg to the type of the nest. Since we might suppose the younger bird to lay smaller eggs, or at any rate less broad eggs, the solution of the simple nests being due to young birds finds no confirmation in our analysis; it is the shape of the egg rather than its size which is associated with its euvironment. In order to test this further the lower portion of the axis Z—3B and the Second Index of Ovality*, 100 (LZ — 48B)/B, were correlated with the type of nest. They gave respectively : r= +°'1233 + 0319, and r=+ 1492 + 0318. In other words the greater the extension below the hemisphere and the greater the ovality the more likely the nest to be elaborated. Thus we see that the rotund egg is more characteristic of the careless nest. It is conceivable that the rounder the egg the less likely it is to catch the eye when laid amid small pebbles and shingle. We next turn to investigate the association of colour and mottling with type of nest. First we inquire as to the simple relation of green and brown to the nest. Here we cannot go further than a fourfold table: * The relative advantage of O2=100 (I~ 4B)/B and 0,=100B/(L - 4B) consists solely in the ovaloid character of the egg increasing as QO» increases, while it decreases as QO, increases, Hither may really be used indifferently if this be borne in mind. 21—2 316 On the Nest and Eggs of the Common Tern TABLE VII. Type of Nest and Colour. Type of Nest. T otals | C 5 | Brown 439 ZS | Green. 669* Ss | SSS | 1108 | Totals - One ‘slatey grey’ egg and one ‘blue’ egg had to be omitted from this table. We find for tetrachoric r r= +0745 + 0409. This cannot in itself be considered significant. The sign indicates that green egg-layers make the more elaborate nests. No stress can, however, be laid on the result. We now take mottling and type of nest using the arrangement below as the best order we could devise of decreasing mottling. TABLE VIII. Mottling and Type of Nest. Categories of Mottling. | Type of Nest | e | g | a+b Totals | § 143 | Cc vad 965 | Totals 1108 The method adopted was that of ‘biserial 1’ with class index correction for the mottling categories. We find, the class index correlation being ‘9534, Correlation = + °1141 + 0325, the sign indicating that the finer blotches are associated with the more elaborate nests. * The preponderance of green eggs over brown in the ternery at Blakeney Point deserves con- sideration because it has not always been recognised. H. Seebohm, Lggs of British Birds, London, 1896, writes that the eggs ‘‘ vary in ground colour from pale greyish-buff to brownish-buff, occasionally with a tinge of green” (p. 102). F. O. Morris, Natural History of the Nests and Eggs of British Birds, London, 1892, gives a wider range of colours, ‘‘pale blue, pale yellow, green, brown, white or light dull yellowish or stone colour” (Vol. m1. p. 136), which certainly does not emphasise the broad alternative categories brown or green, with a fractional percentage of blue or grey. A Cooperative Study 317 Lastly we turn to the intensity of the ground colour and the type of nest. Here we have worked independently brown and green eggs and the results are given in Tables IX and X. TABLE IX. Type of Nest and Density of Colour. in brown Eggs. Density of Colour. C, | dD, x, | PAG, H,+h4+ Kh, Totals 5 12 8 63 58 83 : 376 439 Again we use the ‘biserial 7’ method and correction for class index corre- lation (9785); we find Correlation = + °2189 + ‘0481. Thus there is significant, if only still very moderate, correlation, the relation- ship being between denser brown ground colour and the simpler nests, i.e. holes in the ground. TABLE X. Type of Nest and Density of Colour in Green Eggs. Density of Colour. Totals | | | | | Using the same method as before (class index correction ‘9860), but with one more category as F, and G, could be separated as their total was more consider- able, we have : Correlation = — ‘2366 + :0407. Thus the dark tones of green are on the whole more frequently associated with the nests to which material is brought. Accordingly in the case of both ground colours, although we cannot definitely assert that either brown or green egg-layers are the more elaborate nest builders, we can assert that the denser brown and lighter greens are somewhat more usual when the nest is a mere hole in the shingle, and that the lighter brown and darker green eggs are associated with more elaborately constructed nests. Again the larger blotches are in somewhat greater proportion to be associated with unelaborated nests and the finer mottling with the elaborate nests. There is no 318 On the Nest and Eggs of the Common Tern reason to believe in any appreciable ditference in volume of the simple nest and the complex nest eggs, but the former differ somewhat in shape from the latter, being broader and shorter, 1.e. the eggs in mere holes are more rotund and in the elaborate nests more ovaloid. Although none of these characters appear to be highly correlated with the type of nest as determined by the simple alternative categories adopted by the field- workers, yet they are of a nature which more or less lend themselves to explanation on the basis of a protective colouring. _ It is not possible to determine whether the great variety of colouring and mottling in the common tern’s egg is a vestige of an elaborate system once developed for protective purposes, and now falling into disuse, or, as a product of physiological causes, it 1s now being slowly adapted to protective purposes. The problem is a very interesting one and further light we think might be thrown on it, if a fuller record were in future to be made of the immediate colouring of the nest,—the colour of the materials out of which it is made, and in the case of holes the colour of the ground, shape and nature and colour of the adjacent pebbles or shingle. It would mean much additional labour, but considerable information bearing on the points discussed above might arise from such data. (4) The Problem of the Mixed Colour Clutches. We propose in this section to discuss the problem of the mixed colour clutches. The following are the data to be analysed : TABLE XI. Colour Composition of Clutches. sae aed Number | Colour Composition ot Bene 1 138 74B+63G+1SG 138 Q- -b--178 -- + 67 BP+19 BG +92 G2 so ee eae 3 204 62 B48 BG 414 BG24+119 G341BL 612 4 I} 0 BA+0 BG +0 BG? +0 BG +1 G4 4 = | eaanerus Totals 521 | 203 B only, 41 composite, 275 G only, 2 anomalous | 1110 | | a = — ———— — Br=n brown, G=m green, SG Sia ee ey, BL=blue eggs*. Putting aside the two anomalous eggs, we have 41 clutches out of 519 wherein brown and green eggs are mixed. Putting aside the clutch with 4 green eggs we see that as a whole there are 443 brown eggs to 659 green eggs, * The blue egg may be accounted for by the oxidisation of a green egg—a phenomenon observed by Newton (Art. ‘ Birds’ Eggs,’ Encycl. Brit.) ; the origin of the oxidisation being unrecognised in this case. Newton also states that the individuals of some few species of birds do not always lay eggs of the same ground colour, but the source indicated by him, i.e. change with age of bird, would not apply to our case. A Cooperative Study 319 but the proportions vary with the size of the clutch ; for we have 74 brown to 63 green eggs in the clutches of 1, 153 brown to 203 green eggs in the clutches of 2, 216 brown to 393 green eggs in the clutches of 3, or 100 to 85, 100 to 133, 100 to 184 brown to green eggs respectively. In other words the proportion of green to brown eggs increases with the size of the clutch. Those readers who will examine Plate VIII in the first memoir* will see how distinct the brown and green ground colours are, and will understand how necessary it is to find some explanation for the change in proportions of colour as the clutch increases in size, and for the mixture of colours in the same nest. The field- workers appear to be confident that the same bird can lay different coloured eggs, basing their statement apparently on diversity of colour appearing in clutches of eggs having the same size or shape. The hypotheses that suggest themselves are : (i) That the common terns consist of two gentes one of which lays brown and the other green eggs. The mixture of colour arises from the existence of ‘ cuckoo’ terns who lay in other hen’s nests. We cannot ascertain the number of brown egg-laying tern ‘cuckoos’ who lay in brown egg nests or of green egg-laying tern ‘cuckoos’ who lay in green egg nests. But if the 19 BG arise from cuckoo-terns, we must originally have had 74 + 63 + 19 single egg nests and in these 156 nests 19 tern ‘cuckoos’ of opposite colour laid. The chance therefore of a tern ‘cuckoo’ of opposite egg colour laying in the 1 egg nests is ‘1218. Treating the 2 egg nest in the same manner, we have 67 +92 +8+14=181 of them and in 22 we have occurring the egg of the tern cuckoo of opposite colour, or the chance is ‘1215; this number is sub- stantially the same as we reached before and the coincidence is remarkable. But it collapses when we go a stage further. We have 62 + 119 whole colour clutches of 3, we should therefore expect 25 clutches of 4 with composite colours, Le. 25/(62 +119 + 25) ='122 nearly. Now only a single 4 clutch nest was found and this had all green eggs. With a chance of about 1 in 8 that a cuckoo-tern will lay in any nest, it is hard to believe that it missed at least 181 nests. It appears that three eggs is the practical limit to the size of the clutch laid by one hen, but it seems hard to believe that the cuckoo-tern would avoid all nests which already had three eggst, ie. the cuckoo-tern hypothesis seems to involve a considerable percentage of composite four egg clutches, which do not appear. This argument seems sufficient to render the hypothesis very improbable. (11) There is only one gens of the common tern which can lay both brown and green eggs. Since, however, the number of green eggs increases with the size of the clutch, it is not possible to consider the chance of laying a brown, * Biometrika, Vol. x. p. 146. + Or that the rightful owner having laid two eggs would refrain from laying the third because the ‘cuckoo’ tern had already laid it. 320 On the Nest and Eggs of the Common Tern respectively a green egg as the same in successive layings. The change of pigment in successive layings may be a physiological exhaustive process as a change from a melanin to a lipochrome. This hypothesis does not assume that any given bird may or may not lay a green or brown egg according to a given law of chance, but that physiologically there is a tendency with successive laying to alter the nature of the pigment in the glands or on the surface of the oviduct. For example the hen, as the incubation period approaches, may change the quantity or character of her food. It is probable, however, that the changes will not be the same for small and large layers, we shall therefore give generality to the problem by supposing the probability of laying a brown egg to vary not only with the number of eggs laid but with each egg. We have then the following system of notation: p,’, ps’, ps”, ... = chance of laying a brown egg in the Ist, 2nd, 3rd,... laying of a hen who lays a clutch of s eggs. The corresponding chances of laying green eggs will be g,; =1—p,, de =1—ps qs’ =1-—p,",..... Let there be WV, s-clutch common tern hens. Then our data are to be provided by the equations: Nip, + Niq! = 744+ 63 (N, = 187), Ni pepe” + No (ps qo’ + G2 po’) + Noge'ge” =67+19+92 (N,=178), Npsps "ps" + Ns (ps"ps'' qs + paps qs" + Ps Ps Gs”) ws + p393"qs" + psqs qs’) 4 N, Gs. Gs 93.0 = 62484144119 (N;= 203). | +N, (ps Gs Ys Dividing out by the totals in each case and equating corresponding terms we have the following system of equations to solve: i Pi ='540,1460, gq,’ ="459,8540 Pe Po = °376,4045, ps qo” + po G2 = °106,7416, gogo” ="516,8539 ......... (ii), Ps Ps ps’ = °305,4187, ps'"ps"Gs + ps ps'Qs’ + pss Gs’ ='039,4089, Ps Qs, Qs + Ps'Gs Qs + Ps’'Gs 9s = '068,9655, 93's qs’ = '586,2069 .. .(iii). (1) is solved as it stands. But it is clearly impossible to take g,’ = q' for this would involve q,” being greater than unity, an impossible value. Similarly qs and q;" cannot be equal to q,’ and q,’ respectively, or we should have gq,” >1. Thus it is needful that the probability of laying a green egg should increase with successive eggs or be a function of the fertility. Assuming this change of probability, we may write the first equations of (ii), the third is not independent : po pa’ =°876,4045, p, (1 — ps”) + px” (1 — po’) = "106,746, which gives us p, +p.” = °859,5506, or p,’, p.” are roots of the quadratic pe —°859,5506 p. + '376,4045 = 0. These roots are ¢maginary. A Cooperative Study 321 Turning now to (iii) we find from the first three equations : pa'ps'ps" = °305,4187, Ds Ds + ps "Ds + Ds Ds — *955,6650, ps + ps + ps” = 1:064,0394. These lead to the cubic for p,, ps — 1:064,0394 p2 + 955,6650 p; — °305,4187 = 0. One root of this cubic is p; =°449,5251, which gives on dividing the factor Pps — °449,5251 out ps; — °614,5143p, + °679,4254 = 0. The roots of this quadratic are both imaginary. Accordingly neither the records for the nests with two eggs nor those for the nests with three eggs are consistent with a single gens the hens of which lay brown eggs with a tendency to lay green increasing with greater fertility. This hypothesis has therefore to be discarded. (ii) As a last hypothesis we will assume that there are two gentes or types of females, one of which lays brown eggs (p,) with a small chance of laying green (q.=1-—~p,), and the other of which lays green eggs (p,) with a slight chance of laying brown (q,=1-p.,). Let N,v;, Ns(1—v,) be the number of brown and green laying hens in the group WV, which lays s eggs in the clutch. We suppose p, and p, to be independent of the fertility of the hen, until this assumption is shown to be inadequate. Clutches of 1 egg. Nyy, p, + Ni (1 — ») @ = number of brown eggs = Mye,, say, Ning +N, (1 — 1) po = number of green eggs = Me,’ say. For our special case : ‘540,1460, ‘459 ,8540. VYypy + ‘al — 1) (2 ng, + (1 —%) pe These equations are not, however, independent and only suftice to determine 1, from Pie, —105)/ (Pp Op). vires AUautans Uevadeserevec cs (iv), or the proportions of brown and green egg layers in clutches of one, when p, and q. have been found. Clutches of 2 eggs. If the distribution of clutches be JV, («,’ + €.’ + €,”) Yop + Ci 7) qe = a; © Von + (1 — 74) Q2P2 = $e, VG +(1 = Vy) Do” = es 322 On the Nest and Eggs of the Common Tern Only two of these equations are independent and it is convenient to write them in the form: Yop, + (1 — w) qQ2 = e+ 262 | © CoE are eee eee V). Mepr + (1 — v2) G2 = & | These will give, if p, and q, are known, one equation for the determination of v, and one equation of condition. Clutches of 3 eggs. If the distribution of clutches be N (e/” + 6” + 6)” + €,/”) we have: VD a (1 cal V3) Qo = ere V3PrQ + (1 — vs) qo*Ps = Gs V3Prqy + (1 — ps) qop2 = he”, vg +(l—»v;)pe =e”. Only three of these equations are independent and these may be written : " V3p)? + (1 — ps) q:? = e vp + (1 — vs) qr = ey” + Fey” eters tae so eae (vi). Usp, +(1—3)q@ = e+ 26” + te, i) These suffice to determine, p,, p, and qo. “mt Uniting the right-hand sides of (vi) fy", fi", fi” respectively we find : Ve= (fn = Gs) CPi Ga)” 2 ornare eee (vil), which suffices to find », when p, and q. are found, and pa =f AeA" —&)) pi = Gis =f; Git. = q)| which lead to the quadratic for q, aA AN AOR ARIE IE8R 0 sec We could therefore solve (ix) and choose the appropriate root for q., find the corresponding p, from (vii1), determine v, from (vil), v. from the first of (v) and », from (iv) We might then use the second equation of (v) as an equation of condition. But clearly this would not be satisfactory as all our quantities are subject to considerable sampling errors. The correct method would be to deter- mine 14, Y%, v;, p, and gq, from the sia equations (iv), (v) and (vi) so as to get the best values of these variables. But this would be a very laborious process. We propose therefore to determine p, and q, by the method of least squares from the three equations P= (hl fl @) Mf — o)* =e —fi Ga Sa) Weer eee eae (x); pe hl =f Ge =a * Obtained by writing f)” =e)" +49”, fo’ =e)" and eliminating v2 between two equations of (v). A Cooperative Study 323 using to obtain linearity q@ = @ +, where g, is the value given by the quadratic (ix) and 7 is supposed a small quantity with negligible square. The values of p, and q found from (x) will be good, if not the best. Our system for 1, v2, Vs, Di, gz Will not be the optimum possibile, but if our system is probable, that will be still more probable and the hypothesis of the two gentes of tern hens will not be contradicted by the data. Our system of e’s is: e/ = '540,1460, ¢’ =°376,4045, «/” =:106,7416, e/” = °305,4187, €,”" = 039,4089, e,'"” = 068,9655, leading to: fi =°429,7753, fy’ ='376,4045, fi” = '354,6798, fo" ="318,5550, f," = 305,4187. (ix) now becomes: 192,7573q.2 — °192,4337q, + 006,8485 = 0, giving the small value q,= ‘036,9571 for the chance of a green gens hen laying a brown egg. We now return to (x) substituting the /’s and ‘036,9571 + 7 for q,. Expanding and neglecting 7? we obtain, on extracting the root of p,° in the third equation p, ='917,7816 + 1:242,3209 n, pi = '961,3638 + 1:909,4759 n, pi = '961,3638 + -991,4412 . Solved by least squares these equations give for type equations : pi ='946,8364 + 1°381,0793 n, p, = '948,2960 + 1:°489,7513 n, leading to: p, = '928,2876, gq, =°071,7124, po = "9764736, qo = '023,5264. Whence from (vii), the first of (v) and (iv), pv, ='366,0119, 1—v,=°633,9881, v, = '449,01238, 1—v,=°550,9877, vy, =571,0011, 1 —v, ='428,9989. Thus about 7°/, of the eggs laid by hens of the brown-laying gens will be green, and only about 2°/, of the eggs laid by hens of the green-laying gens will be brown. Further the green-laying gens is far more fertile than the brown- laying gens, the proportion of brown to green layers falling from 57 to 43 in the single clutches to 37 to 63 in the triple clutches. The following is our analysis on this basis. — 324 On the Nest and Eggs of the Common Tern Single Kggq Nests. Observed B74, G63. Theoretical B74, G 63. Number of brown egg layers 78°23. a green Ms 58°77. Number of brown egg layers with brown eggs 72°62. » ” » with green eggs 5°61. Number of green egg layers with green eggs 57°39. r > » With brown eggs 1°38. Two Egg Nests. B BG @ Observed 67 19 92 Theoretical 68°92 15°15 93°93 Number of brown egg layers 79°92. . . green i 98-08. Number of brown egg layers who lay both eggs brown 68°87. » green » vs » 0°05. Number of brown egg layers who lay one brown, one green 10°64. » green : i » » 451. Number of brown egg layers who lay both eggs green 0-41. * green - 5 Pa 93°52. Three Egg Nests. B BG Be? G3 Observed 62 8 14 19 Theoretical 59°43 13°98 9°72 119°87 Number of brown egg layers 74°30. - green . 128°70. Number of brown egg layers who lay 3 brown eggs 59°43. = green am ‘ 0:00. Number of brown egg layers who lay 2 brown and 1 green 13°77. i green ‘5 5 5 i 0-21. Number of brown egg layers who lay 1 brown and 2 green 1-06. ” green ” ” ”» ” 8°66. Number of brown egg layers who lay 3 green eggs 0°08. o green y x ms 119°84. It will be noted that the theory gives for the B’G and BG? about inverted proportions. It also falls short in the BG group. These would very probably have been bettered with a more general solution of our six equations. But are A Cooperative Study 325 the existing frequencies inconsistent with our observations and beyond the limits of random sampling? Summing up our results we have : Z B G | Be | BG | G2 , Bs | BG | Ba G3 seat ————— rag | Observed | 74 | 63 |67 [19 |92 |e2 | 8 [14 | 119 | Calculated 74 G3_— B92 15°15 | 93-93 59-13 13-98 972 119-87 : lia From these we find y? = 5°631, giving P = ‘688, or in 69 trials out of 100 the sample would be more discordant from the calculated than the actual observations. There is accordingly nothing to be said against the theory on the ground of its statistical improbability. Again of the two hypotheses involved, (1) the greater fertility of the green egg layers, (ii) the fixed small probability that a hen of one gens will lay occasion- ally an egg of the colour of the other gens, the first seems not unreasonable; the second gives merely a quantitative measure of the assumption made by a number of ornithologists that birds can lay eggs of two colours. It assumes, however, that as a rule they do not. Clearly we need to know more of the mechanism of egg coloration before we can settle how it happens that a bird usually staining its ege brown will stain it green on a few occasions. If it be a result of type of food, we have to assume that our two gentes feed as a rule differently, which is not easily to be admitted. Will this feeding habit then be hereditary and if so are the male birds also divided into two gentes and is the mating assortative ? Granted on the other hand that it is not due to food, but to differences of pigmen- tation mechanism, we are compelled to ask whether this mechanism is inherited only through the female. If not, then are the matings within the gens, or what is the pigmentation mechanism of heterozygote hens? If we could establish the existence of the two gentes each with its rule and its fixed exception to rule; if further the pigmentation mechanism as one must decidedly expect from the eggs of many species is markedly hereditary, then it is possible that in these clutches of composite colour lies the solvent of some difficulties which the Mendelian explanation meets with when the product of two protogene zygotes instead of being protogene is in rare cases found to be allogene. (5) The Organic Correlations. We devote this section to a consideration of the degree of relationship between size, shape and colour characters of the same egg, and their relative values in the seasons 1913 and 1914. (i) Mottling and Breadth, Length and Index of Egg. The value of the correlation of mottling and breadth in the 1913 census was ‘1803, but unfortunately the sign of it was possibly wrongly given, as may be seen from the Table p. 150 of the former paper (Biometrika, Vol. x.). We have taken occasion already to refer to the difficulties in the mottling scale used, but after 326 On the Nest and Kags of the Common Tern much consideration we are unable to substantially modify the assumed order of mottling of the previous paper. In broad lines we have: TABLE XII. | Mean Breadth Mottling | 1913 census (291 eggs) 1914 census (1108 eggs) Confluent blotches, d+e+g 2°97 3°03 Transition Forms, a+6 ... 2°97 2°95 Discrete, Copious, ¢ ome | 2°99 3°02 | Discrete, Sparse, 2+/f+7 2°96 3°01 | The value of polyserial 7 corrected for class index correlation of mottling is ‘1753 for the census of 1914. It is therefore certainly within the probable error of the difference. Now in both cases the confluent mottling gives a greater breadth than the discrete and sparse mottling, but the transition forms a + b, and c, are anomalous. The correlation ratio 7 in both cases is significant and shows a relation, not very intense, between mottling and breadth, but in the present stage of the mottling classification it is certainly not possible to unravel the relationship. The 1914 returns undoubtedly seem to indicate that not only the confluently but the discretely mottled eggs have the greater breadths, the lesser breadths being found in the transition forms. It should be noted that the returns for 1914 being nearly four times as numerous are worth twice as much. If we could really lay any stress on the sign to be given to the association, we should have to assert that in the species at large the rule is opposite to that for the individual hen. In her case the broader egg has less mottling, while in the species the broader egg has the greater or at least the more confluent mottling. The former relation overrides any result to be obtained from the species as a whole, and seems to oppose any theory that greater pressure during transition through the oviduct is the source of greater mottling*. We have further worked out the association+ of Index and Length to the Mottling. We have Census Census 1913 1914 Mottling and Breadth 18038 1753 Mottling and Length — 0937 > 7) = 0850 + °02038f. Mottling and Index 1550 1598 Since the probable error is of the order ‘02 we see that the value is Insigni- ficant for length. On the other hand the order of mottling classes in the three * The time of transition through the oviduct may conceivably be a factor of greater importance. + Obtained from polychorie 7 with correction for number of arrays and the class index correction for mottling. + mp is the mean value of the correlation ratio on the assumption of no association. A Cooperative Study 327 cases does not appear interpretable. The following table gives the means for each class of mottling as specified in Plate VIII of the first memoir. TABLE XIII. Mottling and Size and Shape of Egg. Index | Breadth | Length fete) a | gq =] 6b 69°52 2954 4-256 Sao 70°60 2-998 4-255 0) oa 70°68 2947 | .4:170 ww | @ Taal 3025 | 4228 ad) 71:86 3016 | 4-200 o,| f-|- 790 3-001 4-184 >| h 7212 | 3042 4°220 Eg 72°65 | 3032 4184 a 7205. 00) eS OLT 4133 The series for index in ascending order corresponds roughly to a series in ascending order for breadth and descending order for length, but the system does not correspond to any easily appreciated mottling order. It appears as if the fieldworkers might have been influenced by shape of egg, instead of merely comparing the nature of the mottling in selecting type. At any rate in this section no final conclusions can be drawn, and it seems very desirable that more elaborate descriptions of mottling should in future be carried out. (i) Ground Colour and Breadth, Length and Index of Egg. The following scheme gives our results. The first value of 1 is the uncorrected 77, the second the value when corrected for number of arrays and class index corre- lation, which is ‘9785 for brown and °9860 for green eggs. TABLE XIV. Index Breadth Length | rea ioe | Brown Green Brown Green | Brown Green | ——s EE | | ee | ” ‘1747 1733 | "1348 "1385 “2061 1432 | n 1011 1313 imaginaryt ‘0773 1530 ‘O857 ~ | f 1482] f +1160 | 1432 | f ‘1160 | f 1432 | f -1160 No ( £0322 | | +:0261|) | +:°0322 (+0261 | | £°0322 | | +:0261 It will be fairly obvious from this table that there is no association of ground * Mean value of 7 supposing no association. + This signifies that if 7,2 be taken from 7’? the difference is negative, i.e. 7’ is less than the mean value of 7 for zero association. 328 On the Nest and Kggs of the Common Tern colour, whether green or brown, with either size or shape of egg*. This does not appear at all unreasonable if we assume the ground colour to be deposited before the egg enters the oviduct or the shell becomes finally hardened. The general conclusion therefore to be drawn from the present investigation is that intensity of ground colour, whether green or brown, has no relation to egg size and shape, but that breadth of egg, whether considered directly or through the index, is more probably related though not intensely to mottling, but the nature of the relationship must be obscure until a more elaborated classification of mottling has been adopted. Gu) Relation of Mottling to Ground Colour. The data are given in Table G* at the end of this memoir, where we have separated brown from green eggs, because it is conceivable that the relationships, if any, for the two categories might be different. If C, denote the mean contin- gency when there is zero association we have : For the Brown Eggs : C, = °2830 + 0323. Uncorrected Contingency: C, = *2030. For the Green Eggs: Cy="2118 4+ -0261. Uncorrected Contingency: C, = ‘2557. Thus for the brown eggs there is no significance in C;, it being less than the mean value of the contingency, when there is no association. For the green eggs CO, is greater than C, but the difference is less than twice the probable error of Os we cannot therefore assert any real relation to exist between mottling and intensity of ground colour+. Under the circumstances of the above relation of C, to C,, it did not seem necessary to correct C,, as such correction would not alter the con- clusion of no significant association. Although the intensity of ground colour may have no relation to mottling, it is conceivable that the colour of the egg may itself have relation to mottling or indeed to intensity of ground colour, i.e. a brown egg may have deeper tones of ground colour and denser mottling than a green egg. We have the following biserial tables to illustrate these cases. TABLE XV. Mottling and Colour of Egg. Mottling Categories. Colour of Ege | g+d | e a+b | c+h | fi | ¢ Totals Brown 215 57 11 437 Green 300 98 23 669 Totals 515 | 155 1106 * This statement is not really contradicted by the 7="1506 of p. 148 of the previous memoir, for with the small number 291 eggs of that census 4)='1655 +0395 !, so that 7 is less than the value for zero association. + Examined in the same manner the result for 1913 appears not to have the significance we attributed to it. We have C.=-2813+-0395, while the corrected contingency is only C)="2260. Thus C, is actually less than the mean value when there is no contingency. A Cooperative Study 329 The order of mottling categories seems to correspond as closely as we can determine from the plate to the order of relative amount of mottling. We find for the mean n when there is no association : a = 004521. Hence 7’ corrected for number of arrays but not for class-index is given by _ 004,702 — 004,521 996,383 or ' n = 0135, This is insignificant and therefore we need not trouble to find the class-index correction. It would not appear therefore that the brown eggs are more densely mottled than the green eggs. } = ‘0001817, ‘We now pass to intensity of ground colour. It will be remembered that two scales were formed of ‘values’ giving as far as possible equal values by the same letters for both green and brown colours. TABLE XVI. Colour and Value. Ground Colour Values. E | 5 a Colour of Egg | 4 | B C D | B | F+@| H+I+K | Totals Brown ... | 52 | 76 63 95 51 | 44 56 437 | Green ... | 34 | 51 71 133 85 | 154 141 669 Totals: ... | 86 | 127 134 | 298 | 136 198 | 197 1106 | It is clear on the face of this table that the percentage of high values in the brown series is far greater than in the green series, which has a much greater percentage of low-colour values. To get an appreciation of this association we use biserial 7. We have for zero association RF = 005,425, while uncorrected 7’? = (113,734. Accordingly corrected for a number of arrays <4, 118, 134— "005,425 cae 995,479 leading to n’ ='3299. Calculating the class-index correlation, we find it °9674 and thus finally corrected = '108,8009, n= 3410 + 0197. bo bo Biometrika x11 330 On the Nest and Eggs of the Common Tern This is a significant and fairly substantial correlation between colour and colour value. It would appear as if absence of Sorby’s oozhodeine pigment* also involved less copious pigment material in general. (iv) Organic Relations in Shape and Size. The fundamental tables are Tables G to L at the end of the paper. correlations are as follows: TABLE XVII. Organic Correlations in two Seasons. The Correlation, 1914 Character Pair Symbols ; | Length and Breadth | £,B Longitudinal and Equatorial Girths | G, Gy Length and Longitudinal Girth L, Gy Breadth and Longitudinal Girth BG, Index and Length ... | J, L = Index and Breadth ... ae a2) Index and Longitudinal Girth L,G, Correlation, 1913 — 4496 + ‘0161 (c. 1110) (c. 294) 2104 + ‘0193 2220 + ‘0374 “5139 + ‘0149 ‘5297 + 0284 "8515 + 0055 “8804 + ‘0088 “4840 + ‘0155 5216 + 0286 ‘7577 £0086 | —°7284+°0185 “4537 + 0161 “5033 + 0294 — °3832 + 0336 The following table contains the seasonal difference and its probable error. TABLE XVIII. Seasonal Change in Correlation. Character Pair | A=1914—1913 | Probable error of A | LI and B — 0116 + 0421 G, and G, — 0158 + °0321 Z and G, — ‘0289 +°0104 B and G, — ‘0376 | + °0325 I and L — 0293 | +0204 I and B — 0496 | + 0335 Zi and G — 0664 + °0373 With the exception of the correlation of Length and Longitudinal Girth none of these differences has a significant relation to their probable errors. In the case mentioned, however, such a deviation would occur in excess 3 times in 100 trials and in defect 3 times in 100 trials, or as we have made 7 trials the odds against it are only 52 to 48. seasonal change in the organic correlations is to be observed between 1913 and 1914, As there were considerable changes in the means (see our p. 310) this result confirms * « On the Colouring-matters of the Shells of Birds’ Eggs,” p. 359. We cannot therefore lay much stress on it, and conclude that no Zoological Society’s Proceedings, 1875, A Cooperative Study 331 the general conclusion that except for very skew distributions, change in means does not involve change in correlation. Change in variability does usually denote change in correlation, but as we have indicated (p. 3811) the changes in varia- bility are not significant except in the girths, and this may be the source of such modification as we find in the correlation of Length and Longitudinal Girth. To test this we note that if Longitudinal Girth only be changed the regression coefficient of the Length on the Longitudinal Girth ought not to be changed within the limits of random sampling. For 1914 this coefficient of regression is 4501 +0056 and for 1918 is ‘4215 + 0089. Hence the difference is ‘0286 +0105. Thus the difference in the regression coefficients is just as significant as it was in the correlation coefficients, or 1s not explicable on the basis of increased varia- bility in the Longitudinal Girth. If it is, which we doubt, to be considered significant it must depend on something else than a more variable Longitudinal Girth. We may consider in this place what changes have taken place in the formula connecting Longitudinal Girth with Length and Breadth. For 1914 we have : G, = 1:1273 B+ 1:4840 L + 1:9180, while for 1913 we had: G,=1:2701 B+ 1°6415 £ +8224. The changes in the coefficients look more considerable than the changes that will be found in the values for G; calculated from either formula for eggs which are not extreme variants. At the same time the differences rather tend to emphasise the suggestion given by the correlation of G; with L, that there may have been a seasonal change in the organic relationship between these characters. (6) The Homotypic Correlations. The results for the 1914 season are of a very startling character ; they demon- strate that while the organic correlations remain nearly constant the homotypic correlations can suffer a very considerable seasonable modification. In other words the birds laid eggs very much more alike in 1914 than in 1913. The reader will remember that 1913 was a bad season for the birds, many young perished and there were few nests. On the other hand 1914 was a good season ; there was plenty of food, numerous and possibly stronger birds. The eggs in the clutches were more alike in 1914 than in 1913. We proceeded to investigate in the first place whether the greater intensity of homotyposis was due to there being a far larger proportion of three-egg clutches, Accordingly we took only the Ist and 2nd eggs in the clutches and obtained the homotypic correlation for Equatorial Girth. It was °7535, for 383 pairs of eggs. When we took all possible pairs out of all the clutches we had 796 pairs, and the correlation instead of rising, fell, but insignificantly to "7469. The difference between 1913 and 1914 cannot therefore be due to a far larger number of clutches providing three pairs in the latter than in the former year. 22—2 332 On the Nest and Eggs of the Common Tern Direct Homotyposis in Size and Shape. TABLE XIX. | Characters Symbols Season 1913 Season 1914 Lengths of Eges [L, L 4643 + 0346 6056 + -O0107 Breadths of Eggs... B, B 5176 + 0326 ‘7327 + :0078 Longitudinal Girths ... G), Gy “5076 + °0327 “6689 + :0093 Equatorial Girths G,, G, | *4621 + :0350 “7469 + ‘0075 oe SS SS =a) = | | Index | 100 B/L, 100 BIL | 5537 + ‘0308 *5327 + 0120 It must at once be admitted that this result is of a very startling character. Only the homotyposis of the Index has remained without any significant change, i.e. the degree of likeness in shape does not exhibit a seasonal change; in all four cases of absolute size there are most substantial and of course significant changes in the homotyposis. The mean size homotyposis has risen from °4879 to °6885, Le. by about 40°/,! It is difticult to offer a demonstrable explanation of this great change. The factor we are seeking for must be one which modifies so to speak the individuality of the bird between its successive egg layings. For example, a change in the climatic condition or in the food supply occurring in 1913 somewhere during the egg-laying period. Such a factor, however, would lead us to suppose that the high values of 1914 were the normal homotypic values, whereas they appear to us from the comparative standpoint to be the abnormal. If we suppose only the stronger birds survived to the season 1914 and that there was a plentiful food supply, it would seem that the community as a whole should have exhibited less individuality in size and not more,—the weaker birds obtaining less food supply would not appear. There is, however, so little change of type and variability of the eggs in the two seasons that it is hard to believe that selection of the birds is the source of the change. Further if anything the variability of the eggs is less in 1914 than 1913, and such reduction of variability would tend to reduce rather than increase correlation. If we suggest that 1913 killed off many of the old birds and that there was a larger proportion of young birds in 1914, so that there was a more heterogeneous community in 1914, we are pulled up by the fact that the eggs were on the average very slightly larger in 1914, which is, perhaps, not what we should anticipate with a larger proportion of first layers. It would seem as if we had to take refuge in some very vague statement that the seasonal environment for 1914 interfered less with individuality than that of 1913. But this does not really help us and leaves us with the greater difficulty, that it suggests that ‘individuality’ is an indefinite quantity from the statistical side and might result under favourable environmental conditions in all the eggs of a clutch being perfectly alike! The persistency in the Index value seems in itself to point to a limitation in in- dividuality, and it seems wisest at present to await further material before A Cooperative Study 333 speculating on the source of this marked seasonal change in size homotyposis. One point, however, we can investigate, namely, whether pigmentation homoty- posis has or has not kept pace with size homotyposis. With this aim in view the direct homotyposis has been worked out between mottling of one egg and mottling of a second in the same clutch, and between ground colour of one egg and ground colour of a second in the same clutch. Further the cross-homotyposis has been determined between the mottling of one egg and the ground colour of a second in the clutch. The fundamental difficulty here lies in the treatment of the ‘values’ of the ground colour. We cannot separate green eggs from brown, because of the occasional appearance of mixed colour clutches. Nor would it be reasonable to work with contingency on a 20 x 20 category table. We have accordingly been compelled to pool green and brown eggs, when they have the same ‘value’ on our colour seale. This at any rate renders our present results comparable with those of 1915. But until we know more of the mechanism of egg pigmentation it is impossible to assert that equal ‘values’ in brown and green ground colours are what we should anticipate as a result of individuality working occasionally with one and occasionally with another pigment. The homotyposis pigmentation tables are given as Tables R, S,and T at the end of this paper. In actually determining the contingency we have clubbed d and e in the mottling together, and A, and A,, B, and &,, C, and C,, etc. in the value of the ground colour, thus reaching 8 x 8, 10 x 10 and 10 x 8 contingency tables. These have then been corrected for number of cells and for class-index correction. The class-index correction for mottling is 9531, and for value of ground colour ‘9848. We consider first the cross-homotyposis of ground colour and mottling. The coefficient of mean square contingency on the supposition that there is no asso- ciation between value of ground colour in one egg and mottling in a second would be a Cy 2 Olen. The corrected actual coefficient of mean square contingency is ee, -which is less than the mean square contingency coefficient for no association. Accordingly there is no cross-homotyposis between mottling and ground colour, and there should not be if our view be correct that the organic relationship in the same egg is zero (see p. 328). The value found for the 1913 data was C, = 3989 + 0379, and was spoken for as significant. But the fact was overlooked that C, = 3169 +0451, ‘so that C,1s less than twice the probable error greater than C,, and may well not be significant. This conclusion is confirmed by the consideration that the organic correlation of mottling and ground colour was really insignificant in 1913, and 334 On the Nest and Eggs of the Connon Tern thus it is exceedingly improbable that the cross-homotyposis could be significant*. Direct homotyposis provides the results of the following table : TABLE XxX. Direct Homotyposis in Mottling and Ground Colour Value. | Character Season 1913 | Season 1914 Mottling of Eggs in same Clutch *3500 6267 | \ The trchatlere errors ae me 1913 values are well below ‘045 and of the 1914 values well below ‘017. Accordingly the differences are markedly significant, or in the nature of pigmentation the resemblance of eggs in the same clutch is much more intense in 1914 than in 1913. Thus the results for size and shape of egg are confirmed by those for pigmentation. We have therefore this very remarkable fact—a fact which it seems to us may be of some consequence—namely that the season can affect the extent to which the female bird impresses her individuality on the eaternal characters of the egg. It does not follow from this that seasonal differences can affect in the ike marked manner the individuality of the internal characters of the egg. But it does raise the suggestion that it would be well worth inquiring whether the degree of resemblance of offspring born in one season can differ sensibly from the degree of resemblance of those born in another season. Should such a difference be established, it would indicate that heredity—in other words the nature of the germ plasm—could be more readily influenced by seasonal differences than has yet been anticipated. We ourselves should be very unwilling to admit this, but we must at the same time confess that we see no obvious explanation of these significant changes in homotyposis. If individuality impressed in the ovary and in the oviduct on the form and colouring of eggs can be increased or decreased by seasonal differences, it is not a very long step to believe that other physiological processes of this region which impress individuality on the internal characters of the ovum can be modified by the nature of the season. _ Ground “Colour Value of Eggs in same Clutch | 5709 “7480 I We now turn to the cross-homotyposis in size and shape of the tern’s egg: TABLE XXI. Cross-Homotyposis in Size Characters. | Characters of the two Eggs Season 1913 | Season 1914 , ears ea Length and Breadth. . : *0922 + 0441 *2621 + ‘0157 | Longitudinal and Transv erse e Gir ane *2603 + 0413 *4546 + ‘0134 | Length and Longitudinal Girth — ... 4229 + 0362 5854+ °0111 | Breadth and Longitudinal Girth ... 2530 + 0416 *4162 + °0140 | | | | | * See above our second footnote on p, 328, In 1913 we had not fully realised how high Ce could be for such short samples as a couple of hundred. Hence the source of the error. A Cooperative Study 335 We are thus again faced with the fact that the cross-homotyposis of the eggs of 1914 is substantially higher than that of 1913. We still see the markedly emphasised individuality of the female birds. We have next to enquire whether, the organic relations being practically constant, the cross-homotyposis has increased in proportion or not to the direct homotyposis. We can test this by Pearson’s suggested relationship*, namely Cross-Homotypic Correlation of x and y= {correlation of « with x + correlation of y with y} x {the organic correlation of # with y}. The following table gives the calculated and observed cross-homotypic correlations for the seasons 1913 and 1914. TABLE XXII. Cross-Homotypic Correlations as Calculated and Observed. Season 1913 Season 1914 Character Pair of two Eggs — | Calculated Observed | Calculated | Observed Length and Breadth... ea “1090 0922 1408 2621 | | Longitudinal and Transverse Girths | +2568 ‘2603 3638 | 4546 Length and Longitudinal Girth ... "4278 4229 5426 =| » 5854 Breadth and Longitudinal Girth ... 2674 2530 "3392 4162 | Thus while the calculated values were in excellent accordance with the ob- served in 1913, they are very inadequate to express the increased individuality in 1914. In other words the cross-homotyposis appears increased even at a greater rate than the direct homotyposis which we have shown in itself to be markedly emphasised. What we are accordingly confronted with in the season 1914 is an exuberance of individuality and the possibilities which such a variation of individuality suggests. It may be confined to the externals of the egg, but the physiological factors which determine those externals must at least be in close proximity and may, perhaps, be affiliated with others which affect matters much more important. The approximate constancy of type, variability and organic correlation for these two seasons coupled with the marked change in homotyposis is a problem which demands further observations and much hard thinking: * Phil. Trans. Vol. 197 A, p. 290. 335 On the Nest and Eggs of the Common Tern LIST OF PLATES BELONGING TO THIS PAPER. Plate II. Simple and complex nests; a and 6 less elaborated, ¢ and d more elaborated nests. Plate III. a Clutch of three eggs; b clutch of three eggs; note agreement of size and shape with diversity of colour and mottling; ¢ noteworthy individual pigmentation in a one-egg clutch. Plate IV. Fig. a. - Common tern alighting on ground to show spread of wings. Fig. b. Unique clutch of four eggs in a more elaborate nest. Plate V. Common tern arranging her eggs and settling down to incubate. Plate VI. Common tern sitting; objects to giving a sitting to the photographer. LIST OF TABLES APPENDED. Table A. Organic Correlation. Mottling and Breadth. eB ss Mottling and Length. yy. Ge - Mottling and Index. IDE - = Value of Brown Ground Colour and Breadth. D°. » " = Green a F os ame Oi 5 Hf i Brown ¥ Length. rae ie S - ee Green * . sa eo tn * - | Brown . fs Index. pean : . . Green i ~ eo ee: - . ; Ground Colours and Mottling. ee oie * . Length and Breadth. H. m i Longitudinal and Transverse Girths. a ai < Length and Longitudinal Girth. ena .s i Breadth - F; » K. - . Length and Index. pel De i ‘ Breadth_ ,, P yo eb - Longitudinal Girth and Index. M. Direct Homotyposis. Lengths. a) aN: . P Breadths. 7 Oo} . ¥ Longitudinal Girths. Ue x . Transverse Girths, Nel: ¥ Indices. eat . - Mottlings. i SE . 5 Values of Ground Colour. , T. Cross-Homotyposis. | Mottling and Value of Ground Colour. cA, WH Ob - , = Length and Breadth. eee s 3 Longitudinal and Transverse Girths. » Ww. 4 , Length and Longitudinal Girth. ae: y , Breadth ” ” uraoy “Ay {q sydeasojoy “JUITIUOATAUA O}VIPAUUT JO TOJOVART|D PUB pasu [BLIdZRUI JO asuvI Plate II ee a £ eo » Biometrika, Vol. XII, Parts III and IV ayy SuTyeo Ipul Udo yf, WOULMOL) 9t{¥ Jo SJSON Plate III Biometrika, Vol. XII, Parts III and IV ‘muo1s@090 auo uo A[UO punoy ‘S30 pwordAyy ‘9 “YON]d ayy UIAIM AJISIOATP 9IBAYSHIIt s3A0 esau, “Yoynjo puoses w jo saaq ‘q ‘9 “Q ‘uBMoy WBA, 4q poquied ‘uieay, TOWIMOD eUy Jo s83q “yoyN]o BuO JO 8H sup ‘p ‘y ‘p lith. Cambridge University Press Pa 1 , Biometrika, Vol. XII, Parts III and IV Plate IV (a) Common Tern just alighting, to indicate great length of wings. (0) The four-egg Clutch : see p. 318. Photographs by W. Rowan Plate V Biometrika, Vol. XII, Parts III and IV urmoy “Aq 4q sydersoqorg “PAT OT} WLOIJ SOTPOUL BT JNoGR vazotTED Buchel TO TMOP Surp}qas puew S550 Jal] SUISURITR Uday, UOMIMOD urmoyy “Ay q x —_____—_—_—|——_ a. 3 Wee Ss Ss) S “ — — as Piss SSPE Salat) | 3 > ail mere mm.) [RS | S | | Ae | oerrmaoam || | |e x ot &) (ae 3 (Oe atetesiteaseacrnu acca FOP eg 182° = a pes = | tare CAINS re GM | | | | ee) ~ = Tal a ee ilel+lellilife | a aan | | | | We) llel-tllilit {+ ui S ie Breadth TABLE H. Organic Correlation of Longitudinal and Transverse (irths. —09.6T 04-61 A Cooperative Study =U —08-61 —06-61 —OT-6T Bee ane ea =| ai = | 00-61 | | eee | J oewrowmaa | | —O06-TT —08.TT Oh TT —09: rT 00.11 ae IT SOLS ETE Longitudinal Girth*. —06-TT —OT-IT | | | | stoke Se 4 4 4 moa | Sees i) Sees NOOK Ol mo AeA M1 MANO OOO | pol qo > a OrFowuraan | (oes It oe oe | mot 6 rm re tH = oS Se OBIE (eo Paes ce ee = 06-01 rT 09-01 —OL-0T —09-0T —0G.0T —O4-01 —08-0T lees | | | = ne) Teen | ovaries anal ai | DO vA OD SHH rs | oat eee | | l | 23 | 7 94 78 | 43 | 36 1142! 1091! 77 * 10°30— signifies the longitudinal girths from 10°295 to 10°395. + 8°60— signifies the transverse girths from 8°595 to 8°695. 343 On the Nest and Eggs of the Common Tern 344 s[eqo], | ia eo ose | ' ' I | L | sTeqoy, —0L-6T — 09-61 OG.6T OF EI ae —08-6L | —_06-6T —OT-6T 00-21 Ln iio! [Pog SEs ee SEN is | 1d SH OD © OC Nato n | | [aha eg Sree eit | | I= | | et SIMON H HO RA eS hi NOOO lM | pasoustoo ot | | Ieee | 06-TT —08-TT OL TE —09-TT me Salh OF.1T O8.-TT —06-TT ON hash) h 00-TT | O6-0T | —08.0T —0L-0F 09-07 Of-0T "yysuery ‘Yl qoupnybuoT pun yybuaryT fo uoynjat.og T Wav DUD BL) —04.01 —08.0T Watts) yeurpn4 -ISUOTT 345 A Cooperative Study S[R10 J, ccl stl No.8 | —0¢.7 MOWoot0o + ao | |NSTEMS | Zz mo st a | js DO oO AN [| tod Sst coor ee Or ONAN o xopuyT ‘rapuy pun yybuaT fo uoynjasuoy oiunbugc) s[®4O], || | |eaaae | [- eal pee el Wie ere OCR ls ‘yySuory | ARAOSoRAS | —— M WTaVva | | | | | GOL | SFL | GEL |GET] GL | OL | 08 | 41 | 2] 11} 6) ¥% | | | Pere fae al be | eee hal (te ea | Geet eat GN cds eer SL. LUG SG. Gt le reaeyadce: fT et | = LOW BOC em MOGeInC Gs ecw pS i tS, |e | mah eae |e GL 0S Mey. OS OC Ol ells | - 1 S 5 [OP NPGIANSStnGGe\ pe Wit etal 0. | |e) e |) POG sS6ie GE 6D AT 6m ee | |S a) Se 1 es! (Gh AGe ev eahee MEL oC I | le Ne) Te Vis Tam si) Gmc Mie WO GA? Ota Qlaedeer | Gi |G Vik eed Paeegl Gil ae lead eres AGMCL. | Gerd G: eG. lela Viel Le hee Moree len TMs Ly Le fh ah cee ewe Pec cele Ba elle tye | oc VP ea a ee | if eae ican sar | penal dae Yd) [eUIpngisu0yy Yl) jpoupnybuoT pub yypoarg “uoynparwog owobig “f ATAWL 23 Biometrika x11 346 On the Nest and Eggs of the Common Tern TABLE L’. Organic Correlation of Breadth and Index. Breadth. | bebe ey eV) TS a ie Te at el Index 1S.) | @ Joe |S [19 Fo | te PS 46 Saito es monies SiSTE [eS [Slse lays Ss [SPs Apres ae]s NY Sy) » Sy) N SS) iss) NR 3 | 8 Ss) Sa) Ss | 9 89 | Ti fg i ee re eg pe) || ef Des ee a es ee | Sl a tT Sane | oes ard =07 | eh ees ee et ee eee aero me eee eile elt aot | 1 —{—|]1t|]—j 9] 6] 14] 16 | 21 | 12 aes — | 1 |— | 4) 19) 20.) 34 1°36 | 277} 26 |.20°). 22.) 1) 11 | dt 4) 6 5 85) 58. 167442" 985) 1), ee — fe | 1] 2) 2) 12 | 22:1 36)) 74) Sa 974! 10 | be) U} 1%) 3:1 0 | 1h) 60-36.) 874) 15 aoc eal me ee ee PL el a8) ie gore 10) | 2, |e ee ae es ea ery eee eee ae stall ee | 80 7 ee eae yee | een ee | 82 | Se ee ee es SS) Sh | Wee:7? et i Ue Seed lage se lie fee. al epee { S6- | 88 90 | 99— | | Totals | 19 | 36 | 60 (128 | 180 234 | 219 | 147 TABLE L?. Totals Organic Correlation of Longitudinal Girth and Index. Longitudinal Girth. 10°70— 10°60— 11°00— 1050 — 10°90— 4 4 a a rl R | mae ee NN I N 4 4 sl ON NN N Ne ee = | | — | =! | — —/—);—] 1] 5 == |=" Seal) oe he epee Oo) Seca ee tae S| Ot | Ut eo ig.) 7 a6 al a5 = leas 1|/ 4| 2] 7| 7/]20/]21/] 21/17] 8 | | 2}.4}—]| 4]13] 15 | 25 | 99 | 39 | a3 | 26 —=|==(—p Lp 4 2) S| 7 | 19! 94 | 28 136) | 38.) 273/220 ti rae 1} 6] 4]. 6] 18] 14 | 26 | 31 | 92 | 19 | 11 —|—|3|/—{—]| 3] 4] 2] 6) Flip 9) 11-6 | 7 ee aie eS fe) Ss) ee ini a@tceealasely del pease eee ee 2 | 1 = | | ee QPSHBy~ Sys yoy s isy) NY isn iss) isy] Q | ~ Aaah Seb |) vente esi yp Am | = feceslleih a 1 | Sih 4 al eke | Shey 3) 3] 3/ 5) 1 toy) 8/13} 14!) 3] 6/1)3)/—;— — 21/16) 11/138} 8,3) 2 |—|—,— 97 | 21 6| 12} 4 | 95°} LO), 6 ik | : | — CAN Rey ie SOR aI AM We Eff ues hs a Oil aeseeen) een es hs es a — |} — | — |—|- =| e | = | 132 | 142 | 109 TABLE M. Direct Homotyposis. Lengths. Length of First Egg. A Cooperative Study 347 BANOO NOW ENO — oD les ~NANAN rR wWON (Sas MANAMmMOWAN ite [fais ge taal Sete gt OOPS ESE i oo aan | P SSP aise aa CI ed | | | 4 Lal | eagle [prea et gt SNS Sepicey tS Me) Ge) OXF ie aS 65 | 97 | 1 { | | nis) 227 | 113 | 101 | 133 210 41 | 47 Egg | See Sa) xD | | | | aod | ! yt ee 2 SHSM SH SH SH SgysovysHoooos a SSSSHHARKRHAHSSOHHOSCSHHLNH 2 SS SBSH SESS SSS YE 348 On the Nest and Eggs of the Common Tern TABLE N. Direct Homotyposis. Breadths. Breadth of First Egg. Breadth | of Second _ Egg il 53 1 @ TABLE O. Longitudinal Girths. Longitudinal Girth of First Egg. Direct Homotyposis. Longitudi- | ' | | | miGinblots |e cd lolg id ia) si ee eee of Second [2 12) 2 |) 2 er) Pl ee Sy ea eee Se | kis eco iret Pe ee re Egg mb ef se Lf ee | see Pee] et ee cre es |S los om Gl ail 7020022 TPR Teh) Bel Sule ee os 10-60. Fa ee) a i les ee 10°70. V2 lef a a a es ae | = 10-80 3) Val BLE 4 Gell Osos aoe ies tel eT eee cee ea 10:90-—— N= || 2 4) Ore re a 2 | 1-00-—= J—) P|) 2 8s) 2) Sy) PON ES 87) 8 eee ais alia ee 17-10—" J—1 => 2) 3B) 7 7 | AB |S 1-20 Bt Ta 7) BP a Se one 1020— Y—|—) 4.) “6rl) 1) 10) 18° | 22 | 1% | 12) |) 19°) 9) a Be z 1230 — (| 2) | 20) 1 | AO") SL Se) 198) aoe ee sae 11:40—-|—;—| — | 1|—]| 3] 8] 12]-31 | 42).31 ]27) 7) -9} 5] a] — le 1050 i ) a | a) ae 1980 e2 | oreo are), eo el | TEGO Neale ae te). dale 21 27°) 25.) 14). 9 92 | 10 | 7/1} 4/—|— 11-70— J—|—-| 1}/~—]|—|—] 8] 4] 7] 7/27] 9) 10}13)13/ 7} 3) 2) 1)— 11°80— =i) DO) a ee ane) OON Ss lea2 | 20,| 10) |. 5. |) =. /e3qieom! 11:90— J— — —j| 1}|—]| 2. 3 5 | 4\ 5| 7/10] 18420) 22) 9) 10) 7) Tt 12:00— |— 1| 2 Sy ee haere ep alos) Cr aloe Gh) ee.) — 12°10— ee a ee i ee es A ee SPO aa Oe Wma TO Aleka th.) ||. 7) 12:20— a ee ee ee ee eee mabe ray P| | 12°30 — ee es) tobe estes i) | 12°40 9 Sol 1b | oe ee ee 12°50 | | 12°60— 12°70— | | | Totals 43 (11/112 | (149 | 107 | 142/118] 63 | 49 | 33 | 8 | 7 A Cooperative Study 349 TABLE P. Direct Homotyposis. Transverse Girths. Transverse Girth of Second Egg Transverse Girth of First Egg. | 10°40— 10°00— 10°60— 10°20— Oi — 10°10— 8°70— 8 80—- | 8:90— 9°00— 9°10— 9°20— 9°30— 9°40— 9°60 9° 70— 9°80—- 9°90 10°00-— 10°10— 10°20 10°30 1L0°40— 10°50— 10°60. LOO poe | NWNNWORRNWNWE | HH} oom | = DS kB OO & | b bo i pn eke. | bore j ] | E | | 40 | 81 | 94 | 134) 194/213! 208! 208 | 155 | 116 TABLE Q. Direct Homotyposis. Indices. Index of First Egg. Index of | | | | | | Second ab ui al i i | il Al a ile an a | a a A ov alee tines iotals Ege |S Sa es ae alas Fl kL lsel sis} sl sia] s pees i eeieg |) ley to 4 | = =| [ 3 Os: | | Ee ae eg ee ee 0 62— |—|—| 4/°6/ 3] 8; 3] 2] 2 - 28 6s rl | -—| 6] 4/11} 14) 9) 38] &) | = 50 66—.}— |— | 3] 11] 32] 29| 96) 9] 2}—| — 112 68— J 1 |—/| 8/14/29] 50] 82} 48/12) 6; — —, —j;1)—,—,— —] 251 70-1.) =| -3] 9] 26] 821110) 88 |27|18} 4/1 | — | 369 | 72a— 1—|—]| 2] 3] 9] 48] 88| 104| 79 | 34] 7] 3 | | | 1 | 378 ee eee a9) 2 lator) O71 7011-8899.) 96) (a= lea | eee ed ong | 76a— J|—, —|—]—}]—] 6] 18] 34/29 12] 10 ee ee 110 va— |—-|/—|—/|—|;—|—| 4] 7] 8/10] 6 Pe | ee 1 eae | Seem te | i Ne ey ee ee SS 6 Da eae nl = ees fee SS ee eee Se el Se ee 0 | im") eee RAE SPS Frm FT mec | (Co fc = a (ce ne a Me Ha 4 86— |—|--|—]—| —|/—); 1,1)—/—|/—'— —)|— 2 Bo iste | me er eee | | ee 0 (eS | ae | | 0 92— = Nees | : = | | | | 1 } | \ | | |, 1 Totals | 3 0 | 28 | 50 /112|251] 369 | 378 |250/110) 38 6 | 0 | 4 | 2 | 0 | 0 | 1 | 1092 Cc Ground olour Value in Second Egg. 350 On Direct Homotyposis. TABLE R. Mottling with Mottling*. the Nest and Eggs of the Common Tern Mottling of First Egg. “4 b ¢ dl eel ee | y | h Totals ae = el a 4 6 a es oat ee Coy e Buut eleel ee 32 aS b 6 16 | 15 | 1 ll | 8 | ay || 2 1 65 ie) G 4 15 388 4 | 148 64 | 29 21 14 687 @| @ I el 8 | 3 3 45 — } a 24 S|) e 7 | 11 | 148 7 38 | 106 | 42 | 24 6 3 | 350 Sat 6 8 | 64 B49 78 yA 4 5 | 214 oo] 9g 3 5 29 4 24 4 60 2 2 133 eA 1 2 21 — | 6 4 2 10 — 46 eae Name ee te | ee | 3 5 a ae 39 © | a = Totals} 32 65 687 24 350 214 | 133 46 39 1590 * In one two-egg clutch no mottling categories were provided. TABLE 8. Direct Homotyposis. Values of Ground Colour t. _ Ground Colour Value in First Egg. Te reales B,| C, | D, | Ds By | Ba | P| | | | HM ones K, | Ky | OWE BE ONwWe | | wire | o | bo eee ie We Peaks 1 1) 3 reli ae bo és to. aOww i: w 48 107 + The total number of pairs of eggs was 1592 ; from this table are omitted the four pairs which arise | 106 | 122 | lee estas 1 [el tel Set 1 2/151 44 6 3 laa] — | at 7 = lope 4l varies 3 ba) 2 |) O36 5 =. | 381 4' io 1 93: | 4 dhe to | 22 = to 6 | 2 ele Te aide’ liad eS 5) 6 | — | elope eon 50 Bel acca eee 16 1 lea 5 eas Sl ad qlee tee i cee ab || = il a, Oeil eo hoe 2 So ri al pa ee 2 197 75 127 | 25 | 131 h 37 L lool | | co | Wwe Wwe pa a ee Reems alanwel wal | [et elroeld leo Piiprcues comnoe ues (ane alsa f soot aes ree | = lrolel wl vol 24 eo bo 21 lmlrol wl wo Sic ee basin eee cee | bo i) 49 from one blue egg in a three-egg c'utch, 351 A Cooperative Study C91) ELL | L2G] €L1 lore | s[eqOL, I —08-8 T {1 LAs) = rts =a CBE Saini) Me PEO eure. Woke es yee ls APA = ime bac (est fem (ga a eee (wae es ee ee ee ee ee ae eee | ae oe | GLE T | 8° 8€ | G14 BL | ee) $f | os) 81) eo.) 1) AL) |e | 1 | tL || | =a Ole 4 | 8 | 08 | &I | St | re | 96 | Lb | 98 | 46 | 1€ | BB, 9 | OL; T | iba = t| —90.8 9 |6 | IT} | 08 | 8¢ |] oa] 244] 98 | 2h | ee | 41) 0T;6 |e | & al y |¢ |9 | SL! IL] 1 | ol] 6e| Fs) ze) 73) aL} 2tit je |e] | | 1 ee @|¢ |¢ [9 |e | et] e | 9s} Fo] Gs) st]2zt}s j8 |e | € oe tance cp ss floes a Sot Rs Me ae eo a CS a 8 a li UC I | = I SSO a EE WG ees WOTW AO) ioe hee we Gs eo Ge aL \sae feaaeel oad leas i eee Wee Lee se Nigel Aer Gol Lewin Le, Hil Wee Reese) | LE cea ma Ga Demme Pe vane LR aa DemeaFa | | haces hae tom a eared eel ce thec gt ene ey MD lee =\S1=] ese = pee es See ee ee) eel ee || ce ~~ aS oS PS hb bn | oS tS > oS Sy | Qg we) ©9 | G | Us Ce Go] Gs} Ce] So Us Le | og Gp | Og pat _ | ALR ELS) S| S/R] RI BTR] S| S|] S$) Sl ele] F/ Ss] S/S] Sl SrSi Hs] S|] S] By mrs i 7 Sr i iF 7 T j. fe ii ib if | is ir if T i. | i ig iF ik i | ie 7 N | ] | a1 jo | | [sae | | | | } |] qypeeag | | ‘Bon ySaty JO yasuary ‘YppHang pun ypuaT ‘sisodhzowo}]{-sso4g ") ATAWIL, ‘stsod 4JOULOY-SSOIO OF aTYBI SITY} UI INO[OD puNoIS JOJ pasn oq 4OU p[Nod eXoJar9y} puv , antq , sv peuInyar S@M BZa 9Uo YoIN[D 339-0014} eu UT puUB ‘peprAord o19M SIOJOBVIVYO SUI[JOU OU Yon Se-omy OU UT , 1 (aes eee lod Om ky ae twonwM em Fie Moen CO OD & IN ‘BAY puocdog Jo onyeA INo[oD punorr) ARrANANA Se Att lets [Rois ie eS € [ é i F TL € I qT 6 9 € i 6 T A Hie cee ec se SO SON St QAtARKOD ‘SOY ISI Jo SurpyqoP; ' ",wnojog punoiy pun buyyoyy ‘srsodhyowojyy-ssoug “J, ATAVA Ow the Nest and Eggs of the Common Tern 352 | |aaace sol | | pas aren cal AOMIAMADOOWMnA €9 eee peat el Pa ale Well el aa tl eee real | An = nN ‘SyJlid) aswonsuDiy, pun pourpnpbuoT ‘srsodhgowo FT-ss0.1) Le | DOS: 7 Petter Nex ioe Oxi oroes Mac = —=00: EE ~ einech rasta | —04-0F Lali Ta tests —09-01 —0E-0T S[e4O,], —O0L-01 —09-01 —0G.0I —07T-01 —08-01 —06-0T —OL-0L —00-0T —06:-6 —08-6 —0L-6 —09-6 —0¢-6 —01-6 —0E-6 —06-6 —OL-6 —00-6 —06-8 —08-8 —0L-8 UE) OSIOASUBLT, ‘YyALD [RUIPNyLsUO'TT 353 A Cooperative Study me {eo | | | | aA MeN (eee | ee | | al AMA ANTR HAO WM IOAN pee | | 2B ELL | 01S | eel moO © e109 4 ao ora | re | 20 oH al | iefeene Ota NA tHsA on NANA o> ae 4 al moe | fol A tN CO Ph 0) A160 I a4 [ail SSE = AC) OS ae | | | ee Pee puoseg jo WOES) [aes -IpnyLsuory 3 “YIAU) ‘S8y YSIL JO yySuery pourpngrbu0'y pup ybuay ‘sisodhyowo fT -ss0.u9 ‘M Tava 354 On the Nest and Eggs of the Common Tern —N9-ET | —0&¢.61 —OF.6T | | aame | —op.2t] | | | | | laa lane a [PERM REESOS CSC | —06-é1 ~ Om ON N —OL-6I Perit at | eee 3 | 49 | 33 Sd HNO Hil M4 N —00-6I a | 6 UOT O08 TT ma) —_09-TI | 177/180] 212) 149| 107 | 142118 Breadth and Longitudinal Girth. TABLE X. —0G-TT {OV fSIEIE OES E Longitudinal Girth of First Egg. -—06-TT 111/112 SOF EL 20 OD SHG | | 43 —00:TT Cross-Homotyposis. eee Jel sslealea| —O6-0L —08-01 es epg ay ee —OL-0L —09.-01 04-07 2:60— Second E Breadth of ON THE DEGREE OF PERFECTION OF HIERARCHICAL ORDER AMONG CORRELATION COEFFICIENTS. By GODFREY H. THOMSON, D.Sc., Armstrong College, in the University of Durham. PAGE (1) Introduction : ‘ : : 3 : : 355 (2) A Criterion for Hievarchien Order : 5 . : 356 (3) The Relationship between Correlation Coefficients and aie Sampling Errors . 5 5 . 357 (4) Two Experimental Dernoustranons of aN Effect of ine Chietion in Cases where the True Values of the Columnar Correlations are known a priori ‘ : : F : 359 (5) The Effect of the “ Correctional Standard’? a : : : : 364 (6) Conclusion . i : : : : ; - ‘ : ‘ 366 (1) Introduction. When several mental tests are applied to a group of subjects, and the correla- tions between the tests (taken in pairs) are worked out, the coefficients are as a rule found. not to be arranged entirely in haphazard order, but to show a certain degree of what has become known as hierarchical order. This means that if the total correlation of each test with all the others is found by adding together its coefficients, and if the tests are then arranged in sequence according to the order of magnitude of this total correlation, they are found to be also in sequence, or nearly so, according to the order of magnitude of their correlations with any one of their number. If the correlation coefficients are set out, as is convenient, in a square table such as the following, the letters #,, 7, etc. being the names of certain mental tests, and the quantities 7,, 7, etc. the correlations between the marks scored in these tests, then hierarchical order shows itself in the fact that each coefficient is smaller than that on its right or than that below it, provided the tests have been arranged in sequence according to the magnitude of the total correlation of each with all the others. ° ra Cy x3 V4 e « | | | vy 0 "yy 713 "4 9 ° | | Eo) 12 e 193 Po4 e e | v3 13 193 . 34 Q . v4 M4 Moy My e e e e @ e e e e e | e C) CY) e ) C) ) The observed numbers in an actual experiment naturally do not in any case come out in perfect hierarchical order, and it becomes important to have a measure 356 On Hierarchical Order among Correlation Coefficients of the degree of perfection present, and some means of estimating from what “true” correlations the observed numbers are most probably derived, and the degree of hierarchical order among these “true” correlations. The importance of this matter arises in the Theory of General Ability which has been proposed by Professor Spearman, for that theory can only be considered proved if the correla- tions are derived from an absolutely perfect hierarchy. A merely high degree of’ hierarchical order can be attained without any General Factor whatever, by the random selection of Group Factors. The very difficult question therefore arises of deciding (if possible) whether the hierarchies actually observed in experimental psychology are more probably derived from perfect hierarchies such as are postu- lated in the Theory of General Ability, or from the good but not perfect hierarchies which arise in the Theory of Group Abilities*. A criterion which, it was hoped, would give such a measure of the perfection of the true hierarchy from which the observed numbers were derived by experiment, and which has been widely adopted for this purpose, was worked out by Dr Bernard Hart and Professor C, Spearman in the British Journal of Psychology for March, 1912. The object of the present paper is to inquire into the accuracy of that criterion. (2) A Criterion for Hierarchical Order. The underlying idea was that if the above square table of correlation coefficients shows hierarchical order in any degree, there will be correlation between the columns of that table taken in pairs, and that when the hierarchical order is perfect the columnar correlation & will rise to unity, except in so far as it is blurred by the sampling errors, which obviously cannot increase an already perfect correla- tion, but can only decrease it. Let us write dashed letters throughout for the true values of the various quantities, which in ordinary experiment are unknown, reserving undashed letters for their measured values. We then have: r = true correlation coefficient, e=its sampling error on one occasion, so that r=r+e * ry = mean of the column of true values 7’, r =mean of the column of observed values r. In finding these means, that coefficient is omitted which has no partner in the column with which correlation is being found. Write also p =?" measured from the mean of the true column, ie. =7’—r', and similarly p=r measured from the mean of the observed column, i.e. a. e=p—p’, =e-@, where € is the mean of the column of e’s. * See G. H. Thomson, ‘The Hierarchy of Abilities,” Brit. Journ. Psychol. 1919, 1x. p. 337 and “«The Cause of Hierarchical Order among the Correlation Coefficients of a Number of Variates taken in Pairs,” Roy. Soc. Proc. A, xcv. p. 400 (April Ist, 1919). GoDFREY H. THOMSON 357 Then for two columns a and b, the true columnar correlation which we desire to know is Cy / / R = S (p xaP ay) 1 IRN (1), Ava (p aa) (p zb)} by the Bravais-Pearson product-moment formula, S indicating summation over the various values of z, 1.e. summation up the column. This can be written Rt —— S (Pra Prd) —8S (€xa Exp) —8 (px Exa) ae) (a Exh) _ = : VS (PxaPaa) —-S8 (€xa Eva) — 28 Ne cee Exe )} V{S (PxpPxb) —S (€xb €ab) — 2S Gee Exb)| In this expression, the three quantities of the form S (pp) are known. ‘The three quantities of the form S(ee) are not known, but an attempt can be made to estimate their probable values from the known standard deviations of the correla- tion coefficients. The four quantities of the form S(p’e) are treated by Dr Hart and Professor Spearman, in their paper, as negligible, on the ground that p’ will not in general be correlated with ¢«. It 1s the object of the next section of this paper to examine the nature of the correlation of these two quantities. (3) The Relationship between the Correlation Coefficients and their Sampling Errors, in the Case of Correlation between a Number of Variates taken in Pairs. Consider the formula for the standard deviation of a correlation coefficient, viz. where NV is the number in the sample. It follows from this that the larger correlation coefficients will probably have the smaller sampling errors e, disregarding the sign of e for the moment. But these signs of the quantities e are not likely to be indiscriminately positive and negative. On the contrary, they will have a tendency to be either all positive or all negative, if, as-is the case in most of the columns of coetticients considered by Professor Spearman, the correlations in the square table are mainly positive. The errors in the correlation of a variate 2, with a variate w are themselves correlated with the errors in the correlation of the variate a@ with another variate #,, according to the formula Tealesa (1 a ry, ce Txoa 7 ry, a 27 ya aga Yy,a) 2A ley) Lee) That is, the correlation of the sampling errors of r,., with the sampling errors of Yz,q depends chiefly upon r,z,,,. To illustrate, let us take three correlations from an experiment in psychology, carried out by Mr Wyattt. ol), Tax, "ax, V1 He - * Karl Pearson and L. N. G. Filon, ‘‘ On the Probable Errors of Frequency Constants,” Phil. Trans. of the Royal Soc. 1898, cxcr. A. p. 259. + Stanley Wyatt, “The Quantitative Investigation of Higher Mental Processes,’ Psychol, 1913, v1. p. 181. > Brit. Journ. 358 On Hierarchical Order among Correlation Coefficients If we let x, be the mental test “ Rearranged Letters,” Hy 4 , “Missing Digits,” a a . , “Analogies,” the values there found were x, a = 0°63, Toa = 0-61. Then by the above formula the correlation of the errors of these two coefficients depends chiefly upon r,,,,, Whose measured value is 0°63. Using the full formula, and employing the measured values in default of the true ones, the correlation between r,,, and rz,q turns out to be ‘47. It is therefore (to an extent indicated by this value) probable that they are either both too large or both too small. The same argument holds, in varying degrees, for the other correlations all over Mr Wyatt’s table, which are all positive. They have a tendency to be either all too large or all too small: in other words, the e’s tend to be all of the same sign. The relationship between the correlation coefficients of a column, and their errors, can therefore be summed up in the following table, in which the symbol 'e| denotes the magnitude of e regardless of sign. TABLE J, i" | el p' | e or « pe or pe 2 — = = : — ann large | small + - + 2 nis + - + - + + - + - + - + = ~ + - + = - + small large - + = = ae | = = a = | S (pe) = | —- or + The first column shows the true correlations r’ arranged in order of magnitude. The second column expresses the fact that the sampling errors on any occasion will probably be arranged in the reverse order of magnitude, disregarding their signs. The third column shows the correlation coefficients measured from their mean. The upper p’’s are then positive, and the lower negative, and also, what is not shown in the table, the absolute values increase upwards and downwards from the point where the signs change. The fourth (double) column shows the probable arrangement of the signs of the quantities «. If the e’s are all tending to be positive, then the left-hand member of the double column gives the arrangement, while if the e’s all tend to be negative, the other member of the double column does so. As shown in the last (double) column, therefore, the quantities p’e tend either to be nearly all negative or nearly all positive. For a very small sample the signs of p’e will no doubt be quite irregularly arranged. But with such a GopFREY H. THoMson 359 small sample, even if p’ and e were really uncorrelated, 1t would be most unlikely for S(p’e) to be negligible. As the sample increases the signs tend to settle down to the above arrangement, and S(p’e) does not tend to disappear compared with S (ee), but only to take on one or other of alternative values. It will only be zero when all the errors are zero, le. when no corrections are needed to R. The distribution of S(p’e) about zero in a number of samples of the same size will not, that is, show a maximum at zero, but a minimum, as is shown qualitatively in Fig. 1. Frequenoy - 0 + —Sipo — Fig. 1. To show the order of magnitude of these neglected quantities, consider the following example, in which the true correlations are known a priori, and with their observed values were as follows: ca =0°730, eq, = 0°708, e=—0:027, Fag—V'093;° ta, 0108, €=4- 0115, T og = 0°356, To = 0367, e=+0°011, ei = 0174, ry =0337, e=+0168, W ga = 0167, t= 0281, e= +0114, Tra =0120, ro=0371, e=+0-251, Pra =O116, w,=0112, e=— 0004, Yiq =O112, rq =01338, e=+0021. The variates here were made up of dice throws, and the sample was one of 36 cases. Here, knowing as we do the actual true correlations* which would be given by the whole population or by a sufficiently large sample, we can form the quantities S(ex~) and 2S(p'xaéea). They prove to be ‘064 and —-116. It is clearly unwise to neglect the latter of these in comparison with the former. (4) Haperimental Demonstrations in Cases where the True Values of the Columnar Correlations are known a priori. The formula at which Dr Hart and Professor Spearman eventually arrive, after neglecting these quantities and making various other assumptions, is ie R’ S (PaaPar) —(%—1) Pep Ona Fad (5) “ab = — = a = == —= ae 7 . aa Sa ee erserers e ) ViS (cca) oe (2 —— 1) Gara VS (px) = (1 = ) op} where the o's are standard deviations of the correlation coefficients, the bar indicates mean values for the column, and n is the number of pairs of correlation * G. H. Thomson, ‘A Hierarchy without a General Factor,” Brit. Journ. Psychol. 1916, vin. p. 271. 360 On Hierarchical Order among Correlation Coefficients coefficients concerned, in the two columns. In using their formula, its authors do not apply it to all the pairs of columns in the square table. They say: “In any case the correction must be kept within limits: as usual, the larger the correction the less it is to be trusted. If the sampling errors are large enough, they eventually will quite swamp the true differences of magnitude upon which the observed correlation should be based. In this case, the true correlation is beyond ascertainment ; any attempt at correction is merely illusory. To avoid this, and at the same time to ensure impartial treatment of all data, it is necessary to fix before- hand some definite limit to the feasibility of correction. We have here adopted the following standard: in order to attempt to estimate the correct correlation between columns, 2 ts required that in each of these columns the mean square deviation should be at least double the correction to be applied to that deviation.” That is to say, the equation (5) is not to be used unless, in each factor of the denominator, S(p?) is at least double its correction (n—1) 8% This condition (the “correctional standard”), will be found to be important. It is clear that the accuracy of this formula (5) could be conveniently tested were we in possession of material in which all the true correlations were known a priort, in addition to the observed correlations found in samples. Such material is supplied in perfection by correlated dice throws. First Example. The first experiment with dice of the above nature which I carried out was described in the Brit. Journ. Psychol. 1916, vi. There ten variates were artificially made up of group factors and specific factors, without any general factor, so as to make a very good hierarchy, which gave the following results when tested by the Hart and Spearman criterion. TABLE II. | Columns . : 5 The Hart and Spearman passing Obser be ares | que coun corrected columnar standard See ee Tre Buln correlation R’ —— Bie 2 Se ——— | ab 0°95 1:00 1°04 | ae 0°89 0:99 1:00 be 0:91 1-00 101 cad 0°90 1:00 Ill Means 0-91 1-00 1:04 Here the exaggeration of the Hart and Spearman R’ is not very noticeable, for the hierarchy is in any case almost perfect. Indeed in this case I took some pains to make the arrangement of group factors imitate a perfect hierarchy very closely, for the sake of emphasising the point I then wished to make, viz., that such group factors can, unaided by any general factor, approach exceedingly close to perfection of hierarchical order. I did not then realise that the pains I took over this point were hardly necessary, for random sampling of the group factors gives good GopFREY H. THomson 361 hierarchies, though such perfection as the above would be unlikely to arise from chance. Second Kaample. For a second example I have therefore chosen a hierarchy formed thus by the chance sampling of group factors, without any general factor, and moreover one which shows considerable departure from perfection of hierarchical order, it being the least perfect of those which I have up to the present formed in this way. The mode of construction of the variates is given in detail in Roy. Soe. Proc. A. xcv. (April Ist, 1919) on page 402, and the theoretical correlations on page 403 of that article. The latter show a certain degree of hierarchical order, though not very high, the true mean columnar correlation R for all pairs of columns being 0°59. Dice were now thrown to form 20 measures of each of the ten variates, Gree day sy on Dip- First the magnitudes of the group factors (which it will be recalled were in that article named after the cards of a playing pack) were decided by throwing dice, with the following results. TABLE IIL Number identifying | Name of Group Factor the subject ACs 22, Beads he 60 7 78. “90910 “Kn Q . “Kk 1 Gee letGhe 6b A Aas) ale ea 2 2 4 3 2 6 4 3 56 6 2 5°3 8 3 Dee le eed Go ie 45 3) Go 4s 8 4 6) 6 2) 5 5 5 6 G6 4° 5 «#5. 4 5 5 § 56 -4 9° 2 5 2 4 .T° 1 4 4 3B 6 5.6 9 6 1" 4 5-6 6 4°2 °5 4 2 i> 3.4 6 4 6 (6 2, 3 2 5b) 4 8 [GRE Oh? mee: leeds Ge 32) 2 9 Te Ge doe sie 2 2 I a) be 4 ly 5 10 Ge ad e325 12) A br I ae A BD 11 DAS AS ODA ES 28 esr a3) 2 8 22 12 Gio Ome ele eee oe eee reliee 2 2 mel all eG 13 6a 6) bo I Ge es 4° 4A 8 14 De 165 eo bo 8 ba 4 15 Deez SO 22 ro oI 6N 9G io 16 6 2 6 4 4 6 3 6 4 6 2 2 38 ile 2S Ie 6S 3 B= 2 62 6G 8 5 18 4 1 2 4 2 4 3 6 4 6 6 3 5 19 Gee 6 8 Se Ie oe 25 545s 3 ol 64 20 Siero 2) Oe 66 2) See Gy IG) 62) 4G Using these numbers, we can make up the scores for the group factor portion of each of the ten tests described in the article quoted. There results (see Table IV). _ The proper number of dice, as described in the article quoted, were then thrown for each test and for each subject to represent the specific factors, and the scores of these dice added to the scores given in the last table, the resulting total being the complete score for each subject in each test (Table V). From the dice scores the observed correlations between the variates can be Biometrika x11 24 362, On Hierarchical Order among Correlation Coefficients TABLE IV. Number identifying | Scores in the group factor portion of the tests the subject | v, Ce ye aN eG ais yh Le aienlo| eee | 1 20.915 «37 1) 25 31 42% ee oe 2 18 20 42 3 2 36 46 4 32 40 3 1 19. 40, 3 22° -34 43 7 270935 4 28 24 59 +4 38 45 64 6 46 54 5 19 10 39 4 27 2 44 65 34 38 | 6 21 21 52 5 32 39 #56 6 39 45 a 18 22 43 5 29 38° 47 ‘1 385 ‘44 8 15 19 35 3 22 30 38 3 26 36 9 17. 14 887d 19 26 89 CGO 10 14 138 36 3 22 26 38 1. 28 81 1] 1415 B20 B18 8 BH 488 12 17 6 ~30°..1 19 Jl 32 3°) 225 13 19.21 46 3 28 32 46> "32a 3h 14 23 18 41 4 30° 29 46 6 34 36 15 19 <2) (-43°- 2: 22° 330> 48) 2. "33eaesi 16 | 18 19 48 2 26 37 54 2 40 44 17 | 13 <1b.°39 22° 21 36 40: 291 Oe somam 18 15 18 46 3 24 38 50 1 31 42 19 | i> 18 42 2 i) 26) 445 fo soles 20 16 20 44 4 23 36 50 3 32 «41 | TABLE V. | Number identifying Total scores in the tests the subject 2, Ly vs wy v5 XG x7 xg vg Gp | 36 2670 9 47 49 92 10 54 #541 | 2 33. 28 Yh 149 58 655). 822 el ee 3 22 31 77 «+11 47° «#Sl 98 9 41 58 4 44. 34 104 «150 «0557059 114 13) 60 «6 5 35 21 72 17 #47 +46 88 11 46 58 6 41 37 79 17 50 55 105 12 49. 61 7 360 3838840 OT 54 BQ 9B 4 43 62 8 31 630088 7 40 56 838 12 41 49 9 40 23 92 16 38 44 81 10 39 54 10 36 24 90 14 47 46 89 10 34 = 82 11 29 19 67 18 48 40 72 9 35 49 12 36 «18 ~«©68— O16 B86 8338 CD 9 37 43 13 40 39 76 18 46 45 77 8 48 58 14 37 «©6238:0—C«Ci80—id6 4 C4 9s BSC*#S‘&L' 15 35 29 89 10 41 44 104 8 44 54 16 38 6-80) 80 14 AT 54 8B 7 53 63 17 30 27 «6883612 C460 S481 9 40 56 18 38 6-28 «=o 92 12 BO 104 8 38 63 19 34 22 91 10 387 «41 9% 8 43 54 20 35 35 682 6160648 (62 SO101 7 40 61 calculated, just as the correlations between mental tests are calculated. Using the product-moment formula we obtain the set of values in Table VI, arranged in hierarchical order, only slightly different from the true hierarchical order, except that variate a has changed its position rather violently. TR) GopFREY H. THoMsoNn 363 TABLE VI. The Observed Hierarchy. X10 rd Xo U5 v2 XL a XL V4 Vs Ly e 72 47 64 53 50 34 “45 2] 09 Ve “72 e 48 “45 79 48 32 [O(m—a 20 10 Xg *47 “48 e ‘Ol “46 45 “50 46 —-02 24 Xs “64 43 ‘O1 e 58 60 20 . °15 29 08 a) D3 ‘75 “46 58 e 63 "26 33 05 -"1l x 50 “48 “45 60 63 e "22 "29 —°16 18 Ly 34 “B32 D0 20 26 22 « “41 38 “15 Xs “45 “67 “46 “15 33 "29 “41 ° —°20 08 vy ‘21 -—°26 —-02 29 705 — "16 38 =— 20 ° — ll ag 09 ‘10 24 08 —--l1l 18 HHS) 08 —-l1l . | The pairs of columns which pass the Hart and Spearman correctional standard give the following values: TABLE VII. Columns Observed columnar True columnar ine Hart and Spearman passing eorealavenin alan corrected columnar standard bi aad correlation R’ 2&7 0°73 0°75 0°76 6&7 0°63 0°89 1°15 2&3 0°70 0°60 101 2&6 0°81 0°88 1:06 3&6 0°66 0°83 1:04 Means O-71 0:79 ; 1:00 True mean columnar correlation of the whole table and not merely 0°59 of the pairs of columns selected 2 by the correctional standard Dr Hart and Professor Spearman would therefore claim the hierarchy as being a sample of a perfect’ one. The true mean columnar correlation for the whole table is 0°59, the Hart and Spearman correctional standard selects pairs of columns whose true mean columnar correlation is 0°79, and the mean value of these when corrected according to their formula rises to unity. This example goes far, I think; towards shaking confidence in their criterion. It must, I think, be partly chance which makes it so peculiarly unfavourable to their work: but I give it as it came. Really a very large number of such examples is necessary, and not all of these could be expected to be so unfavourable. The only other example which I have attempted I have carried far beyond 20 cases 24-2 364. On Hierarchical Order among Correlation Coefficients without as yet reaching a point where any of the columns pass the correctional standard. I feel that working a large number of such examples is beyond the power of an individual, with other claims on his time, and rather a task for a statistical laboratory with experienced computers and mechanical aids. (5) The Effect of the Correctional Standard. Clearly the fact that the criterion is apparently too large in a majority of cases requires further explanation beyond the error already pointed out of neglecting the terms in p’. The other approximations made in obtaining the criterion do not appear to be so erroneous as this one, though their cumulative effect may explain some anomalies. Leaving them on one side let us consider the * correctional standard ” required by Dr Hart and Professor Spearman before they admit any pair of columns. It is this correctional standard, combined with the peculiar distribution of F’, which chiefly is responsible for the exaggeration of perfection produced by this criterion, and for the regularity. with which an average value of unity is arrived at, Let us examine first the actual distribution of the Hart and Spearman R’ in a psychological hierarchy, viz., that of Wyatt already referred to, and calculate R’ not only for those columns which pass the correctional standard, but also for other pairs of columns. What we find is that its value rises as we descend the hierarchy, rushing asymptotically to infinity, remaining for a time imaginary, and then returning. The value reaches infinity when one of the corrections in the denomi- nator becomes as large as the term to be corrected, and remains imaginary until the other term is likewise passed by its correction, when both quantities under the square root are negative and an arithmetically possible but meaningless value is again calculable. Specimen values from Mr Wyatt’s hierarchy are given in this Table. TABLE VIII. Pairs of Columns Values of the Hart and Spearman R’ Analogies and Wordbuilding 0:93 Completion and Wordbuilding 0-97 | Passed by the Completion and Part-wholes 1:05 ¢ correctional Wordbuilding and Part-wholes 0-99 | standard Part-wholes and Memory (delayed) 0-92 Rearranged letters and Missing digits 1:17 Wordbuilding and # R Test 1:26 Sentence construction and Fables 1:33 Rearranged letters and # & Test Practically infinity Nonsense syllables and Dissected pictures Imaginary Crossline test and Letter squares 0°35, both factors in the denominator being now negative. Expressed in diagrammatic form this and similar calculations lead to the conclusion that in actual practice the criterion is distributed as in Fig. 2, where the curve is to be understood as a “best fitting” curve among the values of Ff’ scattered, with a very considerable dispersion, on both sides of it. The line, in fact, ought to be a broad smudge. Ee GoDFREY H. THomMson 365 Now clearly, with a distribution of this sort, it is very important that the boundary between the values that are to be rejected and those that are to be accepted should be chosen with the greatest care, and not arbitrarily but scientific- ally. Either sound theoretical reasons should be given for the choice of the correctional standard, or the choice should be based empirically on experiments in > i=l 3 ss) ae a 2 — = = 2 = S £ > N > fe = x 5 > == a 3 © . d @ = 5 i= 2 o “Ss ) vo <= a ~ = o > = = ae) Dod me o & unity S D oO \\S Q ve n > ae} =e = iS act = Ss = + 3 3 cQ = o <= Es zero —— - Descending the hierarchy—» Fig. 2. material where the truth is known a priori, as in the above dice experiments. For obviously, by moving this boundary, we can make the final average take on almost any value. Another point is that the criterion rushes to infinity at such speed that its probable error must be enormous. Dr Hart and Professor Spearman, 366 On Hierarchical Order among Correlation Coefficients however, give no reasons for their choice of this particular standard, upon which depends so much the values they obtain. The standard which they thus arbitrarily adopt begins admitting the criteria at just such a distance above unity as to balance the cases which give a criterion below unity, and entirely explains the remarkable unanimity with which this average value unity is obtained by them in their calculations. (6) Conclusion. A criterion suggested by Dr Hart and Professor Spearman has been widely used by psychologists for the purpose of ascertaining the degree of “ hierarchical” order among theoretical correlation coefficients of which only experimental values are known, and a Theory of General Ability has been based on the results. In the present paper it is however shown theoretically that an assumption made in deducing this criterion, namely that p’ and ¢ are uncorrelated and the sums S(p’e) negligible, is incorrect. The quantity e taken regardless of sign is strongly corre- lated with p’, and its signs tend to be either all the same as, or all different from, those of p’. The distribution of the sums S(p’e) shows a minimum, not a maximum, at zero. Otherwise the paper is empirical, and applies the criterion in question to correlated dice throws. In the cases tried, this criterion exaggerates the perfection of the hierarchy considerably, claiming a quite poor hierarchy formed by random group factors as being perfect (true mean columnar correlation 0°59, the Hart and Spearman R’=1:00). The reason for this exaggeration, and for the unanimity with which in so many experiments the average value unity has been found for the Hart and Spearman criterion, appears to be mainly the peculiar distribution of this quantity, combined with the action of the “correctional standard” adopted, which commences admitting the criteria at such a distance above unity as to balance those which are less than unity. MISCELLANEA. I. Inheritance of Psychical Characters. By KARL PEARSON, F-.R.S. In view of the papers that have been published on the inheritance of intelligence, it is strange that there should still remain any doubt that psychical characters are inherited at the same rate as physical characters. But having regard to the existence of that doubt any material bearing on the point deserves special recognition and emphasis. In a recent contribution to the Journal of Delinquency, Vol. Iv. p. 46, Dr Kate Gordon gives the results of her tests by the Binet-Simon method of the intelligence of the children in three orphanages in California. Among other data she gives, almost as an aside, a small table for the correlation in intelligence-quotients of 91 pairs of siblings. This table appears to me of very considerable interest and supplies what is occasionally lacking, a nearly uniform environ- ment® both in training and in nourishment to the pairs dealt with. Those who dislike the idea that the mental as well as the physical characters are largely fixed for us by our ancestry are apt to attribute—regardless of known measurements of the intensity of environmental influence—the correlation of pairs of siblings for mental characters to a differential environment of the pairs, i.e. to differential family or home training. Hence the value of data obtained within the walls of an orphanage, as tending to minimise this differentiation. The Intelligence Quotient, it will be remembered, is the ratio of the mental age as given by an intelligence test of the Binet-Simon type to the actual age. The accompanying correlation table is the ‘scatter’ table of br Gordon rendered symmetrical, so that we can enter with either member of the pair. The probable error must, of course, be calculated for the correlation on the basis of 91 pairs, but for the mean and standard-deviation on 182 individuals. We find : Mean Intelligence Quotient ae ie =92°857 +:°836, Variability in Intelligence, s.p. ... ee =16°727 +°591, Coefficient of Variation ... Be. a =18:014 +°657, Correlation of Intelligence between Siblings r= 5082 + ‘0524. At first sight it might seem as if the mean Intelligence Quotient was somewhat low. For a normal child it should be theoretically 100, but so much depends on the nature of the tests used and also on the manner in which they are applied that we cannot dogmatise on this point. In some recent American data we found a very low intelligence quotient among literate adults, and the result was clearly due to the nature and method of applying the test. The coefficient of variation in this case rose to the high value of 38°52, fully double the value we have found in other cases. We may note that the coefficient of variation is also large in the present case, which is distinctly against intelligence being much influenced by environmental conditions— * The ideal method would be to take all the siblings in a very large orphanage, such for example as the Reedham asylum, and select if the numbers should prove adequate only the children who had entered the orphanage at an early age. 368 Inheritance of Intelligence Quotient in Siblings. First Sibling. Intelligence Quotient : Miscellanea OYI—&SI1 G6I—06T OSI —Gel] GéI—O6I SII—Oll. 0GT— SIT | | [ha | Naa OLI—Y0L | 12 4 | 3 GOI—00T ae J awe | Soe 001 —S6 ee 06—-G8 | Srests QMaresta Sto CORNY flagticN ae | ea Pea ace fama |p taal | 21 | 12 G8—08 are! WINONA | i | | yo | [aoa cas | GN Se Jaan | ane aa flaaa | [he seals) —|- Te ered Melo. omimanleao 20 | 23 | 21 Sa oe he ee a | ney ee eee eT SSVHNTSSSSSHSNKVssEggsssgsg a he hn he i ‘BUTIGIG puooag : quand sduUEST][eyAT Jc. SP inier + eeuneate cen eee, Miscellanea . 369 for in this instance we have considerable approximation to uniformity of environment. For 261 normal children examined by the Binet-Simon method by Dr Jaederholm, I find the coefficient of variation in intelligence as measured in mental years to be 19-476. For 420 children in two schools [ find a coefficient of variation in general intelligence of 21:986, and for 1725 children in eight schools I find a coefficient of variation in terms’ marks of 23°133 These are somewhat greater than the variability obtained for the orphanage children, but do not show the great increase some might anticipate from variety in home and school training, and the increase of the last two results may be solely due to the different standards imposed by the judgments of a variety of teachers instead of, as in Jaederholm’s and our present cases, an identical series of tests made by a single psychologist. The noteworthy value, however, lies in the correlation of “508. The values obtained for 12 cases of physical characters in siblings (Biometrika, Vol. 11. p. 387) have exactly this value for their mean. No stress can, of course, be laid on the absolute identity considering the smallness of the present series, but much stress may be laid on the approximation of the two results. But the present data are of further interest —although they are so slender —when we compare the results to be obtained from them with those for a far longer series of pairs of siblings obtained by the method of “broad-categories.” This series is also formed from pairs of siblings who are children. They belonged to a great variety of schools taken throughout Great Britain. Every variety of environment, every variety of educational and home training is therefore included. Accordingly if the intellectual resemblance of siblings were the result or largely the result of differential treatment, we ought to anticipate a great increase of correlation in this material over that of the material drawn from the Californian orphanages. We have also the possibility of obtaining light on two further problems : (i) Whether the method of “broad-categories” really does give results markedly inferior to the Binet-Simon method of direct quantitative measurement. (ii) What is the approximate value of the ‘“‘mentace” or unit of intelligence in terms of a unit obtained from a Binet-Simon test. The definitions of the “ broad-categories” used by the Galton Laboratory in its intelligence investigations have already been published in this journal*, and a “mentace” has been defined as the z}9 part of the range which limits the category “ Intelligentt.” Now if we compare the two series, the one determined by “broad-categories ” and the other by the Binet-Simon test for the total frequencies up to the beginning and up to the end of the range “ Intelligent,” we shall have a first approximation—on the assumption that both series are measuring the same general intelligence character and both approximate to normal distributions—to the absolute value of a “mentace.” I find that my mentace is equal to 1604 of Dr Gordon’s intelligence quotient units, or with the average age of 10°2 (which appears to have been that of her children) it equals six days about of mental growth of children at this age. Roughly we might say that a mentace is equal to about a week’s mental growth at the age of ten years. In estimating the meaning of this statement we must remember that mental growth is very rapid at this age f. As the American data pool children of both sexes I have for purposes of comparison done the same. The following table represents my material for 5602 children in 2801 pairs, each pair being entered either way so as to produce a symmetrical table. * Biometrika, Vol. vu. p. 93. + Biometrika, Vol. v. p. 109. + The reader will of course avoid the conclusion that the mentace is an intelligence unit varying with age. It is the time rate of growth of intelligence which varies with age, and we must state a particular age in evaluating the mentace in terms of growth of intelligence. 370 Miscellanea Contingency Table for General Intelligence in Siblings. Category of Intelligence of First Sibling. : uick | : Slow | Slow Ver xe Inteligent Intethgent Intelligent | Slow | Dull Dull Totals 8 : | So ~| Quick Intelligent | 31: 33° 131-75 3 66 4 767-5 S | Intelligent 5 75 1 1927°5