Skip to main content

Full text of "Forcing regression through a given point using any familiar computational routine"

See other formats


. Cadl Gig Ree. cl = ne 


ws fia 


Forcing Regression Through a Given Point 


Using Any Familiar Computational Routine 


by 
Edward B. Hands 


- / TECHNICAL PAPER NO. 83-1 
: MARCH 1983 


\ \ tae 
NG et 
ne 


| Approved for public release; 
| a distribution unlimited. 
U.S. ARMY, CORPS OF ENGINEERS 


COASTAL ENGINEERING 
RESEARCH CENTER 


sé Kingman Building 
TU Fort Belvoir, Va. 22060 


or republication of any of this material 


Reprint 
Army Coastal 


shall give appropriate credit to the U.S. 
Engineering Research Center. 


Limited free distribution within the United States 
of single copies of this publication has been made by 
this Center. Additional copies are available from: 


Nattonal Technical Information Service 
ATIN: Operations Division 


5285 Port Royal Road 
Springfteld, Virginia 22161 


The findings in this report are not to be construed 
as an official Department of the Army position unless so 
designated by other authorized documents. 


NA 


fii 


ll 


0 0301 0050007 2 


A 


UNCLASSIFIED 
SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered) 


REPORT DOCUMENTATION PAGE BR COE ea 


1. REPORT NUMBER 2. GOVT ACCESSION NO, 3. RECIPIENT'S CATALOG NUMBER 
Less 


4. TITLE (and Subtitle) 5. TYPE OF REPORT & PERIOD COVERED 


FORCING REGRESSION THROUGH A GIVEN POINT Technical Paper 


USING ANY FAMILIAR COMPUTATIONAL ROUTINE 


7. AUTHOR(s) 8. CONTRACT OR GRANT NUMBER(s) 


Edward B. Hands 


9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT, PROJECT, TASK 
AREA & WORK UNIT NUMBERS 
Department of the Army 


Coastal Engineering Research Center (CEREN-GE) D31677 
Kingman Building, Fort Belvoir, VA 22060 


11. CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DATE 
Department of the Army March 1983 
Coastal Engineering Research Center 13. NUMBER OF PAGES 


Kingman Building, Fort Belvoir, VA 22060 20 
14. MONITORING AGENCY NAME & ADDRESS(if different from Controlling Office) 15. SECURITY CLASS. (of thie report) 


UNCLASSIFIED 


15a, DECLASSIFICATION/ DOWNGRADING 
SCHEDULE 


. DISTRIBUTION STATEMENT (of this Report) 


Approved for public release; distribution unlimited. 


. DISTRIBUTION STATEMENT (of the abstract entered in Block 20, if different from Report) 
- SUPPLEMENTARY NOTES 


- KEY WORDS (Continue on reverse side if necessary and identify by block number) 


Coastal engineering Prediction equations 
Data analysis Regression 


20. ABSTRACT (Continue em reverse side if necessary and identify by block number) : 
This report describes a simple method for obtaining the prediction equation 


best fit to all data points (in the least squares sense) while forcing an exact 
fit at any kmown point. The decision to constrain the solution at a point 

should be justified on theoretical grounds without appeal to data. Examples 

are given. When required any familiar regression program can be forced to 

select the best line through a given point by simply adjusting and extending 

the data entry. All necessary changes to the program results (test statistics 
and estimates of regression parameters) can be accomplished without modifying the 
computer program. 


DD jan 7a 1473 EDITIon oF t Nov 65 1s OBSOLETE UNCLASSIFIED 


SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered) 


ay 


avila eos Ug : aA i 
\ fee iy a bie Hh 7 Pa i 


f ay ea 


ge, Naa 


ie " t 


ep eat Bee 
i 25S ean! 

dla ee Pest alte bl, hl 

wy a 


AY! off 


¥ 


bo 


y hi A hy i ; mi | ; ik 
ae NE at oe 


PREFACE 


This report draws attention to the frequent, but often neglected, need to 
force a regression line through a known point while obtaining the best possi- 
ble fit to all experimental data points. A simple method is described for 
solving this problem without modifying customary computational routines. This 
method can be applied to many problems, but is especially useful when cali- 
brating empirical prediction formulas to fit site-specific coastal conditions 
or when choosing from among several theoretical prediction models. The work 
was carried out under the U.S. Army Coastal Engineering Research Center's 
(CERC) Shore Response to Offshore Dredging work unit, Shore Protection and 
Restoration Program, Coastal Engineering Area of Civil Works Research and 
Development. 


The report was prepared by Edward B. Hands, Geologist, under the general 
supervision of Dr. C.H. Everts, Chief, Engineering Geology Branch, and Mr. 
N. Parker, Chief, Engineering Development Division. The author acknowledges 
the helpful suggestions received from C.B. Allen, C.H. Everts, R.J. Hallermeier, 
R.D. Hobson, and P. Vitale. 


Technical Director of CERC was Dr. Robert W. Whalin, P.E. 


Comments on this publication are invited. 


Approved for publication in accordance with Public Law 166, 79th Congress, 
approved 31 July 1945, as supplemented by Public Law 172, 88th Congress, 


approved 7 November 1963. 
TED E, BISHOP ; 


Colonel, Corps of Engineers 
Commander and Director 


CONTENTS 


Page 
CONVERSION FACTORS, U.S. CUSTOMARY TO METRIC (SI). ........-- : 
SNAMINOIS ANNUD) IDAIMIORITINIOINS 5" 6 0 0 0000000000000 656 00 0 6 
TENN OIDIVICIEILON TWO IINENNSSIUCIN 5 5 56 00 0 Oo oO OOK ooo 9 
JX TIO \RALet WHS, GWISINOMIAK? INDDROVNGEIS 56 5 6 6 5606600005000 11 
SOLUTION TO THE PROBLEM. ... . ee eRe aha mmcaterh 5 Ciredh Yedbie el. Aebuet ta Oho 12 
1. Regression Through the Orefien eats Behe a) leur vo. zo A le 
2. Regression Through Any Arbitrary Point “Gs 'b) A” a) 2 ello a Ae 
SIMICIIONG, WIOAKIIAON WOES I ANNI Ile 4g 666 60a.6 8 ogee ooo du6 ob» Leo 
TIGER RAT URE ACUMCED ie Meare tien vos Vm eer a acl es Pe aa pee ae ot Gast gy) oo, 220) 
TABLES 

Adjustment of standard elements produced by programs using 
ext endedudatagnra amc tpt the tet NWAw Tu Re tates, cat, the fe ellen me 
Palelidicalitbrathionidatas ey si scty uses ines weirs fs) fs oe st ve et et felt 16 
Extend cdudatama ct Noma Wem. lmrward matin Carmen amen utes en a) 2 (st rseeuee ets ve ee con 28g, 
Extendedridatamscitr iNOk Wik cotm seis Yeo Ly ee nemnat fst anes, ah oW et cin cole cs fa el tomes 19 


FIGURES 
Application of Model I produces an intercept (a), which may be a 
useful estimate of a component of longshore flow which is independ- 


ent of wave conditions and presumably pervades the entire data set. . 10 


Application of Model I identified a threshold value below which 
WEVES CAUSE MO GAME 5 o 05000000 Fo 7 oo oOo ooo KO 10 


Application of Model II forces a zero-intercept solution ....... 11 


Model II estimates an increase in Y per unit increase in X that 


is nearly twice that predicted using Model I. .-........... il 
Real test data for example problem Hacer staiet trove tofaets@ sn ouster; is? Got ce Suite! aya ts 17 
Real test data for example problem 2 and fitted equations. ...... 18 


CONVERSION FACTORS, U.S. CUSTOMARY TO METRIC (SI) UNITS OF MEASUREMENT 


U.S. customary units of measurement used in this report can be converted to 
metric (SI) units as follows: 


Multiply by To obtain 


inches 25.4 millimeters 
2.54 centimeters 
Square inches 6.452 Square centimeters 
cubic inches 16.39 cubic centimeters 
feet 30.48 centimeters 
0.3048 meters 
square feet 0.0929 Square meters 
cubic feet 0.0283 cubic meters 
yards 0.9144 meters 
Square yards 0.836 square meters 
cubic yards 0.7646 cubic meters 
miles 1.6093 kilometers 
Square miles 259.0 hectares 
knots 1.852 kilometers per hour 
acres 0.4047 hectares 
foot—pounds 1.3558 newton meters 
millibars 1.0197 x 1073 kilograms per square centimeter 
ounces 28.35 grams 
pounds 453.6 grams 
0.4536 kilograms 
ton, long 1.0160 metric tons 
ton, short 0.9072 metric tons 
degrees (angle) 0.01745 radians 
1 


Fahrenheit degrees 5/9 Celsius degrees or Kelvins 


lt) obtain Celsius (C) temperature readings from Fahrenheit (F) readings, 
use formula: C = (5/9) (F -32). 
To obtain Kelvin (K) readings, use formula: K = (5/9) (F -32) + 273.15. 


Hp 


SS 


yeu 


SYMBOLS AND DEFINITIONS 


The F-value may be produced by a multiple regression program and 

is analogous to the t-value in simple regression (one independent varia- 
ble). The F-value indicates the "significance" of r* and is useful 

in selecting the most important independent variables. 


p 2G =D" (a-2=2) abaecoen, /Misapie 
E(y - y)? P Ls 62 P 


height of breaking waves 
size of the sample 


total number of independent variables. Caution, several observed car- 
riers may end up combined into a single independent variable; e.g., 

X= (gH,) 1/2 sin 2a, has two distinct carriers (Hj, and ap) but is 
one independent variable (see example problem 1). The value of p will 
be one less than the number of constants to be estimated in Model I, 

and is equal to the number of constants in Model II. 


sample correlation coefficient. The r-value produced by regression 
partially measures the closeness of fit between the linear predictor and 
data. Its square is called the coefficient of determination. 


ia NS me ane 

eB = Boas = 2y y) & a) (Model 1) 
(y - y)? i(y - y)* U(x - x)? 

r2 = see ES caus (Model IT) 


(Sy) 2(2x) 2 


sum of squares of x may be produced by the regression program 
and is useful for computing other values, e.g., Sg. 


= =) 2 
SS. = Gs = 2) 
standard error of the estimated slope, 


: S$ ox 
B scm 
The larger Sg, the less reliable is the estimate of slope. 


unbiased estimator of the variance of the random component €, e.g., 


ig = y)2 
2 eo eet Atal . 
Sy ox Apel in Model I 


The number of independent variables, p, is 1 in simple regression 
with Model I. The mean square deviation from regresston corresponds to 


> 


the simple variance used to measure the spread of values in a single 

data set. It is also sometimes called the standard error of the estimate. 
The value produced by regression to indicate uncertainty of the esti- 
mated y; the value Boose depends on the variances of all the estimated 
coefficients. 

The t-value produced in simple regression to test whether the estimated 
regression coefficient is "significantly" different from zero. 


longshore current velocity 
independent variable in regression 


observed values of X. A string of n-values in simple regression; a 
n by p matrix in multiple regression 


dependent variable to be estimated 

n observed values of Y 

estimated value of Y for given values of X 

Y-intercept in a regression model 

angle between the crest of the breaking wave and the shoreline 


estimated regression coefficients in multiple regression or the slope of 
the line in simple regression 


5 3G =o DG =p 


8 = ——————————_ (Model I) 
I(x - x)2 


a del IT 
= ee (Mode ) 


zero-mean random component of Y assumed by both regression models 


atti ae yo Wes - : 
Aveta, aia brat was tesa gph. 
ie ta wat ee) sibaraien ol 


- Pre). 


he 
Lod 


pie = wh mb ae “oka vey vn ba weve bog badness 
o RM Ph a 
re ee oboe ar bao at a St ‘seston a 
ee ae Ar ca 
 ieiteonlld rin tos rie eat any css se i cabs ae) ats a 


er a. wecknaasgsi Sythe ak eae Mand: (utes stant “ator ses 
2 ' ; Pale yeti! ae 2 RRR a> wd aiid 
y - f a a A - S 


Se a eee 
— Rye e.g 
oF ae Pes A 


Pay Oe 
Ok, 1 Daeboh): Sle 


View 


"Sh Tipowtidess aolincs gaaeaes 


FORCING REGRESSION THROUGH A GIVEN POINT USING ANY 
FAMILIAR COMPUTATIONAL ROUTINE 


by 
Edward B. Hands 


I. INTRODUCTION TO REGRESSION 


The engineer frequently needs to estimate some response or dependent variable 
Y (e.g., sand transport rate, change in shoreline position, or structural dam- 
age), when given the magnitude of other factors, or independent variables X 
(e.g., longshore wave energy flux, storm frequency, elevation of storm surges, 
etc.). A common approach is to assume a linear model, 


Y = a + 8X + € (Model I) 


then adopt the principle of least squares; and use sample data to estimate the 
unknown parameters, a and 8. Both 8 and X can be considered as strings 
of numbers in the case of multiple regression with several independent varia- 
bles; e¢ indicates that the response is not being thought of as an exact linear 
function of X. The e€ represents random and unpredictable elements in Y; 
therefore, e¢ does not appear in the prediction equation: y = a+ 8x, where 
eCard 8 are estimates of the corresponding components in the conceptual 
Model I. The assumption that e has an expected value of zero indicates that 
the “average'' response is considered linear. If e varies widely, Model I, 
though conceptually correct, may have only limited predictive value. In such 

a case the estimated mean value of Y would frequently be thrown off by noise 
in the data. If ce varies only slightly, good predictions will be possible 
provided good estimates of a and 8 are available. Adopting the principle 
of least squares means one is willing to define the best estimates of a and 

8 as those that minimize the sum of the squares of the deviations between the 
observed and predicted values (i.e., y and y). 


Customarily, no constraints are placed on the contenders for the best fit 
line. Of all possible lines in the XY plane, the prediction equation is 
chosen because it has the least sums of squares of deviations in y's from the 
data points. The y-intercept, a, is the point where the best fit line inter- 
sects the Y-axis. The a may be of Special interest, e.g., in the regression 
of current speed against longshore wave energy flux measured in a field test 
(Fig. 1). An intercept substantially above zero would suggest that during the 
test a component of the longshore current was driven by mechanisms other than 
waves (e.g., tides or winds). In this case, the nonzero intercept would not 
only be meaningful, but would also provide a good estimate of the velocity of 
any steady, nonwave-generated coastal current during the test. 


An additional example of unconstrained regression would be where greater 
and greater structural damage occurs as the wave forces exceed an undetermined 
threshold value. Again Model I applies and produces the correct regression 
coefficient (8). In the process it produces a meaningless response intercept 
well below zero (Fig. 2). In contrast with the previous example, the interest 
here is strictly in the prediction of future damage for given wave forces, not 
in the value of the intercept itself. The resulting linear relationship applies 
only to values of the independent variable above the threshold of wave effect. 


Biveuisey iy 


Figure 2. 


Flow Rate 


Wave Energy Flux 


Application of Model I produces an intercept (a), 
which may be a useful estimate of a component of 
longshore flow which is independent of wave con- 
ditions and presumably pervades the entire data 
set. 


Wave Forces 


Vi. s—Negative Intercept 


Application of Model I identified a thresheld value 
below which waves cause no damage. A negative inter- 
cept is produced, but is of no interest in this 
particular problem. 


Although the negative intercept (a) is in itself meaningless, Model I is 
correct because there is no basis for constraining 4. 


II. A PROBLEM WITH THE CUSTOMARY APPROACH 


There are many cases where the logic of the application dictates the 
response at a particular value of X. For example, if the response is some 
change that is regressed against time then the response must be 0 when X= 0 
(Fig. 3). If there is no elapse time, there can be no change. If the linear 
assumption is valid, the appropriate conceptual mode is 


Y = 8 X + ©€ (Model IT) 


and the customary predictive equation (based on Model I) is inappropriate and 
May give poor estimates of 8 (see Fig. 4). Yet the vast majority of regres-— 
sion programs (e.g., SPSS, IMSL, IBM's 5110 package, and TI-59) do not allow 
specification of a zero intercept or any constraint through a known point. 
Statistical texts usually do not cover this topic either. However, formulas 
for the zero-intercept case are given by Brownlee (1965) and Krumbein (1965). 


Figure 3. Application of Model II forces a zero-intercept solution. 


Y 


A 
Model 11> 8 =0.63 
“_= Model 1 > 8 =0.34 


Figure 4. Model II estimates an increase in Y per unit increase in X 
that is nearly twice that predicted using Model I. The phy- 
sical relationship between X and Y dictates which model 
should be adopted. If Model II is appropriate the solution can 
be obtained using a simple artifice described in this report 
to modify results of standard computer programs intended for 
Model I. 


1 


The value of Y may be known for a single value of X (mot necessarily 0). 
The best prediction should then be sought from among the limited subset of 
lines through this point. All these lines will have a larger sum of squares 
(Z[y - y]?) than the line that would have been selected by Model I. A simple 
procedure is described herein for picking from among these restricted candidates 
the one with the smallest ‘Z[y - y]*. Thus, regressing through the origin is 
but one specific case that can be solved by a general model forcing regression 
at an arbitrary point. 


III. SOLUTION TO THE PROBLEM 


This report describes a method for getting the best fit to all data points 
(in the sense of least squares) while forcing an exact fit at any known point. 
A simple procedure for forcing regression through the origin was described by 
Hawkins (1980), who indicated the procedure was not well known. The author of 
this report knows of no references to the general case of an exact fit to an 
arbitrary point. However, if a fit can be constrained through the origin, then 
a simple transform of variables can force the line through any given point. 
The details of the through-the-origin solution will be explained first. 


1. Regression Through the Origin. 


For each set of measured dependent and independent variables observed 
(yj, x4), also enter, or program, a mirror-image set (-y,, =x; ))- Thus), the 
computer is given an extended data set consisting of 2n data points, only n 
of which were observed. By definition of this extended data set, the depend- 
ent and all the independent variables each individually sum to zero, forcing 
a zero intercept: 


qa by the principle of least squares 


u 
< 
| 
DR 
bea 


> 


a = 0 because ‘x and Yy = 0 and thus 
xX = y = O on the extended data set 


Thus a zero-intercept solution is obtained. Is it still the least squares 
solution for the observed data set? The principle of least squares by defini- 
tion minimizes the sum of the squares of the deviations of the observed from 
the predicted values. Because each squared deviation from the observed data 
set generates an identical squared deviation in the extended data set, the sum 
of these two positive sequences is minimized over the extended data set only 
if it is also minimized over both the observed and the mirror-image sets. 
Thus, the regression coefficient produced in this manner; not only the least 
Squares solution for the artificially extended data set, but for the observed 
data set as well. By this artifice the proper estimate is obtained for the 
regression coefficient (8) with the prediction forced through the origin. 


2. Regression Through Any Arbitrary Point (a, b). 


If the predicted response (Y) must be a when the independent variables 
(X) are b, then regress an extended data set u on v, where u=x-a 
and v=y-b. If (a, b) = (0, 0), then this collapses to the exact 
situation described above. If (a, b) # (0, 0), the direct results, wu = Bv, 
should be unraveled to produce the y prediction: 


12 


G> 
tl 


y - b = B(x - a) 


(b - aB) + Bx 


M< 
ll 


NOTE: The proper estimate of the regression coefficient (8) now forces the 
prediction through the point (a, b) as desired. By using this procedure the 
correct regression coefficient is obtained by using any familiar computational 
routines. The second most frequently reported output from regression programs, 
the correlation coefficient (r), is also the correct, unbiased estimator for 
Model Il. 


If additional information is provided by the regression program, then 
corrections may be necessary before adopting them for the real data set. The 
estimate of the residual variance will be correct for simple regression (one 
independent variable) and can be easily adjusted for multiple regression (see 
Table 1). Any sums of squares, cross products, and F-values produced by the 
program will be exactly twice the correct values. The standard error of the 
estimated slope will be too small by a factor of V2. Therefore, the t-value, 
for testing the zero slope hypothesis, will be too large by the same factor. 


Table 1 indicates the corrections for most of the elements produced by 
various .cgression programs. However, employing the described extended data 
procedures does not require consideration of any part of the output beyond that 
used in the standard unconstrained approach. 


IV. SELECTING BETWEEN MODELS I AND II 


If either the true or mean value (whichever interpretation fits the situa-— 
tion) of the dependent variable (Y) is unknown for all values of the independ- 
ent variable in the range of concern, then the customary model (I) may be 
appropriate. However, if the postulated physical relationship between X and 
Y dictates constraint through any point (a, b) and the relationship is linear 
from the maximum observed x to x = a, then Model II should be used. To pro- 
ceed with the customary evaluation of Model I would be equivalent to ignoring 
what is already known about the relationship between X and Y and, instead, 
relying totally on the limited information available in the sample data. The 
objective should be to obtain the best interpretation of the data, which does 
not override any more firmly established understanding of the situation. 


Assuming Model II applies, it may still be useful to evaluate Model I to 
test in the conventional way (Draper and Smith, 1966) the significance of the 
estimated nonzero intercept. If this test fails to provide enough evidence to 
reject the strawman hypothesis (H,: a= 0) then this failure may be cited as 
additional evidence strictly from the data, substantiating the choice of Model 
II to estimate 8. The results of this formal test of hypothesis should not, 
however, be relied on as the criterion for selecting Model II. It should serve 
only as a source of auxiliary information clarifying the extent to which the 
sample data will support the model choice. The choice should be made on the 
basis of functional insight and understanding of the relationship between X 
and Y. 


Comparing the correlation coefficients or r-values, produced using the 
real data and the extended data, is likewise not a valid method for choosing 


13 


°(996T) YaFUS pue todeag UT eTqe[TeAe st uoTSsseaZea1 Jo sjUSUETA 

prepueys 9yi SsuTJeAdzeq{UE_ UO UOTJeEWIOJUT [TeUOTITPpY ‘peaNseeuUl e19M Jey seTqeTIeA JUepuedepuT jo 

Jequnu ey} st d ‘(sjUsWeINsPell sulos Jo UOTJeENTeAS UO peseq eeWTISe SI]— s— g pue Tepow Tenzdaouos 
eB UE JoqJewesed umouyun ue st g ‘*8°9) saoqoweaed peqjeyun oni ey. Jo seqewTyzse ose senqea peqieH, 


OSIeT 
AjTe_eiepow st = ane Obs Ajejeupxoidde 
pue [=d “go ok ATQ0exXe ST YOTYM 
d-u 
‘ ) teh eS oURTIPA [TeENpPTsel pejeurisy 


Z/l (e/ =o = u) 


¢ 
5B A anTqTea- 
Fj TeA-d 
“al al oTIsTIeIs 3S99-3 
eg ee ee Sa Glo Ota sezenbs jo ung 
(tig Age  2gig 
d esie, AjTeqeiepow 
Ss} — 3: & ;28 AjTerewrxoidde pue 
ped ar @ 1s ATWIeX® ST yoPYM 
(d - u) g J 
dS ws g jo 201x709 piepueqjs 
aft | Cr = & = ee) |e $ Ms 
5 BA d :jUeTOTJJOoO UoTIeTeI1I09 
1g 9 g <sqUeTITFJO0O uoTSselrsey 
eqep pepue}xe B8utsn 
eUTINOA UOCTSSeIZeA pouTeazsuoOD SOTISTIeIS 3S0} 
Tepow peupTerzjysu0d TOF saqeMTIse 49987109 Aq peutejigo sozewLqis7 pue siojowereg 


SSS ee eee ee 


“e1e8p pepueixe Buysn sueis0id Aq peonpoad sjuseweTe prepueqjs jo quewqysnf{py -T eTqeL 


14 


between Models I and II. The value of r2 using Model I (observed data only) 


is often referred to as the reduction in variance of the estimator made possible 
by using the apparent association between X and Y. A value of 0) 
indicates that knowledge of the X-values makes no improvement in the prediction 
of Y and using the mean value of the y's as the estimator would not increase 
the sum of the squares of the deviations. At the other extreme if r2=1, all 
sample points lie on a sloping straight line implying a strong predictive value. 
Similarly with Model II, higher r* values indicate improved fit of the data; 
but comparing r* values between Models I and II does not reveal which is 
correct or even preferable. There is a slight conceptual and a substantial 
computational difference between the r* values for the two models. The two 
values should not be compared; both indicate the relative fit of various data 

to their own particular model. Either value can be used to measure "goodness 

of fit" in particular applications; or even to indicate the usefulness of several 
versions of the particular model chosen, For example comparison of r-values 
would indicate whether taking logs of the measurements, or raising them to a 
given power prior to regression, improved the fit. But comparison of the r- 
value would not be a valid basis for choosing between Models I and Il. 


V. EXAMPLES 


The following problems illustrate a frequent need to constrain the regres-— 
sion line in coastal engineering applications. The problems also illustrate 
the usefulness of r2 to rank different predictors in terms of how well they 
fit data. Before initially applying the described method to an actual problem, 
it may be helpful to reanalyze one of the smali data sets used in these examples 
and compare the results with those published in this report. 


kok wk OK kK KK & OK O&O K KOR & & * EXAMPLE PROBLEM 1 * * * * * * KK KR RK KK K 


Consider the requirement to simulate a long-term history of wave-induced 
longshore currents for a particular coastal site. Assume hindcasted wave data 
are available, but that current measurements were not made over the period of 
interest. According to the Shore Protection Manual (U.S. Army, Corps of 
Engineers, Coastal Engineering Research Center, 1977), the longshore current 
(v) can be calculated as a function of the beach slope (m), the gravitational 
acceleration (g), and the angle and height of breaking waves (ap, Hp, 
respectively). 


v= 20.7 m (gip,) 1/2 sin 2ap (1) 


The coefficient of proportionality (20.7) is based on typical mixing and fric- 
tional factors for the surf zone. Empirical formulas, like equation (1) can be 
adjusted by regression analysis of test data from the specific site of intended 
application. This will customize the formula to fit site-sensitive conditions. 
The longshore velocity also varies laterally within the surf zone. The problem 
of estimating the spatial structure of flow across the surf zone may be avoided 
by obtaining current measurements at the exact point where the long-term flow 
must be reconstructed, then regressing the test measurements against simul~ 
taneously determined breaker conditions. Steps in such an analysis are given 
below. Only a few data points are used in the example to encourage the reader 
to go through the computations and check the results. The data are taken from 
a frequently referenced field study done at Nags Head, North Carolina (Galvin 
and Savage, 1966). 


Ks) 


GIVEN: Longshore current velocities (v), breaker heights (H,), breaker 
angles (ap), and the beach slope (m) determined onsite during a short 
field evaluation (see Table 2). 


Table 2. Field calibration data (from 
Galvin and Savage, 1966). 


Obsn. Hp m Vv 
(£t) (c/5) 
1 2 0.03 2.42 
2 3.2 0.026 4.33 
3 1.8 0.029 1.96 
4 8 0.026 1.26 . 


REQUIRED: An equation that will predict wave-induced longshore currents for 
the test site. 


ANALYSIS: Because the linearity expressed in equation (1) has a firm theoreti- 
cal basis in the concept of radiation stress (Longuet-Higgins, 1970), and 
because according to this concept, v = 0 whenever Hp = 0 or op = 0, the 
prediction line must pass through the origin (0, 0). So Model II must be used. 


Let 
Yay 


and 
xX 


m(gHp) !/2 sin 20, 


Regress Y on X to determine the best estimate of the coefficient of 
proportionality between X and Y. 


CORRECT RESULTS: 


Regression coefficient 8 = 17 
Correlation coefficient r = 0.91 
Standard error of 8 SB = 4.6 
Test statistic for 8 t S 367 
Estimated residual variance SGox = 1.8 


CONCLUSION: The version of the Longuet-Higgins type equation that best fits 
this problem site (based on available current data) is: 


v= 17 m (gh) 1/2 sin 20 


NOTE: Fitting the equation to the data in this example produces results closer 
to those obtained with larger data sets (eq. i) if the line is forced through 
the origin rather than being fit strictly to the data without this constraint 
(see Fig. 5). 


Measured Velocity 


Y= 


2 


| : 

X=m(gH,) "8Sin 2ap 

Figure 5. Real test data for example problem 1. Compare 
the correct fit through the origin with the 


customary fit. 


koe ek eee KK KK RK KOK & ® EXAMPLE PROBLEM 2% * * & ¥ kX RR RK K XX KK 


At least 10 equations relating the velocity of longshore currents to wave 
characteristics have appeared in the literature. Presumably more will appear 
as knowledge increases or theory is adapted to specific wave or bathymetric 
conditions (i.e., specialized for breaker type or bar dimensions). A recent 
article (Komar, 1979) questions the value of including a measure of beach slope 
in the general prediction equation and claims better results for 

v = 0.585(gHp) 1/2 sin 2op 
GIVEN: The same situation and data as in example problem 1. 
REQUIRED: Determine the best fit version of the type 

v= (gh) 1/2 sin 2a 


and compare the results with those obtained in example problem 1 to see if 
the beach slope is indeed of any value at this particular site. 


ANALYSIS: For the same reasons stated in example problem 1, regression 
should require the prediction line to pass through the point (0, 0). 


Let 


Y=v 


Ke (gh) 1/2 sin 20 


and regress Y on X using Model II with its extended data set (Fig. 6). 


IT 


Measured Velocity 


a= 


X= (gH)? Sin 20, 


Figure 6. Real test data for example problem 2 and fitted 
equations. Compare the correct fit through the 
origin with the customary fit. 


CORRECT RESULTS: 


Regression coefficient 8 = 0.46 
Correlation coefficient Te = 0.90 
Standard error of 8 Se = 0.13 
Test statistic for 8 fe = 3.6 
Estimated residual variance Sosg ie Ms) 


CONCLUSION: The best predictor of the Komar type is: 
v= 0.46(gH,) 1/2 sin 2ap 


It would be surprising to find a clear indication of whether beach slope should 
be included in the predictor for longshore currents by evaluating such a 
limited data set as chosen here to encourage reader computation. Indeed a 
comparison of Tables 3 and 4 reveals no significant differences between the 
correlation coefficients or any other test statistics. However, significant 
differences would be expected if a large reliable data set covering a wider 
range of conditions were compared by the methods illustrated in this report. 


Table 3. Extended data set No. 1. 


Obsn. x We 

(ft/s) (ft/s) 
1 0.152 2.42 
-0.152 -2.42 
2 0.162 4.33 
-0.162 -4.33 
3 0.0827 L9G 
-0.0827 -1.96 
4 0.170 Po2l 
-0.170 -1.27 

Table 4. Extended data set No. 2. 

Obsn. x 4 

(ft/s) (ft/s) 

il 5.05 2 on 
-5.05 -2.42 
2 6.25 4.33 
-6.25 -4.33 
3 2.85 1.96 
-2.85 -1.96 
4 6.53 1.27 
-6.53 -1.27 


LITERATURE CITED 


BROWNLEE, K.A., Statistical Theory and Methodology tn Setenee and Engineering, 
2d ed., John Wiley & Sons, Inc., New York, 1965. 


DRAPER, N.R., and SMITH, H., Applted Regression Analysts, John Wiley & Sons, 
Inc., New York, 1966. 


GALVIN, C.J., Jr., and SAVAGE, R.P., "Longshore Currents at Nags Head, North 
Carolina," Bulletin 11, U.S. Army, Corps of Engineers, Coastal Engineering 
Research Center, Washington, D.C., 1966, pp. 11-29. 


HAWKINS, D.M., "A Note on Fitting a Regression Without an Intercept Term," 
The Amertcan Stattsttetan, Vol. 34, Nov. 1980, p. 233. 


KOMAR, P.D., "Beach-Slope Dependence of Longshore Currents," Journal of Water- 
ways, Port, Coastal and Ocean Divisions, Vol. 105, Nov. 1979. 


KRUMBEIN, W.C., and GRAYBILL, F.A., An Introduction to Stattsttcal Models in 
Geology, McGraw-Hill Book Co., New York, 1965. 


LONGUET-HIGGINS, M.S., “Longshore Currents Generated by Obliquely Incident 
Seawaves," Parts I and II, Journal of Geophystcal Research, Vol. 75, No. 33, 
Nov. 1970. 


U.S. ARMY, CORPS OF ENGINEERS, COASTAL ENGINEERING RESEARCH CENTER, Shore 


Protectton Manual, 3d ed., Vols. I, II, and III, Stock No. 008-022-00113-1, 
U.S. Government Printing Office, Washington, D.C,, 1977, 1,262 pp. 


20 


129 I-€8 ‘ou d31gsn° €07OL 
*T-€8 *ou £((*S*N) JequeD yoIResey 
Suy~iveutTsuq Teqseoj) aeded yTeopuyoel, :seties “III “(*S*N) 1eqUaD 
yoieesoy Buyiveutsuq Teyseo) “TI ‘eTAIFL “I “uoTsseisey *y “suoT? 
-enbe uotiotTpeid *¢€ “*sTSsATeue ejeg *Z “*BuTyieeuT~sue Te,seoD *T 
*sauTInor [euoT Ae yndwos Arewojysnd BuytAzTpow no 
-Y1TA peyst[dwos0er aie sq[nsei werZ01d ayq 03 seBueyd Aiessadeu TTV 
*uaATS o1e sotdmexqy *qutod umouy Aue 4e ATF JOexXe ue BuyoIoZ aTTYyM 
(asuas saienbs qseayT ay ut) squtod ejep [Te 02 ATF 3Seq uoTIeNbe 
uotTjOtTpead ayq Buputeqqo 10z poyjew oTdwts e saqtaossep qaiodoy 
. €861 YOIeH,, 
*2TITI I2A0D 
(1-€8 ‘ou £ aeque9 YyoIResay 
Butissauytsug [Teqseop / aeded Teotuydey)--"wo gz f *TTE : *d [07] 
“€861 “SIIN Woaz atqeTTeae : ‘eA ‘ppetzButads 
£ zaque9g yoieesoy BSutTiseutsuq [eqyseog ‘saeeuTSuq jo sdaog ‘Awasy 
"Ss" : "BA f1fOATOg J10q--*spueH *g psreMpy Aq / aUTINOA TeuoTIeIWNd 
-wod iJet{~wezy Aue Sutsn qutod ueatTs e yBno1y I uotTsseifZe1 Buyo10g 
“q paempy ‘spueyH 


Lt9 I-€8 ‘ou da1gsn* €07OL 
*T-€8 *ou £((°S°N) JeqUueD yo1eessy 
Sutiseutsuq Teqyseoj) aaded [eoTuy.eL :seTieg “III ‘(*S*N) izequeD 
yoiessoy Buptaveutsuq Teqyseo) “II “eTITL “I “uoTsseis0y “hy “suOoT 
-enba uoTqtpedid *¢€ “*sTSsATeue ejeg *Z *BuTiseutsue Te IseOD *] 
*soutqnoi [Teuotjzeqndwod Aiewojsnd BuyAFTpow no 
-YITh peystTdwoooe eae sqynsei weis0ad oy 073 sadueyo Aaessa20u [TV 
*uaATs o1e sotdwexqy *qutod umouy Aue je ATF JOeXa ue BuTIIOZ VTTYM 
(esuas saienbs 4seayT 9yq ut) squtod ejep [Te 03 3TJ 3Ssaq uoTIeNbes 
uotqo2tTpead ayq Bututeqqo 10x poyjew otduts e saqTraosap jioday 
» €861 YOIeW,, 
“27TIT2 41aA0p 
(1-€8 *ou £ Jajque9g yoIRessay 
Suyteouzsug ~Teqyseop / teded Teotuyosey)--*wo gz £ *TTET : “d [02] 
"€861 “SILN Worx atqeTzeae : *eA ‘pyToTzsutads 
£ Jaqueg yo1eesey Buyiseutsuy Teqseog ‘saveuTsuq jo sdiog ‘Away 
*s*l : *BA ‘ApToOATeg J10q--*spueyH *gq paempg Aq / aut nor TeuoTIeAnd 
—wod Jet TTwezy Aue Butsn yutod usatTs e ysno1y. uoTsseise1 BupII0g 
*q paempyg ‘spuey 


Le9 T-€8 “ou da1gsn° £072OL 
“T-€8 ‘ou ‘((*S*N) Jeque9 yoreesey 
SuTiseutsuyq Teqyseop) aeded [eoTuydeL :seties “III *(*S*N) 1equUeD 
yoaeesay ButAeeuTsuq TeIseoD “II “STIL “I “uoTsseizey *y ‘*suoTz 
-enbe uotTjoTpedid *¢€ ‘“*stTSsATeue ejeqd *Z “*BuTIesuTSsue TeqIseOD *] 
*sout}NoI TeuoT Ze4ndwos Aiemojsnd BuyAFTpow no 
-4J—TM peyst[dwosoe aie sq[nsei weis0ad ayq 03 sasueyo Azessedou TTy 
*uaaT3 oie setTdwexq ‘*jutod umouy Aue je AT JOexe ue BuyoI0OZ aTTYyM 
(esues seienbs 4seetT ey} UT) squtTod eJep [Te 03 3TF 3Seq uoTIeENbs 
uotTjoTpead ayq BututTe;qo 10% poyjow eTdwts e saqtiosep j1odsay 
«861 YOIeH,, 
*aT3T2 1eA09 
(1-€8 ‘ou £ daqueg yoreessy 
Suptivoutsuq [Teqyseog / ieded ~TeoTuyoe],)--*wo gz * “TIT : “d [07] 
“€861 “SILN Woaz eTqeTyeae : *eA ‘pTezzZutads 
£ tequeg yoieessy ButiveuT3uq Teq3seop ‘siaeuzTZug jo sdiop ‘Away 
*s°n : ‘eA SATOATOq JIOq¥--*spueyH “gq paempy Aq / sUTINOI TeuoT eINnd 
-wod iezt[Twey Aue 3utsn jutod uaats e ySnoiyq uofsseiZe1 BuToI104 
“q paeapy ‘spueH 


129 I-€8 “ou d3tgsn° €0ZOL 
*T-€8 °ou £((°S*°N) 1teqUeD YyYoIeEeSsy 
SuTiseutsuq Teqseop) aeded Teop~uyoeL, :seties “III °*(°S*N) AequeD 
yoaeesey BupasouTsuq [eqIseoD “II “eTITL “I “uoTsseisey *h *suoTI 
-enba uotTjoTpedd *€ ‘“‘stTsATeue ejeq *Z “SuTIeseuTsue TeqseoD *[ 
*sautjNnoI Teuot Zeqndwod Arewojsnd BuyA;Tpow yno 
-Y7TM peystTdwoose aie sq[nsei weasoid ey} 03 sesueyo Aiessedeu [TV 
*‘uaAT3 are setTdwexy ‘*jutod umouy Aue je 4TjJ JOexe ue BuTOIOF aTTYyA 
(esuas saaenbs qseayT ey) UT) squtod ejep [Te 03 ITF 3Sseq uoTIenbs 
uoTjoTpeiad ayq Buptuteqqo 10z poyjew etdwts e seqyaosep j10dey 
u £861 YOIeW,, 
“97312 41eA0D 
(1-€8 ‘ou £ JequeD yoreasey 
Sutieeuzsuq Teqseog / aeded Teopfuyoey)--*wo gz +: “TTT: *d [07] 
“€861 ‘SILN Worx atqeTTeae : ‘eA ‘pTetysutids 
{ agaqueg yoaeessy Butaseutsuq [Teqyseog ‘sieseuTsuq jo sdaop ‘Away 
"s*n : ‘eA SafTOATOg Jaog—--*spuey “gq paempy Aq / aut InoI TeuoTIeAnd 
-—wod ief,{Twey Aue Zuzsn qutod ueazs e ySnoizy} uoTsserze1 Bupo10g 
*q paempa ‘spueHq 


eal 
= 


i i 6 


[Te a Se eg Tes a Wan a 
a = = > Phi - _ Oe i ee = : 


We _ : 


~~—m &. @ i aon =s 


od Gea = P@ > as 6): ' PY * 
ae goat 5) a a We ay i ah i