LIBKARY RESEARCH REPORTS DIV NAVAL POSTGRADUATE MONTEREY, CALIFORNIA NPS55-82-005 NAVAL POSTGRADUATE SCHOOL Monterey, California STOCHASTIC MODELING: IDEAS AND TECHNIQUES by Donald P. Gaver January 1982 Approved for public release; distribution unlimited Prepared for: PK ^°f of Naval Research ngton, Virginia 22217 FEDDOCS D 208.14/2:NPS-55-82-005 NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA Rear Admiral J. J. Ekelund David A. Schrady Superintendent Acting Provost Work on this report was sponsored in part by the Office of Naval Research, Arlington, VA. Reproduction of all or part of this report is authorized. Unclassified SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered) REPORT DOCUMENTATION PAGE READ INSTRUCTIONS BEFORE COMPLETING FORM I. REPORT NUMBER NPS55-82-005 2. GOVT ACCESSION NO 3. .RECIPIENT'S CATALOG NUMBER 4. TITLE (and Subtitle) STOCHASTIC MODELING: IDEAS AND TECHNIQUES 5. TYPE OF REPORT 6 PERIOD COVERED Technical 6. PERFORMING ORG. REPORT NUMBER 7. AUTHORf*; Donald P. Gaver 8. CONTRACT OR GRANT NUMBERf*,! 9. PERFORMING ORGANIZATION NAME AND ADDRESS Naval Postgraduate School Monterey, CA 93940 University Libre de Bruxelles Brussels, Belgium 10. PROGRAM ELEMENT. PROJECT. TASK AREA 4 WORK UNIT NUMBERS 61153N; RR14-05-OE N000 1482WR20017 II. CONTROLLING OFFICE NAME AND ADDRESS Naval Postgraduate School Monterey, CA 93940 1? <EPORT DATE January 1982 13. NUMBER OF PAGES 82 U. MONITORING AGENCY NAME a ADORESSf// dl Iterant Irom Controlling Oltice) Chief of Naval Research Arlington, Virginia 22217 15. SECURITY CLASS, (ol thle report) Unclassified 15a. DECLASSIFICATION/ DOWN GRADING SCHEDULE 16. DISTRIBUTION STATEMENT (ol thia Report) Approved for public release; distribution unlimited 17. DISTRIBUTION STATEMENT (of the abstract entered In Block 20, It dltlerent Irom Report) 18. SUPPLEMENTARY NOTES 19. KEY WORDS (Continue on reverse aide It neceaaary and Identity by block number) applied probability probability modeling reliability availability waiting lines 20. ABSTRACT (Continue on reverae aide It neceaaary and Identity by block number) This report summarizes the contents of lectures given on prob- ability modeling and reports some new results on the availability of inspected systems of redundant systems in random environments, and on "sculptured distributions". dd ,; FORM AN 73 1473 EDITION OF 1 NOV 65 IS OBSOLETE S/N 0102-014' 6601 | Unclassified SECURITY CLASSIFICATION OF THIS PAGE (When Data Bntered) Stochastic Modeling: Ideas and Techniques Donald P. Gaver 1 . Introduction The primary purpose of this chapter is to summarize the contents of lectures on stochastic modeling presented at the Universite Libre de Bruxelles (ULB) in the period March-May, 1981. Much of the material selected for presentation was from the standard menu of probabilistic topics typical of a second course as given to engineers, operations researchers, statisticians, or computer scientists. An attempt was made to emphasize a modeling attitude rather than details of mathematical rigor, illustrating with problems and techniques that are not often prominent in such courses. For example, attention was given to problems of, and models for, redundant system reliability and availability, queueing with priorities, first-passage times and areas under path functions of stochastic processes, (total waiting times), and various other topics. Also included was a brief account of aspects of modern data analysis, with the implication that its usefulness is significant at the pre-modeling and model-assessment stage of an investigation. A secondary, but gratifying, purpose is to briefly report on cooperative work initiated with faculty and students at ULB. I wish to mention the enjoyable collaboration with Dr. Guy Latouche on development of efficient computational methods for repairman- like Markov models in random environments, and with Ph. Collard on the application of sculptured distributions in the simulation evaluation of certain scheduling algorithms. The interest and warm hospitality of Prof. Guy Louchard, head of the Dept.Of Computer Science at ULB, was also much appreciated. 2 . The Total Modeling Process: Brief Overview It is coming to be recognized that the topic of mathematical modeling (including stochastic modeling) exists in its own right as a subject suitable for a formal university course; see Bender [1978], The modeling step is part of a process of several stages or steps; these may be expressed as follows (Gaver and Thompson [1973] ) : (a) Identify the general problem area or s ituatio n; identify specific questions concerning that area. (b) Obtain and analyze subject-matter information and data relating to the problem area. Often an examination of such information and data will suggest suitably formulated questions, as in (a), (c). Construct a preliminary model, or models, representing the important features of the situation. Deduce some model implications. (d) Refer the result of (c) to subject-matter specialists and decision-makers for qualitative critique; revise the model accordingly. This likely means re-doing (a) - (d) . (e) Assess the empirical validity of the model to the degree possible. Check the sensitivity of model conclusions to changes in model assumptions (sub-model inputs) , and to data variations. Submit to judgement by subject-matter experts -- but anticipate differences of opinion! The modeling, and re-modeling, process may help to reconcile such differences. (f) Compute required answers to interesting questions. Assess the degree of uncertainty in these answers possibly resulting from model mis-specification, data bias or other deficiency, computational error, and sampling error in estimates of basic parameters or in simulation results used to supply model implications, (g) Communicate, and aid in implementing, the results of the model. (h) Monitor the situation for possible changes in the environment, and hence for the necessity to change the model. Of course the emphasis in these notes (and in the lectures, was) upon the actual modeling step, (c) . However, some attention was given to the display of data for pre-model examination (Tukey's exploratory data analysis), and to model parameter estimation techniques, particularly those robust methods that attempt to deal with questions of data deficiencies. 3 . Topics in Outline In this section we out-line the basic contents of the lectures. These were in general arranged so as to first present mathematical definitions and properties, and then to illustrate in terms of sample models for various situations. ( 1 ) Review of Probabilistic Concepts, Particularly Conditioning . In this lecture the following basic notions of probability were defined or reviewed: random experiment or trial, sample or event space, events and combinations of events, probability as a function with rules for combination, conditional probability, independence, and Bayes 1 Theorem, random variables and their moments or expectations, transforms (characteristic function, Laplace transform, and generating function) and their moment- generating, convolution, unicity, and continuity properties, plus properties of conditional expectations. In addition, certain classical univariate distributions were reviewed (Normal/Gaussian, log-Normal, exponential and gamma, etc.) By way of illustration, a simple problem of equipment (or possibly software) unrealiability was considered. Situation: Suppose a system is made up of components that individually fail after a time because of the action of faults ; the latter may be the result of component mis-design, or attributable to bad installation or adjustment ("human error"), or to a mistake in computer program coding. We wish to relate system failure rate to initial fault content. Model: N is a random variable (rv) representing the number of faults initially installed unwittingly in the system. Let {T., i = 1,2, ..., N} be the sequence of rv describing the failure time of each fault, measured from time at which the system begins use; T. may actually be the time at which the service of the particular component is first requested. Suppose the system fails at X N = min {T-, T 2 , ..., T„}. (1.1) Under very simple conditions, namely that all components have the same distribution, F(t), of failure time, and all failure times are independent, simple conditional probability arguments yield P(X N > t} = E N [(l-F(t)) N ] (1.2) where g is the generating function of the number of faults originally sown in the system, and F (t) = 1-F (t) is the survival time distribution, per fault. It is easy to see that P{X = 00} = p{n = 0}, possibly > , so the derived distribution of X is quite possibly dishonest. Note that while in general explicit expressions for expectations cannot be obtained (may not even exist), such summaries as the median, 90% point, etc., may if g (F(t)) can be explicitly inverted, e.g. for N ~ Poisson and F ~ Exponential. The simplistic assumption of the model may be relaxed, allowing for different T. distri- butions, dependence, and so on, and an additional random death time, D, applying to the total system can be introduced to in- duce eventual failure (of physical equipment), or biological death in finite time. There will be less analytical tractability, but simulation may be used to assess system behavior. Statistical estimation problems may be addressed as well; a suitable version of (1.2) will provide a likelihood function. Another example of the applicability of a simple conditioning argument is the following. Situation ; When an individual speaks on a telephone or telecommunication channel the conversation is an alternating sequence of talk-spurts and pauses. Similarly, a job being processed on a computer goes through an alternating sequence of CPU (compute) times and 10 (input-output) times. Model the total time of the conversation or job processing time, and particularly the joint distribution of busy and idle segments. Model 1 : Let {X., i = 1,2, ..., K} and {Y. , i = 1,2, ..., K) be the durations of talk spurts and pauses, respectively, and let K be the number of each. The simplest model assumes {X.} and {Y.} to be independently and identically distributed (IID) sequences of rv , and themselves to be conditionally in- dependent, given K, also a rv. The joint distribution of total talking (or processing) time, X, and total pause time (10 time), Y, is thus, by simple conditioning, ? k* k* P{X < x, Y < y} = I F v (x) • F v (y) K P{K=k} (1.3) k=l X Y The joint Laplace transform is (s. , s_, > 0) — s X — s Y e 2 j I [F X ( S;L )F (s 2 )T P(K=k) 1" "2" r r ~ , , ~ , ,ik k=l (1.4) = %^X (S 1^Y (S 2 )] ' where g v is the generating function of K. Put s n = s„ = s is. 12 to recover the transform of X+Y = L, the total conversation length. In case X. - Expon(A) and Y. Expon(y) and K Geom(a), independent and " S 1 X " S 2 Y e e - I k=l A+s V lP+s J (1-a) a k-1 Ay (1-a) (A+s ) (y+s ) - Aya (1.5) E[X] = [A (1-a)] 1 , E[Y] = [y(l-a)] 1 (1.6) Furthermore (a = 1-a) .' E[L] = E[X] + e[y] = A+y Aya (1.7) and Var[L] = (E[Lj) Aya (1.8) Notice also that the mechanism of randomization of a sum , or mixing (see Feller [1966])which has given (1.3) may be used to generate families of bivariate (multivariate) exponential distributions for other modeling purposes. Model 2 : A plausible alternative to the above model assumes X. and Y. are not independent, being possibly positively correlated -- a long talkspurt tending to result in a long pause (response by conversationalist). Most simply, Y. = $X . , 3 > 0. Then again transform in the X - Expon(A) case to get _<= Y -c v" A (1-a) - S 1 X " S 2 Y e e s x + 6s 2 + A (1-a) or if 3 = A/p which preserves the marginals of Model 1, A (1-a) s, + — s„ + A (1-a) 1 y 2 Aya s,y + s A + Aya (1.9) It is now immediate that the marginal distribution of X ~ Expon(Aa) , Y - Expon(ya) and that now the df of the Aya total time, L, is simply Expon [A+y -- a much simpler form than that occurring in Model 1 above, which involves a Bessel function. The variance of L in Model 2, being (e[l]) , is also larger than that for Model 1, see (1.8), suggesting that the former model has a longer tail, hence predicting a greater proportion of extremely long conversations. The above illustrates that the same situation can easily give rise to two — or more — different models, depending upon the manner in which stochastic assumptions are introduced. At best, the introduction should be guided by observed data; at least, sensitivity analyses using different assumptions can outline the range of specification uncertainty. (2) Models Involving Repeated Trials A great many situations may be initially modeled in terms of repeated independent trials, where this means that on each of a possibly countably infinite number of occasions a trial (or experiment, or observation) is performed, with outcome X. (possibly a vector random variable (rv) ) on the ith trial; {X., i = 1, 2, ...} are IID rv. Bernoulli trials are a prime example: flip a biased coin indefinitely; let I. be one if a Head (Success) results, and zero if a Tail (Failure) otherwise, and assume that probability of success on any trial is independent of all previous outcomes. Equally, X. may be the winnings on a bet at occasion i, with X. in dollars and either positive or negative. Or X. may even represent the increase or decrease in a common stock price on the New York Stock exchange, according to some observers. It is convenient, but somewhat more questionable, to adopt the repeated trial model for modeling real operational and physical phenomena, yet it is often done uncritically. For instance the lifetimes (times between failure) of computing equipment are often modeled by IID rv, repair times likewise, queueing system inter-arrivals and service times as well, inventory demand sizes, sizes of deposits of resources (petroleum) as well, ... the list is very long, and observational support for these assumptions is usually conspicuously lacking. The Attraction of the repeated trials model is mainly its mathematical tractability , which leads to elegant and appealing results. 10 Often a brief data analysis in terms of marginal distributions of observed X. seems to provide justification. The simple repeated trials model cannot well represent, say, systematic daily changes in job inter-arrival times, or numbers of jobs per hour, at a computer center, seasonal effects on such computer system demand, or the influence of other variables such as the introduction of a new class of computer users upon a measure of computer loading. Also, the model does not well describe the sequence of daily rainfalls in a region, nor many other environ- mental variables. Some examples follow in which repeated trial models seem initially plausible, but which doubtless can stand improvement. Situation ; A structure is to be designed to withstand (wind- generated, or seismic or other) shocks. Question: how long will it survive the environment stresses, given its initial strength Z? Model: Let the structure have strength Z (suitable units). If the ith year's maximum shock is X., assume {X.} to be IID with df F (x) . Then the time T until structure failure exceeds t x (t = 1, 2, ...) iff VX ± < Z, i = 1,2, ..., t, so P{T > t|z} = [F x (Z)] t , (2.1) the geometric distribution, while if Z itself is regarded as random P{T > t} = E z P{T > t|z) = E z ([F x (Z)] t } (2.2) ^ (E z {[F x (Z)]}) t ; 11 the unconditional distribution is a convex (probability) combin- ation of exponentials. It will resemble an exponential, but often has an extended right tail. It is obviously important that the condition on equipment stress, Z, be removed at the appropriate stage -- at the end. Removal of the condition before each "strength test" would be appropriate only if the structure were completely repaired or replaced with another having the unchanged distribution of Z before each trial. Note that assuming yearly maximum environmental events to be IID is intuitively plausible, but some statistical evidence exists for truly long-range correlation in weather data that may call even this assumption into question. The Bernoulli counting process is an important special case of repeated trials, on each of which either Success (probability p) or Failure (probability q = 1-p) occurs; {N(t), t = 1,2, ...} is the number of Successes in d,t]). Times (number of trials) t between successive Successes are IID and geometrically distrib- k-1 uted (P{t = k} = q p) . The number N(t) of Successes in fixed time is Binomial (p , t) , and is in turn approxi- mately Normal (tp, tpq) as t •+ °°. Of course Bernoulli trials describe the outcomes of many other repeated trial situations: for instance, the number of jobs submitted to a processing facility requiring more than x time units of processing may be modeled as a Bernoulli counting process with p = 1 - F (x) , F being the distribution of job processing time. Generalizations of Bernoulli trial situations may be (a) to variations of success probability with trial number ("time", t or "space"): P{X. = 1} = p.. Here E[N(t)] = I P- and 11 i=l 1 t _ ! t Var[N(t)] = I p.q. £ tp(l-p), where P = 7" I Pi' so variability i=l i=l is u nder-represented if trial-to-trial probabilities change, but 12 a Normal approximation to N(t) may still hold. A second qeneralization is to (b) independently randomize p in the Binomial distribution, say according to a Beta distribution, creating the Beta-Binomial dis- tribution. This has found useful in reliability modeling and in Bayesian inference. A third and important elaboration, (c) , is to develop a statistical regression model for success probability p based conveniently on the logist ic function: a+Bu. P- = — T5 — ; (2.3) *i a+3u. 1+e 1 here u. (possibly a vector) represents the influence of other factors upon success probability. Given observations of the form (I., u.), where I. = 1 indicates success on trial i, one 11 l can estimate a and 3 (vector) by maximum likelihood; see Cox [1969] • Generalizations to multiple-category situations are possible, and computational methods for parameter estimation and model assessment have been devised; see Pregibon [1901J. The distribution of the number of counts N(t) in t trials of a Bernoulli trials process can be computed by making use of a forward equation . Let P. (t) = P{N(.t) = j |N(0) = 0}. Then P.(t) = P. (t-1) • (1-p) + P._ 1 (t-1) -p;(0 < j It); (2.4) on the basis of conditioning on events that have happened up to t-1. One can generalize to non-stationary success probabilities easily: 13 P. (t) = P. (t-1)- (1-p ) + P. (t-l)p. . (2.5) J J T. J X XL Initial conditions may be P Q (0) = 1, P. (0) =0, j = 1,2, ... A further generalization allows success probability to depend upon the number of previous successes; then P j( t) = P j( t-l)U-p. <t ) ♦ P^U-Up^^ , (2.6) and the distribution P. (t) can easily be computed recursively given the success probabilities p = P{N(t) = j+l|N(t) = j} . (2.7) This is a preview of ideas of Markov chains, to be treated later. These expressions are introduced to suggest early that the answers to interesting and comparatively complex problems can be directly computed numerically (in this case iteratively, starting from the initial conditions) . Closed- form expressions such as the Binomial distribution are handy, and Normal approximations are even handier, but one need not modify the facts merely for the sake of convenience. 14 (3) Sums of Repeated Trial (IIP) Random Variables; "Large Deviations " Models for total demand for physical inventory or for facility (computer) time often naturally involve sums of varying components, modelled as rv. ; thus total demand from n sources, or over n time periods, is S = X, + X n + . . . + X . n 1 2 n (3.1) If X^ is the (dollar) profit in the i th year for some enterprise, then a financial measure of success is n S (r) = y X.r 1 n ■ -, i i=l (3.2) where r is a discount rate (0 < r < 1) Situation . A computer center experiences varying monthly demands, X. for the ith month. Here are answers to several simple questions involving sums of X.s. The expected yearly (n-period) demand is n E[S ] = I E[X. ], n- 1 , L - l i = l (3.3) the sum of the expected monthly demands, and also n Var[S ] = I Var[X.] n . , i i=l (3.4) provided the X.s are uncorrelated . Importantly, as n F ,(x) n P< S - E[S ] n L n /VarLS J n < x > % / 1 2 "2 Z = $ (x) , '2n (3.5) 15 i.e. S becomes approximately Normally distributed no matter what the distributions of X . , by the Central Limit Theorem, provided the X. components are all of about the same size (certainly if they all come independently from the same parent distribution with finite mean and variance). For smallish n or distinctly non-Normal components the approximation is improved by an Edaeworth expansion (Feller [1966]), wherein for the eaual component example F gI (x) = $(x) + n ry 3 3 (x 2 -l) |* + R (3.6) dx n where R = n 1 /n~ densities. Here y = E 6/n , and the components are assumed to have 3 (X - E[X]) y 3 and the term — ^ = y G is the conventional dimensionless skewness measure for a dis- tribution, being zero for symmetric distributions (Normal) , and being +2 for the Exponential. Additional terms involving kurtosis (4th moments) improve the approximation, but it is possible that Edgeworth numerical values can be "infeasible" : the approximation can actually decrease with x in certain ranges. Nevertheless the Edgeworth series has been usefully applied, even to unequal component situations, for estimating the loss of capacity of an electric utility; see Levy & Kahn [1981]. A useful alternative is the method of large deviations , Feller [1966], Daniels [1954], and others. The ingenious idea is to tilt (or sculpture) the df. components so as to make a Normal approximation more effective at predicting the prob- ability that S > x for large x. For equal components with 16 d.f. F(x) look at the tilted probability measure (assumed to exist for s > 0, which sometimes restricts the theoretical ap- plicability) : v(dx) = eSXF(dx) n 7> viaxj , (3.7) e /v sX ip(s) = £n F(s) = in E[e ] being a cumulant generating function for F, or X. Manipulations show that '{S n > z} = / F n *(dz) = e n ^ (s) J e" SX V n * (dx) , (3.8) n* and the idea is to approximate V by a Normal centered at z, a feat that can be accomplished by choice of s. It turns out that it is necessary to solve (sometimes numerically) for s(z) the equation z = nip' (s) (3.9) in order that the mean of the approximating Normal be at z ; the variance is nij;"(s). Finally, P{S n > z} % exp n{i|>[s(z)] - s(z)^'[s(z)] + | s 2 (z) ty" [ s (z ) ] } x 1 2 oo -—v i- / e 2 dv . (3.10) s(z)/mj/'Ls(z) j The above technique can also be applied to a compound Poisson model (n is replaced conditionally by N(t), the counting process of a Poisson process, and the condition then removed) . Such models are frequently employed in inventory studies; apparently the large deviations approximation has not been applied in that area. 17 ( 4 ) Bernoulli Trials and Poisson Process: Rare Events Bernoulli Trials are a special case of the Repeated Trials model, with events occurring ("Success") or not permitted to occur ("Failure") at specific integer time points, often equally spaced. In practice the fixed intervals between trials may be largely arbitrary, and it is attractive to think of events occurring at any (real-valued) time; from this comes the Poisson process. One approach to the P.P. properties is to consider a B.T. process to operate over time t with unit time steps, and then refine the time steps (e.g. let t = 1 day and starting with possible demands at 15-minute intervals, then down to 7.5 minutes, then to 3.75 etc. ) to create a sequence of B.T. models. The limit of the sequence after ultimate refinement describes the P.P. Specifically, let T(k) be the generic time between successes in the (kth) B.T. model with time steps 1/2 (k = 0,1,2, ...). This means that the actual number of steps in time t for B.T. model k is 2 t; correspondingly, let the probability of success per step be p/2 . By conditioning on the first step's outcome this means that E[T(k)] = l/2 k + (1 - p/2 k ) • E[T(k)] (4.1) so E[T(k)] = — for every model, as should be true. Furthermore, as k -> °° so time steps become arbitrarily small, P{T(k) > t} = (1 - p/2 k ) t ' 2 ■* e" tp (4-2) inter-event times become exponentially distributed in the (P.P.) 18 limit. Furthermore the number of P.P. events ("Successes") in time t have the Poisson distribution. The P.P. is usefully invoked for many modelling purposes. Situation . Consider a sequence of days on which demands for computer service (time) are made, and focus on the occurrence patterns of runs (uninterrupted sequences) of h igh-de mand days. Question: what is the distribution of times between successive runs, and what is the distribution of the number of such runs in a fixed time t? It will turn out that if either the run lengths are long, or if the probability of a high-demand day is small, that runs tend to occur as a Poisson process if the time scale is appropriate. Model . Begin by modelling individual high demand day occurrences as successes in B.T. Let t, (k) represent the time until the 1st occurrence of a run of length k, and, measured from the end of such a run, let t, (k). , t-UOi ... t^Oc),.. be the time until the 2nd, 3rd, ... ith, such run is realized. By the B.T. as- sumption {t.. (JO, i = 1,2, ...} is an IID sequence of rv. Then we can represent x (JO by conditioning on the events that may occur in the first k trials: with prob. p x(k) 1 + t' (k) with prob, j + t 1 (k) with prob. p- 1 q k-1 k + t' (k) with prob. p q (4.3) 19 where x 1 (k) is an independent replica of any x (k) : the idea is that the process starts over once a failure occurs to spoil a run. Alternatively , T(k) = \ k with prob. p R(k) + t' (k) with prob. l-p ] where j-l P{R(k) = j} = 23^— , 1-p 3 -Lf '/ •••/ K , (4.4) (4.5) a truncated geometric distribution. From these come the generating function of t (k) , and in principle its distribution: [z T < k >] . k k z p 1 - - qz r i-(pz) k l l-pz Differentiation gives the mean E[x(k)] = k + ^ P 1_ kp^ 1-p , k 1-p _ Qj P K (1"P) (4.6) (4.7) the approximation holding if either p -> or k ■* °°. In either case the run is a rare event. While explicit inversion of the expression for E[z ] is possible by use of partial fractions, the result is quite complicated. On the other hand, look for the distribution of T*(k) = t (k)/E[x (k)] when E[x(k)] becomes large. The expectation (4.8) 20 E[e- ST * (k) ] = E e -ST (k)/E[T(k) ] (4.9) can be obtained from the generating function (by putting z = exp[-sp q]); next let either p-*0 (rare individual events) or k->-°° (long runs) to find that this transform converges to (1+s) . Then by the unicity theorem for transforms (Feller [1966]) the normalized rv t* (k) is approximately unit exponentially distributed, i.e. P{x(k) < t E[-r(k)]} % 1 - e _t (4.10) and furthermore the distribution of the number of k-runs in time t, N, (t) , is approximately Poisson. Deviation from the Poisson (indicated by over-variance) may signify that the underlying demand generating process is inhomogeneous or cluster-prone in time, and that extra facilities are required to reduce backlogs. Examination of runs is one way to check the validity of the basic modeling assumption of Bernoulli trials. Similar limiting arguments simplify other situations in- volving rare events that are generated by even more complicated processes. See work on first-passage times for combinations of random loads by Gaver, Jacobs and Latonche [1981], 21 (5) Markov Models: General Comments The basic theory of Markov chains and processes, both in discrete and continuous time, is well introduced in standard texts such as Feller [1966], Chung [1967], Karlin & Taylor [1975] Kleinrockf 1976], and needs no systematic coverage, only review and illustration. By way of review, recollect the ideas of various possible state space definitions: integers, integer and real numbers ( "ages" ) , real numbers (e.g. virtual waiting times in queues) ; times (index sets) either discrete and equally-spaced or imbedded or continuous time; Markov property defined by conditional probabilities ("The future is independent of the past, given the present"). Carry on to matrix represen- tation of the state probabilities after t (0,1,2,...) time steps, forward and backward Chapman-Kolmogorov equations, generalize to discrete state Markov chain in continuous time with exponential sojourns in states, state classification emphasizing irreducible chains and transient chains (with at least one absorbing barrier) , recurrent events and first-passage times and absorption probabilities, generating functions and other transforms. Simple Markovian assumptions, i.e. that a scalar state rv X(t)f where t is time or space, is Markov, introduce de- pendence in a plausible and tractable manner. Usually it is necessary to assume, for example, that the one-step transition probabilities (discrete state, discrete time): p ± . = P{X(t) = j |x(t-l) = i}, (5.1) 22 are time-homogeneous in order to obtain explicit neat solutions. Analogous assumptions must be made about discrete-state Markov processes in continuous time, wherein A. is the rate of de- parture from state i (exponentially distributed sojourn time parameter), and p. . is the corresponding probability of move from i to destination j. Of course a known deterministic time dependence, involving daily or weekly cycles, and trends can be dealt with by numerically multiplying the transition probability matrices. More irregular changes in process behaviour can be repre- sented as the effect of randomly changing external events, or rando m environments for short. In such models the actual primary process transition parameters (e.g. p. ., or A.) -*- J change in time under the influence of such environmental factors as seismic vibration, temperature and humidity, ocean sea state, wind speed or other meteorological ef f ects , or variations in personnel effectiveness and propensity for errors. Random environment models conveniently postulate that environ- mental changes induce simple discrete-state Markovian behavior on the basic or primary process parameters; of most interest are para- meter changes that occur more slowly than do state changes in the basic process. Markov modelling of real situations usually involves simplifi- cations at certain crucial states. Even then, the answers to interesting questions may require extensive computing or simu- lation. Astute choices of sub-models or component models, e.g. the use of "phase-type" distributions for representing arrival and departure processes in queues can be of help, as 23 can the recognition (or plausible imposition) and exploitation of special structure; see Neuts [1981], 24 (6 ) Some Markov Process Problems and Models Here are some illustrative situations and corresponding Markov chain models. Situation (Queueing in discrete time). A servicing facility, e.g. a computer system or a programming (or other) consultant, or a communication channel, experiences single customer arrivals in a random fashion; arrivals enter at the discrete times 0,1,2,3, ... only, and service completions occur only at such times. Discuss the nature of the delays and backlogs that occur. Model 1 . Let the probability of a single arrival at time (epoch) t be a. (t) , where i refers to the number present at that epoch. Each arrival must wait at least one time period before discharge, even if it immediately enters service upon arrival. Let d. (t) be the probability that an arrival that has been in service at t actually departs at t+1. Now let X (t) denote the number of arrivals in the system who have not yet completed service at time t. Model (X(t)} as a Markov chain with the following one-step transition probabilities: p. . (t) = P{X(t+l) = i + l|x(t) = i} = [1-d. (t)]a. (t) 1 / 1 ' X i J- , i > l p. i _ 1 (t) = P{X(t+l) = i-l|x(t) = i} = d i (t) [l-a i (t) ] (6.1) p 0fl (t) = a Q (t) P ,0 (t) = ^V^ 25 P i± (t) = 1 - {[l-d i (t)]a i (t) + d i (t)[l-a i (t) ]} p. . (t) = otherwise; If the number in the system is I , so the state space is finite, and P i+i ft) = ° and P i i-l (t) = d i (t) theD the Probability dis- tribution of X(t) for any t can be obtained by numerically multiplying the one-step transition matrices, p_(t) , with elements aiven above : P..(t) = P{X(t) = j|x(0) = i} = element in ith row, jth column of t p(t) = n p(t') (6.2) t'=0 This can be done especially easily in APL if the process is time-homogeneous, i.e. a. (t) = a. , d. (t) = d. independent of elapsed time. Explicit analytical solutions can rarely be found for non-time-homogeneous cases, let alone for time homo- geneous cases. If they were availabler the solutions would generally be very complicated and difficult to interpret. Model 1-A . Specialize the above to let a. (t) = a>0 and d. (t) = d>0. If the maximum number in the system is I, there is a stationary solution; put s = — , a = 1-a, d = 1-d: ad (d-a)d TT„ = dd-aas (d-a)s 3 (6.3) 7T . dd-aas ^ _ (d-a)as dd-aas 26 Furthermore, if s<l then the process tends to drift towards and even if there is no upper bound on system states the process is irreducible and ergodic so the long-run distribution is tt. = ^ s j , j = 1,2, ... (6.4) J dd a modified geometric distribution, some form of which so often appears in queueing problems. The above simple special case is well-known, but can be useful for checking the accuracy of computer programs used to compute numerical solutions to the time-dependent case • Model 2. Let a. h (t) be the probability that at time t there occurs a bunch of arrivals of size b(b = 0,1,2, ...), given that i are awaiting service and will not make further demands. For example, suppose there are I total customers, e.g. computer terminals accessing a central facility, and that each applies for service independently with probability a(t), provided that it is not undergoing service. Then a ±fb (t) = ( I ^ 1 )[a(t)] b [l-a(t)] I " i " b , and for k >_ , Pi, i+ k (t » - 5 i (t)a i,k (t) + d i (t » a i, k+ i (t) - ,6 ' 5) while Pi,i-l (t) ■ a i,0 (t,d i (t) 27 Again the probability distribution of X (t) can be numerically computed. Model 2 -A . Suppose the arrivals are caused by a common event (a "common-cause" in engineering parlance) . This might be the occurrence of an earthquake of large magnitude, or other en- vironmental shock. Let such an event occur at time t with probability c(t); let the probability that b arrivals (demands for service) occur as a result be conditionally binomially distributed with parameter 6. Then I - i \fi b n_ fi 1 I - i - b (6.6) ^i,b (t) = cttM^e-Ei-ft] This can again be used to form one-step transition probabilities, and to calculate state probabilities at any time. The present model allows for a catastrophic shock situation: if 6=1 then all outstanding customers simultaneously demand service, i.e. I-i arrivals occur simultaneously. This differs from Model 2. Situation (Queueing with breakdown of service or preemptive priorities). Suppose a single server, e.g. computer facility, or data transmission channel, is confronted by a random arrival stream of basic service demands. These demands may be characterized by their service times, or work request durations such as the times required to transmit single bodies of data or digitized messages. In addition, these services may be effectively prolonged by the occurrence of interruptions, e.g. from internal server breakdowns resulting in temporary processor unavailability, or 28 from environmental noise or even intentional jamming. How is the queue size and waiting time of demands affected by such interruptions? What steps can be taken to reduce the interruption effects? Model . Nearly all classical queueing theory is most conveniently developed if the service times of the individual demands are independent"! v and identically distributed (I ID ); see however Jacobs [1978, 1980] for discussion of a model involving correlation effects. If service times are to be interrupted and repeated, or alternatively resumed , an interruption process that preserves the IID character of the basic service times allows nearly direct adaptation of conventional theory; such a process is one that requires exponentially distributed ( "memory less" ) periods between successive interruptions, and this will be assumed. Checks of the sensitivity of results to this reasonable assumption can be made by simulation. If interruptions of IID duration X (df F (x) ) occur at IID Expon(A ) intervals, then the time to complete the ith basic service (low-priority) is, provided service can resume after each service C i = S i + X l + X 2 + •'• + X I(S.) ' (6 ' 7) where S. is the ith basic or low priority service time 1 — (df F (x) ) , X. is the duration of the jth interruption, and I(S.) is the number of interruptions that occur during S.. 29 Given S., I(S.) is Poisson (A ) , and the Laplace-Stielt jes 11 n transform of C. is, in terms of F c (s) = F s [s+A H (1-F x (s) }] (6.8) and hence E[C] = E[S](1+A H E[X]} E[C 2 ] = E[S 2 ]{1+A U E[X]} 2 + E[S]A U E[X 2 ]. ri n (6.9) If, on the other hand/ basic services that are interrupted must begin again from scratch, i.e. services must repeat , then the i th completion time becomes C. = S. + X 1+ S| + X 2 + S* + ... + X I(S _ } + SJ. (Si , (6.10 where S '. is the ith interrupted basic service time that must D — be repeated. The L.-S. transform is F (s) = E< c -(A H +s)S A H +s A TT r - (s + A rT )S 1-e H ■>, 6.11 and by differentiation of the latter expression, r AS- E[C] = ^E H 1 E[X] + \ II (6.12 t A..S 2-, E[C 2 ] = 2E (e H -l) E[X] + i J I A 11* + 2E Se (E[X] + \- A H + V 1 2 E(e " )-l) (E[X"] + 2E[x].f- + ^ 30 In order to assess queueinc delay, look at the process, {N,,d=0,l 2 describing the number of basic demands at the server at the instants just following departures; { N d , d - 1,2, ...} is an embedded Markov chain provided basic arrivals are (compound) Poisson. The N,. 1 = H , + A(C,.,) if N , = d+1 d d+1 d = N + A(C,, 1 )-l if N, > 1 . d d+1 d — (6.13) Here H is the number of basic (low-priority) demands made at the beginning of a basic service busy period initiated by the appearance of a high-priority demand, and A(C,) is the number of basic demands made during the dth basic completion time. Express an arbitrary H as follows A L with probability t — t-t — H - L H ^ H (6.14) A(X) with probability t — -^ — A L +A H It follows by conditional expectations that the embedded chain is ergodic if E[A(C)] < 1, and that then the long-run probability of system emptiness at an embedded time point is 1 - E[A(C)3 p = lim P{N =0} = - r ^ J (6.15) O j d ELHJ and the long-run expected occupancy is E[N] = 2(1-E[A(C)J) { E[(H + A ( C )) 2 ]P + E[(A(C)-1) 2 ](1-P )}. (6, 16) Delay can then be estimated by use of Little's formula. A version of the above formulas correct in continuous time may be found using 11 results of Gaver [1962],; the difference between embedded and con- tinuous time becomes comparatively negligible if the basic traffic intensity E[A(C)] is close to, but below unity ("heavy traffic"). Since computer system monitoring devices sample system state at the moment an event occurs (e.g. at a departure instant) a theoretical account of the queue at such moments (imbedded times) is of direct interest. An alternative approach to the long-run distribution of delay of an arriving basic demand is by way of Wald's identity or martingales; see Feller [1966] . This will actually handle waiting-times when the basic service inter-arrival intervals are IID, but otherwise arbitrarily distributed. Still another ap- proach is via the Takacs-type integro-dif f erential equation^ see Kleinrock [1976] for an account. 32 The previous situations have been discussed in terms of long- run probabilities. Frequently questions involving the time until system failure (or restitution to operational condition) are more important. Situation (Redundant repairable systems) . A particular system function can be performed if at least k out of n system com- ponents ("machines") function. For example, electric power is available if at least one generator is working out of two that are installed. Suppose that the system components are all operative initially but fail randomly; failures are immediately detected, but repairs are of random duration, so several machines can be down, all awaiting repair completion. Question: how long will it be until I = n-k+1 components are simultaneously in a failed condition? The time until this occurrence is the time to failure of a k - out-of - n system. Model 1 . It is now convenient (but possibly unrealistic!) to assume that the machines fail independently and after exponentially dis- tributed times in operation, each with rate A. This simplification may be relaxed, but at the price of expanding the state space. Assume too that the repairs occur in a Markovian manner, e.g. (but not necessarily) at rate y -Min (N (t) ,R (t) ) , where N(t) is the number of machines failed and down for repair at t, and R(t) is the number of repairmen on duty. This is the classical machine repairman problem; see Feller [1966], and Cox and Smith [1962]. Usually R(t)=r, a constant, although provisions may be made for automatically in- creasing repair effort when redundancy reserves become dangerously low. In other words, N(t) is a simple birth-and-death Markov process, wherein jumps in state, N(t), occur at exponentially dis- tributed intervals or sojourn times , S^ for the sojourn in state i, 33 transitioning always to neighboring states. In the present situation as A -* P{N(t+A) = i + l|N(t) = i} = A(n-i)A + o(A) = A. A + o(A) 1 P{N(t+A) = i-l|N(t) = i} = y min(i,r)A + o(A) (6.17) = y . A + o (A), abbreviating the general transition rates, to A. and p. . In order for system state to reach i+1 from i for the first time it must either do so on the first transition out of state i, or else drop back to i-1, return to i and try again Thus if U. is the local first passage time from i to i+1: U. = inf{t: N(t) = i+l|N(t) = i}. (6.18) write U. = S. + { l l with probability p. . , U! , + U! with probability p. . , l-l i r J r i f i-l A .+y . l l (6.19) y i A .+y . l l where U! has the same distribution as U-. The above repre- sentation allows immediate derivation of the Laplace-Stielt jes transform of U. by conditional expectations. The result is -sU A . E ^i (s) = s+A.+y.[l-ij,. I (s)J ' 1 = 1 l i L ^1-1 = 12 (6.20) ^o (s) = A-Ti • 34 Furthermore, since the first-passage time from N(t) = i to j > i can, on the basis of Markovian assumptions, be expressed as T. . =U. + U. , , + i] 1 i+l + U j-l (6.21) where {U. , , k = 0,1, ...} are independent rv, the L.-S transform of T. . is ID -sT. . j-l n k=i n ij> k (s) (6.22) and the cumulants (moment-like quantities; see Cramer [1946]) can be expressed in terms of those of the U- . Here are a few moments of the U.; recursively expressed and hence easily computed : :[u ± ] = b{ 1 + ^i E[u i-i ] }' K, + UiEEu.^]} + Jr E t u i-i3 AT l E[u 3] . 6 (, + p.ECu.^]} 3 + ^ jl + M i E[0.. 1 ]}E[u2_ 1 ] and + £ K[UJ] 1 (6.23) E[nJ] - % {i + v ± EtUi.^} 4 + ^ {l + ^[U.^]}^^] A . A . 6y 2 8y . E ^ u i-i ] + -TT i 1 + ^i E f u i-i ] E[u i-i ] i „r„4 + ji eCU^] 35 From these, standard variance, skewness , and kurtosis measures can be easily computed. For the repairman model discussed initially it can be shown that if the expected time to system failure, E[T n ] , is "long" then T /E[T „] resembles an L o£ ox/ L ol Expon(l) rv. Model 2 . (Catastrophic failures) . Suppose that in addition to the independent random failures there is a catastrophic event that "kills all operative machines simultaneously; let it occur after a time C ^ Expon(v). Then the system failure time T* has distribution given by P^T* £ > tl = p{t o£ > tl e" Vt (6.2.) and from this E[e ST °^ = iz£ - (s+v) T s+v (6.25) from which moments can be generated; see Chu and Gaver [1977], It is sobering to note that if v , the mean time to catastrophe occurrence, becomes small or even comparable to e[t „ ] , then the mean time to redundant system failure is essentially v , and redundancy alone may not improve system reliability. Model 3 (Simultaneous repair) . If the system is not under con- stant surveillance, but instead is inspected at random times (rate y) and then repaired in negligible time, the number of down machines at t may jump essentially instantaneously, either to zero (perfect and rapid repair) , or to some lower point (imperfect repair) . In this case the basic "skip-free up" 36 character is retained, but now with probability p U. = S. + < 1 l i,i+l U! + U! + U! + ... + U! . with probability p. j = 0,1,2, . . . i. (6.26) Note that if j = then repair is completely ineffective Conditional expectations now give the L.-S. transform -sU. E[e 1 ] = ^.(s) = V S)P i,i+l (6.27) l-n ± (s) I n i> =l l r=l l-r i/i-: -sS. where here n. (s) = E[e x ] = a. (a.+s) . To specialize this to a l i repair model, introduce a Binomial distribution for successful repairs a . = X (n-i) + u p i,i+l = X(n " i) a ^ -1 i (6.28) p . . . = pa . l P j (1-p) i-D j = 0,1,2, . . . , i where p represents the probability that an individual down machine is indeed repaired just after inspection (neglect the duration of repair times) . The Binomial model assumes that repair success is independent across machines, which may be inappropriate in case similar causes give rise to the failures. Differentiation or direct expectations yield moments of U. and eventually of T p . 37 (7) Diffusion and Fluid Approximation While classical discrete state space Markov process ideas can often be used to model some quite interesting situations, the analytical results obtained frequently emerge only in terms of incomprehensible transforms, or in other somewhat obscure form. Not infrequently the difficulty that induces complexity can be traced back to the influence of boundaries upon the process transitions. If one examines a rather heavily loaded or congested service system, however, it is apparent, first, that the state changes may appear almost negligibly small relative to the system state magnitude (e.g. length of queue) itself, and, second, that annoying boundaries, particularly that at zero, are visited infrequently although their influence may still be crucial. These remarks hold true not only for simple one-dimensional processes, such as those used to describe congestion at a single servicing facility, but also for much more complex situations involving the interaction of several servicing processes. An attractive approach to problems involving many customer arrivals occurring rapidly and generating considerable queueing is, then, to treat them by the method of diffusion process approximation. For details concerning the rigorous details of diffusion mathematics see Feller [1966]; in brief summary recall that a diffusion is a possibly vector-valued Markov process on the real numbers that typically moves continuously, governed by a drift (infinitesimal mean) and a diffusion (infinitesimal variance) parameter. 38 This approximation has been employed by Kobayashi [ ] for certain cyclic networks of queues such as are encountered in multipro- gramming computer systems. See also Gaver and Shedler [1973], and the important work of M. Reiman [1982], and particularly G. Newell [1979] and also McNeil and Schach [ ] . In this section the use of diffusion will be briefly illustrated, and some ex- perience with the results will be recounted. Situation (Waiting time or backlog at one server) . Suppose a single servicing facility is confronted by random arrivals that bring with them contributions to work load, expressed as re- quired processing times. If the facility processes them in order of arrival, what is the backlog at time t? The facility is heavily loaded, so that it is seldom idle. It never turns away customers, i.e. infinite buffering is possible, nor do long delays discourage those waiting, causing defections or balking. Model . Assume that arrivals occur in a Poisson (A) process, and that the generic processing time S has df G (y) ; successive processing times are IID. This model has been studied by Takacs ] who derived an integro^dif ferential equation for the df of backlog or virtual waiting time W(t). Although the formal solution of that equation can be obtained, it is in a somewhat complicated form, not conducive to immediate insights. It is tempting to take an alternative, somewhat heuristic approach. Intuitively, if p = AE[s] (= expected total load increment per unit time) > 1 (= processing or output rate per unit time) , the backlog grows at rate p - 1 > 0. Furthermore, the backlog process, 39 W(t), "ignores the boundary" at W = after a time, and eventually w(t) appears approximately Normal/Gaussian over an interval (t, t+A) , with mean (p-l)A, and variance Ae[s 2 ]A. Importantly, also, the process seems to grow by accumulating in- dependent, nearly Gaussian, increments. If p<l some difficulties occur because W=0 is an im- permeable reflecting boundary, but the basic scenario is still the same: if p is close to unity W(t) moves in nearly Gaussian increments, but occasionally interacts with the boundary at zero. Question: what is the long-run behavior of the delay in such a process? The Takacs (or forward Kolmogorov) equation for the "exact" process is - H + H = -AF + A / F(x-y,t)G (dy) (7.1) o where F(x,t) is the df of W(t); initial and boundary conditions are necessary but are suppressed. If F(x,t) is only appreciable when x is large, and if the magnitude of a typical S is also small compared to x then it becomes plausible to Taylor-expand the F (x-y , t) -term to three terms and integrate; the result is || = (1 . AE[s]) | + m|!ii!| which is the well-known forward partial differential equation for Brownian motion with drift. Impose the boundary condition that F(x,t) = for x < 0, and the equation for F, an approximation 40 to F, the "true" distribution of w (t) r emerges ; the latter equation can be explicitly solved in terms of error functions and exponentials (see Newell[l97l]) for all t; i.e. the transient solution is actually readily available. With some further effort one can impose an upper boundary at x > to repre- sent a finite buffer size. The approximate steady-state (t -*■ °°) distribution turns out to be F (x) = exp -2(1-P) x AE[S 2 ] p < 1 (7.3) which often is, for large p, usefully close to the behavior of F (x) , the long-run solution to the Takacs equation. Note that the parameters of the above differential equation can be obtained by equatina infinitesimal mean and variance of the assumed Poisson arrival process to the corresponding quantities for the diffusion. It is interesting that, when available, a martingale approach to the problem, of Gaver and Shedler [1973], produces a different exponent that yields better approximations for moderate traffic intensities, especially if the service time distribution is very long tailed (more skewed than the expon- ential) . Turn now to a more complex example, involving the interaction of two traffic dreams. Situation. At a node of a communication network there are a total of c+v channels (servers) ; voice messages are exclusively assigned to the v channels, and data messages are assigned to the c channels, but may also utilize any unused voice 41 channel capacity under the stipulation that voice has preemptive priority and may displace any data encroaching on its (v-channel) territory. Question: what is the nature of the delay experi- enced by the data, which is allowed to queue up in a buffer in- definitely? Model (Markov assumptions). Voice traffic is Poisson (A) with Expon (y) service times; voice is a loss system, so immediately the steady-state voice loss rate can be calculated using the Erlang-B formula. The data, however, operates in a random service environment modified by voice needs. Data arrives in an independent Poisson (5) process, with independent Expon (n) service times. Typically 6 >> A, and n >> y. Data, with state variable X (t) , is an M/M/S system where S = c+v - V(t), V(t) being the number of voice messages in service. Clearly (X(t), V(t)} is a bivariate Markov process, but one difficult to analyze exactly; see references in Lehoczky and Gaver [1981] for other approaches to the analysis. Now typically n/y (.= Data arrival rate per voice service 4 time) is very large, possibly 10 . Furthermore often p, 5 6/ri > c, so some voice channel usage by data is necessary in order that all data be handled and there is not an evergrowing queue. The appropriate traffic intensity parameter for the system is seen to be P = [p d + P v (l-q)](c+v) _1 (7.4) 4? P V /v! v where p = A/y, and q = — v v . I pVj: j-0 v is the probability that the voice system rejects an arriving voice message. If p becomes large under such circumstances, Gaver and Lehoczky [1982] show that X (t) behaves like a Wiener process with reflecting boundary, precisely as was mentioned in the previous example. Actually Lehoczky and Gaver [19 82] assumes that data input acts as a fluid , with no variability. By further, more intricate, methods involving the convergence of semigroups of operators developed by Burman [1979], it is shown in Lehoczky and Gaver [19 81] that the long-run distribution of X (t) is exponential with mean where v-1 p J + — d yp - I (Tt/TT ) r i=0 (c+v) (1-p) (7.5) and tt. = pV(i!)/ 1 v r v I P^/(iD L i=0 T k = I 7T i (i-p v (l-q)) i=0 Numerical work indicates that the diffusion approximation is reasonably accurate if the traffic intensity is quite high, say if p > 0.95; otherwise, for smaller s, the accuracy is not as high It would not be surprising if a refined method for fitting drift and diffusion coefficients would lead to improved results. Finally, the difference between the refined treatment of basic data as a Poisson, and the simpler treatment by a fluid (yielding a model that can be solved exactly) resides 43 merely in the addition of the term p , in the numerator of the expression for the mean. The diffusion approximation can be utilized to evaluate another interesting measure of system effectiveness, namely the expected total waiting time, in job or data-packet hours, expended during a busy period for data traffic that starts with x present. Model . Let A(x) be the expected total waiting time during a busy period when the initial number of jobs is X(0) = x > 0. Condition on process change, Z (A) , during the initial short time period (0,A) to get e|/ X(f )dt' |X (0) = x, X(A) = x + Z (A) | E (7.6) A(x;Z (A)) = xA + A(x+Z (A)) + o (A) . Now Taylor-expand and remove the condition on Z(A): 2 A(x) = xA + A(x) + A (x)y(x)A + A (x) • °—^- A + O (A) (7.7) or, collecting terms in A and letting A ■*■ , = x + y(x)A x (x) + |a^ x) A xx (x), (7.8) to be solved subject to A(0) = 0- clearly restrictions on y (x) are necessary in order that A(x) be finite. 2 ?. I* y (x) = y < a ' (x) = a , both independent of x, the eauation can be solved directly to give AA A(x) =h* 2 + vi * ' <7 - q) y<0 when traffic intensity p<l. A similar expression for compound Poisson (A) inputs is A(x) = r7 f- T + X E[A2] , p<l, (7.10) ' u P; 2(l-p) where A represents the number of items (packets) arising at a single request; p = Ae[a]. The moral is that variability (measured by a or E[A ]) can greatly increase expected total waiting time, particularly when system loading is high (p close to unity) . For application of this "area under a random path" to discussing total wait during a road traffic jam see Gaver [1969J . See also McNeil [1970] for generalizations. The same backward argument is well-adapted also to studying problems of optimum investment decisions. 45 8 . Renewal-Theoretic Modeling Ideas of renewal theory and recurrent events are extremely useful for many purposes in stochastic modeling. Recognition of the occurrence of one or more recurrent events or "renewal points" in the development of a model process points the way to writing down simple forward or backward type equations for probabilities or expectations. Frequently analytical information can be extracted from such equations, particularly that relevant to long-time or other asymptotic process behavior. If more in- formation is desired it can be obtained by use of transform techniques, by numerical computation, or by Monte Carlo simulation Mathematical definitions and properties of renewal processes are well presented by Feller [1966], Karlin and Taylor [1975], and Cox [1962], among others. There follow a few situations and suggested models based on renewal theory that illustrate the basic notions. We also comment on the relevance of the models and results obtained to real situations. Situation . A machine, e.g. computer system or component thereof, or human operator, etc. , operates properly for a period of time, fails, is restored (or restores itself) to service and operates properly again for a different time, fails again, and so on. Questions: how many failures are likely to occur in a given fixed period of time, say a year? The answer to such a question will help to guide decisions concerning logistics (necessary spare parts) and employment of repair personnel. How long a time will elapse until the k — failure? Suppose the times to restore failures vary; what is the likelihood that a "chance" 46 user of the system will find the machine down for repair when he or she needs it, and how long will the wait to service restoration last? These are only a few of the many questions that might be asked. Model . The classical renewal theory model for the situation described portulates that times between successive failures, {X-, i = 1,2, ...} are IID positive rv with distribution F (x) . That is X, is the time until the first event (here • -, st , • th failure) , and X. is the elapsed time between the l-l — and 1 — event. X. can be either a discrete random variable, so failures occur at regular intervals, say hourly, a continuous random variable, or a mixture. For the moment assume repairs to take a negligible time. Mathematical results for this model are simplest and nicest when the IID assumption is fully exploited. Under that assumption (and even more generally) the counting process, N(t), giving the number of renewal events (failures in time t) has probability distribution P{N(t) = n) = f£* (t) - F^ n+1) *<t) (8.1) where * refers to convolution. For long time (t -*- °°) and under suitable mathematical restrictions M(t) = E[N(t)] ~ ^J (8 ' 2) Var[N(t>] - Var[X] , t " (E[X]) 3 47 and N(t) ^ Normal, with the above parameters. Of course exact analytical solutions can be obtained if sympathetic distributional models are assumed: taking X ^ Expon. yields the Poisson distribution, and X ^ Gamma or Erlang also produces rather neat closed-form solutions. Any discrete-time distribution for X can be numerically convolved, conveniently using APL. This helps to answer questions about the number of events in (0, t) , provided the IID assumption is palatable. If finite-time results are needed resort can be made to numerical summation, using a dis- crete time model, to approximation by a standard, tractable, distribution such as the Gamma followed by transform inversion, or by simulation. In what follows we illustrate the diverse utility of backward conditioning arguments, leading to renewal integral equations. 48 Situation . (Incorrect repair possibly due to human error) . Each time a repair is made there is the chance that it will be incorrect, and that the subsequent time to failure will be short. Suppose that incorrect repairs tend to bunch together in runs; describe the number of failures that occur over a (long) time period. Notice that a similar situation describes a clustering scheme of arrivals to a repair facility or a communications center. Model . Assume that the generic time to failure of the system is X when repair is made properly, and X' when repair is incorrect; the intermingled sequence of X and X 1 quantities are condi- tionally independent. Furthermore, let the probability of a correct repair at failure n be a, 5 a 5 1, if the repair at the time of previous failure n - 1 was correct, and 3 = 1-6 < 3 5 1 (a^3) if it was incorrect; the sequence of correct and incorrect repairs is thus modeled as a stationary ergodic Markov chain. This, of course, does not represent system- atic improvements in repair capability, although a transient chain could serve for that purpose. Let M(t) (M'(t)) denote the mean or expected number of repairs in t, given that the first repair was correct (incorrect) ; think of the first repair (manufacture) as occurring at t = 0. Argue that 1 if X > t; M(t) = \l + M(t-X) if X<t and the 2nd repair is correct; (8.3) 1 + M' (t-X) if X<t and the 2nd repair is incorrect. Likewise , 49 1 if X' > t; M' (t) ]1 + M(t-X') if X' < t and the 2nd repair is correct; (8.4) .1 + M* (t-X 1 ) if X' < t and the 2nd repair is incorrect. Now if the various conditions are removed then according to the model there results the two linked convolution integral equations t t M(t) = 1 + a M(t-x)F x (dx) + (1-a) M(t-x)F x , (dx) . and (P. 5) M' (t) =1 + 3 M(t-x)F x , (dx) + (1-3) M(t-x)F x , (dx) (8.6) In turn, these two equations are susceptible to transforming: multiply by e -st and integrate to get M(s) = - + aM(s)F v (s) + ctM 1 (s)F v (s) S X X M (s) = ± + 3M(s)F x ,(s) + 3M(s)F x ,(s); matrix notation is natural here, especially if more than two repair states are used. If one then solves and collects terms to order (1/s) as s -> , Tauberian theorems, cf. Feller [1966] show that for large t M(t) - M 1 (t) ~ a + R aE[X' ] + 3E[X] (8.7) and a little reflection shows that this is entirely sensible. In similar ways variances can be written down, and an approximate Normal/Gaussian distribution for total failures may be derived. 50 The model can also be extended to account for the existence of non-zero repair times, and total availability studied. Here is another, somewhat more complicated, application of renewal theory ideas, now to an inspected system problem. Situation . (Standby system availability) . A system, such as an emergency electric power source, is usually in a quiescent or cold standby status, but occasionally is called upon to fill a function (e.g. generate power) . Various systematic plans might be devised for assuring reasonably high system availability, or probability of satisfying a demand when one occurs. One such plan is to inspect infrequently so long as no failure is detected between inspections, and otherwise inspect more frequently until evidence of need seems gone. Problem: develop a model to evalu- ate such an inspection scheme. Model . Let the inspection plan be to inspect at long intervals, {L., i=l,2,...} until such time as an inspection reveals a fail- ure, and then switch to short intervals {S., i=l,2,...}, con- tinuing until there has been a run of r(10,say) failure-free short-interval inspections, at which time switch back to long intervals; continue indefinitely. A measure of effectiveness is the long-run point availability of the system, i.e. the proba- bility that the system is failure-free on the occasion of a demand. To evaluate such a rule, allow the L. and S. sequences to be IID and mutually independent, with dfs f t/ x ) and F S ^ x ^ respectively; if desired these latter can be specialized to con- centrate at fixed values (e.g. 14 days and 1 day, respectively. 51 Furthermore, let A be the failure rate of a system failing at exponentially distributed intervals even when "cold." In order to analyze the system by renewal theory it is worthwhile to look at the periods during which inspection are infrequent, called L-eras , and those alternating with them, during which inspec- tions occur frequently, called S-eras. Note that a demand can occur during either type of era, L- , and S-eras constitute an alternating renewal process. The analysis involves the following components. (A) Distribution (density) of L-era duration. Suppose an inspection has just been completed at t = and nothing amiss has been detected. Furthermore, suppose that this inspection marked the end of the previous S-era, so an L-era is just beginning. Let a (dt) be the probability that the present L-era will last for time (t,t+dt), or, loosely, until exactly t. One can now write down a renewal equation for a L Cdt) : a L Cdt) = (l-e" Xt )F L (dt) + t e" At 'F T (dt')a_ (dt-f) ; (8.8) the first term on the rhs means that the L-era terminates with the first inspection, meaning that the unit has failed before L, . The second term represents survival through the first inspec- tion at which time the process renews itself or starts anew; final failure occurs at time t-t' thereafter. Failure or no failure at first inspection are mutually exclusive and exhaustive events, and so the result is a renewal equation for a (dt) . Introduce transforms to find 52 F (s) -F (A+s) a (s) = E[e -] = -±! £ . (8.9) 1 -F L (A + s) The mean of the L-era duration, denoted by L, is E [L] = E[L] . (8.10) 1-F L (A) (B) Distribution of S-era duration. If an inspection has just revealed a failure, an S-era begins immediately (take inspection and repair to be instantane- ous for the moment). Let a q (dt) be the probability that an S-era lasts for time t. Then the following renewal equation may be written: t a g (dt) =e" At F^*(dt) + h(dt')a s (dt-t'); (8.11) the auxiliary function h represents the probability that an inspection reveals a failure before the termination of the S-era in progress; this causes the frequent inspection to start over, i.e. starts the S-era afresh. In terms of transforms, S denoting an S-era duration, _ (F c (s+X)) r a c (s) b E[e" S -] = — 2 , (R.12) b l-h(s) and 53 h(s) = [F s (s)-F s (X + s) ] •{! +F s (s + X) + (F s (s+X)) 2 + ... + (F s (s+X)) r 1 } (8.13) F (s) -F (s+X) -S 2 [1-(F fs+X)) r ] ; 1-F (s+X) b The latter transform expresses the probability that the necessary run of r successes is interrupted by a failure, and must begin again. From the transform comes the expected length of an S-era: (F_(X))" r -l E[S] = E[S]{ — } . (8.14) 1-F S (X) (C) Probability that system is available during an L-era. The probability A (t) that the system is up at time t after the beginning of an L-era is simply e , the probability of no failure, and the transform is e St A L (t)dt = A L (s) = (X+s)" 1 . (8.15) (D) Probability that system is available during an S-era. If A (t) is the probability that the system is up after an S-era has progressed for time t, by renewal t A s (t) = e At F s r *(t) + h(dt')A (t-f) (8.16) where h has appeared before, under (B) . It follows that 54 A s (s) = 1 - (F s (s+X)) r (s+X) [l-h(s) ] (8.17) (E) Overall availability at t. Let A(t) denote the availability of the system at t. Again by backward renewal argument, and starting at the beginning of an L-era, A(t) = A L (t) + A s (t-f)a L (df) + A(t-t')a L * a s (df ) ; (R.18) transforms then give A(s) = A L (s) +A s (s)a L (s) 1 - a L (s)a s (s) (8.19) a Tauberian theorem now shows that lim A(t) = A L (0) +A S (0) E[L] + E[S] so long-run point availability is A(oo) = (A 1 ) { (F S (X)) -r E[L] (1-F L (X)) 1 + E[S] ((F S (X)) r -l) (1-F S (X)) 1 } . (8.20) The expression for A(°°) is easily evaluated numerically if expressions for L-interval and S-interval transforms are obtainable. It is sometimes useful to interpret 55 A (s) • s = A(t)e St sdt (8.2,1) as the availability of the system upon demand , the demand now occurring at a random time D, D having the exponential distribution with mean s . In this case the initial condi- tions matter (they do not, in the long run) , and a different availability figure is obtained depending upon whether the system is initially in an L-era or an S-era. 56 4. Additional Modeling Topics In this section are outlines of certain modeling topics that formed the basis for cooperative research at ULB . 57 (1) Distributional Sculpturing or Inverse Modification Standard distributions such as the Exponential or Gamma in particular, and also the Normal, log-Normal, and many others may reasonably and conveniently serve as components of a stochastic model or be used to summarize data distributions. For instance, inter-arrival times to service systems may appear approximately Exponential, service times nearly Gamma or log- Normal, and so on. On the other hand, a systematic departure from such a standard may be revealed by an analyzing actual data. It is then frequently possible to alter conventional distributions, or, equivalently , transform the random variable in simple ways in order to represent empirical reality more closely. Here are two conventional examples; there then follow some more general procedures. Example 1 . The Weibull distribution is often utilized to represent times to failure of components or times between sys- tem demands. Let T be a random variable (time, for instance) , then T is distributed in a Weibull manner if ( F T (t;a,3) = P{T < t} = ■ 1 - e -at 1 , t>0,a>0, 3>0 t < ; (1.1) The Weibull density is 3 f T (t;a,3) = e at " a 3 t 6 " 1 (1.2) 58 Furthermore, the Weibull hazard or failure rate at age t is ~ P{T 6 (dt) |T > t} E k T (t;a,3) f (t;a,3) R , = aBt P L ; (1.3) 1 - F T (t;a,B) The latter formula and also (1.2) imply that if 3=1 the Weibull is actually an Exponential; for this case "age" or "time in service" as measured by t does not influence the probability of failure in the next short time interval (t,t+dt) . On the other hand 3 > 1 implies that the rate of failure having survived to t — the hazard--increases with age, while, 6 < 1 implies that hazard decreases with age. If 3 < 1 the Weibull right tail is longer than that of the Exponential (extremely positively skewed) , while if 3 > 1 the positive skewness is less pronounced. The Weibull r.v. , T, is actually only an Exponential r.v., X, transformed or disguised, for if P{T < t} = 1 - e' at = P{T 6 < t 6 } , (1.4) then ~ r m 3 i -1 -ax P{T < x} = 1 - e Consequently, T = X , an Exponential r.v. , and T = X is a representation of a Weibull in terms of an Exponential. Sup- posing that one wishes to simulate a Weibull (a, 3) r.v., then one merely simulates an Expon (1) r.v. and raises it to the 59 (1/3) power, later multiplying by a~ . Since the power transformation is monotonic increasing, the quantiles of Weibull and Exponential are related by t(p) = (x(p)) e (1.5) Example 2 . The log-Normal distribution is a favorite model for system repair times (see Kline and Alvooq [l Q nn]« it has r^any other uses, and even some rational persuasions for its apparent re- semblance to empirical distributions (see Aitchison and Brown [1957]). Say that X is log-Normal if in x = Y is Normal (it doesn't matter what the base of the logs is.'); specifically 2 Y ~ N(\i,o ). Here are some properties: . ^1.2 2 • m k = e = E[X K ] u+~-a _ 2 9 9 • n^ = E[X] = e , Var[X] = (E[X]) / (e° -1) = (E[X]) n Y 1 (X) = skew[X] = n + 3n (1.6) y 2 (x) =kur[x] = n 8 + 6n 6 + I5n 4 + I6n 2 2 Median[X] = e y , Mode[X] = e y ° Supposing that one wishes to simulate a log-Normal random 2 variable, one simply simulates a W(y,a ) r.v. , Y, and Y exponentiates X = e 60 The above two familiar examples illustrate the for- mation of new and useful random variables and distributions by simple transformation. An intuitively appealing way of looking at certain transformations is in terms of modifica- tions to familiar random variables or quantiles by convenient shaping factors . This process will be called Distributional Sculpturing: Let X be a basic r.v., for example an Exponential, Normal or log-Normal. Define Y = Xs(X) (1.7) where the shaping function s(X) is designed to conveniently convert the basic r.v. X into a shaped version, Y , having desired distributional properties. Some examples of the prop- erties often desired in practice, along with suitable — but not unique--shaping functions, now follow. (A) Skewness-Producing Shapers Example: X is a positive basic r.v., e.g., Exponential. (i) s(X) = 1 + AX £ , A > , I > d.8) (ii) s(X) = e AX' Effect: Y = X s(X) (i) (ii) X if X "small" X + AX ,1 £+1 >> X Xe AX' much greater than X if X "large" 61 The particular shaping functions (i) and (ii) both leave small values of X unchanged, but considerably expand large values, thus transforming the distribution of X , e.g., the Exponen- tial, to one that is nearly Exponential near the origin, but having a relatively long right tail. For example, take shap- ing function (2) with 1 = 1 and apply to X Exponential. The shaped distribution becomes F Y (y;A) = 1 - exp[- ( /1+4 ^ ^) ] . (1.9) Examination shows that for small y (Taylor series) the dis- tribution of Y is nearly unit Exponential, while for large Y it resembles a Weibull with shape parameter (3 = 1/2. Shap- ing function (ii) has an even more pronounced effect on the right tail. Note that both shaping functions (i) and (ii) yield monotonic increasing transformations from X ■* Y , and that given by (i) is sometimes explicitly algebraically in- vertible (solve quadratic equation when 1=1, cubic when 1=2, quartic when £ = 3) , while that of (ii) is not. Also note that useful transformations result when parameter A < 0: this actually may result in right tail truncation (severe shortening; such transformations are no longer monotonic. Moments . The moments of shaped or sculptured r.v.s. can some- times be conveniently calculated. For the present represen- tations: 62 (i) Y = X(l + AX^) ; m (Y) = m 1 (X) + Am 1+Jl CX) m 2 (Y) = E[(X(1 + AX £ )) 2 ] = m 2 (X ) + 2Am 1+£ (X) + & 2 ™ 2 +2l {X) (1.10) Var[Y] = Var[X] + 2ACov[X, X £+1 ] + A 2 Var [X £+1 ] . Also, to indicate the dependence between the stretched r.v. Y and the basic r.v. X , m k (Y-X) = A k m 1 CX U+1)k ) = A k m u+1)k (X) so and Var[Y-X] = A 2 Var[X £+1 ] (1.11) Cov[Y,X] = Var[X] + ACov[X f X £+1 ] . (1.12) The quantiles are directly and simply related by y(p) = x(p) (1 + A(x(p)) £ ) . < p < 1 (1.13) I AX (ii) Y = Xe ; but for the present consider only £ = 1 . Then the k moment is expressible in terms of the derivatives of the Laplace transform of X . Note that the k moment does not necessarily exist for all basic distributions. 63 When it does, m k (Y » - (-D k ^E[e- sX ] s = -kA (1.14) For example, if X ~ Expon (1) , m k (Y) = k' (1 - kA) k+1 if kA < 1; (1.15) otherwise the moment doesn't exist because the distribution's right tail is too long. (B) Symmetric Stretch-Producing Shapers Example: Z is a r.v. symmetrically distributed around 2, zero; e.g., N(0,a ) . Here are some useful shapers: 2 (i) s(Z) = 1 + hZ~ , h > ; hz 2 (ii) s(Z) = e (due to J. W. Tukey) . Again (i) and (ii) imply that Y = Zs(Z) resembles Z for small Z, but lengthens the tails of the distribution for large Z . Example 1 . Stretched log-Normal variables may be suggested for modeling repair or service times if data analysis indicates that the logarithms of observwd times are symmetric but not nearly Normal, having symmetrically too-long tails. Then it may be convenient to use the representation 64 Y = e (1.16) X = \x + cZs(Z) where u is the center (corresponds to mean) of the logged observations, and the sale constant c is the standard devi- ation (spread parameter) of the variable Zs(Z) , replacing o in the ordinary log-Normal formula. Moments . The moments of the shaped distribution are, for (i), representable in terms of those for a basic Z . (i) Y = Z(l + hZ 2 ) m 1 (Y) = m (Z) + hm 3 (Z) , = (Z symmetrical around zero) . 2 m (Y) = m 2 (Z) + 2hm 4 (Z) + h m g (Z) . m 4 (Y) = m 4 (Z) + 4hm g (Z) + 6h 2 mg(Z) + 4h 3 m 10 (Z) 4 + h m, 2 (Z) . ? 2 , ... „ hZ ; consider only Z ~ W(0,a ). Calculate li Y = Ze (1.17) as a preliminary /> z2 ,T" 2/ ° 2 _1_ - u _ Zha 2 )- 1 / 2 , h < 2a 2 (1.18) /2ttq 65 (1.19) Then differentiate repeatedly with respect to h to obtain the even moments (the odd moments equal zero) : 2 Var[Y] = m 2 (Y) = - 2 3/2 , h < l/4a" ; (1 - 4ha ) ' m 4 (Y > = ~ H 2,5/2 < h < i/ 80 ' • (1 - 8ha ) ' Hence kurtosis is m 4 (Y) 3(1 - 4ha 2 ) 3 Y ? = — * = ^ o L 9 - 3 (1.20) (m 2 (Y)) Z (1 - 8ha^) *' z Tails are so extended that the kurtosis becomes very large in the case of (i) , and actually infinite for rather small values of h in (ii) ; the variance remains finite for slightly larger values. Nevertheless, the central part of the Y- distribution remains remarkably close to the Normal from when Z is itself Normal. Both forms (i) and (ii) can be induced to fit the inverse distribution (percent points) of the Student's t distribution fairly satisfactorily. (C) Left Tail Enhancement Shapers Example: X is a positive r.v., e.g. Exponential. (i) s(X) = 2£_ , i > o, a > 0; 1 + aX 66 (ii) s(X) = 1 - e Ciii) s(X) = e~ a/X -aX' (1.21) This shaping function tends to concentrate probability near zero, leaving the distribution's shape unchanged for large X : Y = Xs(X) ~ < X (i) (ii) if X large V aX £+1 << X, X small (1.22) J Moments are generally impossible to find in useable form, although in the case of (ii) , £ = 1 , explicit results can be found in terms of Laplace transforms: suppose X ~ Expon(l) , then under (ii) E[Y] = E[X(1 - e aX ) ] = 1 - (1 + a) 2 ' (1.23) the mean of Y is influenced very little when a is large, but small values of X are made even smaller; for the Expon(l) X the quantiles are related as follows: i \ r \ n aln(l-p) -, (p) = x(p) [1 - e ^ J / (1.24) so for a fixed p > large a forces y(p)/x(p) to one. However, for a fixed a and small p 67 *44~ ap x(p) - ^ (1.25) showing that low quantiles for X transform into even smaller values for Y . Furthermore, it is easy to show that if X,, » 1 ( 1 ; n) is the minimum in a sample of n from X ~ Expon(l) , and -aX Y n \ = x ri x [1 - e (l;n) (l;n) L (l;n) then ( 1 ; n) J a E[X (l;n) n + a as n -> oo (1.26) All of this reinforces the image of the present shaper as forcing small X-values to become smaller, leaving large values unchanged. Note that the result of the shaper (i) , £ = 1, can be explicitly inverted, giving the distribution F y (y) = F x "y(l + /1+4/ y) (1.27) from which the left-tail enhancement property shows itself explicitly: 68 f Y (y) = f x y(l + /1+4/gy h& /a /a /y f x (y) 1 + y + C/a) y 2 - 4)y for y small for y large •(|) (1.28) Note that the hazard associated with the latter transformation is, when X is an Expon(l) r.v., h Y (y) = hi + y + v } /y + (4/a)y l//ay for y small for y large (1.29) Thus the density of an Exponential X remains, under this transformation, Exponential-like in the right tail but approaches infinity near the origin. A distribution with such behavior may well be useful for modeling failure data exhibiting early failures ("infant mortality.") (D) Right-Tail - Shortening Shaper Example: X is a positive r.v., e.g. an Exponential. To shorten or truncate the right tail, consider 69 (i) s(X) = 1 + 3X 9, 3 > o, a > o (1.30) (ii) s(X) = e aX , a > 0/ £ > The shaper (i) with I = 1 is a right tail shortener or trunca- tor; it provides a monotonic transformation to Y <_ 3 ; small values of X are left nearly unchanged. The quantiles of Y are x(p) for p small ^P> = 1 ^ X ( P ) - • (1.31) ,-1 for p large if x(p) >> 3 -1 Shaper (ii) is non-mono tonic: values of Y = Xs(X) increase to (ai) , decreasing thereafter. The inverse of (i) yields V^ = Vr^-s^ ' o i y i 3 -i (1.32) with density f v ( ^ } = f x ( l - Bv } " 2 ; y XI 3y (1 _ &y) ^ (1.33) the latter approaches infinity at a rapid rate as y nears 1/3 • 70 Shaper (i) with £ = 1/2 is a tail thinner; it can be inverted to give F v (y) = F Y [(By) 2 (^-^HIZE)2 ] . (1>34) with density f Y (Y) = f x C3y) 2 (il^tiVM) 2 ][i i (1+/1+(4/6 2 y) } (1 + i +(2/By) )3 /l+(4/6 2 y f Y (y) for y small ■x - { (1.35) 2 2 2 f (3 y )23 y for y large ; if X is Exponential, then the hazard rate of Y is initially flat, but eventually increases linearly with age, thus produc- ing a plausible wearout model for equipment or biological organisms. The above examples illustrate a few of the large number of possible ways in which the sculpturing idea can be used to extend the descriptive power of standard distributional models. Problems of fitting such models to actual data are currently being addressed, as are applications to simulation and time series modeling. Use of shaped exponentials to evaluate scheduling procedures has been initiated in collaboration with P. Collard at ULB. 71 (2) Response Times Under Processor Sharing Consider a simple model of a time-sharing computer sys- tem in which N terminals access a single computer (server) . Let repairman-model conditions prevail, but processor sharing governs the service order: if j jobs (0 <_ j <_ N) are at the execution stage each receives one-j — of a time unit of ser- vice per time unit. In other words, if it is the service rate (u is the expected service time under Markov-exponential assumptions) , then the unconditional probability that any des- ignated single job finishes in (t, t + s) is y(— ) + 0(A) . Under such conditions it is possible to derive backward- type equations to describe the response of waiting time, R , of a newly-initiated "tagged" job arriving from a previously idle terminal. In particular, consider m.(t) = E[R|W(R) = t , X(0) = j] , (2.1) where W(R) is the amount of work or processing time required of the server, and X(0) represents the number of jobs cur- rently at the processor when the tagged job first arrives (including that job). Thus m.(t) is the expected response time, conditional on need and accompaniment. Additionally, introduce r(j) as the fraction of a time quantum, D , actu- ally available for job processing when j jobs are present at the computer; r(j) represents one component of overhead, and may decrease as j increases. For short let A. represent the rate of new arrivals when j are present at the server; 72 under stated model conditions A. = A(N-j) . Let y. be the rate at which jobs accompanying the tagged job depart; under r j + 0(A) . These assump- stated conditions y. = y(j-l) tions, or those for a general birth and death process, lead by a backward-conditioning to this equation: m . Ct) = a + m .[ t - (^)a)[i - <A. + m .)a] + A.Am j + 1 t ~ r(j) A j + y . Am . , t - r <J> A j + o(A) (2.2) .tract m. (t - ^4-^-) f: 3 3 let A -» to obtain Subt m. (t - " N y " ) from each side, divide by A , and ^ dm.(t) — i = 1 - (A. + y ) m ( + ) + A. m..,(t) + y. m. . ( + ) (2.3) Ul - J J J J J -1 "- 1 - J j~ j - r(j) I 3 J This is a standard system of linear differential equations with constant coefficients that may be solved by standard methods. If r(j) is constant, and the repairman model assump- tions are fulfilled, then it has been shown by G. Latouche that e[r|w(r) = t] = c t , (2.4) i.e. is linear in t , with C depending upon A and y . See the article by Mitra [1981] for more detail concerning this problem, 73 ( 3) Repairman Model in Random Environment Suppose we have m machines (this may mean electric power generators, or even remote computer terminals) that, when in use, fail independently (computer terminals: apply for processing time, or data) at rate A(t) , and, if failed are repaired at rate u(t) ; all processes are Markovian, given A(t) and u ( t) , t >_ . Now let A(t) and y(t) themselves be realizations of a finite-state Markov process that develops independently of the number N(t) of machines down for repair. If N(t) is the number down at t , and J(t) is the under- lying environmental state, then {N(t), J(t)} is a bivariate Markov process, and N(t) change is governed by the current level of the environment J(t) . The latter environment may refer to physical conditions such as heat, seismic shock, or to variations in repair effectiveness. In the case of computer terminals or communication nodes the environmental variations may be the result of changes in message transmissions or data demands under occasional crisis conditions. The paper by Gaver, Jacobs, and Latouche [19 81] presents a systematic mathematical analysis of the general birth-and- death process in random environments, including the above re- pairman model as a special case. Numerical illustrations are provided. Here we present a truncated version of the solution to the first-passage time problems, utilizing a recursive or "clawing-up" mode of thinking analogous to developments in Section ( ) of this account. Restrict discussion to just two 7^ environmental levels, denoted by j = 1, 2 . Let the transi- tion rate from environmental state j = 2 -*■ 1 be a , and from j = 1 -+• 2 be 3 • Let ^ n (j) be the transition rate from (n,j) ■> (n+1, j) , and P n (j) be that from (n,j) ■*■ (n-1, j). Let, as before, U be the first-passage time from n to n + 1 , and put G n (dx;i,j) = P{U n e(dx), J (U n ) = j|N(0) = n, J(0) = i} , (3.1) with i, j = 1, 2 ; the L.-S. transform of this measure is called -sU G n (i,j;s) = E[e n , J(U n ) = j |N (0) = n, J(0) = i]; (3.2) this is the transform of the time to pass for the first time from a state in which n are down, the environment being in state i at some initial instant Ct = 0) , to n + 1 down, the environment being in state j . Simple considerations of cases that may arise during the first transition give, first for i = 1 , A (1) y (1) 2 G n (1 '^ s) = d i riT £ j (1) + dnnr X G n -i (1 ' k ' s) G n (k ^ ;s) n n k — x (3.3) + dTD G n < 2 '3;s) where j = 1, 2, the indicator function 75 if i = otherwise V j) = ■{ (3 - 4 > and the denominator d n (i) = A n (l) + y n Cl) + 3 + s . (3.5) Likewise, for i = 2 , X (2) u (2) 2 G n (2 '^ s) = cTT2TV 2) + ^T2T I G n-l (2 ' k;s) G n (k '^ s) n J n k=l (3.6) + dT2) G n (1 ^ ;s) n (the equivalent of these equations in Chu and Gaver [ ] accidentally incorrectly omits the final term on the rhs) . The above equations can in principle be solved recursively, beginning with A (1) a G o (1 '^ s) dTUT V 13 + dTOT G o (2 ' j;s) (3.7) '0 V " J ~0 where G (2 '^ s) " dfuT £ j (2) + d^2T G (1 '^ s) 3 = 1.2. d Q (l) = A (l) + 3 + s . d Q (2) = X Q (2) + a + s . The first-passage time T R from N(0) = and J(0) = i (i=l,2) to n + 1 has transform 76 -sT P (i,j;s) = E[e n , J(T ) = j|N(0) = , J(0) = i] ; n - . n - in matrix notation, £ n - ie % • • • £ n and the L.-S. transform of the first-passage time to n + 1 T in P £ , where Jl = (.1,1) , a column vector. Differentiation of expressions (3.2) and (3.5) produces recursive expressions for means, variances and higher moments. Programs have been written to evaluate these expressions numerically. 77 REFERENCES [1] AITCHISON J., and BROWN J. A. C. (1957) . The Log-Normal Distribution . Cambridge University Press, Cambridge, England. [2] BENDER E.A. (1978). An Introduction to Mathematical Modeling . John Wiley and Sons, Inc. , New York. [3] BURMAN D.Y. (19 79). An Analytic Approach to Diffusion Approximations in Queueing . Ph.D. Dissertation, NYU Courant Inst. [4] COX D.R., and SMITH, W.L. (1961). Queues . Methuen Monograph. John Wiley and Sons, Inc., New York. [5] COX D.R. (1962). Renewal Theory . Methuen Monograph. John Wiley and Sons, Inc., New York. [6] COX D.R. (1969) . Analysis of Binary Data . Chapman and Hall. London. [7] CHU B.B., and GAVER D.P. (.1977). Stochastic models for repairable redundant systems susceptible to common mode failure. S.I. A.M., Proc. of Int. Conf. on Nuclear Systems, Reliab. Eng. and Risk Assessment , Gatlinburg, Tenn. , 342-367. [8] CHUNG K.L. (1967). Markov Chains with Stationary Transition Probabilities. 2nd Ed. Springer-Verlag, New York. [9] CRAMER H. (1946). Mathematical Methods of Statistics . Princeton University Press, Princeton, NJ. [10] DANIELS H.E. (1954) . "Saddlepoint approximations in statistics". Ann. of Math. Statistics, Vol. 25, pp. 631-650. [11] FELLER W. (1951, 1966) . An Introduction to Probability Theory and Its Applications Vols. I and II . John Wiley and Sons, Inc., New York. 78 [12] GAVER D.P., and JACOB P. A. (1981). "On combinations of random loads". S.I. A.M. Jnl. Appl. Math. , Vol. 40, pp. 454-466. [13] GAVER D.P. and LEHOCZKY J. P. (1981). "Diffusion approxi- mations for the cooperative service of voice and data messages". J. Appl. Prob . , Vol. 18, pp. 660-671. [14] GAVER D.P., JACOBS P. A., and LATOUCHE G. (1981). "Finite birth-and-death models in randomly changing environments" Report Interne No. 121, Lab. D'Informat. Theor. , Univ. Libre de Brux. [15] GAVER D.P., and THOMPSON G.L. (1973). Programming and Probability Models in Operations Research . Brooks- Cole Publ. Co., Monterey, CA. [16] GAVER D.P., and SHEDLER G. (1973). "Approximate models for processor utilization in multiprogrammed computer systems". S.I. A.M. J. Comput. , Vol. 2, No. 3, pp. 183-192. [17] GAVER D.P. (1969) . "Highway delays resulting from flow- stopping incidents". J. App. Prob. , Vo. 6, pp. 137-153. [18] GAVER D. P. (1962) . "A waiting line with interrupted service, including priorities". J. of the Royal Stat. Soc , B, Vol. 25, pp. 73-90. [19] HARRISON J.M., and REIMAN M.I. (1981). "Reflected Brownian motion on an orthant" . Annals. Prob. , Vol. 9, pp. 302-308. [20] JACOBS P. A. (1980). "Heavy traffic results for single- server queues with dependent (EARMA) service and arrival times. Adv. Appl. Prob. , Vol. 12, pp. 517-529. [21] JACOBS P. A. (1978) . "A cyclic queueing network with dependent exponential service times". J. of Appl. Prob. , Vol. 15, pp. 573-589. [22] KARLIN S., and TAYLOR H.M. (.1975). A First Course in Stochastic Processes. Academic Press, London. 79 [23] KLEINROCK L. (1976) . Queueing Systems, Vols. I, II . John Wiley and Sons, Inc., New York. [24] KLINE, M.B., and ALMOG, R. [1930]. "Suitability o^ the loanormal distribution for repair times. Second Ins t. Conf. on Reliability and Maintainability Perros-Guirec- Tregastel, France, Sept. 1980. [25] KOBAYASHI H. (1981). This volume. [26] LEVY D. , and KAHN E. (1981). "Accuracy of the Edgeworth expansion of Loss-of-Load Probability calculations in small power systems". Summer Mtg., IEEE Power Eng. Soc. (IEEE Trans, on Power Apparatus and Systems, to appear) . [27] LEWIS P.A.W., and SHEDLER G.S. (1979). "Simulation of non-homogeneous Poisson processes by thinning". Naval Res. Log. Quart. , Vol. 26, No. 3, pp. 403-413. [28] LUGANNANI R. , and RICE S. (.1980). "Saddle point approxi- mation for the distribution of the sum of independent random variables". Adv. in Applied Prob. , Vol. 12, No. 2, pp. 475-490. [29] McNEIL D.R., and SCHACH S. (1973). "Central limit analogues for Markov population processes" (with discussion) . J. Roy. Stat. Soc. , Vol. 35, pp. 1-23. [30] McNEIL D.R. (1970). "Integral functionals of birth and death processes and related limiting distributions". Annals, of Statistics , Vo. 41, No. 2, pp. 480-485. [31] MITRA D. (1981). "Waiting time distributions from closed queueing network models of shared-processor systems". Proc. Eighth Internat. Symp. Comp. Performance Modelling Measurem. and Eval. Amsterdam, The Netherlands. [32] NEUTS M.F. (1981). Matrix-Geometric Solutions in Stochastic M odels - An Algorithmic Approach . The Johns Hopkins University Press, Baltimore Maryland. [33] NEUTS M.F., and MEIER K.S. (1981). On the use of phase type distributions in reliability modelling of systems with a small number of components. OR Spektrum, 2 , 227-234. 80 [34] NEWELL G.F. (1979). Approximate Behavior of Tandens Queues Springer-Verlag. Berlin, Heidelber, New York. [35] NEWELL G.F. (1971). Applications of Queueing Theory . Chapman and Hall. London, England. [36] PREGIBON D. (1981). "Logistic regression diagnostics". Ann, of Statistics , Vol. 9, pp. 705-724. [37] REIMAN M.I. (1982). Open queueing networks in heavy traffic" . To appear in Math, of Opns. Res. [38] TAKACS L. (1962). Introduction to the Theory of Queues Oxford University Press, New York, NY. DISTRIBUTION LIST NO. OF COPIES Library, Code 0142 4 Naval Postgraduate School Monterey, CA 9 3940 Dean of Research 1 Code 012A Naval Postgraduate School Monterey, CA 9 39 40 Library, Code 55 2 Naval Postgraduate School Monterey, CA 93940 Professor D. P. Gaver 14 8 Code 55Gv Naval Postgraduate School Monterey, CA 93940 Chief of NavalResearch 2 Arlington, Virginia 22217 Defense Technical Information Center 2 ATTN: DTIC-DDR Cameron Station Alexandria, Virginia 22314 82 DUDLEY KNOX LIBRARY - RESEARCH REPORTS 5 6853 01067370 1)2022;