UNIVERSITY OF

ILLINOIS LIBRARY

AT URBANA-CHAMPAIGN

BOOKSTACKS

CENTRAL CIRCULATION AND Bnrwcr

T^e person borrow W 7u B00KSTACKS sponsible for its reneL, materiaI is re" the Lot** Date s2££3 h?^ before be charged a mini*.? r C,OW- You may each non-refu^d':;^^ °f $75°° S

JAN 0 4 2000

wai/j

1999

bTlow DreneWl'ng by Phone, CiOW P^vious due date.

write new du

e date L162

Digitized by the Internet Archive

in 2011 with funding from

University of Illinois Urbana-Champaign

http://www.archive.org/details/ongenericnonconv1528feld

rx

BEBR

FACULTY WORKING PAPER NO. 89-1528

On the Generic Nonconvergence of Bayesian Actions and Beliefs

THE LIBRARY OF THE

FFB 2 3 1989

UNlvtrta'H O. ILLINOIS .... ipAIGN

Mark Feldman

College of Commerce and Business Administration Bureau of Economic and Business Research University of Illinois Urbana-Champaign

BEBR

FACULTY WORKING PAPER NO. 89-1528

College of Commerce and Business Administration

University of Illinois at Urbana- Champaign

January 1989

On the Generic Nonconvergence of Bayesian Actions and Beliefs

Mark Feldman, Assistant Professor Department of Economics

An earlier version of this paper was presented at a conference on "Endogenous Learning in Economic Environments," Cornell University, June 1988. I am grateful to Roger Koenker, Andrew McLennan and the participants of the conference for helpful discussions and comments.

ABSTRACT

Many authors have recently modeled learning by Bayesian decision-makers and by appeal to the Martingale Convergence Theorem proven that posterior beliefs converge a.s. from the point of view of the decision-maker. This paper emphasizes that this is not equivalent to a.s. convergence from the point of view of an observer who knows the true parameter value. For a variant of die two-armed bandit problem it is demontrated that for a residual family of parameter values and priors with objective probability one posterior beliefs of the Bayesian decision-maker never converge. JEL Classification Numbers: 022, 026, 211.

-1-

I. INTRODUCTION

Recently, there has been a revival of interest in studying the asymptotic dynamics of Bayesian learning and control in economic environments. One set of authors, including Easley and Kiefcr [9, 10], Kiefer and Nyarko [16], McLennan [18], and Nyarko [19], have analyzed single agent decision problems for which there is a tradeoff between current period reward and the value of the information generated by the current period action. In another strand of the literature, Blume and Easley [5], Bray and Kreps [6], and Feldman [11, 12] attempt to characterize the tail of the sequence of beliefs and outcomes for economies with many passively learning agents. Specifically, these latter papers focus on whether Bayesian learning by agents with a correct specification of the underlying structure but uncertainty regarding the parameter values is a sufficient condition to assure convergence to a stationary rational expectations equilibrium.

These articles, as well as important earlier contributions of Cyert and DeGroot [6], Rothschild [23], and Townsend [26], have the following common framework. From the vantage point of the economic actors, the set of possible complete descriptions of the relevant time-invariant economic data can be represented as a separable metric space 0 with Borel a-field B. The actors in the model, uncertain as to the "true" e 0, have prior beliefs |i(0) on (S,B).1 A result common to this literature is a "theorem" that the sequence (H(t)} of posterior beliefs almost surely converges weakly to |i(°°), the posterior belief conditioned on the limit sub-<y-field.

In all of the more recently authored papers the Martingale Convergence Theorem is invoked to establish a.s. convergence of the sequence of Bayesian beliefs, where the a.s. is with respect to the prior probability measures of the Bayesian agents. An essential point of this paper is that this is not necessarily equivalent to stating that a passive observer who is informed of the "true" parameter would attach probability one to the event that the sequence of beliefs {n(t)} of the Bayesian agents would converge to a limit belief |i(°°).

To elaborate on this distinction, suppose that in period t = 0, 1, ... , agents observe outcomes in a separable metric space Y. Given the behavioral rules of the agents, each parameter value 9 e 0 induces a probability measure Qq on the product space Y°°. The prior p.(0) induces a measure Pfi(0) on Y°° defined by P|i(0)(A) = Qe(A)PjI(0)(d9). The application of the Martingale Convergence Theorem yields a.s.

convergence of (|i(t)}, where the a.s. statement is with respect to the probability measure Pjj.(0)- But this does not imply that for any particular 9 that |i(t) => |i°° with Qq probability one, even if 9 is in the support ofn(0).

One might hope to establish a result that for a "large" class of priors, posteriors converge for a "large" class of parameter values. When 0 can be embedded in finite-dimensional Euclidean space, one has recourse to Lebesgue measure m (restricted to 0) as a natural notion of size. Then if m « |i(0) the exceptional 9 set, the set of 9 such that Qe((p.(t) => n°°}) < 1, has m measure zero. But in many naturally occurring

Some authors allow for heterogeneous beliefs across agents.

-2-

settings 0 is not finite dimensional, and since there is no infinite dimensional analogue of Lebesgue measure, a measure-theoretic criterion is unavailable.

In lieu of a reference measure to evaluate size, the customary procedure is to resort to the topological notion of category. Residual subsets are deemed to be large, and subsets of first category (which are complements of residual subsets) are regarded as small.

The formal analysis in this paper focuses on the two-armed bandit problem with a countable number of potential outcomes associated with each arm. Building upon results of Freedman [13] we prove that mere is a residual set of pairs (0, \i) of parameter values and prior beliefs such that with Qq probability one posterior beliefs never converge. Furthermore, there exist reward structures with one arm more favorable than the other for an informed decision-maker; yet for a residual set of pairs (9, \x) both arms will be played infinitely often with Qq probability one.

II. NOTATION AND MATHEMATICAL PRELIMINARIES II. 1. Notational Conventions

The set of real numbers is denoted by R. The set of natural numbers is denoted by N = { 1, 2, ... }. A generic element of N is denoted by either i, n or L The set of non-negative integers is denoted by No = {0, 1,2,}. If Z is an arbitrary set and A c Z, the indicator function for the set A is denoted by Ia If W is a topological space, then C(W) is the Banach space of bounded, continuous real- valued functions defined on W with the sup norm. P(W) is the space of probability measures on W endowed with the topology of weak convergence. P(P(W)) is denoted by P2(W). Unless otherwise specified all a-fields are the Borel a-field. If (W,W0 is a measurable space and w e W the Dirac measure 8W e f(W) is defined by 8W(A) = 1 if w e A and 5W(A) = 0 if w e A, for A e W.

II. 2. Category Theory

I shall briefly review the concepts relating to Baire category. Standard references include Kelley [17, pp. 200-203], Oxtoby [20] and Royden [24, Section 7.8]. Let W be a metric space. A set E c W is nowhere

dense if E has empty interior. A set E is of first category or meager if it is the union of a countable collection of nowhere dense sets. If a set is not of first category then it is of second category. The complement of a set of first category is a residual set.

According to the theorem of Baire [24, Theorem 7.27], if W is a complete metric space then the intersection of a countable family of open dense subsets of W is itself a dense subset of W.

III. THE MODEL III.l. Preliminary Description

There are a countable number of time periods, with periods indexed by t = 0, 1, 2, .... At the beginning of period t the Bayesian decision-maker selects an action (or bandit-arm) x(t) e X = [x\, X2).

-3-

After choosing action Xk, an outcome y(t) e Yk = (yki, yk2» ...} is observed, and a period reward Rk(y(0) is received. The range of Rk is bounded with cfc = inf(Rk(yki): yki e Yk}.

The action x(t) and outcome y(t) are the realizations of the random variables X(t) and Y(t). The probability law specifying the distribution of Y(t) conditional upon X(t) = xk is 9k e /'(Yk), where Yk is endowed with the discrete topology. The objective probabilities 0j and 62 are respectively elements of the parameter subspaces ©1 = P(Yi) and ©2 = P(Y2), and the vector = (8j, 82) is an element of the parameter space 0 = @i x ©2. 9^ is initially unknown to the decision-maker, and so is viewed from her perspective as the realization of a random element with distribution n(0) e P(&). (When it is clear from the context that n represents a prior belief, we sometimes write |i instead of |i(0).)

The objective of the decision-maker is to maximize, with discount factor a e (0, 1), the expectation of

the present value of the reward stream. Denoting for notational convenience the indicator function IXk: X » {0, 1 } by Ik, for a given sequence {x(t)j of actions and a given sequence (y(t)} of outcomes, the

00 2

total reward to die decision-maker is 2^al-{ £lk(x(0)Rk(y(0))- So the optimization problem to choose a

t=0 k=1

policy (X(t)} (a precise definition of a policy is provided in section III. 3) that maximizes

00 2

E[^al{ £lk(X(f))Rk(Yk(t))}] where the expectation is taken with respect to the probability measure t=0 k=l

induced by her prior belief fi(0).

The policy or "rule" that the decision-maker abides by in choosing the sequence (x(f)} depends upon a

and |i(0). While a remains fixed throughout the paper, our goal is to study the sensitivity of the tail

behavior to variation in the prior beliefs. To emphasize this dependence of actions upon the prior belief

^i(O), we will often write X(t, ^i(O)) or X(t, n) and Y(t, |i(0)) or Y(t, |i) for respectively X(t) and Y(t).

III.2. Specification of the Probability Spaces

Endowed with the Prohorov metric, © is a complete and separable metric space. The Borel subsets of ©1, ©2, and © are denoted respectively by B], B2, and B.

The set of possible outcomes from repeated play of arm k is Yk°°. Endowing Yk°° with the product topology, the product a-field is denoted byfy00 '. To facilitate the analysis we elect, without loss of generality, to work in representation space (for further details see [8, Section 1.6]). Accordingly, we define the measurable spaces (Q\, Fj), (^2. ^2) and (Q, F) by (Q^ Fk) = (Yk°°, Yk°) and (Q, F) = (^1 x &2. F 2 x P2)- A generic element of Qk is denoted as cok = (u)ki, 0)k2. •••)• Associated with each 9 = (9i,e ©, is an "objective" product probability measure Qe = Qej x Qq2 on (Q, F) defined by Qek=

9k x 9k x ... . The interpretation is that: (i) cokn is the element y^i e Yk that is observed on the n'th

■4-

occasion that action xk is chosen, and (ii) that conditional upon parameter 8k, the outcomes {(Okl, 0)^2. •••} fr°m choosing xk are i.i.d. with distribution 8k.2

The product outcome space is (Q, F ) = (Q.\ x Q.2, F\ x F2). For 8 = (61,62) e 0 the "objective" product probability is Qe = Qej x Qq2. The set {(Q, F, Qe): 8 e 0} is the family of objective probability

spaces.

To avoid extraneous complications we make a restrictive independence assumption that the prior belief \i g P(&) can be decomposed as the product probability [i\ x (12 of the marginal distributions.3 The induced prior probability P^ on (Q x 0, F x 5) is defined for measurable rectangles A x B, by P^(A x B) = |BQe(A) ji(0)(d8) for A e F and B efl.4 By the Product Measure Theorem [1, Theorem 2.6.2] there is a

unique extension of P^. to (Q. x 0, F x B ). The family of subjective probability spaces is {(Q x 0, F x B, P^): \i e P(Q)}. Slightly abusing notation, the marginal distribution of P^ on (Q, F) and (0, 5) is also denoted by Pu. So for example for A e F, P^(A) = Pp.(A x 0).

III.3. Histories and Policies

A precise statement of the optimization problem is provided in this section. We start by specifying what constitutes an admissible plan. Included in the definition of an admissible plan are the count functions Ck(t): ft > No, k = 1,2. The realization Ck(t)(co) represents the number of times arm k has been chosen through time t.

An admissible plan is defined as sequences of random variables {X(t)}, {Y(t)}, (Ci(t)}, and (C2(t)}, and a sequence of sub-a-fields Ht such that:

\)H0 ={0,Q)

ii) X(0) is Ho measurable,

iii) If X(0)(co) = xk, then Y(0)(co) = coki,

iv) for t > 1, Ht = Ht.j v o(X(t-l), Y(t-l)),

v) X(t) is Ht measurable,

vi) Ck(0) = Ik(X(0)), and Ck(t) = Ck(t-1) + Ik(X(t)), and

vii) If X(t)(co) = xk and Ck(co) = n, then Y(t)(co) = w^.

A policy is a sequence (X(t)} for which there exists a (unique) corresponding admissible plan [{X(t), (Y(t)}, (Ci(t)), (C2(t)}, [///}]. An optimal policy for the prior /i maximizes

■^Technically, it is the projections (Proj(k, n)), defined by Proj(k, n)(o>) = co^ which are independent.

-'Note the contrast with the assumptions of Rothschild [23]. Rothschild allows for the arms to be dependent but

requires the outcome space be a two point set. 4For the integral to be well-defined it is necessary that the map 9 > Qe(A) be measurable. This is a consequence of

[21, Lemma 6.1].

-5-

C oo 2 la1- { IIk(X(t)(o)))Rk(Y(t)(a)))}P^(d(o) over the set of policies. t=0 k=l

a

III.4. Posterior Beliefs and The Optimal Bayesian Policy

When possible, beliefs are updated according to Bayes rule. When an event of prior probability zero occurs, the updating is arbitrary. Since none of the results of this paper depend upon the choice of version of conditional probability, without loss of generality we shall assume that the decision-maker ignores any outcome to which she assigned a prior probability of zero. More precisely, the posterior map rk: /'(©k) x Yk -> P(e\d> k = 1, 2 is defined by: For all A e Bk ,

jek(yki) Uk(d6k)

TkOik, Yki)(A) = whenever Jek(yki) Wc(d9k) > 0, and

Jek(yki) Wc(dek) ©k

rk(|ik, yidXA) = (ik(A) whenever Jek(yki) |ik(d0k) = 0.

©k

For prior [i = m x \x.2, the sequences {vi(n, M-i)}nro and {V2(n, \^1)}^1q of updating functions, vk(n, ilk): Q P(&k) are recursively defined by: vk(0, ^((0) = nk,

vk(n, ^k)(co) = Tk(vk(n-1, ^ik)(o)), CDkn) for n > 1. The interpretation is that vk(n, Hk)(co) is the posterior measure on (©k, B0 after arm k has been chosen n times with prior measure |ik.

Associated with a policy (X(t)} and the prior belief p. = |ii x 112 is a sequence of posterior maps [\x(i)} ~r The map n(t): Q. -> P(&) is defined by n(t)(co) = m(t)((o) x |i2(t)(o)) where ^ik(t)(co) = vk(Ck(co), nk)((0).

A policy (X(t)} is stationary if there exists a measurable function 71: P(Q) » X such that X(t)(co) = ^(|i(t)(u))) for all 0) e Q. Identifying a function 71: P(&) X with the stationary policy it induces, no confusion should result from referring to k as a stationary policy.

Invoking now well-known results (for details see [9], [15] or [21]) the Bayesian optimal control problem can be reformulated as a dynamic programming problem with state space P(@) and value function V: />(0) » R. Furthermore, there exists an optimal stationary policy [5, Theorem 7(b)].

Drawing on results of Gittins and Jones [14], with refinements by Berry and Fristedt [2], Ross [22] and Whittle [27], there is an optimal policy with a sharp characterization. Gittins and Jones proved the existence of functions Mi: P(S\) -> R and M2: ^(©2) -* R widi the property that it is optimal to choose arm 1 in period t iff Mi(jii(t)) > M2(M2(0)- The function Mk is often referred to as the Gittins Index for arm k.

To define Mi consider a one-armed bandit problem where in the initial stage the decision maker has the option of playing arm 1 or stopping and collecting a terminal reward of m. In subsequent stages, if arm 1

has been played in all previous stages then the options remain selecting arm 1 or stopping and receiving a final payment m. The value of the optimal policy for beliefs \x\ e P(&\) is denoted as Vi(m, m). The Gittins index is defined by Mi(jii) = inf{m: Vi(jii, m) = m). M2 is defined analogously.

Throughout the remainder of the paper we assume that the Bayesian controller follows an optimal stationary policy 7t*: P(@) » X defined by:

7t*(jii x \i2) = xi if Mi(m) > M2(H2). and k*(\h, 112) = X2 if M2(^2> > Mi(m).

For a more detailed exposition the reader is advised to consult Ross [22], Whittle [27], and especially the monograph of Berry and Fristedt [2].

IV. GENERIC LIMIT THEOREMS IV. 1. Some Continuity Results

Preparatory to proving the genericity theorems, some preliminary technical results are developed in this subsection. The principal result is the establishment of the continuity of the functions Mj and M2.

The first step is to develop characterizations of V\ and V2 by making use of the fact that the value functions Vk are solutions to the optimality equation. The expected one-period reward for choosing action

Xfc with beliefs |ik is denoted by the function Ufc: P(&k) ~* R defined by Uk(M-k) =

p 00

[ Z^k(yki)6k(yki)] Mk(dQk)- Informally denoting expectation taken with respect to current beliefs |ik by J i=l

E^k and denoting future beliefs with a prime symbol, the optimality equation can be written as:

Vk(Wc. m) = max (m, Uk(|ik)+ aE^k[Vk(^k', m)]}. To make the above precise, it necessary to provide formal definitions of EUk and the mapping from

current beliefs to probability distributions over future beliefs. So for k = 1, 2, define *Fk: /'(©k) -* P2(@k)

[ SlAC^kd^k. yki))"6k(yki)] Hk(dQk)- Conditional upon period t beliefs (ik(t) and arm i=l

by Vk(MkXA) =

©k

k being selected, vFk(jik(0)(A) can be interpreted as the probability from the point of view of the decision- maker that Hk(t+1) e A. It will also be useful to have an expression for the conditional probability of realization yki given that arm k is chosen with beliefs fik- So we define the functions ^k: ^k x f(©k). k = 1, 2, by A-k(yki. Uk) = J9k(yki) Uk(dQk)- We can now prove the following results.

ek

LEMMA 4.1. The maps p.k -> rk(|ik> yki). k = 1, 2 are continuous for all i e N. Proof. The proof which follows from routine arguments is for completeness included in the Appendix.

PROPOSITION 4.2. ¥1 and ¥2 are continuous. Proof. The proof is in the Appendix.

-7-

The next step is to establish the continuity of Vi and V2. The boundedness of the functions R\ and R2 is necessary for this result. Berry and Fristedt [2, p.42] provide a counterexample to continuity when the reward function is unbounded.

LEMMA 4.3. Vj and V2 are continuous functions.

Proof. It suffices to prove that Vj is continuous. Let KcR be compact and define Ek = C(/>(©i) x K). Define TK: EK -> EK by TKf(^i, m) = Max {m, U1O11) + a-Jf^i', m) *iOli)(djllO).

Note that Uj is bounded since R\ is bounded. Invoking standard arguments, it is easily confirmed that T^ is a contraction mapping with fixed-point <J>k: P(©l) x K. Since <J>k is the solution to the optimality equation, it is also the restriction of Vj: P(&\) x R -> R to P(&\) x K. Since 4>k is continuous, V] is continuous on every compact set and by extension W\ is continuous.

The expected loss from stopping when arm k is the only arm, is given by the function L^: /'(©k) x R —» R defined by Lk(|ik- m) = Vk(Wo m) m- The following Lemma collects some well-known results.

LEMMA 4.4. For k = 1,2: (i) the function V^Oik, •) is increasing and convex, and (ii) the function Lk(|!k, ) is decreasing.

Proof, (i) See [20, Lemma VII.3.1], (ii) Replacing sums with integrals, the proof provided in [20, Lemma VII. 2.1] remains valid.

Now consider the game for which the decision-maker must initially play arm k, and thereafter chooses between continuing to select arm k or permanently retiring with payoff of m. The value of pursuing the optimal strategy for this game is given by the function Wk: ^(©k) x R R defined by Wk(Mk> m) = Uk(M-k) + ctjVk(|ik, m) ^(M^^Wc)- ^Vith the following Lemma we prove that the set of terminal

rewards m, for which the decision-maker is indifferent between continuing play (and possibly stopping at some future date with reward m received at that future date) and stopping immediately is a singleton.

LEMMA 4.5. (i) Wi and W2 are continuous, (ii) If Mk(Uk) < m. then WQik, m) < m.

Proof, (i) The proof follows from Proposition 4.2 and Theorem 5.5 of [3]. (ii) Suppose Mk(jik) = m < m. Since m is a terminal payoff for which the controller is indifferent between continuing playing arm k and receiving payoff m, Wk(jik, m) = in = Vk(jlk> m)- According to Lemma 4.4 for all |ik e /'(©k), Vk(p.k. m) - vk(Wo m) < (m - m), implying that J[vk(Wc. m) - Vk(nkl in)] *Fk(Wc)(dWc) ^ (m - m). Finally, note that

Wk0ik, m) = UkQlk) + a/VkOlfc, m)^k(Mk)(d^ik)

-8-

= Uk(Wc) +

«• {/VkOik, m)¥k(£kXd|ii) + [Jvk(p-k. m) n^kKdUk) - Jvk(nk. m) ^k^kXdpk)] } = m + a-fJVkOlfc m) 4>k(^k)(dWc) - JVk(Hk. m)4>k(iIk)(dHk)]

< m + a(m - m)

< m.

The proof of continuity of Mi and M2 is now completed by verifying that these functions are both upper semicontinuous (u.s.c.) and lower semicontinuous (l.s.c).

PROPOSITION 4.6. Mi and M2 are continuous.

Proof. First we establish that Mk is l.s.c. by verifying that for all c 6 R the set (|ik g P(@k)'- Mk(Uk) > c) is open. Suppose that Mk(ilk) > c. So by definition of Mk, Vk(£k. c) - c = Lk(Mk. c) > 0. From the continuity of Lk (Lemma 4.3), 3 an open neighborhood N^ of Hk such that for all ^k e Nk> Lk(Wc. c) > 0. Therefore, for all |ik e N/c, Vk(Hk> c) - c> 0 and Mk(lik) > c.

Upper semicontinuity is confirmed by demonstrating that the set {|ik s /'(©k): Mk(Wc) < c) is open for all c e R. Suppose that Mk(Mk) = m < c. By Lemma 4.5, Wk(jlk> c) < c and 3 an open neighborhood J/c of jik such that for |ik e Jk . Wk(Hk. c) < c. But this implies that Mk(M-k) < c.

IV .l.Generic Outcomes when c\ * C2

In this subsection we analyze the limit outcomes in the case when ci * C2 (Recall that Ck = inf (Rk(yki): i e N}). Without loss of generality we assume that ci < C2- To understand the next result, select i* N such that Rl(yii*) < C2- So if the decision-maker attaches sufficiently high probability to yii* occurring if arm 1 is selected, arm 2 will be selected regardless of |i2(t). Furthermore, the Proposition of Freedman [13] stated below implies that for a residual set of (81, M-i(O)) pairs, the probability of playing

arm 1 infinitely often with J0i(yii*) |ii(t)(d0i) bounded away from one, is zero.

©1

PROPOSITION 4.7 [Freedman]. For k = 1,2, the set Lk of pairs (9k, Hk(0)} c 0k x /'(©k) wiln lim supt->« jHk(0(tok)(U) Qek(dC0k) = 1 simultaneously for all nonempty open subsets <?/@k W residual. Qk

So for a residual set Ii c ©1 x P(Q\), if arm 1 is played sufficiently often the decision-maker will eventually (Qe a.s.) be sufficiently pessimistic regarding the realization 9i that arm 1 will never be tested again.

-9-

PROPOSITION 4.8. Suppose c\ < C2- Then there exists a residual subset 1\ c 0j x P(Q\) such that

for (9i, m(0)) Zi and all (02-H2(O)) e 02 x />(02), Qe a.s. X(t, m(0) x n2(0))(co) = xk only

finitely often.

Proof For i e N, define 5? e P(@\) by 5?(B) = 1 if Si B, else 6?(B) = 0. (\i\ = 5? corresponds to

the belief that outcome i occurs with probability one whenever action x\ is chosen.) Choose i* e N s.t.

Rl(yii*) < c2- Observing that MiCSj) = 'r-and recalling that Mj is continuous (Proposition 4.6),

2 c2 there exists a neighborhood U of 5j s.t. for m 6 U, Mi(jii) < : Note that if ni(t)(to) e U, then for

all t' > t, Xt(co) = X2. The to set for which vi(n, M-l)(<u) is never in U is denoted as Di(m) = {co: vi(n, m)(a>) e U. n = 1, 2, ... }. Define Ii = {(6, m(0)): Qe(DiGii)) = 0). By Proposition 4.7, Si

is residual.

For (0, |ii(0)) g Zi, define n*(co) = inf{n: vi(n, |ii(0))(to) e U}. n* is Q0 a.s. finite and so Q0 a.s.,

for all t the count function Ci(t)(co) < n*(co) < °o.

IV. 3. Generic Outcomes when ci = C2

ci To understand the next result suppose that ci = c2 and < Mi(|ii(0)) < M2(M-2(0))- Then

generically, Qe a.s. there will exist some time t, say ti, such that M2(|i2('cl)) < Ml(m(0))- Then reversing the roles of the arms, eventually at some time x2 > Ti, Mi(|ii(T2)) < M2(p-2(Tl)). etc- And so each arm will be chosen infinitely often. Observe that there exist consistent estimators of (0i, 02) and so from the point of view of a classical statistician, there will eventually be overwhelming evidence as to which arm offers superior prospects.

PROPOSITION 4.9. Suppose that c\ = C2 and that the range of Rfc, k = 1, 2, is not a singleton. Then there is a residual subset 9* c 0 x P(@) such that for (8i, 02» Hl» M-2) e with Qe probability one X(t, p.(0)) = xi infinitely often and X(t, |i(0)) = X2 infinitely often.

Proof. For (i e f(0) define the set Ei(fi) = {co: X(t, |i(0)) = X2 finitely often}. The strategy is to first prove that there exists a residual subset 9? i c 0 x /*(©) such that for (9i, 02, M-i, M-2) 9?i, Q9(Zi(|i)) = 0. Defining E2 analogously, by an identical argument there exists a residual subset 9*2 c 0 x />(©) such that for (0i, 02, M-i, U2) 6 ^2. Q0(-2(H-)) = 0- Since the countable intersection of residual sets is a residual set, 9? defined by 9* = 9* i n 9*2 is residual.

We now verify the existence of such a set 9*i. For k = 1, 2, define = {0k e 0k: V i e N, 6k(yki)>0}. is residual [13, Remark I]. The sets Ak c f(0k) defined by A k = {^k e P(Q\d- Mc(6f) > 0) are also residual [13, Remark 2], and so A = Ai x A2 is residual [20, Theorem 15.3].

-10-

For n g N, 3 i(n) such that Ri(yii(n)) < cl + n_1 and hence Mi(5j/2n)) < By the continuity

9 ci + n of Mi, there exists a neighborhood Nn of SjAO such that for m g Un, M(jii) <— : Define N to be

a family of such neighborhoods. So if |I2(0) g A2, then for all (o e LI and V n' g N, 3 jVn g N s.t.

sup [MiOii)] < M2(v2(n', co)). meNn

Define #(|ii) = {(o 6 Q: 3 n(co) s.L for all n > n((u), vi(j, m)(G)) g Un for only finitely many j}.

Define 9*i = [(]i, 9): \i g A, Q0(H(m, H2)) = 0 and observe that: (i) for 0 = (6i, 92), Q0(#(Hl)) does not

depend upon 02, and (ii) for p. = (m, J12) g A, Z(hi, 112) c i3(|ii). A direct consequence is that

9tl3{(M):neA,QeO&(ni)) = 0}.

The proof is completed by confirming that [(p., 0): p. g A, Q0(i3(Hl)) = 0) is residual and hence 9* 1 is residual. Define K 1 = {(]ii, 0i) g P(Q\) x&\: Q0(t3(jii)) = 0}. According to Proposition 4.7, K 1 is residual. By application of the category analog of Fubini's theorem [20, Theorem 15.3], A] x 0] is residual, and so K 1 n (Ai x ©1) is residual. Again invoking [20, Theorem 15.3], {(m, ^2» 6l» 02): (HI, 0i) g N 1 n (Ai x @i), |i2 g A2) is residual. And by definition, (\i\, \\.2, 01, 02): (M-l, ^l) e X 1 n (Ai x 0!), w e A2} = (Oi. 0): [ie A, Qe(d(m)) = 0}.

V. CONCLUDING REMARKS

It is natural to inquire as to the generality of Propositions 4.7 and 4.8. Is there some peculiarity associated with having a countable outcome space that underlies these nonintuitive propositions? In cases when the parameter space is a smooth, finite dimensional set (as in [10], [16], and [19]) one could hope to modify the theorems of Schwartz [24] and prove a.s. convergence of beliefs for all 0 g 0. (This work though, remains to be undertaken.) But the results of Freedman [13] and of this paper can be extended to other infinite dimensional settings. For instance, suppose for k = 1, 2 the outcome space Yk = [0, 1]. Then if ©k is the set of all density functions with the Li[0, 1] topology, with minimal modification all proofs remain valid. Furthermore, I conjecture that the restriction that the decision-maker choose form a finite set of actions is inessential. So while a complete answer is not yet available, it appears that the only crucial assumption is that the parameter space 0 is infinite dimensional.

To assess the significance of the results contained in this paper with regard to economic modelling with Bayesian learning, requires addressing die issue of the "size" of the residual set R. The customary practice is to equate residual with "large" (even though in Rn there are residual sets of Lebesgue measure zero). If this equivalence is accepted and if prior beliefs are arbitrarily chosen, then one is lead to conclude as did Freedman [13], "for essentially any pair of Bayesians, each thinks the other is crazy."

An obvious objection to this stark prediction is that there is little basis for automatically classifying residual sets as "large". On the other hand, there are no grounds for asserting that the "bad" pairs of parameters and priors are in any sense small. A more cautious interpretation, relying only on the density of 91 and of the Dirichlet priors, would stress the sensitivity of asymptotic beliefs to the prior. In particular, it

-11-

is illegitimate to restrict attention to a computationally convenient class of priors (even when dense) and presume mat any resulting limit theorems extend beyond the initial family of distributions.

A second point of contention with the remark of Freedman, is that it may be appropriate to impose a priori restrictions on the family of admissible prior beliefs. Freedman himself now appears to support this stance. Diaconis-Freedman [7] advocate that Bayesian statisticians adopt what they label the "what if method. This consists of "... after specifying a prior distribution generate imaginary data sequences, compute the posterior, and consider whether the posterior would be an adequate representation of the updated prior." Adapting this reccommendation to the context of economic modelling, it would be natural to require that agents have prior beliefs with full support and that the sequence of agent posterior beliefs converges almost surely with respect to the measure Qe for all 9 e 0.

While this author is sympathetic to imposing such restrictions, these auxiliary conditions are ad hoc; Axioms of Bayesian decision theory provide no foundation for such constraints on prior beliefs. Furthermore, while the Dirichlet priors constitute an example of such priors within the specific framework of this paper, the existence of such priors for more general learning models (such as in [8]) has not yet been verified. Further research on this issue is needed.

Appendix

Proof of Lemma 4.1. It suffices to prove that Ti(-, y\\) is continuous. To simplify the notation for this proof, representative elements of Y\, &\, and P(S\) are denoted respectively as yj, 9, and |i. Let F c ©i be a closed set and suppose u.n => p. e P(@\). Applying a standard characterization of weak convergence [B, Theorem 2.1], it suffices to show that limn ri(jj.n, yi)(F) < ri(^t, yi)(F). Or equivalently that the map \i -> T\(p., yi)(F) is upper semicontinuous u.s.c.

/9(y0H(d9)

F

Recall that T\([i, yi)(F) = : And note that since the map 9 > 9(yj) is bounded and

J9(y0p.(d9)

ei

continuous the map \i > J9(y0 |i(d9) is continuous. So the proof will be completed by confirming that ©1

the map |i JQ(y\) |i(d9) u.s.c. F

The map 9 lF(9)-9(yi) is u.s.c, and by [3, Exercise 7, p. 17] the map |i -> JlF(9)-9(y0 |i(d9) is

01

u.s.c. But J9(y0 p_(d9) = JlF<9)-9(y0 u.(d9), and so the map p. J9(y0 p(d9) u.s.c. F 0! F

Proof of Proposition 4.2. It suffices to prove that *¥\ is continuous. As in the proof of Lemma 4.1, for notational simplicity representative elements of Yi, 0j, and P(&\) are denoted respectively as yj, 9, and \L

-12-

Let f: P(&\) -* R be a bounded continuous function and {\in} a sequence with |in => p.. To establish

continuity of 4*1, we must verify that /f(j_i') \ (jin)(dpO Jf(ji') vFi(iI)(dM.')> or equivalently that

P(S) P(0)

oo oo

f[^f(TiOi, yi)-e(yi))] Hn(d0) -> J[]^f(riOi. yO-eCyj))] ji(d9). ei=l ei=l

oo

Define the functions g: 0i -> R and gn: ©i -> R by g(0) = £f(T(Ji yOHCyi) and gn(0) =

i=l

oo

Xf(r(nn, yO)-Q(yi)- Since the maps y, -> f(r(ji y,)) and y, -> fCr^^ y,)) are bounded and continuous, g i=l

and gn are continuous. Furthermore, as is now demonstrated, gn(0) -* g(0), ji a.s.. Define X^ e P(Y\) by

oo

XU(A} = SlACyD'Myii H) f°r Ac Yj. The support of fi is contained in Ilc8i, defined by n = i=l

(9e 0i: 0 « Xji). For 0 e n, 0(y,) > 0 implies that T(-, yj) is continuous at jl [Lemma 4.1], and hence ffJX-, yO) is continuous at jl. Denoting the Ue norm of f by llflU, Mlflleo 0(dyi) < °°. By a standard

reult on the preservation of continuity under integration [4, Theorem 16.8], for 0 e n the map

oo

\i £f(T(n, yi))0(yi) is continuous at £ and so gn(0) -> g(0). i=l

Finally, |gn(0) Md0) -> Jg(9) p(d0) by [3, Theorem 5.5], or equivalently

e e

£f(TiOi, yi)-0(yi))] lin(d0) J[^f(TiOl, yi)-0(yi))] H(d0). i=l 0 i=l

-13-

References

1. Ash, R. (1972), Real Analysis and Probability. New York, Academic Press.

2. Berry, D. A., and B. Fristedt (1985), Bandit Problems; Sequential Allocation of Experiments. London:

Chapman and Hall.

3. Billingsley, P. (1968), Convergence of Probability Measures. New York, Wiley.

4. Billingsley, P. (1979), Probability and Measure. New York, Wiley.

5. Blackwell, D. (1965), Discounted Dynamic Programming. Annals of Mathematical Statistics 36, 226-

235.

6. Blume, L. and D. Easley (1984), "Rational Expectations Equilibrium: an Alternative Approach, "

Journal of Economic Theory 34, 116-129.

5. Bray, M. and D. Kreps (1987), "Rational Learning and Rational Expectations," Arrow and the Ascent of

Modem Economic Theory, edited by George Feiwel, New York, New York University Press.

6. Cyert, R. and M. DeGroot (1974), "Rational Expectations and Bayesian Analysis," Journal of Political

Economy 82, 521-536.

7. Diaconis, P. and D. Freedman (1986), "On the Consistency of Bayes Estimates," Special Invited Paper,

Annals of Statistics 14,1-26.

8. Doob, J. (1953), Stochastic Processes. New York, Wiley.

9. Easley, D. and N. Kiefer (1988), "Controlling a Stochastic Process with Unknown Parameters,"

Econometrica 56, 1045-1064.

10. Easley, D. and N. Kiefer (1988), "Optimal Learning with Endogenous Data," forthcoming, International Economic Review.

11. Feldman, M. (1987a), "An Example of Convergence to Rational Expectations with Heterogeneous Beliefs," International Economic Review 28, 635-650.

12. Feldman, M. (1987b), "Bayesian Learning and Convergence to Rational Expectations," Journal of Mathematical Economics 16, 297-313.

-14-

13. Freedman, D. (1965), "On the Asymptotic Properties of Bayes Estimates in the Discrete Case II," Annals of Mathematical Statistics 36, 454-456.

14. Gittins, J. and D. Jones (1974), "A Dynamic Allocation Index for the Sequential Design of Experiments," in Progress in Statistics, edited by J. Gani, K. Sarkadi, and I. Vincze, 241-266, North- Holland, Amsterdam.

15. Hinderer, K. (1970), Foundations of Nonstationary Dynamic Programming with Discrete Time Parameter. Springer- Verlag, New York.

16. Kiefer, N. and Y. Nyarko (1988), "Control of a Linear Regression Process with Unknown Parameters," in Dynamic Econometric Modeling, edited by W. Barnett, E. Bemdt, and H. White, Cambridge University Press, New York.

17. Kelley, J. (1955), General Topology. D. Van Nostrand, Princeton.

18. McLennan, A. (1987), "Incomplete Learning in a Repeated Statistical Decision Problem," working paper, University of Minnesota.

19. Nyarko, Y. (1988), "On the Convergence of Bayesian Posterior Processes in Linear Economic Models," working paper, Brown University.

20. Oxtoby, J. (1980), Measure and Category: A Survey of the Analogies between Topological and Measure Spaces, second edition, Springer-Verlag, New York.

21. Rieder, U. (1975), "Bayesian Dynamic Programming," Advances in Applied Probability 7, 330-348.

22. Ross, S. (1983), Introduction to Stochastic Dynamic Programming. New York, Academic Press.

23. Rothschild, M. (1974), "A Two- Armed Bandit Theory of Market Pricing," Journal of EconomicTheory

9, 185-202.

24. Royden, H. L. (1988), Real Analysis, third edition, Macmillan, New York.

25. Schwartz, L. (1965), "On Bayes Procedures," Z. Wahr. 4, 10-26.

26. Townsend, R. (1978), "Market Anticipations, Rational Expectations and Bayesian Analysis," International Economic Review 19, 481-494.

27. Whittle, P. (1982), Optimization Over Time: Dynamic Programming and Stochastic Control-I. John

Wiley, New York.

HECKMAN LJ BINDERY INC. |«|

JUN95

b__j T„ PW»J N MANCHESTER, Bo.nJ.To.FW |NDlANA 46962

k . '