TOKEN REINFORCEMENT, CHOICE, AND SELF-CONTROL IN PIGEONS BY KEVIN D. JACKSON A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1993 ACKNOWLEDGEMENTS I thank the members of my Ph.D. committee, Marc Branch, Marvin Harris, Hank Pennypacker, Donald Stehouwer, Frans van Haaren and especially my committee chairs Timothy D. Hackenberg and E.F. Malagodi. Karen Anderson provided expert assistance with the figures. Jeff Arbuckle commented helpfully during the design of the experiment. Charlene Kruegar did most of the initial subject training and assisted with early program writing. Eric Jacobs and Cindy Pietras often served as surrogate experimenters, and kept the lab running through the duration. A special thank you goes to the wonderful people of the Alachua County Association for Retarded Citizens for providing support throughout the conduct of this study and especially during the write up. I thank my family for providing important social contingencies regarding my commitment to this project. I especially thank my wife, Linda, and my daughter, Julie, for their tolerance, patience, and love. Finally, my thanks go to Metallica and Ted Nugent for setting such high standards and for providing an auditory context in which to work. TABLE OF CONTENTS ACKNOWLEDGEMENTS 11 ABSTRACT iv GENERAL INTRODUCTION 1 Self -Control as Behavior 1 Individual and Cultural Benefits of Self-Control 3 Experimental Analyses of Self-Control 7 Experiments with Pigeons 8 Human Self-Control and Interspecies Differences 16 EXPERIMENT 1 32 Method 3 6 Subjects 36 Apparatus 3 6 Procedure 37 Results 41 Discussion 45 EXPERIMENT 2 67 Method 67 Subjects and Apparatus 67 Procedure 67 Results 69 Discussion 72 GENERAL DISCUSSION 91 APPENDIX 100 REFERENCES 102 BIOGRAPHICAL SKETCH 109 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy TOKEN REINFORCEMENT, CHOICE, AND SELF-CONTROL IN PIGEONS By Kevin D. Jackson August, 1993 Chairperson: Dr. E. F. Malagodi Cochair: Dr. Timothy D. Hackenberg Major Department: Psychology In a choice between an immediate small reinforcer and a delayed large reinforcer, an organism exhibits "self- control" if it chooses the delayed reinforcer and "impulsiveness" if it chooses the immediate reinforcer. Under such procedures, humans generally exhibit self-control but pigeons usually respond impulsively. Six pigeons were exposed to self-control procedures involving illumination of light-emitting diodes (LEDs) as a form of token reinforcement. In a discrete-trials arrangement subjects chose between 1 and 3 LEDs ; each LED was exchangeable for 2-s access to food. In Experiment 1, subjects responded impulsively, consistent with predictions of the ideal matching law applied to LED reinforcement, and with previous findings in pigeons. However, within-session patterns of responding were more consistent with predictions of the iv ideal matching law applied to food scheduling. Differences in food delays for the 2 choices, that favored the small- reinforcer choice, prevented a clear assessment of the role of LEDs in determining choice. In Experiment 2, the relative influence of LEDs and food was investigated in the same subjects with delays to food from either choice response equal under most conditions, but unequal in others. All subjects exhibited more self-control in Experiment 2 than in Experiment 1. Four subjects preferred the delayed large reinforcer during an arrangement that closely resembled typical human procedures, suggesting that the nature of the consequences of choice responding may account for previously reported differences in the choice responding of humans and pigeons. Token-reinf orcer arrangements may promote self-control in a manner similar to commitment procedures. The LEDs probably functioned as conditioned reinforcers, although their discriminative properties may be more relevant to the obtained self-control. GENERAL INTRODUCTION Self-Control as Behavior We speak of self-control when, despite the presence of contingencies that increase the likelihood of one class of behavior, an individual engages in an alternative behavior that is more beneficial in the long run. For example, choosing a piece of fruit from the refrigerator, instead of one's favorite pastry, in order to improve overall health. Self-control is freguently used, not only as a description of a valued form of behavior, but mistakenly as an internalized explanation for that behavior. Unfortunately, this practice does little to promote an understanding of the origins and mechanisms of self-control, and perpetuates the myth that self-control and other behavioral patterns are the result of inexorably mysterious processes. Behaviorists also recognize the importance of self- control, not as an internalized trait, but as behavior to be explained. Radical behaviorists in the tradition of B.F. Skinner focus on relations between historical and current contextual factors and the occurrence of self-control, as well as on technologies for enabling humans to acguire and benefit from repertoires that are sensitive to long-term 2 consequences. In the seminal textbook chapter on this topic (Skinner, 1953, chap. 15), Skinner defined self-control as engaging in one behavior (controlling response) that alters the occurrence of another behavior (controlled response) , thereby producing a more valuable outcome. Thus, the controlling response of counting to ten when angry may decrease the probability of hitting someone (controlled response) thereby avoiding the potentially aversive consequences of fighting. Skinner also discusses varied situations in which individuals produce or remove a controlling stimulus of some response, in which they change the relationship between behavior and its consequences, arrange for deprivation, or manipulate an emotional variable. Often, self-control involves the manipulation of verbal stimuli: for example, making and then following a list of tasks to be completed. Stating to oneself the beneficial outcome (s) of some behavior--a rule about the behavior and its consequences—may also exemplify self- control . Recognizing self-control as behavior may help reveal the variables of which self-control is a function. It may also yield important practical benefits, such as new self- control techniques and technologies for teaching self- control. Skinner attributed much of his own success to the use of behaviorally based strategies of self-control (Skinner, 1979) , and even co-authored a book containing 3 self-control techniques relevant to behavioral changes accompanying old age (Skinner & Vaughan, 1983) . Others have adopted Skinner's strategy, endorsing the application of behavioral principles toward teaching self-control (e.g., Mahoney & Thoresen, 1974; Runck, 1982; Stuart, 1977). Thus, self-control may proliferate through exposure to scientifically based rules about behavior and through the application of scientifically based technologies. Individual and Cultural Benefits of Self-Control Much important human behavior can be viewed in terms consistent with self-control, that is, operant behavior functionally related to temporally remote consequences. For example, consider a person who encounters a valued item while shopping, perhaps a stereo system, but lacks the cash to purchase it. The person may use a credit card, gaining immediate possession of the stereo, but with the unfavorable remote consequence of less money due to interest payments on the credit card. Self-control is said to occur when instead of purchasing on credit, the person saves enough cash to buy the item directly at some future time, thereby avoiding the added cost of interest on money borrowed. Techniques for achieving this type of self-control may include cutting up all one's credit cards or only buying items on a premade shopping list. 4 At the level of individual behavior, self-control often determines success in life. An individual who saves money for greater long-term gains or studies now because of job opportunities later is likely to benefit in the long run and be more successful over the course of a lifetime. Many stories of individual human success and greatness involve forgoing of immediate gains and behaving instead toward some long-term objective such as solving an important problem or completing an extensive project. Human cultural patterns may also be viewed in terms of self-control. No single individual can build a highway, operate a manufacturing plant, or cultivate the crops responsible for feeding a nation. Instead, such tasks reguire the collective behavior of many individuals, behavior that occurs because of its relationship to important deferred outcomes. Culture may thus be viewed as a system by which human behavior (cultural practices) is collectively brought under control of valuable deferred outcomes. Cultural evolution can be explained in terms of the relationship between cultural practices and important outcomes, particularly outcomes involving increased energy flow, decreased reproductive pressure, and, in hierarchically stratified societies, differential advantages for members of the upper strata (Harris, 1974, 1977, 1980, 1981, 1989) . In the case of culture, the behavior of many individuals is brought under the control of remote consequences through the arrangement of more immediate socially administered reinforcement and punishment and through verbal practices that include rules relating behavior to arbitrary, nonarbitrary, and sometimes supernatural consequences (Glenn, 1985, 1988; Malott, 1988; Skinner, 1953, 1974) . As important as self-control is to human success, so is the failure to respond to deferred consequences at the root of many problems facing both individuals and the cultures of which they are members. Many stories of individual human failure involve "impulsive" responding or behavior controlled by relatively immediate consequences. An individual behaving under control of short-term outcomes, for example, by spending hours each day watching television instead of learning new job skills, by consuming goods and services at a rate in excess of income, or by the daily self-administration of drugs, will not fare well in the long run. A frightening implication of this account of self- control is that as the market place is increasingly flooded with electronic entertainment devices, video games, video tapes, advanced audio components, and other computerized toys capable of providing hours of seemingly endless varieties of relatively immediate reinforcing outcomes, individuals may be increasingly less likely to engage in behaviors related to long-term, individually beneficial consequences, and hence, less likely to succeed at life (Skinner, 1986) . Social problems ranging from the AIDS epidemic, in which the more immediate reinforcement of unprotected sex overrides the potentially lethal outcome, to pollution and the destruction of the earth's ozone layer, in which more immediate financial gains outweigh tremendous environmental costs, can be viewed as failures to respond to important deferred outcomes. Similarly, the growing national debt, substandard housing construction in hurricane prone areas, and the needless depletion of natural resources, all involve failures of deferred consequences to exert control over current behavior. Although cultural evolution involves selection by deferred outcomes, a culture may also fail by not responding to even more remote consequences of some of its practices (Glenn, 1988) . Indeed, the history of human cultural evolution reveals repeated cycles of adopting new modes of production, momentarily improving living standards, and intensifying production until ecological limitations are met, producing catastrophic consequences for participants in the culture (Harris, 1977, 1980). In response to such catastrophes, a process of radical transformation begins, new cultural practices are selected, and the pre-existing culture no longer survives. These catastrophes are avoidable by increasing investment in the development and 7 adoption of more efficient technologies, adjusting the rate of production intensification, and tolerating sustained, but less severe, reductions in living standards. As Skinner put it, "The evolution of culture is a gigantic exercise in self-control" (Skinner, 1971, p. 205) . In other words, a culture survives when it is responsive to the remote consequences (reinforcing and aversive) of its practices (Skinner, 1971, 1981). Responding to deferred outcomes is thus at the heart of behavioral ethics and the high value placed on cultural survival. For all of these reasons, self-control may be the most important problem faced by the behavioral and social sciences. Experimental Analyses of Self-Control Experimental analyses of self-control focus primarily on the role of procedural and historical factors on choices of individual subjects. Typically, concurrent schedules with two response options are used and each option (choice) is associated with its own reinforcement schedule (Herrnstein, 1961) . The experimental arrangement for studying self-control typically involves a choice between a larger, delayed reinforcer and a smaller, more immediate reinforcer. Under these conditions, choice of the delayed reinforcer is defined as "self-control" whereas choice of the immediate reinforcer is defined as "impulsiveness." Investigations of self-control have focused on reinforcement 8 schedule parameters, type of reinforcement, degree of deprivation, experimental history, the availability of different responses and stimuli during experimental sessions, and other characteristics of experimental subjects. Experiments with Pigeons Pigeons have served as subjects in most nonhuman studies of self-control, with access to food (grain) as the reinforcer and key pecking as the choice response. When faced with a choice between an immediate small reinforcer and a delayed larger reinforcer, pigeons almost invariably prefer the smaller, more immediate reinforcer (Ainslie, 1974; Logue & Pena-Correal , 1984; Logue, Rodriguez, Pena- Correal, & Mauro, 1984; Mazur & Logue, 1978; Rachlin & Green, 1972; see review by Logue, 1988). For example, Mazur and Logue (1978) exposed 4 pigeons to a choice procedure with 31 discrete choice trials per session. Reinforcement rate was held constant by starting each trial 1 min from the onset of the preceding trial. Trials began with the illumination of the left and right keys, green and red respectively. A single peck on the right key, fixed-ratio 1 (FR1) , resulted in 2-s access to grain. Each left keypeck produced a 6-s delay period, followed by 6-s access to grain. All subjects preferred the immediate reinforcer, pecking the right key on nearly every trial. 9 Lea (1979) demonstrated that pigeons prefer a more immediate reinforcer over an equivalent delayed reinforcer, even when rate of reinforcer access is greater when the delayed reinforcer is chosen. This demonstrates the potent effects of reinforcement immediacy, for pigeons' choice responding is extremely sensitive to rate of reinforcer access when there is no prereinforcer delay across alternatives (de Villiers, 1977). In a related study, Logue, Smith, and Rachlin (1985) demonstrated that pigeons' choices in a self-control paradigm were insensitive to postreinforcer delay, except when prereinforcer delays were equal and postreinforcer delays affected the rate of reinforcer access. A notable exception to the usual finding of impulsiveness in pigeons occurs if subjects are given an opportunity to commit in advance to receiving the larger delayed reinforcer (Rachlin & Green, 1972) . In Rachlin and Green's experiment, five pigeons were first exposed to a standard self-control arrangement. Using a discrete-trials procedure, a single peck on a red choice key produced immediate access to 2-s food, whereas a single peck on the green key produced 4-s access to food after a 4-s delay. Within one session, all subjects showed exclusive preference for the red key (immediate reinforcer) that was maintained throughout subsequent exposures to this choice arrangement. 10 Next, subjects were presented with a concurrent chains schedule. At the start of each choice trial both response keys were illuminated white (initial link) and a fixed-ratio (FR) of 2 5 keypecks, distributed in any way between the two keys, produced a blackout of T seconds. The terminal link followed the blackout and depended on the location of the 2 5th keypeck. If the 2 5th keypeck was on the right key, the terminal link consisted of the original choice situation described above. If the 2 5th keypeck was on the left key, only the green key was illuminated in the terminal link, and only the larger delayed reinforcer was available. The value of T was manipulated across experimental phases. For all subjects, the number of large-reinf orcer choices (left keypecks during the initial link) and entries into the terminal link associated with only the large reinforcer increased as the value of T was increased from 0.5 to 16 s. Preference reversals occurred in 3 subjects; that is, pigeons that primarily pecked the right key at shorter values of T switched over to the left key as the value of T was increased. These subjects preferred the delayed, larger reinforcer and thus exhibited self-control, when given an opportunity to commit to that option far enough in advance of the availability of the smaller, more immediate reinforcer. Impulsive responding under the standard self-control arrangement and the preference shifts observed in the 11 Rachlin and Green study are consistent with the ideal matching law, an equation that is useful for describing and predicting pigeon performance under two-component concurrent schedule arrangements (Baum & Rachlin, 1969; Herrnstein, 1970) : B,/B2 = A^/AjjD,. In this equation B1 and B2 represent the number of responses on alternatives 1 and 2, respectively, and A1 , A2, D1 , and D2 represent the reinforcer amounts (A) and prereinf orcer delays (D) associated with the two options. According to this equation, the proportion of responses allocated to an option is equal to the relative reinforcer value of that option, where reinforcer value is defined as the product of magnitude and immediacy (1/delay) of reinforcement. With concurrent FR1 schedules, subjects tend to choose the preferred option exclusively (e.g., Herrnstein, 1958; Logue & Pena-Correal , 1984); under such arrangements the matching law is useful primarily as a predictor of the direction of preference. If the ratio B1/B2 is greater than 1, preference for option 1 is predicted, and if less than 1, preference for option 2 is predicted. In Rachlin and Green's (1972) initial procedure, treating the large- reinf orcer choice as option 1, the ratio B^/B2 would be less than 1 (substituting a small nonzero delay value for the small reinforcer) , which is consistent with the obtained preference for the smaller immediate reinforcer. In later 12 conditions the value of T is added to the delay value of each option; thus, as the value of T increases so does the ratio of B.|/B2. The increasing number of large-reinf orcer choices, observed as T increased in the Rachlin and Green experiment, was therefore in qualitative agreement with predictions of the ideal matching law. The matching equation also predicts a preference reversal, from the small reinforcer to the large reinforcer, as the value of T increases. This occurred in 3 of 5 subjects of the Rachlin and Green study and has since been replicated in many other studies with pigeons as subjects (e.g., Ainslie, 1974; Green, Fisher, Perlow, & Sherman, 1981; Navarick & Fantino, 1976) . Interestingly, Logue and Pena-Correal (1985) found that pigeons' choices in a self-control procedure were not affected by changes in deprivation. Four pigeons were each deprived to 65%, 80%, and 90% of their free-feeding weights and were exposed to 5 different choice arrangements under each deprivation level. As predicted by the matching law, large-reinforcer choices increased as delays to the small reinforcer approached the value of the large-reinforcer delay. The failure of deprivation to alter choice responding suggests that deprivation produces the same percentage change in the value of each reinforcer (Logue, 1988) . 13 An important exception to the finding of impulsiveness in pigeons and to predictions of the ideal matching law involves a fading procedure developed by Mazur and Logue (1978). Two groups of 4 pigeons each were studied. Subjects in the experimental group were first exposed to a discrete-trials choice between 2- or 6-s access to grain each delayed 6-s from a choice. All subjects preferred the large reinforcer. Over the next 11,000 trials, the delay to the small reinforcer was gradually reduced towards 0 s (fading) . Subjects nearly always chose the large reinforcer across conditions in which the delay to the small reinforcer was greater than 3 s, a finding consistent with the matching law. When the delay to the small reinforcer was 2 s or less, a value at which the matching law predicts exclusive preference for the small reinforcer, 2 subjects continued to prefer the large reinforcer and all subjects continued to make large-reinforcer choices at least some of the time. Subjects in the control group were only exposed to the terminal condition of the experimental group and then to a condition in which the small reinforcer was delayed 5.5 s. Unlike subjects in the experimental group, these subjects showed nearly exclusive preference for the small reinforcer when it was delivered immediately, a finding consistent with predictions of the ideal matching law. Logue and Mazur (1981) showed that the self-control observed in the fading subjects partly depended on the presence of stimuli 14 (overhead lights) during the delay that were differentially- associated with the two choices. These stimuli apparently enhanced the value of the delayed larger reinforcer. A later study confirmed the effects of this fading procedure on self-control in pigeons. Using an eguation that includes parametric estimations of sensitivity to delays and amounts of reinforcement, it was shown that the choices of pigeons exposed to the fading procedure were more sensitive to variations in reinforcer amount than to reinforcer delay (Logue et al., 1984). Other exceptions to the adeguacy of the ideal matching law for predicting preference in self-control arrangements with pigeons include some concurrent-chain schedule situations with eguivalent variable-interval (VI) schedules in the initial links and fixed-interval (FI) schedules in the terminal links. With equivalent VI schedules in the initial link, responses are distributed across both options and terminal links are entered equally often from either option (Fantino, 1977) . Relative response rate serves as the measure of preference under such schedules. Green and Snyderman (1980) manipulated reinforcer delay by altering the length of terminal-link FI components. Pigeons were exposed to a choice between 6-s access to grain after a long delay and 2-s access to grain after a shorter delay. When the ratio of delays was 6:1 and 3:1, preference for the large reinforcer decreased with increases in the absolute 15 value of the delays. With a delay ratio of 3:2, the relative rate of large-reinforcer responses increased with increases in delay values. Both of these findings are inconsistent with matching- law predictions of no change in preference when delay ratios are constant. Green and Snyderman also examined predictions of the delay-reduction hypothesis (Fantino, 1969, 1977), a model that bases reinforcer value on the reduction in delay to food associated with the onset of terminal components. This model is consistent with the changes observed under delay ratios of 6:1 and 3:2, but, like the matching law, predicts no change when the delay ratios are 3:1. Navarick and Fantino (1976) obtained some results consistent with both the matching law and the delay-reduction model. When the value of the terminal link FI (delay) associated with the small reinforcer was consistently 10 s shorter than the large, the number of large-reinforcer choices increased as the value of both terminal FIs increased. However, similar increases in large-reinforcer choices occurred when reinforcer delays (FI values) were equal, a finding consistent with delay reduction, but not the matching law. Grosch and Neuringer (1981) exposed pigeons to a series of self-control arrangements similar to those used by Mischel (1974) with human children as subjects. Trial durations alternated between 5 and 15 seconds; subjects could wait until the end of a trial and receive a preferred 16 grain mixture or peck a key during the trial and receive an egual amount of a less preferred grain. Grain preferences were determined prior to the experiment by presenting both grains at once and observing which grain mixture was consumed first. Self-control was measured as the time subjects waited before responding. Self-control was influenced by a number of variables that, more or less, resembled those manipulated by Mischel. (Some of Mischel's research is discussed below.) Pigeons exhibited less self- control when food was visible (although the presence of food increased self-control when key pecks were required to obtain the preferred grain) , when stimuli correlated with food (feeder lights) were present, or when food was delivered immediately before choice trials. Adding an alternative response manipulandum during the delay increased self-control (see Logue & Pena-Correal , 1984, for a similar finding) . Prior reinforcement of waiting increased self- control and prior punishment of waiting decreased self- control. While these findings illustrate some of the commonalities in the choice responding of humans and pigeons under self-control arrangements, substantial performance differences have also been observed. Human Self-Control and Interspecies Differences In contrast to pigeons, human subjects generally exhibit self-control in laboratory settings (Logue, Pena- Correal, Rodriguez, & Kabela, 1986). Logue et al . (1986) 17 exposed adult females to choices between reinforcers of varying amounts and delays, similar to the choices given to pigeons by Logue et al . (1984). Subjects pressed a button that delivered points exchangeable for money following sessions. Access to the button was controlled by pushing a rod to the left or right (choice responses) . The first experiment involved a discrete trials self-control procedure. Unlike pigeons, humans in this study preferred the larger delayed reinforcer over the smaller more immediate reinforcer in most cases, although response bias made it difficult to interpret the choices of some subjects. During the remaining experiments, subjects were exposed to concurrent VI schedules with various arrangements of delays and magnitudes of reinforcement for the two options. When faced with a choice between a small, relatively immediate reinforcer and a larger delayed reinforcer, all subjects made a greater number of delayed-reinf orcer choices than characteristically made by pigeons or predicted by the ideal matching law. In 30 of 38 cases in which the matching law predicted preference for the more immediate reinforcer, the humans preferred the delayed reinforcer. These findings are consistent with many other studies of human choice which deviate from matching-law predictions and from the usual pigeon findings. Instead of matching, humans' choices tend toward maximizing overall obtained reinforcement, and are less sensitive to the diminishing effects of delay on 18 reinforcer value (e.g., Belke, Pierce, & Powell, 1989; Flora & Pavlik, 1992; King & Logue, 1987; Mawhinney, 1982; Millar & Navarick, 1984; Navarick, 1986). There are various possibilities for explaining the differences in the choices of humans and pigeons, some of which will be reviewed below. Molar maximization models of choice, which assume behavior maximizes overall obtained reinforcement, are most consistent with human self-control performance (e.g., Houston & McNamara, 1985; Rachlin, Battalio, Kagel, & Green, 1981). Some studies, upon which these models are based, have demonstrated preference for a larger more delayed reinforcer by nonhuman subjects, when such a choice maximizes energy intake and minimizes energy expenditure; procedural discrepancies, however, make it difficult to compare these findings directly with the studies reviewed here (for further discussion see Logue, 1988) . From this perspective, the failure of molar maximization models to account for pigeons' performances under self-control arrangements is the result of limitations on the time frame over which costs and benefits are balanced. Such limitations could be argued for on an evolutionary basis or could be viewed as a result of historical or procedural factors. Unfortunately, it is unclear at present which of these variables is critical and even whether pigeon and human differences are best characterized in terms of maximization models of behavior. 19 Performance differences between humans and pigeons under self-control procedures might also result from the participation of human subjects in extensive verbal communities outside of the laboratory that, especially in capitalistic societies, are likely to support adherence to maximization strategies differentially (Mawhinney, 1982) . In addition to directly reinforcing maximization, such histories likely establish repertoires of following maximization rules and stating rules to oneself about how to respond in ways that maximizes reinforcement. Such an interpretation is consistent with behavioral theory (Skinner, 1974; also see Home & Lowe, 1993, for an excellent discussion) and is supported by direct evidence that experimenter provided rules can influence responding under experimentally arranged contingencies (Bentall & Lowe, 1987; Catania, Matthews, & Shimoff, 1982; Home & Lowe, 1993; Solnick, Kannenberg, Eckerman, & Waller, 1980) and by inferential evidence that self-stated rules influence responding during some human experiments (Baron & Galizio, 1983; Home & Lowe, 1993; Laties & Weiss 1963; Lippman & Meyer, 1967; Logue et al., 1986; Lowe, Harzem, & Bagshaw, 1978; Matthews, Catania, & Shimoff, 1985; Sonuga-Barke, Lea, & Webley, 1989) . Sometimes instructions explicitly encourage maximization patterns, as in the Logue et. al. study, in which written instructions to the subjects included the statement, "Your task is to earn as many points 20 as you can" (p. 161) . That such rules contributed to the observed tendency to maximize is supported by post- experimental questionnaires in which subjects reported that they were attempting to maximize the total points earned and that they did this by trying to time the delays and durations characteristic of button availability. More recently, similar correlations between human subjects' verbal reports and patterns of responding were obtained under various concurrent schedules (Home & Lowe, 1993). The authors of this study clarified how the responding of verbal adult humans in operant experiments often involves an interaction of verbal processes with experimental contingencies . Human verbal and social histories are also implicated in developmental studies of self-control. Sonuga-Barke et al. (1989) exposed 4, 6, 9, and 12-year-old children to choices between 1 and 3 tokens exchangeable for candy or toys after the session. Preference was assessed with concurrent VI schedules of block pressing. Presses on one block produced a 10-s delay and delivery of 1 token; presses on the alternate block resulted in 3 tokens after a delay that ranged from 20 to 50 s across different conditions. With these delay values, reinforcement could be maximized by shifting preference from the large to the small reinforcer as the delay to the large reinforcer was increased. Some of the 4-year-olds and all of the 12-year-olds showed this 21 pattern. While the 12-year-olds showed dramatic preference shifts, however, the 4-year-olds' shifts were from near indifferent responding to preference for the smaller more immediate reinforcer. The 4-year-olds reported a strategy of picking the large reinforcer, although they did not do so with any consistency. The 12-year-olds gave reports that corresponded to their performance and indicated a strategy of attempting to maximize reinforcement by timing the delays and counting tokens. The 6- and 9-year-olds showed consistent preference for the larger reinforcer and, like the 12-year-olds, their individual verbal reports corresponded well with their choice responding. The results suggest a developmental sequence in which, between the ages of 4 and 6, children learn to wait for larger delayed reinforcers, and between the ages of 9 and 12, learn to wait, or not wait, for a larger reinforcer depending on overall obtained reinforcement. These changes were likely aided by accompanying changes in rule stating and rule following repertoires. Other developmental studies described by Logue (1988) and Mischel and Mischel (1983) , also implicate verbal processes in choice. In these studies, children (3 to 12 years old) choose between preferred and nonpreferred edibles. The preferred edible was determined on the basis of prior verbal reports of the subjects. During single- trial experimentation, subjects were instructed to wait for 22 the experimenter to return to get the preferred snack but to signal for the experimenter to return to get the less preferred snack. The measure of self-control was the time spent waiting for the experimenter to return. Generally, the longer the experimenter was away, the more likely it was that subjects would not wait. Subjects were also less likely to wait for the less preferred snack than the more preferred snack. Older children were more likely to wait and to wait longer than younger children (a similar developmental finding has been reported by Burns & Powers, 1975) . Self-control is improved in these studies when subjects engage in distracting activities during the wait; restate the rule about getting the preferred snack by waiting; make general or abstract statements about the task (e.g. , it is good to wait) ; avoid making statements about the taste, texture, or consumable characteristics of edibles; and avoid looking at the edibles. Older children are more likely to describe and engage in these strategies for improving self-control and to prefer choice situations more conducive to self-control (e.g., situations in which the edibles are out of sight) . Verbal reports may be seen to correspond with performance in these studies, in that children who report using the above strategies usually wait longer, and children who wait longer are usually better at describing these strategies for improving self-control. Waiting by children is also increased when the experimenter 23 provides instructions describing successful waiting activities. Together, these findings suggest that the type of self-verbalizations determines the length of waiting and as children grow older they become more skilled at engaging in verbal strategies during the wait. It is possible that some self-stated rules about forthcoming reinforcers serve a function analogous to the overhead lights present during the delay interval associated with the larger reinforcer in pigeon studies involving delay fading (Logue & Mazur, 1981; Logue et al., 1984; Mazur & Logue, 1978) . With both pigeons and humans, events (lights or rules) during the delay that are differentially associated with obtaining the larger reinforcer enhance self-control. These delay-fading studies might also relate to pigeon and human self-control differences, in that human adults are more likely to have had experiences analogous to the fading history of the pigeons that demonstrated more self-control. In any case, results showing that both pigeons and younger (less verbal) humans tend to respond impulsively in self-control situations, and that verbal processes play a role in human performance, strongly suggest that verbal history is an important determinant of self- control in humans. Verbal processes cannot explain all species differences in self-control, however (van Haaren, van Hest, & van De Poll, 1988) . Van Haaren et al. investigated choices of male 24 and female rats between 1 and 3 food pellets. Presses on the right lever produced the larger (3-pellet) reinforcer, whereas presses on the left lever produced the smaller (1-pellet) reinforcer. When each reinforcer was preceded by a 6-s delay, all subjects preferred the large reinforcer. When the delay associated with the small reinforcer was decreased to 0.1 s, all subjects continued to prefer the large reinforcer. When contingencies associated with the levers were reversed, most of the subjects switched levers and continued to prefer the larger, more delayed reinforcer. In a second experiment with different rats as subjects, the small reinforcer was always delivered after a 6-s delay and the large reinforcer was delayed either 9, 15, 24, or 36 s during different conditions. Most subjects consistently preferred the large reinforcer. Rats' choices in this study differed from those of pigeons under similar arrangements, more closely resembling human performance. Among the interpretations of the differences between pigeons and rats considered by van Haaren et al., was the notion that elicited key pecks might contribute to the impulsive responding typical of pigeons. It is well known that a stimulus paired with food presentation will elicit stimulus-directed pecking in pigeons (Schwartz & Gamzu, 1977) . Poling, Thomas, Hall- Johnson, and Picker (1985) demonstrated that a red key paired with a small reinforcer (3-s access to grain) was 25 more often the target of elicited key pecks than a simultaneously presented blue key paired with a larger delayed reinforcer (9-s access to grain) . Lopatto and Lewis (1985) investigated the role of elicited pecks in a single key self-control arrangement, in which pecking a key during periodic 4-s presentations produced a small reinforcer (2-s access to grain) , while not pecking resulted in a larger reinforcer (4-s access to grain) delivered after the key was darkened. Subjects responded impulsively, pecking the key on 95% of trials. When pecking no longer produced the small reinforcer and canceled the large reinforcer, pigeons continued to peck on 75% of key illuminations, suggesting that elicited pecks also contributed to the impulsiveness observed in the first procedure. The role of elicited key pecks in standard two-key self-control arrangements with pigeons has not been determined, although the studies cited here suggest that elicitation may add to the impulsiveness observed in some of these experiments. Finally, procedural differences involving the nature of the conseguences may contribute to the reported performance differences between humans and pigeons in studies of choice and self-control. In most human experiments, consequences consist of points (token reinforcers) that are exchangeable for money some time after the experimental session. Humans may be more likely to demonstrate self-control because there is no advantage to obtaining points quickly, since they 26 cannot be exchanged until the session is over. Thus, the point arrangement characteristic of human studies may favor maximization over the length of the session. In pigeon studies, on the other hand, the typical consequence is food, an unconditioned reinforcer of more immediate consummatory value. This arrangement may favor impulsivity. Consistent with this interpretation are reports of impulsiveness in humans when food (Ragotzy, Blakely, & Poling, 1988) or escape from unconditioned aversive stimuli (Navarick, 1982; Solnick et al., 1980) are consequences of choice responding. Ragotzy et al. (1988) demonstrated impulsiveness in humans when food was the consequence of choice responding. Severely retarded human adolescents chose between 1 and 3 Cocoa Puffs. Choices were made by touching one of two different colored cards, each associated with one of the reinforcer options. All 3 subjects preferred the large reinforcer when both reinforcers were delivered immediately, but as the delay to the large reinforcer was increased across conditions, preference shifted strongly in favor of the small reinforcer. The human subjects in this study responded somewhat differently than pigeons, preferring the large delayed reinforcer under some parameters in which the matching law predicts strong impulsiveness. However, unlike the human subjects in prototypical choice studies, and more like pigeons, these subjects failed to maximize reinforcement, responded impulsively, and were sensitive to 27 the diminishing effects of reinforcer delay on reinforcer value. In a second phase of the experiment, the delay to the small reinforcer was increased across conditions and preference shifted back to the large reinforcer, a result that is also consistent with previous findings in pigeons (Green et al., 1981; Rachlin & Green, 1972). While the Ragotzy et al . , 1988, study lends some support to the notion that impulsiveness is more likely with immediately consumable reinforcers, the atypical impulsive responding in their human subjects could also be related to the verbal deficiencies characteristic of the severely retarded. Solnick et al. (1980) investigated choices of female college students who solved math problems while wearing headphones. In one condition, after 15 s of exposure to white noise (90 dba) played through the headphones, subjects were given a choice of pressing one button that turned the noise off immediately for a short duration (90 s) or pressing an alternate button that turned the noise off for a longer duration (150 s) after a delay of 30 s. Unlike the verbal human adults in most studies, these subjects responded impulsively, strongly preferring the immediate reinforcer (noise termination) . A 15-s delay was added to both options for a second group of subjects, by scheduling the choice opportunity at the start of each trial. Subjects exposed to this condition showed exclusive preference for the larger, more delayed reinforcer, a finding that 28 resembles previous reports with pigeons (e.g., Green et al., 1981) . Negative reinforcement by noise termination also produced impulsive responding in adult human college students in a study by Navarick (1982). Navarick's subjects increasingly preferred the small reinforcer as the delay to the large reinforcer was increased, preferred immediate reinforcement over an equal duration of delayed reinforcement, and preferred a large reinforcer over a small reinforcer when both were delivered immediately. Navarick and associates have also examined the effects of other reinforcers with humans. Impulsivity was demonstrated in at least some of the human subjects when either access to a video game (Millar & Navarick, 1984) or slides of entertainment and sports personalities served as choice consequences (Navarick, 1986) . Another study (Navarick, 1985) examined choice when illumination of indicator lights that the subjects were told to react to with a "pleasant feeling" served as consequences of choice responding. In this case, subjects demonstrated preference for large over small amounts of reinforcement (duration of illumination) when no delays were scheduled for either choice but did not prefer the immediate to the delayed reinforcer when reinforcer amounts were equal. This finding raises the possibility that the instructions regarding the point reinforcers in human self-control studies may also 29 play a role in the obtained insensitivity to large- reinforcer delays, although insensitivity of the type demonstrated by Navarick was not apparent in the Logue et al . (1986) study. Together Navarick' s work shows that adult human choices are generally more sensitive to differences in reinforcer amount than reinforcer delay, and because the magnitude and reliability of delay sensitivity varied considerably between the reinforcer types investigated, that gualitatively different reinforcers likely have different propensities for producing impulsiveness (Navarick, 1986) . In regards to the present discussion, the finding of impulsiveness in many of these studies when reinforcers of more immediate value serve as conseguences of choice, and the failure to show impulsiveness in human studies when points serve as reinforcers, further suggests that the characteristic conseguences of choice in pigeon and human studies (food vs. points) may contribute to the characteristic differences in choice and self-control. In summary, the finding that pigeons respond impulsively under self-control arrangements and that adult humans typically demonstrate self-control is often explained in terms of the verbal processes characteristic of humans (e.g., Mawhinney, 1982) and the limited capacity for temporal integration in pigeons. The finding that adult humans respond impulsively when negative reinforcement or access to positive reinforcers with more immediate value 30 serve as choice consequences suggests that the type of reinforcement may be involved in the previously reported species differences. The present experiments investigated this possibility with pigeons as subjects, using tokens as consequences of choice, responding in a self-control arrangement that more closely resembles the typical human paradigm. Figure 1 illustrates the rationale for this investigation . REINFORCEMENT WITH IMMEDIATE VAEUE 31 YES NO U CQ D c/3 < 2 Z o a IMPULSIVENESS UNCONDITIONED REINFORCEMENT (FOOD AND NOISE TERMINATION) SELF-CONTROL TOKEN REINFORCEMENT (POINTS) IMPULSIVENESS UNCONDITIONED REINFORCEMENT (FOOD) ? TOKEN REINFORCEMENT (LED ILLUMINATION) Figure 1: A summary of research findings in self-control experiments with pigeons and humans. The two left quadrants show that impulsiveness has usually been found with both pigeons and humans when reinforcement has immediate value. The upper right quadrant represents the usual finding of self-control in humans when token reinforcement is used. ^ The present experiment was conducted to provide information for the lower right quadrant and assessed the responding of pigeons with token reinforcement. EXPERIMENT 1 The points delivered as consequences in human operant studies may be viewed as token reinforcers (Gollub, 1977; Kelleher, 1958; Malagodi, 1967). Token reinforcers are usually physical objects, delivered according to some schedule of reinforcement, that can be exchanged for some other (terminal) reinforcer. Tokens, however, can be defined more generally as conditioned reinforcers "that the organism may accumulate and later exchange for other reinforcers" (Catania, 1992, p. 400). In token reinforcer arrangements a discriminative stimulus is usually associated with exchange periods, during which a specified "exchange" response involving the token (s) is followed by presentation of the terminal reinforcer. Thus, the token reinforcer paradigm involves a schedule of token reinforcement, a schedule of exchange periods (exchange schedule) , and a schedule of reinforcement of exchange responses by the terminal reinforcer (Malagodi, Webbe, & Waddell, 1975; Waddell, Leander, Webbe, & Malagodi, 1972; Webbe & Malagodi, 1978) . All three schedules of the token paradigm are also components of the point reinforcer system used in human operant studies. The typical procedural arrangement with 32 33 humans differs from the token reinforcer paradigm, however, in the following 3 ways: (1) point delivery consists of incrementing a counter instead of delivering a physical object; (2) the exchange response involves manipulation of verbal stimuli that correspond to points instead of manipulating a token object itself; and (3) the terminal reinforcer consists of money (a generalized conditioned reinforcer) , instead of an unconditioned reinforcer. The token reinforcer arrangement characteristic of human studies may produce self-control in a manner similar to the commitment response procedure described earlier (Rachlin & Green, 1972). Recall that self-control was increased in this study when pigeons were provided with an opportunity for advance commitment to the large reinforcer. Similarly, by choosing a larger number of points during the session, humans are committing to a greater amount of money after the session. In both cases, a choice at time X determines the availability of reinforcement at a later time (X + T) . With humans, in-session choices determine the magnitude of post-session (post-T) monetary reinforcement. For pigeons, commitment responses within a session determine food availability after T seconds. Interestingly, if the matching law were applied to humans' choices using the delays and magnitudes of monetary reinforcement, preference for the larger amount of reinforcement would be predicted. The pervasiveness of self-control in human subjects may 34 simply be a replication of the effects of scheduling choices far enough in advance of the availability of reinforcement (e.g., Green et al . , 1981). The points delivered in studies with humans might also contribute to the obtained self-control. The correspondence of points to the amount of monetary reinforcement, resembles the correspondence of overhead lighting to the amount of food reinforcement that was shown to promote self-control in pigeons (Logue & Mazur, 1981) . This interpretation de-emphasizes the importance of points and implies that they are subordinate to the scheduling of monetary reinforcement in determining humans' choices. Whether or not this is true of nonhumans ' choices is not known. The token reinforcement schedule, however, is often considered to be subordinate to the exchange schedule: the token derives its reinforcing function from the terminal reinforcer that is available only during exchange periods. Also, while patterns of token reinforced behavior usually resemble those characteristic of the token reinforcement schedule, the obtained rate of behavior and within session changes in patterns and rates across intertoken intervals, are determined by the exchange schedule (e.g., Malagodi et al., 1975; Waddell et al., 1972; Webbe & Malagodi, 1978). An extreme example of this is the extended pauses observed under token reinforcement schedules during times and stimulus conditions most remote from the exchange period 35 (e.g., Kelleher, 1958; Malagodi et al . , 1975; Waddell et al., 1972; Webbe & Malagodi, 1978). The present experiment investigated pigeons1 preference under a token reinforcer arrangement similar to the typical human procedure involving point delivery. Choices (pecks on lighted side keys) during discrete trials resulted in the illumination (delivery) of either 1 or 3 LEDs (tokens) . Each LED could be "exchanged" for 2-s access to grain by pecking a center key during exchange periods. Exchange periods were initially scheduled after each trial; the ratio of trials to exchange periods was then increased across phases until a single exchange period was scheduled at the end of the session. Increasing this ratio in successive phases was done to encourage the development of conditioned reinforcing properties of the LEDs by initially providing a strong correlation between LED presentation and food availability, before gradually increasing the periodicity of exchange periods. Gradually increasing the ratio of trials to exchange periods may also minimize the response-weakening properties of increasing exchange schedule values (Waddell et al . , 1972). Also, the exposure of subjects to exchange periods with increasing numbers of LEDs to exchange periods across phases, provided a rich history of correspondence between LEDs and the number of food deliveries available. Thus, the correspondence of LEDs to food amounts resembled the correspondence of points to money amounts in human 36 studies. Finally, if changing the exchange schedule was analogous to the manipulation of temporal variables, (T, as discussed above) , then under conditions with choices between 1 immediate LED and 3 delayed LEDs, preference for the larger delayed reinforcer (3 LEDs) might be expected to increase as the ratio of trials to exchange periods increased, that is, as choice responses became increasingly remote from food availability. Method Subjects Six experimentally naive male White Carneau pigeons (Columba livia) served as subjects. All subjects were individually housed with water and health grit continuously available. Subjects were maintained at 80% of their laboratory free-feeding weight. Apparatus A standard 3-key pigeon chamber (Lehigh Valley) with a modified stimulus panel served as the experimental space. A minimum force of 0.14 N was required to activate either side key and a minimum force of 0.12 N activated the center key. Thirty-four red light-emitting diodes (LEDs) were recessed in the panel, forming a horizontal row 5 cm below the ceiling and 0.7 cm below the houselight fixture (see Figure 2). The LEDs were evenly spaced and centered 1.7 cm from each end of the panel. Unless otherwise indicated, onset of 37 LEDs always proceeded sequentially from left to right with each onset accompanied by a brief tone. Offset of LEDs always proceeded sequentially from right to left. When operative, the left, center, and right keys were illuminated green, red, and blue respectively. Primary reinforcement consisted of access to mixed grain through the stimulus panel reinforcement aperture. During food delivery, all key lights and the houselight were dark and an orange light above the feeder was illuminated. White noise was present in the experimental room to mask extraneous sounds. Experimental contingencies were scheduled and recorded by an IBM 286-compatible computer with MED-PC software. Procedure Each subject was first exposed to a one hour session of adaptation with the houselight and all LEDs illuminated but no other programmed contingencies in effect. During magazine training and exchange keypeck shaping, the number of illuminated LEDs corresponded to the number of food deliveries available. Magazine training sessions began with the simultaneous illumination of the left-most 17 LEDs, the white houselight, and the red center (exchange) key. Intermittent hopper presentations were controlled by a hand held switch. When operated, the switch turned off 1 LED and 0.5 s later produced food. Alternate switch operations withdrew the hopper. Magazine training ended when the 38 subject ate readily from the feeder for at least five consecutive food deliveries. Exchange-keypeck shaping began with the same stimulus conditions as magazine training. Successive approximations to keypecks on the center (exchange) key produced offset of 1 LED, followed 0.5 s later by a 2-s food delivery. Once a keypeck (exchange response) occurred, each remaining food delivery of the session reguired a single peck on the illuminated exchange key. All subjects were then exposed to two sessions of 3 4 LED exchanges each, with the same contingencies on the exchange key. Choice-key training began with the illumination of the houselight and one choice key (left or right) . Each subject was exposed to two sessions of 34 food deliveries each, with a different choice key available in each session. A single peck on the illuminated choice key turned off the key and turned on 1 LED, followed 0.1 s later by an exchange period, signaled by illumination of the exchange key. A single peck on the exchange key turned off the key and 1 LED, followed 0.5 s later by 2 s of food. Throughout the experiment, exchange periods remained in effect until all illuminated LEDs were exchanged. For one subject (1857) , who did not peck the choice key after 180 minutes in the chamber, pecking was established by reinforcing successive approximations with the onset of an LED followed by the exchange period. 39 Throughout the remainder of the experiment, two sessions were scheduled daily, five days per week, with a 5-min blackout between sessions. Each session consisted of 12 discrete trials, each beginning 60 s from the onset of the preceding trial, excluding exchange periods. Failure to respond for 45 s on a given trial delayed the onset of the next trial an additional 60 s. During the intertrial interval (ITI) the houselight and all keylights were dark. The first two trials of each session were forced exposure trials, designed to bring behavior into contact with the consequences programmed on both keys. The key available on the first trial (left or right) was determined randomly with a probability of .5; the alternate choice key was automatically illuminated during the second trial. The contingencies correlated with the illuminated key on forced- choice trials corresponded to those in effect on choice trials. Choice trials began with the illumination of the houselight and both side (choice) keys. A single peck on either side key (choice response) darkened both keys and produced the associated consequences, the illumination of either 1 or 3 LEDs. Large-reinf orcer choices resulted in the illumination of 3 LEDs — 1 immediate, the other 2 spaced 0.6 s apart. Thus, it took 1.2 s to deliver 3 LEDs. Small- reinforcer choices resulted in the immediate illumination of 1 LED. 40 All subjects were initially exposed to a choice between 1 and 3 LEDs, scheduled "immediately", with an exchange period following each trial (designated condition 1) . The large reinforcer was arbitrarily assigned to the left key for three subjects and to the right key for the other three subjects (Table 1) . This assignment was constant throughout the experiment. When scheduled, exchange periods always began 0.1 s after the last LED presentation. Thus, exchange periods followed small-reinforcer (1 LED) choices by 0.1 s and large-reinforcer (3 LED) choices by 1.3 s.1 Next, subjects were randomly divided into two groups of three pigeons each. For Group A, large-reinforcer choices produced 3 LEDs after a 6-s delay (condition ID) . The ratio of choice trials to exchange opportunities was then increased to 2:1, 5:1, and 10:1, across conditions 2D, 5D, and 10D, respectively. For Group B, the ratio of trials to exchange periods was first increased from 1:1 to 2:1 to 5:1 to 10:1, before adding the 6-s delay to the large reinforcer in the final condition (10D) . Figure 3 shows the seguence of events following large- and small-reinforcer choices. The LEDs are spoken of in terms of reinforcement recognizing that strict behavior analytic criteria for doing so have not been met. This is done on the basis of formal similarities between the scheduling of LEDs here and the scheduling of reinforcing consequences in other studies and for convenience when discussing and evaluating the role of LEDs in the current experiment; this is consistent with discussions of analogous consequences in human operant studies. 41 LEDs remained illuminated during the ITI after trials with no scheduled exchange period. Whenever the ratio of choice trials to exchange periods was greater than 1:1, only the second forced trial was followed with an exchange period. Table 1 summarizes the experimental conditions, order of exposure, and number of sessions for all subjects. Experimental phases were in effect for at least 20 sessions and until the following stability criteria were met: (a) no trends evident in the number of choices allocated to either alternative over the last 10 sessions and (b) the number of choices of either option during the last 5 sessions not outside the range of values obtained during all previous sessions. Conditions were changed arbitrarily if these criteria were not met in 80 sessions. Results Figure 4 shows the number of large-reinforcer choices across all experimental conditions. Data from Group A are displayed in the left panel and Group B in the right. The bars are means from the last 10 sessions of each condition; vertical lines show the range of values used to determine the means. Because a session consisted of 10 trials, a value above 5 generally indicates preference for the large reinforcer, whereas a value below 5 indicates preference for the small reinforcer. A mean value between 4 and 6, with a 42 range that extends above and below 5, indicates indifference. Condition 1, with no delay to small or large reinforcers, resulted in strong preference for the large reinforcer in 5 of 6 subjects; only Subject 1857 (Group A) preferred the small reinforcer. For the other 2 subjects in Group A (747 and 1383) , preference reversed in favor of the small reinforcer when the large reinforcer was delayed 6 s in condition ID. Large-reinf orcer choices also decreased for Subject 1857 during this phase. All three subjects in Group A preferred the immediate reinforcer across phases ID, 2D, 5D, and 10D. This preference was generally strong, with an average of less than 2 large-reinforcer choices per session, except during phase 2D in which the number of large-reinforcer choices was somewhat elevated for Subjects 1857 and 1383. For subjects in Group B, scheduling the exchange period every second choice trial reduced the number of large- reinforcer choices for Subjects 1732 and 1855 but not for Subject 753. Further increases in the number of trials per exchange period during conditions 5 and 10 shifted preference in favor of the small reinforcer for Subjects 1855 and 753. The magnitude of this effect was greatest in Subject 753, who in the previous two conditions chose the large reinforcer on nearly all trials. For Subject 1732, preference for the large reinforcer was recovered during 43 conditions 5 and 10 but reversed in favor of the small reinforcer when a delay to the large reinforcer was added in condition 10D. This added delay also resulted in fewer large-reinforcer choices for Subject 753. In Subject 1855 the number of large-reinforcer choices increased slightly during this condition, resulting in approximate indifference. Figure 5 shows within-session choice patterns. The relative freguency of large-reinforcer choices is plotted across trials preceding scheduled exchange periods over the final 10 sessions of each condition. Only data from conditions in which exchange periods occurred after two or more trials are shown. As before, proportions above .5 indicate preference for the large reinforcer and proportions below .5 indicate preference for the small reinforcer. For subjects in Group A (left panels) , the greatest proportion of large-reinforcer choices occurred during the 1st trial of the block of trials preceding exchange periods. This was consistent across subjects and conditions, except for Subject 747 during condition 10D in which the proportion of large-reinforcer choices varied unsystematically across the 10 trials. The most pronounced differential control of large-reinforcer choices by trial position occurred during condition 2D in Subject 1383: the proportion of large reinforcers chosen was .82 in the 1st trial but zero during the 2nd trial of the block. For all subjects in Group A, 44 during conditions 2D and 5D the proportion of large- reinforcer choices was greatest during the 1st trial and dropped to zero or near zero levels during the remaining trial (s) of a block. During condition 10D, except for Subject 747, the probability of a large-reinf orcer choice decreased across trials, reaching a level of zero during the latter trials of the block. Similar, though less pronounced, effects occurred with subjects in Group B (right panels) . The relative number of large-reinforcer choices was greatest during the initial trial of the block in 8 of 12 cases for the three subjects and decreased to lower levels across remaining trials. Figure 6 shows average choice latencies during conditions in which exchange periods were scheduled after two or more trials, from the last 10 sessions of each condition. Latencies for subjects in Group A and B are displayed in left and right panels, respectively. Note that the Y axes are scaled individually to accommodate between subject differences in latencies. Open symbols represent latencies for large-reinforcer choices and filled symbols for small-reinf orcer choices. The absence of a data point for either choice denotes conditions in which choices of that type did not occur. In 38 of 40 cases across subjects, latencies were longest during the 1st trial of a block, decreasing across trials. The lst-trial latencies also tended to be longer as 45 the number of trials per exchange period was increased across conditions. This effect was clearest for Subject 1857, in which 1st trial latencies were shortest during condition 2D, somewhat longer during condition 5D, and longest during condition 10D. With one exception (the 2nd trial of condition 10D for Subject 1732) , the longest latency for each subject occurred on the 1st trial in conditions with exchange periods scheduled every 10th trial. Subjects 747 (condition 10D) and 1855 (condition 10) regularly had 1st trial choice latencies longer than 45 s, which postponed the onset of the 2nd trial. No trend was evident in these latencies and they did not systematically relate to choice. Discussion In Experiment 1, pigeons' choices were assessed in a self-control arrangement with token-like reinforcers. Despite the procedural similarities of this arrangement with typical human procedures, the overall results of Experiment 1 support previous findings with pigeons (Logue et al., 1984; Mazur & Logue, 1978) rather than with humans (Logue et al., 1986). That is, subjects usually responded impulsively, preferring the small immediate reinforcer over the large delayed reinforcer (Figure 4). Such impulsive responding is consistent with the matching law applied to LED reinforcement. 46 Figure 7 shows matching-law predictions of the number of large-reinf orcer choices based on LED reinforcement. To obtain meaningful predictions a delay of .01 s was used instead of 0 s when reinforcement delivery was immediate. Thus, both D. and Ds were .01 s when neither reinforcer was delayed (no delay) ; when the large reinforcer was delayed, a value of 6 s was used for DL. The reinforcer amounts used to calculate predicted values were 3 (AL) and 1 (As) for the large and small reinforcers, respectively (3 or 1 LEDs) . The relative number of large-reinforcer choices predicted by the matching law was first calculated, then multiplied by 10 to obtain the predicted number of large-reinforcer choices out of 10 trials. The matching-law predictions correspond very well to obtained data from conditions in which the large reinforcer was delayed (ID, 2D, 5D, and 10D in Figure 4) . Here, in 14 of 15 cases the small reinforcer was preferred; the only exception was the indifferent responding of Subject 1855 under condition 10D. The matching-law predictions correspond less well to obtained data from conditions without a reinforcer delay (1, 2, 5, and 10). Under these conditions the large reinforcer was preferred in only 8 of 15 cases. Six of the 7 exceptions were from conditions for Group B subjects in which the number of trials per exchange period exceeded one. LED reinforcement did not differ between these conditions, suggesting that 47 other factors were responsible for the lower number of large-reinforcer choices. LED reinforcement parameters also cannot account for the within-session patterns of choice shown in Figure 5. For example, during conditions 2 and 2D, a greater proportion of large-reinforcer choices occurred on the 1st trial of a block than the 2nd. In four subjects (1857, 1383, 1732, and 1855) the large reinforcer was prererred on the 1st trial while the small reinforcer was preferred on the 2nd trial. Also, during conditions in which exchange periods were scheduled after 5 or 10 trials, especially for subjects in Group A, the proportion of large reinforcers chosen tended to be greatest during the 1st trial, often shifting downward abruptly from the 1st to the 2nd trial. The within-session pattern of choices under conditions 2 and 2D is more consistent with the predictions of the ideal matching law applied to food parameters, than to LED reinforcement. Figure 8 shows matching-law predictions (based on food reinforcement) of the relative number of large-reinforcer choices for each trial in the block preceding exchange periods under all experimental conditions.2 Predictions for Group A are shown in the top graph and for Group B in the bottom. The displayed values are based on food amounts and delays. The amounts used to calculate these values are based on the total amount (seconds) of food available during the exchange period 48 following all trials of a block. The delay values are based on the minimum delays to the first food delivery of an exchange period, excluding choice response and exchange response latencies. For each experimental condition the food-delay values on the trial immediately preceding an exchange period, when LED reinforcement is immediate for both options, are 1.8 s and 0.6 s for large- and small- reinforcer choices, respectively. When the large reinforcer is delayed, the food delay values are 7.8 s and 0.6 s for large- and small-reinforcer choices, respectively. Because on all trials except the trial just prior to an exchange period the delays to food are equal for both large- and small-reinforcer choices, the matching-law predictions across these trials are determined solely by amount of food ratios. Table 2 shows the amount of food values used in calculating the proportions displayed in Figure 8 and the results of all calculations. The effect of a choice in a given trial on the relative amount of food obtained in a subsequent exchange period, depends on reinforcer choices during all other trials of the block. For this reason, the predicted proportion of large-reinforcer choices for each Although there is no precedent for applying the matching law to food parameters in an arrangement like the one here, the matching law should have relevance to the present data. The method of application described and presented here was selected on rational but also pragmatic grounds — it yielded results that were consistent with the obtained choice in the current experiments. 49 trial number was determined by first calculating the ratio assuming exclusively small-reinforcer choices for all other trials of the block and then calculating the ratio assuming exclusively large-reinforcer choices for all other trials. These two ratios establish the range of predictions for a given trial number under a particular experimental condition. The shapes of the obtained functions in both calculations were the same and the magnitude of difference between the two values on any trial was always small. The ratios were therefore averaged to obtain the values displayed in Figure 8 (see the Appendix for complete calculation examples) . Because the experimental procedure involved concurrent fixed-ratio 1 schedules, these predictions were not expected to provide precise estimates of choice response ratios but rather to predict the direction of preference; the obtained choice ratios would thus be expected to be more extreme than illustrated. During conditions 1 and ID an exchange period occurred after each trial so predictions are plotted only for trial 1, represented by the symbols "1" and "ID". Figure 8 shows that the ideal matching law applied to condition 1 predicts indifference between the small and the large reinforcer. Figure 4, however, shows that 5 of 6 subjects strongly preferred the large reinforcer under this condition, whereas Subject 1857 preferred the small reinforcer. The matching law predicts strong preference for the small reinforcer 50 under condition ID for Group A (top graph of Figure 8) , which is in accord with the obtained preferences (see Figure 4) . When the exchange period is scheduled after two trials, the matching law predicts preference for the large reinforcer on the 1st trial of the block and the small reinforcer on the 2nd. The data shown in Figure 5 are in gualitative, and sometimes guantitative, agreement with these predictions. As predicted, Subjects 1857 and 1383 (left panel) , and 1732 and 1855 (right panel) , preferred the large reinforcer on the 1st trial of a block and the small reinforcer on the 2nd trial of a block of trials preceding an exchange period. Subject 747 (left panel) preferred the small reinforcer on both trials but a greater proportion of large-reinforcer choices occurred on the 1st trial than the 2nd trial, yielding a curve in the direction predicted by the matching law. Data from Subject 753 (right panel) did not correspond to matching-law predictions; egually strong preference for the large reinforcer was exhibited during both the 1st and 2nd trials of condition 2. Predictions of the matching law were less accurate under conditions in which an exchange period was scheduled following 5 or 10 trials. Preference for the large reinforcer is predicted across all but the final trial of a block, at which point the proportion of large-reinforcer choices is predicted to drop steeply below .5. Instead, the proportion of large reinforcers chosen tended to decrease 51 across trials of a block, often shifting downward abruptly from the 1st to the 2nd trial (see Figure 5) . Given the well established sensitivity of pigeons' choices to even small differences in delays to food, it is not surprising that unegual delays to food also affected responding in the current experiment. In fact, small differences in delays to food in Experiment 1 may have precluded a clear assessment of choices maintained by LED reinforcement. For example, as described earlier, on choice trials immediately preceding exchange periods, under conditions in which the large reinforcer was delayed, the minimum delays to food were 0.6 s following small-reinf orcer choices but were 7.8 s following large-reinf orcer choices. Similarly, minimum delays to food on trials immediately preceding an exchange period were 1.8 s and 0.6 s for large and small-reinforcer choices, respectively, when LED reinforcement was immediate for both options. These different delays to food were a joint function of exchange periods scheduled immediately after LED presentation, the additional time taken to illuminate three LEDs in succession following large-reinf orcer choices, and the added delay to the large reinforcer under conditions in which LED presentation was delayed. The ideal matching law, established largely on the basis of pigeons' choices under food reinforcement schedules, applied to the choices in Experiment 1 with food parameters, predicts preference for 52 the small reinforcer on trials immediately preceding the exchange period, whenever exchange periods are scheduled after two or more trials (Figure 8). Thus, at least on the final trial of a block, food-reinforcement parameters would be expected to have had a greater influence on choices than the subordinate LED arrangements. This interpretation is consistent with the choice patterns usually observed under conditions 2 and 2D (Figure 5) . Under these conditions, differences between the 1st and 2nd trials in delays to food for the two choice responses and in stimulus conditions, provided a basis for discriminative control of choice. On the 1st trial, with no LEDs illuminated, most subjects preferred the large reinforcer. On the 2nd trial, when at least one illuminated LED was always present and food was obtained sooner following a small-reinforcer choice, most subjects preferred the small reinforcer. Together, these results are in accord with the predictions of the ideal matching law applied to food delays (Figure 8) and extend the generality of previous findings regarding the importance of food reinforcer delays in controlling choice in pigeons. Although food-based ideal matching-law predictions corresponded less well to results from conditions in which exchange periods occurred after 5 or 10 trials (Figures 3 and 6) , stimulus generalization, based largely on the presence of illuminated LEDs, might account for some of 53 these discrepancies between performances and matching-law predictions. Recall that the presence or absence of LEDs distinguished the 1st and 2nd trials of conditions 2 and 2D. During subsequent conditions, when exchange periods were scheduled after 5 or 10 trials, LEDs were illuminated on all trials except the 1st trial of a block. The greater number of large-reinforcer choices on the 1st trial of a block occurred in the absence of illuminated LEDs, a situation correlated with no differential delays to food. On the final trial of a block, with LEDs present, food delays favored small-reinforcer choices. Such control of small- reinforcer choices may have generalized across earlier trials, with LEDs present, resulting in fewer large- reinforcer choices than predicted by the ideal matching law. The latency data displayed in Figure 6 also support the view that the presence or absence of LEDs contributed to the choice patterns. For both large- and small-reinforcer choices, latencies were generally longest during the 1st trial of a block, the trial most temporally remote from food, and the trial on which the large reinforcer was most preferred. With the exception of Subject 1732, latencies were short and nearly equal across the remaining trials in which illuminated LEDs were always present. For Subject 1732, latencies tended to decrease across trials, apparently under control of the increasing proximity to food delivery and perhaps the increasing number of LEDs. That this 54 pattern did not occur in the other five subjects suggests that control by presence or absence of LEDs was greater than control by increasing numbers of LEDs or by temporal proximity to food. Interestingly, the differential preference for the large reinforcer on the 1st trial in the current experiment may be viewed as a kind of self-control, although it is not clear from the present results if LEDs or food deliveries should be treated as the effective reinforcers. Of course, both LED and food parameters may have been relevant. The relative influence of these reinforcement variables was assessed in Experiment 2 . 55 £f +-Houselight LEDs 1 0000000000000000000000000000000000 o o o choice exchange choice key key key FEEDER -> OPENING Figure 2: A diagram of the stimulus panel with the LEDs. 56 Conditions 1, 2, 5. and 10 Large Choice I Choice keys off 1 LED on 0.6 1 s 1 LED 1 on 0.6 1 s 1 LED 1 on 0.1 1 s q< 2 Period or ITI Small Choice 1 Choice keys off 1 LED on I 0.1s I Exchange Period or ITI Conditions ID, 2D. 5D. and 10D Large Choice Choice keys 1 off 6 s 1 1 LED on I 0.6 s 1 1 LED on 1 0.6 s 1 1 LED on 1 0.1 s I Exchange Period or ITI Small Choice 1 Choice keys off 1 LED on 1 0.1 s I Exchange Period or ITI Figure 3: The seguence of events following large- and small-reinforcer choices during conditions with (bottom panel) and without (top panel) a large-reinf orcer delay. Figure 4. The number of large-reinforcer choices per session across experimental conditions. Data from Group A subjects are shown in the left panel and Group B subjects in the right panel. Values are means from the last 10 sessions of each condition. Open bars indicate no delay to the large reinforcer (3 LEDs) . Striped bars indicate a 6-s delay to the large reinforcer. Vertical lines show the range of values used to determine the mean. GROUP A BIRD 1857 NO DELAY 6-S DELAY if) LJ o (J o X o 10 UJ o o LU 01 UJ C r\ BIRD 1383 ^-^ 10 < 1 ID 2D 5D 10D JIRD 747 VY~\ 1 1D 2D 5D 10D JU V/A vn u^n 1 1D 2D 5D 10D GROUP B BIRD 1732 58 10 1 2 5 10 10D BIRD 1855 10 5 - 1 2 5 10 10D BIRD 753 10 1 2 5 10 10D EXPERIMENTAL CONDITIONS Figure 5. The proportion of large-reinforcer choices at each trial number of a block of trials preceding exchange periods. Values are derived from choice trials during the last 10 sessions of each experimental condition where an exchange period occurred after two or more trials. Data from Group A subjects are shown in the left panel and data from Group B subjects are displayed in the right panel. GROUP A BIRD 1857 GROUP B 60 BIRD 1732 1.0 0.5 8 A O 2D i-o r A 5D □ 10D 0.5 - ( ) 0.0 □ □ jot / \ A A B — i — S n S III 0-0 1 23456789 10 [V7 -v- -v- -v- 0 A "7 -v-v 2 To 10D ■A 1 1 — -co — to — tn- -B- -s- -s- -B-a 1 23456789 10 < + BIRD 747 1.0 r 0.5 BIRD 1855 1.0 r D Q\ A 0.5 UJ Oo.o < a 1 23456789 10 a D V □ ./ //Off A^a n-fa - v / of / I — ^ — i 1 1 — v1 — i 1 — i 1 1 23456789 10 BIRD 1383 1.0 r 0.5 1 23456789 10 0.0 BIRD 753 AVA vVv -^ — © — ^ 1 23456789 10 TRIAL NUMBER Figure 6. Average choice latencies during conditions where exchange periods were scheduled after two or more trials. Values were derived from choice trials during the last 10 sessions of each experimental condition. Open symbols represent latencies for large-reinf orcer choices and filled symbols indicate small-reinf orcer choices. Data from Group A subjects are shown in the left panel and data from Group B subjects are displayed in the right panel. GROUP A GROUP B 62 BIRD 1857 BIRD 1732 10 5 - if) 0 Q Z o o LU 70 (J) 60 50 >- 40 o 30 z 20 UJ 10 i— 0 < _i UJ o m o X o LG O 2D A 5D □ 10D 50 • 40 *V^-B- 1 1 1 1 1 i~ 2 3 4 5 6 7 1 1 9 10 30 - 20 10 0 : □ LG SM 0 2 • : / A 5 A i / V 10 ▼ ': J D 10D ■ 1 1 1 1 1 1 T — T 1 T BIRD 747 j D [ A L AV, ^%F* * i J£ i =9 2 3 4 5 6 7 BIRD 1855 V 9 10 23456789 10 BIRD 1383 1 2 3 4 5 6 7 BIRD 753 9 10 5 - ±>m= -* A* - A\ ' \ ■ -•-•-• A (/) \\ " UJ o 0.5 - \\/ A : f ■V o A A u A X I I I I I 1 1 1 2 3 4 5 6 7 i i i 8 9 10 84 GROUP B BIRD 1732 ; n r • m • #,»•-•-*.*-* 0.5 0.0 \ OxA O 2 ■ ED10 A 5 • RED10 V 10 A ED10 D D10 o -I 1 1 1 1 1 1 1 I I 1 23456789 10 < 1.0 BIRD 747 O Ul : / ■ - ^A \ I 0,5 r / ^A ■ / k ■ ■/ 7A / X UJ ' / " / \ J ■ CO on - 9 , v' i \l BIRD 1855 1.0 r Q-O •. □• 9-U-Q-M v- a □ A/# ■; A' ,^A\ V £<. \/A WTA/AA A A 0.5 0.0 ^" \ fK 1 23456789 10 < "1 1 1 1 1 1 1 1 I I 1 23456789 10 ^ 1.0 o < BIRD 1383 BIRD 753 0.5 0.0 O ED2 A ED5 ■ ED10 # RED10 A ED10 I • I I ] 0.5 0.0 1 23456789 10 1 23456789 10 TRIAL NUMBER Figure 12 . The average choice latency during conditions where exchange periods were scheduled after two or more trials. Values are derived from preferred choice trials during the last 10 sessions of each experimental condition. The axes are scaled individually for each subject. Other graphing conventions are the same as in Figure 6 except for the different symbol correspondences indicated in the figure keys. GROUP A GROUP B 86 BIRD 1857 BIRD 1732 100 r 75 - Q Z o o Ld 00 o z LlJ h- < _J LJ o o X o 50 25 I »/0\ • A ~*A~ i 20 15 10 1 2 3 4 5 6 7 9 10 ^O-O n 1 1 1 1 1 1 1 1 1 1 23456789 10 BIRD 747 lg SM 40 - A O ED2 # : ■ A ED5 A 30 - \\ □ - \\ V ED10 ■ RED10 20 K ED10 10 1 L a ■ 0 " i ¥' ■ ■ i ■ I I I I 130 120 90 t 80 j- □ 70 p- 60 50 40 30 20 10 0 BIRD 1855 a LG SM a\J 1 J 2 A 5 A • L 10 * S DL D10 OS □ ED10 V RED10 ED10 ?W> 1 23456789 10 BIRD 1383 5 - 4 - 2 - K 1 23456789 10 BIRD 753 o *\ 12 3 4 5 6 7 9 10 1 "1 I I I 1 23456789 10 TRIAL NUMBER 87 IATCHING LAW PREDICTIONS (FOOD c/) LJ O o X o < if) < LJ o < IALL TRIALS .^"ITRIALS 1-9 T^FINAL TRIAL 0.5 0.0 J n 1 2 5 10 D1 ED1 ED2 ED5 ED10 D10 EXPERIMENTAL CONDITIONS Figure 13. The proportion of large-reinforcer choices predicted by the matching law applied to food reinforcement for each trial of a block of trials preceding exchange periods. Open bars represent values during conditions where predictions do not differ between trials. Under condition D10, the coarsely striped bar indicates the predicted value for each of the first 9 trials of a block and the finely striped bar illustrates the matching-law prediction for the 10th trial. Error bars during conditions with more than one trial per exchange period indicate the range of predictions under all possible choice patterns for other trials of a block. 88 TABLE 3 The experimental conditions, order of exposure, and number of sessions for all subjects in Experiment 2. Group A conditions are summarized in the top panel and Group B in the bottom panel . Time from a Choice Response to the Exchange Experimental Period (Seconds) Number of Condition3 Large Small Sessions Bird Bird Bird Group A 1857 747 1383 1 Dl EDI ED2 ED5 ED10 RED10 ED10 Bird Bird Bird Group B 1732 1855 753 1 2 5 10 D10 ED10 RED10 ED10 1.5 1.5 27 28 27 1.5 7.5 50 22 44 9.5 9.5 60 24 28 9.5 9.5 32 34 64 9.5 9.5 47 30 70 9.5 9.5 77 21 46 9.5 9.5 42 — — 9.5 9.5 26 — — 1.5 1.5 28 26 30 1.5 1.5 24 20 90b 1.5 1.5 39 34 20 1.5 1.5 22 78 30 1.5 7.5 33 22 42 9.5 9.5 26 80 30 9.5 9.5 20 80 36 9.5 9.5 40 34 106c 89 TABLE 3 — continued aThe numbers 1, 2, 5, and 10 refer to the number of trials per exchange period. The letter D indicates a 6-s delay to the large reinforcer (3 LEDs) . The letter E indicates an equal delay of 9.5 s from either choice response to a scheduled exchange period. The letter R indicates that the contingencies were reversed for the choice keys. ''The choice key assignments were inadvertently switched for two consecutive sessions and performance was noticeably disrupted afterwards. The phase was continued until stability criteria were met. Preference cycled between the large and small reinforcer during most of the phase without meeting stability criteria. At the 80th session a trend toward the small reinforcer was evident and the phase was continued until no trends were evident for 20 consecutive sessions. 90 TABLE 4 The relative number of large-reinforcer choices predicted by the matching law for each trial number of all experimental conditions in Experiment 2. Values are based on food reinforcement. When there are two listings for the same trial (s), the top listing shows values when the small reinforcer is chosen on all other trials and the bottom listing shows values when the large reinforcer is chosen on all other trials. The mean values displayed are the average of these two calculations for each trial and correspond to the values plotted in Figure 13. The food delay values are described in the text, the amount of food values are the same as in Experiment 1. Experimental Large Condition Trial Large + Small Mean .750 .750 .667 .634 .600 .583 .560 .536 .545 .531 .517 .429 .429 .545 .531 .517 .231 .221 .211 1 and EDI 1 2 and ED2 all 5 and ED5 all 10 and ED10 all Dl 1 D10 1-9 10 GENERAL DISCUSSION All subjects chose the larger delayed reinforcer more often in Experiment 2 than in Experiment 1. Self-control increased in Experiment 2, consistent with predictions of the matching law, primarily because the minimum delays to food from choices were eguated for both options during most conditions. Together, these experiments confirm many previous findings regarding the sensitivity of pigeons1 choices to delays in food presentation (e.g., Green et al., 1981; Lea, 1979; Logue et al., 1984; Rachlin & Green, 1972). When food delays were prevented from differentially influencing choice, in the terminal condition (ED10) of Experiment 2 that most resembles the typical human procedure, 4 of 6 subjects (1857, 1732, 1855, and 753) preferred the larger delayed reinforcer (Figure 10) . The levels of self-control observed in Experiment 2 are comparable to those reported in a similar study with humans (Logue et al., 1986) and those found in a previous demonstration of self-control with pigeons involving delay- fading histories (Mazur & Logue, 1978) . Also, the variability, within and between subjects, was well within the range characteristic of similar studies (e.g., Logue & Pena-Correal, 1984; Logue et al . , 1984, Experiment 1; Logue 91 92 et al., 1986, Experiment 1; Mazur & Logue, 1978; Rachlin & Green, 1972; van Haaren et al., 1988). The reliability of these effects, in the 4 subjects showing the most self- control, was further established by reversing the contingencies on the choice keys. In all cases, subjects continued to prefer the larger delayed reinforcer, regardless of the key with which it was associated (Figure 10, conditions ED10 and RED10) ruling out key color and position bias as alternative explanations. This manipulation was important because 3 of the 4 subjects exposed to the key reversals had prolonged recent histories of preferring the same option; moreover, position and/or color biases are especially common with concurrent FR1, discrete-trials procedures like those used here (e.g., Logue & Pena-Correal, 1984; Logue et al., 1984, Experiment 1; van Haaren et al., 1988). The self-control demonstrated in the present study may be viewed in terms consistent with Skinner's (1953) treatment of self-control. In Skinner's terms, choice of the immediate small reinforcer might be considered the controlled response and choice of the delayed larger reinforcer the controlling response. In this case, choosing the delayed reinforcer, as a form of self-control, exemplifies the technigue Skinner calls "doing something else" (Skinner, 1953, p. 239). That is, choice of the immediate smaller reinforcer is prevented by the emission of 93 an incompatible response (choice of the delayed larger reinforcer) . The process by which pigeons in the present study came to exhibit self-control and acquire this controlling response is worth considering. One possibility is that choice of the 3 LEDs was directly reinforced by the LEDs. Despite the present finding that food delays affected choice more than did LED delays, there are several reasons to suspect that the LEDs did function as reinforcers. First, because the training histories and LED arrangements in the present study closely resemble the token reinforcer paradigm (Malagodi, 1967), it is likely that the LEDs functioned as token reinforcers. Although subjects in the present study did not directly manipulate the LEDs, as do subjects in more typical token reinforcement studies, it is not clear that such handling enhances reinforcing efficacy. Also, the long latencies characteristic of lst-trial choices and latency reductions once tokens were present (Experiment 1, Figure 6 and Experiment 2, Figure 12) resemble previous findings with token reinforcement (Kelleher, 1958; Malagodi et al., 1975; Waddell et al., 1972; Webbe & Malagodi, 1978). Informal observations revealed that all subjects did occasionally orient toward the LEDs when they were presented and often pecked at them during the ITI and prior to exchange periods. Pecking is often elicited by conditioned stimuli paired with food (Schwartz & Gamzu, 1977) , stimuli that would also be 94 expected to have reinforcing properties (Gollub, 1977) . LED illumination might also be expected to function as conditioned reinforcement because the accumulation of LEDs was correlated with reductions in the delay to food (Fantino, 1977) . Although the reinforcing function of the LEDs is not certain, the precise function of the LEDs in the present study is no more mysterious than the function of points delivered in similar experiments with human subjects (e.g., Logue et al., 1986). Although these experiments do not usually include clear functional assessments of point delivery, points are often presumed to function as reinforcers in humans, even in the absence of explicit instructions. This is presumably because human subjects typically have extensive histories with points and numbers outside of the laboratory. These histories likely establish precise discriminations in humans of more from less points over a wide range of absolute numbers of points. If points are delivered as reinforcers, such histories may also enhance sensitivity to the cumulative amount of reinforcement — sensitivity that may be related to the maximization and self-control often seen in human subjects (e.g., Flora & Pavlik, 1992; King & Logue, 1987; Mawhinney, 1982) . The present finding of self-control in subjects that did not have such extensive verbal and social histories reveals that training circumstances provided within the 95 token reinforcer arrangement may be sufficient to produce self-control. Previously reported differences in the performance of humans and pigeons under self-control procedures may therefore be the result of procedural differences, rather than the verbal processes characteristic of humans per se. A number of studies have documented differences in self- control when different consequences are arranged (e.g., Logue et al., 1984; Logue et al., 1986; Navarick, 1982; Ragotzy et al., 1988; Solnick et al., 1980). In conjunction with the current study, these experiments suggest that with both humans and pigeons, self-control is less likely with reinforcers of more immediate value, such as food and escape from noise, but more likely with token reinforcers. The self-control obtained with token reinforcement could thus be viewed simply as a case of insensitivity to delays with certain kinds of consequences. When this delay insensitivity is related to other characteristics of the token arrangement, however, more complex interpretations emerge . It is consistent with the token reinforcement literature to discuss the exchange period as a reinforcer of component token schedules (e.g., Webbe & Malagodi, 1978) and, in the present study, of trial choices. In Experiment 1 of the current study, it was argued that quicker access to the exchange period on the final trial of a block following 96 small-reinforcer choices produced the impulsive responding observed. Similarly, the self-control exhibited in Experiment 2 might be interpreted in terms of reinforcement of large-reinforcer choices on the final trial of a block by onset of an equally delayed exchange period with a differentially greater amount of food. Component schedule sensitivity to exchange period food amounts was shown in a study by Malagodi et al. (1975) in which rates of lever pressing by rats were inversely related to the amount of food obtained in the exchange period, when food amounts were manipulated by increasing the number of tokens required for each food delivery in the exchange period. However, the self-control exhibited by subjects in Experiment 2 cannot easily be interpreted as simply selection of large- reinforcer choices on the final trial of a block by differentially larger food amounts during the exchange period. For example, Subjects 1857, 1855, and 753 all demonstrated overall preference for the larger delayed reinforcer under conditions in which impulsive choices increased across trials of a block (Figure 11) . Also, Subjects 1857, 1732, 1855, and 753 all preferred the larger delayed reinforcer during early trials of a block, at times remote and discriminable from the availability of an exchange period. The self-control found with token reinforcement in the present study might result from the temporal relationship 97 between choices for tokens at one time and a terminal reinforcer that cannot be obtained until a later time. In this regard, the token reinforcer procedure may be analogous to commitment response procedures (Rachlin & Green, 1972) . This interpretation implies that choices are controlled primarily by their relationship to the terminal reinforcer obtained at a later time, which accounts for the insensitivity to token delays. The predominance of food reinforcement over LED reinforcement, demonstrated repeatedly in the present study, and the finding that 4 subjects in Experiment 2 preferred the option eventually yielding the greatest amount of food regardless of the delays in LED presentation, is also consistent with this interpretation . Although the self-control obtained with token reinforcer arrangements may result from similarities with commitment response procedures involving temporal relations between choices and reinforcement outcomes, there is an important difference. In the token reinforcer procedure, the amount of the terminal reinforcer available during exchange periods is an aggregate result of multiple choices, made prior to the exchange period. But pigeons' choices are normally insensitive to events integrated over entire sessions. This suggests that tokens may generate self- control by somehow bringing choices under the control of their aggregate effect on the amount of the terminal 98 reinforcer available in the exchange period. Token delivery- may facilitate choice of the larger reinforcer in this context by providing a stimulus (number of tokens) differentially correlated with deferred choice outcomes regarding food amounts. The display of tokens earned during experimental sessions corresponded precisely with the cumulative amount of food available during the subseguent exchange period, a seemingly ideal arrangement for engendering this type of control. This interpretation is also consistent with Logue and Mazur's (1981) finding that overhead lights differentially correlated with the large- reinforcer delay period facilitated self-control in pigeons. Logue and Mazur suggested a conditioned reinforcing function of the light but a discriminative function is more likely. With respect to the present study, preferring the delayed 3 LEDs may not represent reinforcement by LEDs at all. Rather, the LEDs may provide a more immediate discriminative basis for maintaining choices that result in more food during the exchange period. The role of tokens and the token display in improving sensitivity to the outcomes of choice on overall food reinforcer amount could be investigated by comparing choice in a token reinforcer arrangement with and without an ongoing display of acquired tokens or by examining choice in a similar arrangement, without the tokens. 99 Choice of the larger delayed reinforcer as a controlling response may have been established in the present study because of the presence of stimuli differentially correlated with the cumulative outcomes of choices. In traditional terms, such stimuli make the subject "aware" of the conseguences of alternative actions and may facilitate self-control in a manner similar to self- generated rules reported by human subjects — rules that similarly correspond to the outcomes of alternative choice options in relation to overall obtained reinforcement (Logue, 1988; Logue et al., 1986; Mawhinney, 1988). Such verbal stimuli are also used to engender self-control outside the laboratory (Skinner, 1953) . Just as tokens may bring choices under control of the amount of a deferred terminal reinforcer by providing more immediate stimuli (tokens) that correspond to that reinforcer, so might verbal stimuli, such as checks on a list, daily logs of energy use, and weekly weight records, bring human behavior under the control of respective long-term outcomes. In both cases, such stimuli may function as a type of reinforcement. Indeed, they occur response dependently. Their critical function, however, even when they are chosen, is their discriminative effect on behavior, that is itself important because of its relationship to some other deferred reinforcer. APPENDIX A SAMPLE OF FOOD BASED MATCHING LAW CALCULATIONS FROM EXPERIMENT 1 Condition Ac x D, B„ B, + B, Mean 6 x .6 3.6 3.6 2 X 1.8 3.6 7.2 ID 6X.6 3.6 3.6 2x7 8 15.6 19.2 5 (Trial 1-4) 14 14 14 10 10 24 30 30 30 5 (Trial 5) 26 14 X 6 26 8.4 56 8.4 10 X 1 8 18 26.4 30 x 6 18 18 26 x 1 8 46.8 64.8 = .500 .500 = .1! = .583 = .536 = .318 = .278 560 .298 100 101 Condition Ac x D, Bc B, + B„ Mean 5D (Trial 1-4) Same as condition 5 = .097 .090 = .082 (Trial 5) 14 x .6 8.4 8.4 10 X 7.8 78 86.4 30 x .6 18 18 26 x 7.8 202.8 220.8 Note. The calculations for the first 4 trials of condition 5 are based solely on the amount of reinforcement ratios for the two options because the delays to food are equal except on the final trial of a block. Two calculations are shown for trials 1-4 and two for trial 5. The first calculation assumes that only the small reinforcer is chosen on the other trials of a block. The second calculation assumes that only the large reinforcer is chosen on other trials. The mean of these two calculations was used in plotting the predictions. REFERENCES Ainslie, G.W. (1974) . Impulse control in pigeons. Journal of the Experimental Analysis of Behavior, 38, 485-489. Baron, A., & Galizio, M. (1976). Clock control of human performance on avoidance and fixed-interval schedules. Journal of the Experimental Analysis of Behavior, 26, 165-180. Baum, W.M., & Rachlin, H. (1969). Choice as time allocation. Journal of the Experimental Analysis of Behavior, 12, 861-874. Belke, T.W. , Pierce, W.D., & Powell, R.A. (1989). Determinants of choice for pigeons and humans on concurrent-chains schedules of reinforcement. Journal of the Experimental Analysis of Behavior, 52, 97-109. Bentall, R.P., & Lowe, C.F. (1987). The role of verbal behavior in human learning: III. Instructional effects in children. Journal of the Experimental Analysis of Behavior. 47 , 177-190. Burns, D.J., & Powers, R.B. (1975). Choice and self- control in children: A test of Rachlin' s model. Bulletin of the Psychonomic Society, 5, 156-158. Catania, A.C. (1992). Learning (3rd ed.). Englewood Cliffs, NJ: Prentice-Hall. Catania, A.C, Matthews, B.S., & Shimoff, E. (1982). Instructed versus shaped human verbal behavior: Interactions with nonverbal responding. Journal of the Experimental Analysis of Behavior. 38 , 233- 248. de Villiers, P. (1977) . Choice in concurrent schedules and a guantitative formulation of the law of effect. In W.K. Honig & J.E.R. Staddon (Eds.), Handbook of operant behavior (pp. 2 3 3- 287). Englewood Cliffs, NJ: Prentice-Hall. 102 103 Fantino, E. (1969) . Choice and rate of reinforcement. Journal of the Experimental Analysis of Behavior, 12, 723-730. Fantino, E. (1977) . Conditioned reinforcement: Choice and information. In W.K. Honig & J.E.R. Staddon (Eds.), Handbook of operant behavior (pp. 313-339) . Englewood Cliffs, NJ: Prentice-Hall. Flora, S.R., & Pavlik, W.B. (1992). Human self- control and the density of reinforcement. Journal of the Experimental Analysis of Behavior. 57 , 201- 2 08. Glenn, S.S. (1985, October). Behavioral selection and cultural contingencies. Paper presented at the meeting of the Southeastern Association of Behavior Analysis, Charleston, SC. Glenn, S.S. (1988). Contingencies and metacontingencies: Toward a synthesis of behavior analysis and cultural materialism. The Behavior Analyst. 11, 161-179. Gollub, L. (1977). Conditioned reinforcement: Schedule effects. In W.K. Honig & J.E.R. Staddon (Eds.), Handbook of operant behavior (pp. 288- 312). Englewood Cliffs, NJ: Prentice-Hall. Green, L. , Fisher, E.B., Jr., Perlow, S., & Sherman, L. (1981). Preference reversal and self-control: Choice as a function of reward amount and delay. Behaviour Analysis Letters , 1, 43-51. Green, L. , & Snyderman, M. (1980). Choice between rewards differing in amount and delay: Toward a choice model of self-control. Journal of the Experimental Analysis of Behavior, 34., 135-147. Grosch, J., & Neuringer, A. (1981). Self-control in pigeons under the Mischel paradigm. Journal of the Experimental Analysis of Behavior. 35, 3-21. Harris, M. (1974) . Cows, pigs, wars, and witches: The riddles of culture. New York: Random House. Harris, M. (1977) . Cannibals and kings: The origins of cultures. New York: Random House. Harris, M. (1980) . Cultural materialism: The struggle for a science of culture. New York: Random House. 104 Harris, M. (1981) . Why nothing works?. New York: Simon and Schuster. (Originally published as America Now) . Harris, M. (1989) . Our kind. New York: Harper Perennial . Herrnstein, R.J. (1958) . Some factors influencing behavior in a two-response situation. Transactions of the New York Academy of Sciences, 2_1, 35-45. Herrnstein, R.J. (1961) . Relative and absolute strength of response as a function of frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4, 267-272. Herrnstein, R.J. (1970) . On the law of effect. Journal of the Experimental Analysis of Behavior, 13. 243-266. Home, P.J., & Lowe, C.F. (1993). Determinants of human performance on concurrent schedules. Journal of the Experimental Analysis of Behavior, 59. 29-60. Houston, A.I., & McNamara, J.M. (1985). The choice of two prey types that minimizes the probability of starvation. Behavioral Ecology and Sociobiology , 17, 135-141. Kelleher, R.T. (1958) . Fixed-ratio schedules of conditioned reinforcement with chimpanzees. Journal of the Experimental Analysis of Behavior. 1, 281-289. King, G.R. , & Logue, A.W. (1987). Choice in a self- control paradigm with human subjects: Effects of changeover delay duration. Learning and Motivation. 18, 421-438. Laties, V.G., & Weiss, B. (1963). Effects of a concurrent task on fixed-interval responding in humans . Journal of the Experimental Analysis of Behavior. 3, 431-436. Lea, S.E.G. (1979) . Foraging and reinforcement schedules in the pigeon: Optimal and non-optimal aspects of choice. Animal Behaviour. 27 . 875-886. 105 Lippman, L.G., & Meyer, M.E. (1967). Fixed interval performance as related to instructions and to subjects' verbalizations of the contingency. Psvchonomic Science. 8, 135-136. Logue, A.W. (1988). Research on self-control: An integrating framework. Behavioral and Brain Sciences, 11, 665-709. Logue, A.W. , & Mazur, J.E. (1981). Maintenance of self-control acquired through a fading procedure: Follow-up on Mazur and Logue (1978) . Behaviour Analysis Letters, 1, 131-137. Logue, A.W. , & Pena-Correal, T.E. (1984). Responding during reinforcement delay in a self-control paradigm. Journal of the Experimental Analysis of Behavior, 41, 267-277. Logue, A.W. , & Pena-Correal, T.E. (1985). The effect of food deprivation on self-control. Behavioural Processes, 10, 355-368. Logue, A.W. , Pena-Correal, T.E., Rodriguez, M.L., & Kabela, E. (1986) . Self-control in adult humans: Variation in positive reinforcer amount and delay. Journal of the Experimental Analysis of Behavior, 46, 159-173. Logue, A.W., Pena-Correal, T.E., Rodriguez, M.L., & Mauro, B.C. (1984) . Choice in a self-control paradigm: Quantification of experience-based differences. Journal of the Experimental Analysis of Behavior. 41, 53-67. Logue, A.W. , Smith, M.E., & Rachlin, H. (1985). Sensitivity of pigeons to prereinforcer and postreinforcer delay. Animal Learning & Behavior, 13, 181-186. Lopatto, D. , & Lewis, P. (1985). Contributions of elicitation to measures of self-control. Journal of the Experimental Analysis of Behavior. 44 , 69- 77. Lowe, C.F., Harzem, P., & Bagshaw, M. (1978). Species differences in temporal control of behavior II: Human performance. Journal of the Experimental Analysis of Behavior, 29, 351-361. 106 Hahoney M.J., & Thoresen, C.E. (1974). Self-control: power to the person. Monterey: Brooks/Cole. Malagodi, E.F. (1967). Acquisition of the token-reward habit in the rat. Psychological Reports. 20, 1335-1342. Malagodi, E.F., Webbe, F.M. , & Waddell, T.R. (1975). Second-order schedules of token reinforcement: Effects of varying the schedule of food presentation. Journal of the Experimental Analysis of Behavior. 24, 173-181. Malott, R.W. (1988) . Rule-governed behavior and behavioral anthropology. The Behavior Analyst , 11, 181-203. Matthews, B.A. , Catania, A.C., & Shimoff, E. (1985). Effects of uninstructed verbal behavior on nonverbal responding: Contingency descriptions versus performance descriptions. Journal of the Experimental Analysis of Behavior, 43., 155-164. Mawhinney, T.C. (1982). Maximizing versus matching in people versus pigeons. Psychological Reports. 50, 267-281. Mazur, J.E., & Logue, A.W. (1978). Choice in a "self- control" paradigm: Effects of a fading procedure. Journal of the Experimental Analysis of Behavior, 30, 11-17. Millar, A., & Navarick, D.J. (1984). Self-control and choice in humans: Effects of video game playing as a positive reinforcer. Learning and Motivation, 1J>, 203-218. Mischel, H.N., & Mischel, W. (1983). The development of children's knowledge of self-control strategies. Child Development, 54, 603-619. Mischel, W. (1974) . Processes in delay of gratification. In L. Berkowitz (Ed.) Advances in experimental social psychology: Vol. 7 (pp. 249- 292) . New York: Academic Press. Navarick, D.J. (1982). Negative reinforcement and choice in humans. Learning and Motivation. 13, 361-377. 107 Navarick, D.J. (1985) . Choice in humans: Functional properties of reinforcers established by instruction. Behavioural Processes, 11. 269-277. Navarick, D.J. (1986) . Human impulsivity and choice: A challenge to traditional operant methodology. Psychological Record. 36, 343-356. Navarick, D.J., & Fantino, E. (1976). Self-control and general models of choice. Journal of Experimental Psychology: Animal Behavior Processes. 2, 75-87. Poling, A., Thomas, J., Hall-Johnson, E. , & Picker, M. (1985) . Self-control revisited: Some factors that affect autoshaped responding. Behavioural Processes. 10. 77-85. Rachlin, H. , Battalio, R. , Kagel, J., & Green, L. (1981) . Maximization theory in behavioral psychology. Behavioral & Brain Sciences, 4, 371- 417. Rachlin, H. , & Green, L. (1972) . Commitment, choice and self-control. Journal of the Experimental Analysis of Behavior. 17, 15-22. Ragotzy, S.P., Blakely, E. , & Poling, A. (1988). Self-control in mentally retarded adolescents: Choice as a function of amount and delay of reinforcement. Journal of the Experimental Analysis of Behavior. 49, 1-9. Runck, B. (1982) . Behavioral self-control: Issues in treatment assessment. (DHHS Publication No. ADM 82- 12 07) . Washington, DC: U.S. Government Printing Office. Schwartz, B. , & Gamzu, E. (1977). Pavlovian control of operant behavior. In W.K. Honig & J.E.R. Staddon (Eds.), Handbook of operant behavior (pp. 53-97). Englewood Cliffs, NJ: Prentice-Hall. Skinner, B.F. (1953). Science and human behavior. New York: Macmillan. Skinner, B.F. (1971). Beyond freedom and dignity. New York: Alfred A. Knopf Skinner, B.F. (1974). About behaviorism. New York: Alfred A. Knopf. 108 Skinner, B.F. (1979). The shaping of a behaviorist. New York: Knopf. Skinner, B.F. (1981). Selection by consequences. Science. 213, 501-504. Skinner, B.F. (1986). What is wrong with daily life in the Western World? American Psychologist. 41, 568-574. Skinner, B.F., & Vaughan, M. (1983). Enjoy old age. New York: W.W. Norton. Solnick, J. V., Kannenberg, C.H., Eckerman, D.A., & Waller, M.B. (1980). An experimental analysis of impulsivity and impulse control in humans. Learning and Motivation. 11, 61-77. Sonuga-Barke, E.J.S., Lea, S.E.G., & Webley, P. (1989) . The development of adaptive choice in a self-control paradigm. Journal of the Experimental Analysis of Behavior. 51, 77-85. Stuart, R.B. (1977). Behavioral sel f -management : Strategies, technigues, and outcome. New York: Brunner/Mazel . van Haaren, F. , van Hest, A., & van De Poll, N.E. (1988) . Self-control in male and female rats. Journal of the Experimental Analysis of Behavior. 49, 201-211. Waddell, T.R. , Leander, J.D., Webbe, F.M. , & Malagodi, E.F. (1972). Schedule interactions in second- order fixed-interval (fixed-ratio) schedules of token reinforcement. Learning and Motivation. 2, 91-100. Webbe, F.M. , & Malagodi, E.F. (1978). Second-order schedules of token reinforcement: Comparisons of performance under fixed-ratio and variable-ratio exchange schedules. Journal of the Experimental Analysis of Behavior. 30, 219-224. BIOGRAPHICAL SKETCH Kevin D. Jackson was born in Detroit, Michigan, on September 19, 1957, to Gerald and Doris Jackson. He graduated from Fordson High School in Dearborn, Michigan in 1975. Kevin received a B.S. degree specializing in behavior analysis in 1981 from Western Michigan University. Kevin received an M.S. degree in 1991 and a Ph.D. in 1993 through the experimental analysis of behavior program of the University of Florida. 109 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of--D^e^r of Philosophy. E.F. Mai Professo I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. [Tirothy D. /Wa<3jcenberg, Cocl(iair is< istant Professor of isychology I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Marc Branch Professor of Psychology I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Marvin Harris Graduate Research Professor of Anthropology I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in'sciope^arid quality, as a dissertation for the degree of Doctor of Phirosophy. aa&a^r H.S. "Pennyffacker Professor of Psychology I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ^^ Donald StehrOuwer Professor 'of Psychology I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ins van Haaren sociate Scientist of ;ychology This dissertation was submitted to the Graduate Faculty of the Department of Psychology in the College of Liberal Arts and Sciences and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. August 1993 Dean, Graduate School ffi'SS'TY OF FL0R|DA 3,1111s