Brief Critique of Traditional Learning Theory

The principles of learning theory have been derived from the experimental study of behavior. This research has been based on a small set of empirical assumptions and beliefs. Perhaps the most central and pervasive of them is the law of effect, that is, behavior is modified by its consequences. If a behavior is rendered more likely to occur in the future as the result of its consequences, it is said to have undergone reinforcement. Reinforcement is divided into two categories depending on whether the behavior involved produces the reinforcer (positive reinforcement) or avoids/terminates the reinforcing event (negative reinforcement). The theory also posits punishment as producing an effect opposite to that of reinforcement. When an anticipated positive reinforcer is omitted, the effect is negative punishment (P ). Conversely, when a negative reinforcer is presented positive punishment occurs (P+). In both cases, punishment is defined as an event that lowers the future probability of the punished behavior. The term punishment is also used more generally to designate any outcome that suppresses behavior—regardless of the target behavior's reinforcement history.

This general system of analysis has been extremely productive. Many thousands of studies have been performed ostensibly confirming these basic assumptions and postulates. Further, there is also little doubt that the paradigm works as a practical system for the control and modification of behavior. Despite such heuristic and practical value, however, these most fundamental assumptions are vulnerable to theoretical criticism, especially with regard to issues involving parsimony and logical coherence (i.e., how the theory relates to behavior).

Reinforcement and the Notion of Probability

The notion of probability is central to the traditional behavior analytical interpretation of reinforcement (Johnson and Morris, 1987; Catania, 1992). Despite the central importance of "probability" in science, and, in particular, behavior analysis, it has not received a great deal of independent attention. Curiously, in Murray Sidman's important book (the "Bible" of many experimental behavior analysts) Tactics of Scientific Research: Evaluating Experimental Data in Psychology (1960), probability as a scientific concept is left to the reader's imagination. This lack of analysis is especially surprising and troubling considering the generally vague meaning of the term probability in science. These various shortcomings appear to have prompted Bertrand Russell to sardonically comment: "Probability is the most important concept in modern science, especially as nobody has the slightest notion what it means" (quoted in Johnson and Morris, 1987:107).

Nonetheless, reinforcement is defined (as has been frequently reiterated above) in terms of the effect it has on the future probability or frequency of the reinforced behavior. Response probability is typically defined as a proportional relation between the number of opportunities for the response to occur and the number of times it actually occurs. For example, if a dog is signaled to sit 15 times but only sits on 9 of those occasions, the probability that he will sit on signal is 0.6 (calculated by dividing 9 by 15).

However, with this definition of response probability in mind, how can one determine whether a given response has undergone reinforcement, unless one knows in advance the effect of reinforcement. Let us say, for example, that a dog were to receive a reinforcer as the result of sitting on a single occasion, can an objective observer really make any predictions from that one event about the future probability of the sit response? How about after two, three, four, or five reinforcements? In fact, nothing very definite can be said about the response's future probability after a single exposure to reinforcement. Consequently, since it is not possible to calculate probabilities from the first reinforcing event onward, how can one say of these early events whether they were reinforcing? Obviously, it is only after the habit of sitting becomes highly predictable and regular that one might infer (or speculate?) that the sit response had undergone previous reinforcement.

Another problematical area regarding response probability as the defining characteristic of reinforcement is observed in cases where no additional improvement in response probability is evident as the result of continued reinforcement. Take, for example, a dog that has undergone several hundred trials of training, until the dog has achieved an almost errorless proficiency or fluency at sitting on signal. Reinforcement in this case may have some effect on the behavior of sitting, but, assuming that the sit response's probability of occurrence cannot be measurably improved upon, in what sense can one say that the behavior is reinforced? If the behavior's probability cannot be improved upon

(or worsened) through reinforcement, then what is the event to be called? (By the way, I have chosen the term verifier for such instances; see Chapter 8 for a more detailed discussion). In conclusion, it appears that the probability theory of reinforcement breaks down in cases involving single and many (asymptotic) reinforcing events.

The probability account of reinforcement also appears to break down in the case of shaping (Catania, 1992). During shaping procedures, no particular response is repeated in exactly the same way. Behavior operating under a shaping contingency is emitted with a high degree of variability, with differential consequences gradually narrowing instrumental efforts to progressively approximate the target response—a process in which response probability (e.g., frequency or rate) is rather irrelevant. It is evident during the shaping process that the dog optimizes its chances of obtaining the offered reinforcer by changing its behavior along several dimensions at once. In general, the dog becomes more active and exploratory, especially if it is hungry. When, as the result of discovering that some behavioral change improves its control over the re-inforcer, the dog's effort in that direction is intensified.

Efforts to analyze the relationship between reinforcement and response probability in terms of the foregoing definition (i.e., reinforcement increases the future probability/frequency of the behavior it follows) are dependent on the size of the response/reinforcer sample being observed. The belief that response probability is improved as the result of reinforcement is an uncertain assumption in the case of small samples, but one that becomes progressively more certain (to a point) as the sample size increases. The assumed overall effect of reinforcement on response probability does not appear to be measurable on the level of individual responses and reinforcing events. If it is not measurable at the level of individual responses and reinforcing events, can one be sure that the effect is not a statistical myth?

Probability appears to be evident only in cases where patterns and molar relations

(classes of behavior) are studied as the basic unit of analysis. Furthermore, the usual definition of reinforcement in terms of increasing response probability only begs the question about the effect that reinforcement has on discrete units of behavior—it says nothing about how or why increased predictability and regularity result from reinforcement. The usual definition only asserts that the response's increased predictability and regularity (as a function of probability) is predicated upon reinforcement. One might conclude that the relationship between reinforcement and response probability as it is characterized by behavior analysis is a post hoc interpretation of reinforcement—certainly not a causal account of how reinforcement affects the probability/frequency of behavior. Perhaps the strongest statement that can be made about the relationship between reinforcement and response probability is that the two are correlated—that is, reinforcement is positively correlated with an increased response probability/frequency.

A variety of experimental and conceptual considerations led Johnson and Morris to question the value of probability theory in the analysis of behavior: "If the concept of probability does not enhance the description, prediction, and control of behavior, then perhaps its role in behavior analysis should be re-evaluated" (1987:124). An alternative discussed by them is to replace the notion of probability with that of propensity, which is defined in terms of the experimental arrangement or context in which behavior occurs:

"Propensity," then, makes clear the importance of context in affecting the outcomes that probabilities are taken to predict, whether of the behavior of coin tosses or organisms. With respect to the behavior of coins, for example, a biased coin will produce different outcomes depending on the strength of the gravitational field in which it is tossed. In a weak gravitational field, the bias will have little effect; in a strong gravitational field, the bias will be enhanced. Likewise, with respect to the behavior of organisms, a propensity interpretation emphasizes the contextual nature of behavior and takes probability to be a characteristic of the experimental arrangement as a whole, not just a property of a sequence of events without reference to other conditions. (1987:124-125)

Positive and Negative Reinforcement and Ockham's Razor

The term reinforcement is further complicated by its division into positive and negative categories. On many levels, these distinctions appear arbitrary and confusing (Michael, 1975; Iwata, 1987). Positive reinforcement is distinguished from negative reinforcement by the manner in which the reinforcing event is operated upon by the animal. in the case of positive reinforcement the animal's behavior is reinforced by producing the presentation of an event, whereas in negative reinforcement the animal's behavior is reinforced by either terminating or avoiding the presentation of an event. in a certain sense, all instrumental learning can be reduced to one or the other of these categories. it simply depends on how the events are viewed and interpreted. An animal escaping and subsequently learning to avoid aversive stimulation may not in the first place "view" his success as escape-avoidance but, instead, frame the learning situation in terms of the acquisition of safety (a positive reinforcer) from aversive stimulation. Thus, under similar future circumstances of impending threat, the animal will likely select the successful behavior resulting in the acquisition of safety and relief in the past. Conversely, an animal that is deprived of free access to food and starved to 80% of its ad lib feeding weight may find the general physiological state aroused by deprivation aversive and attempt to terminate or avoid it by performing various arbitrary behaviors (e.g., key peck) to obtain food. Thus, from this perspective, working for food may be interpreted as escape-avoidance behavior aimed at reducing or terminating the aversive condition of starvation. Unfortunately, the terms positive reinforcement and negative reinforcement— although of some practical value in the everyday control of behavior—are highly subjective and appear to depend on an experimenter's point of view and bias.

In an important sense, the bifurcation of reinforcement into positive and negative categories is a rather unfortunate violation of Ockham's razor: Entia non sunt multiplicanda praeter necessitatem ("Entities are not to be multiplied beyond necessity"). Whether an animal's behavior produces or terminates/ avoids the reinforcing event, the bottom line is that reinforcement is contingent on the successful prediction and control of significant impinging events. Whether these events are appetitive, sexual, social, agonistic, playful, or aversive is of only secondary interest. Regardless of an animal's disposition to learn, the goal of purposive behavior is to predict and control outcomes. Locating food when hungry and finding a successful route of escape when threatened are behaviors that are both strongly reinforced in the same general way. The reinforcement of such behavior does not depend on a hypothetical enhancement of probability but on the more immediate and real outcome of having successfully exercised decisive control over the occurrence of such events (i.e., finding food when hungry and locating a route of escape when threatened). Essentially, reinforcement occurs when an animal successfully controls any event in such a way that the animal's self-interests are served (survival) and its well-being enhanced.

An Alternative Theory of Reinforcement

According to the foregoing line of reasoning, instrumental reinforcement occurs when any behavior successfully controls a significant event or situation impinging on an animal. In other words, reinforcement does not stand apart from the reinforced behavior. In the case of classical conditioning, reinforcement occurs when a significant event is adequately predicted by anticipatory stimuli associated with its occurrence. Functionally speaking, sharp lines of distinction between instrumental and classical phenomena do not exist except under the artificial conditions of the laboratory and not really there either. The synthetic relationship and interdependency existing between these two classes of behavior

(instrumental and classical) results in the necessary conclusion that perhaps only one general form of reinforcement exists for both paradigms. Successful control depends on adequate prediction and adequate prediction depends on successful control. When significant events are adequately predicted and controlled, the consequence is adaptive success— an enhanced state of well-being, confidence, and power.

Within this general framework, the biological and motivational inclinations driving behavior (e.g., hunger, fear, and other home-ostatic needs) together with past learning experiences form an animal's disposition to learn. The disposition to learn can be fairly characterized by the sort of environmental events the animal seeks to predict or control, that is, events that the animal treats as significant. For instance, the presentation of food to a hungry dog has a far greater significance to that dog than to another dog that is satiated. In the case of learning to sit, the disposition to learn is characterized by a dog's effort to control several basic needs, including contact (affection), food (appetitive), and, perhaps, the escape-avoidance of aversive stimulation (fear). The need to predict and control the environment is directly related to the maintenance of biological, emotional, and psychological homeostasis and security. The overall goals of the disposition to learn are survival, adaptive success, enhanced power, and, ultimately, reproduction.

In any instrumental learning situation, at least three basic elements interact with one another: a signal (S), a response (R), and an outcome (O). The primary function of the S is to announce a moment when a particular behavior will most likely result in reinforcement. However, the S is much more complicated than this simple description indicates. In addition to announcing the moment and the sort of behavior most likely to result in reinforcement, the S also makes other predictions. One such prediction concerns the type (quality) and size (quantity) of the probable reinforcer available. This prediction has a pronounced effect on how the response will be affected by reinforcement. Three general variations are possible, depending on the kind of prediction involved: (1) The S under-predicts the type or size of the reinforcer (acquisition). (2) The S overpredicts the type or size of the reinforcer (extinction). (3) The S exactly predicts (verifies) the type or size of the reinforcer (maintenance).

Relations Between the Signal, Response, and Outcome

On a basic level, most behavioral and training events are organized and structured in terms of triads. The most obvious triadic structure is composed of the signal (S), response (R), and outcome (O). Each element in this triadic compound depends on and influences the others, forming several binary relations. These several interdependent binary relations between S, R, and O provide a great deal of information to dogs (Rescorla, 1987). For example, S (cue or command) tells dogs what to do (S-R) as well as designating the contingent outcome available (S-O), provided that it responds. Several other relations between S, R, and O become progressively apparent as the response is repeated in the presence of the predictive signal and the confirming occurrence of the predicted outcome during the course of training. These intertrial effects are influenced by the repetitive occurrence of the basic pattern. For example, O confirms the prediction S (R-O) while simultaneously designating the end of the trial and the possibility of another. Thus, O has a link with S as part of a general confirming relation (O-S)—that is, the outcome confirms the predictions of S, concludes the trial, and signals the possibility of a new one. The outcome of the preceding trial also affects R of the succeeding trial by making it more or less likely to the extent that the previous emission of R confirmed or disconfirmed the predictions made by S. These intertrial relations and effects extending from trial to trial are summarized thus:

1. S (R-O) produces the predictive binary relations S-R and S-O, such that O will occur, if and only if R occurs in the presence of S.

2. O (S-R) produces the confirming binary relations O-S and O-R, such that R will be more likely, if and only if S adequately predicts the presentation of O given that R occurs. Conversely, if the prediction of O given S and R is disconfirmed (e.g., reinforcement is omitted), then R will become less likely in the future.

Finally, R is also connected to S and O in terms of the control R exercises over the presentation of the predictive signal and outcome. Under circumstances of repeated practice, a dog gradually learns that R controls the reoccurrence of the predictive signal and outcome or R (S-O). This last set of relations summarizes the operative or controlling effect that the dog's behavior has on the handler's behavior. In an important sense, the handler's training behavior is controlled by the dog's recognition (as evident in his behavior) of a contingent relation between its behavior and the presentation of the predictive signals and confirming outcomes controlled by its behavior. From the handler's point of view, the dog is successfully controlled by the presentation of the predictive signals and the confirming outcomes. In other words,

3. Provided that the predictive relations S (R-O) are confirmed by O (S-R), then R (S-O) produces the operative or controlling binary relations R-S and R-O, such that R sets the occasion for the presentation of the predictive S (R-O) contingency, producing the opportunity for R to produce O again, thus further strengthening R while reinforcing the entire chain of events.

In summary, the interdependent relations produced by repeated reinforcement include prediction, confirmation, and control:

1. S(R-O): A predictive relation between the signal and the response (S-R) and the signal and the outcome (S-O).

2. O(S-R): A confirmative relation between the predicted outcome and the signal (OS) and the predicted outcome and the mediating response (O-R).

3. R(S-O): The operative relation between the controlling response and the repeated confirmation of the predictive signal (R-S)

and the predicted outcome (R-O).

Besides the foregoing functions, the S also formulates predictions about the ability of the target behavior to control available outcomes. Outcome control is operationally defined in terms of the dog's relative ability to predict and control significant outcomes (see

The prediction and control of significant events result in the formation of various expectancies regarding the effectiveness of behavior to anticipate and control such events in the future. These expectancies or instrumental cognitive sets are derived from past learning experiences and are of great importance for both facilitating or retarding learning. An expectancy is confirmed or discon-firmed by the degree of correspondence between what the animal expects to occur and what actually occurs. A high degree of correspondence results in confirmation, whereas a low degree of correspondence results in disconfirmation. For example, if a dog expects to be reinforced each time it sits, but on some occasion it is not reinforced (i.e., the dog is disappointed), the generalized expectancy that sitting always results in reinforcement is disconfirmed. The disconfirmation of a generalized expectancy results in its revision into a probable or statistical ex-pectancy—that is, the dog no longer expects to be reinforced each time it sits. Similarly, if a dog has never been reinforced as the result of sitting but happens to receive a treat on some occasion after sitting, the novel reinforcing event disconfirms the previously held expectancy that sitting is not followed by the presentation of food. In the future, the dog may now anticipate or hope for the presentation of food when it sits.

The revision of expectancies occurs in order to secure a more perfect match between past experience and current reinforcement contingencies, thus continuously refining and adjusting an animal's ability to predict and control significant events occurring within the flux impinging upon it. In an important sense, the cognitive function of expectancy is the exercise of a reality principle, establishing an informative feedback loop between the animal's past experiences with current sensory and behavioral efforts to predict and control the occurrence of significant events. The most dramatic examples of dissonance occur in cases in which highly regular and generalized expectancies are disconfirmed. The least dramatic change or dissonance occurs in cases where the disconfirmation is statistically significant but remains consistent with the animal's overall expectations. For example, a dog that is accustomed to receiving reinforcement after sitting two or three times will notice, and adjust accordingly, when it is instead reinforced only on every fifth or sixth occasion. The change in this case would be merely statistical and not nearly as dramatic as the resultant dissonance would be if the dog were all of a sudden punished each time it sat, for example.


What is the relationship between reinforcement and punishment? Traditionally, behavior analysis defines punishment in terms of the effect an event has on behavior insofar as its presentation (positive punishment) or omission (negative punishment) suppresses or lowers the future probability/frequency of the behavior it follows. However, defining punishment as a suppressive event is to describe it in terms of its most superficial and general attributes. As it stands, this definition of punishment might be construed to include events that are clearly not intended as punishment. For example, when dogs are reinforced with food, other possible behaviors, except those directly facilitating access to food and eating, are suppressed and made less likely to occur in the future by the reinforcer's presentation. Similarly, aversive stimulation suppresses all concurrent behavior at the moment except the response that results in the termination of aversive stimulation.

An alternative definition of punishment may be stated in terms of prediction and control. According to this interpretation, punishment is defined as occurring whenever a behavior fails to anticipate and control a significant event adequately. Punishment is not something done to a behavior or to an animal but rather something that the behavior itself does or fails to do—that is, it fails to appropriate an important resource or escape or avoid an aversive or dangerous situation. The cause of this failure can be causally traced to any number of factors. Instrumental punishment often results when stimulus events are inadequately predicted or when correct predictions are not followed into effective action. For example, if a hungry dog fails to obtain a piece of food for sitting because it misses a signal or fails to sit in a timely fashion, the dog is punished—not indirectly as the result of the withdrawal of the appetitive opportunity—but directly as the result of its failure to control the opportunity to obtain food. Conversely, if the same dog fails to terminate or avoid an aversive event by sitting because it misses a signal or fails to sit in a timely fashion, the dog is punished— not indirectly as the result of the presentation of the aversive event—but directly as a result of its failure to control the presentation of the aversive event.

Punishment is associated with the elicita-tion of various concomitant emotional states, especially fear and frustration. Punishment resulting from a failure to predict a reinforcing event results in fear/anxiety, whereas a failure to control the occurrence of a reinforcing event results in frustration. These emotional reactions facilitate adaptation in cases where prediction and control are compromised. Fear/anxiety serves to heighten vigilance and, thereby, improves the likelihood of anticipating future stimulus events associated with reinforcement. Frustration, on the other hand, serves to invigorate or amplify behavioral efforts aimed at restoring instrumental control over available reinforcers.

Within certain limits, both anxiety and frustration contribute beneficially to the efficiency of the learning process. However, in cases involving high levels of fear or frustration, learning may be adversely affected by these otherwise potentiating and useful states. Under conditions involving high levels of anxiety (unpredictability) and high levels of frustration (uncontrollability), a variety of conflict-driven learning dysfunctions are pre cipitated. Learning situations in which significant events are both unpredictable and uncontrollable are prone to produce pathological emotional states (e.g., PTSD) and abnormal behavior patterns (e.g., learned helplessness—see Chapter 9). On the other hand, a high degree of control and predictability over significant resources and stimuli occasioning their presentation or escape-avoidance (as may be appropriate from moment to moment) fosters successful adaptation and a sense of well-being.

To a considerable extent, it boils down to a matter of whether one views punishment from the perspective of an event produced by behavior (the animal's perspective) or as an event done to behavior (the trainer's perspective).

How To Stop Biting

How To Stop Biting

Use Proven Dog Psychology to Improve Your Dog's Behavior (dog biting, dog behavior, dog aggression, dog training tips, how to train your dog, puppy biting, dog psychology)

Get My Free Ebook


  • J
    What is the relationship between punishment and reinforcement catania?
    7 years ago
  • Maria
    What is traditional behavior learning?
    6 years ago

Post a comment