Presentation of reward
Withdrawal or omission of aversive
Presentation of aversive
Withdrawal or omission of reward
Fig. 7.1. Various ways in which the frequency of behavior is influenced by the consequences it produces.
plex behavioral skills can be acquired. In an important sense, dogs are always learning how to learn.
Two complementary motivations drive instrumental learning: the maximization of positive outcomes and minimization of aversive ones. These complementary motivations correspond to the notions of positive and negative reinforcement. If a response becomes more probable as the result of its producing a desirable consequence (e.g., petting and food), then the potentiating effect is referred to as positive reinforcement. Conversely, if a response becomes more probable by its terminating or avoiding an aversive stimulus (e.g., leash correction), then the effect is referred to as negative reinforcement. Positive and negative reinforcement are the two primary ways in which goal-directed behavior is acquired and maintained.
Typical reinforcement events satisfy some physiological or psychological need. To hungry dogs, the opportunity to acquire a savory treat is worth effort and work. If the acquisition of food is made contingent on a dog sitting when requested to do so, the dog will quickly learn that sitting on cue results in the acquisition of the desired treat (positive rein-
forcer). After several such experiences, the probability that the dog will sit on cue is increased and will continue to increase as long as the performance is reinforced and the dog remains motivated or until additional learning is not possible (asymptote). In the foregoing case, the dog learns that a causal connection exists between the presence of a specific cue or discriminative stimulus (SD), a response (R) and a resulting positive reinforcer (SR+). Through this simple lesson, the dog not only learns how to sit, but, more importantly, the dog learns that its actions can control the environment—an outcome that makes learning itself intrinsically rewarding.
Negative reinforcement occurs when a dog discovers that a particular response terminates or avoids the presentation of an aversive stimulus. A natural example can be observed when a dog, having stayed too long in the sun, finds relief by moving to nearby shade. Moving out of the direct sunlight into the shade is a negatively reinforced behavior because it terminates the aversive condition of overheating. Traditional obedience training makes liberal use of negative reinforcement. For example, the sit exercise is often taught by applying an upward pull on the leash and collar coupled with a downward pressure on the rump. The forces involved are mildly aversive. Under such stimulation, most dogs will at first struggle and attempt to resist the pressure, but after several trials they usually learn to escape it by following the applied forces in the correct direction and successfully learn to sit under compulsion. if a word cue ("Sit") is presented before the onset of the pressure, the dog will learn to avoid the negative event by sitting in response to the cue alone. After several such trials, the dog will begin to recognize a causal linkage between the presentation of the avoidance cue, specific and timely action, and the avoidance of the anticipated aversive outcome. Such learning depends on anticipatory signals that reliably predict response-produced outcomes. This pattern is confirmed (acquisition) or disconfirmed (extinction) by repeated experience.
There are two general sources from which positive and negative incentives are derived: intrinsic (part of the task itself) and extrinsic (external to the task). intrinsic incentives are those attractive and aversive motivational inducements that belong to the task itself. intrinsic positive reinforcers are inherent to behaviors (e.g., playing ball, chasing a cat, or jumping on guests) that are enjoyed in and of themselves and maintained without additional external reinforcement. Intrinsic negative reinforcers, on the other hand, are inherent to the relief provided by behaviors that avoid or terminate situations that are annoying in and of themselves (e.g., growling or snapping when threatened or escaping confinement when left alone). Extrinsic incentives include all positive and negative inducements that derive from sources other than the behavior itself (e.g., various attractive and aversive events). Intrinsically reinforced behavior is acquired and maintained under natural reinforcement contingencies, whereas extrinsic incentives are provided contingently by the trainer. Both intrinsic and extrinsic incentives play important roles in dog training and behavior modification.
Understanding that behavior is modified by its consequences is an important insight into how dogs learn. In addition, timing and repetition also play crucial roles in the training process. For a reinforcer to be effective, it must closely follow the target behavior. Optimally, the reinforcer should be presented immediately after the target behavior is emitted. Further, the connection between the rein-forcer and the target behavior is strengthened by frequent repetitions. With practice, dogs learn to expect the eventual presentation of the positive reinforcer as the result of emitting the selected behavior.
Behavior is a fluid phenomenon with each event flowing seamlessly into the next. Under natural conditions, no edges or boundaries sharply separate one behavior from another. Behavioral differentiation occurs as the result of selectively reinforcing responses and sequences of the dog's behavior that are compatible with the trainer's objectives and ignoring or punishing behavior that is not. This process of selection strengthens certain tendencies and patterns while extinguishing or suppressing other aspects of the dog's behavior. As a result of such pressure and change, the dog's behavior is adjusted to fit and respond to the demands made upon it by domestic life.
The structuring of behavior is accomplished by the differential presentation and withdrawal of reinforcement or punishment. Since behavior is fluid, it is important that the reinforcing or punitive events coincide exactly with the behavior being strengthened or weakened. Unfortunately, dogs cannot be directly reinforced with most tangible rewards (e.g., food and petting) at the exact moment that they emit the target behavior, especially if the behavior occurs while they are some distance away. Also, in order to make punitive events effective, they must be timed to coincide with the occurrence of the target behavior.
These problems are solved by using remote stimuli that temporarily take the place of the reinforcer or punisher until they can be delivered to the dog. On the one hand, the so-called bridging stimulus or conditioned rein-forcer (Sr) serves to bridge the emission of the target response with the acquisition of a positive reinforcer. In contrast, the conditioned punisher (Sp) suppresses unwanted behavior by its being associated with the loss of an expected reinforcer or the impending presentation of a punishing aversive event. Conditioning the Sr is a Pavlovian process in which the bridging stimulus (e.g., "Good") is repeatedly paired with the presentation of the positive reinforcer or the termination of a negative reinforcer. On the other hand, a conditioned punisher is produced by pairing the bridging stimulus (e.g., "No") with the loss of positive reinforcement or the presentation of an aversive punishing event.
Additional Characteristics of Positive Reinforcement
The reinforcer is conceptualized as a contingent event capable of satisfying some biological necessity or drive that, when presented upon the emission of some behavior, will make the occurrence of that behavior more likely under similar circumstances and states of motivation in the future. For example, the presentation of a biscuit to a hungry dog after sitting will make the dog more likely to sit in the owner's presence in the future when hungry. But actually reinforcement is much more complicated than this reward paradigm suggests, exhibiting many irregular and, perhaps, unanticipated characteristics. For example, while the opportunity to eat represents a strong reinforcer for a hungry dog, the dog may also find just smelling the food reinforcing (Long and Tapp, 1967). There are several other characteristics of positive reinforcement that should be kept in mind: The incentive (or conditioned reinforcement associated with the work and the anticipation of reinforcement) may be more strongly reinforcing than the actual reward or unconditioned re-inforcer itself. Highly desirable rewards may generate faster acquisition of simple skills but retard the acquisition of more complicated ones. Large food rewards generate an enthusiastic performance while the food is available but result in learning that is more prone to extinction when it is withdrawn. Smaller rewards may not generate very much enthusiasm initially but learning acquired under the control of small rewards is more resistant to extinction. Finally, slow, steady learning is the most resistant to extinction (Tarpy, 1982).
Was this article helpful?