Instinctive Drift and Appetitive Learning

Just as innate defensive reactions often obstruct efficient avoidance learning, interfering appetitive and exploratory tendencies sometimes surface during reward training that can significantly impede positive learning (Bolles, 1972). Animals trained to perform simple chains of behavior for food reinforcement often spontaneously exhibit adjunctive behavior patterns that distract them from completing the trained sequence. This interference occurs in spite of intensive and lengthy training efforts. In fact, it appears as though such interference worsens as conditioning proceeds. Animal trainers (Breland and Breland, 1961) using operant conditioning to train a variety of species to perform for entertainment and commercial purposes found that a number of the tasks were interfered with by species-specific appetitive and exploratory behaviors that spontaneously appear during the course of training. These interference effects resulting from food reinforcement are collectively referred to as instinctive drift. For example, in one case, a pig was trained to pick up wooden coins with its mouth and to deposit them in a piggy bank. The pig readily learned the task but over time began to play with the coins by repeatedly picking them up and dropping them down again, throwing them into the air, or rooting them about with its snout—all behaviors associated with normal pig exploratory and appetitive behavior. Similarly, raccoons that had been taught a similar task would persistently fondle the coins before dropping them into the bank or, perhaps, periodically dip the coin into the bank as though washing it. When provided with more than one coin at a time, the raccoons would tend to rub them together instead of dropping them into the box as they had been trained to do. The Brelands formulated the following conclusion regarding instinctive drift:

The general principle seems to be that wherever an animal has strong instinctive behaviors in the area of the conditioned response, after continued running the organism will drift toward the instinctive behavior to the detriment of the conditioned behavior and even to the delay or preclusion of the reinforcement. In a very boiled-down, simplified form, it might be stated as "learned behavior drifts toward instinctive behavior." (1961:684)

It is interesting to note that the sensory-motor modalities involved in this phenomenon are consistent with an interpretation involving corticothalamic dominance previously discussed (see Chapter 3). Under Welker's (1973) model of thalamocortical dominance, pigs are rooters, raccoons are feelers, and pigeons are beholders. Furthermore, some self-reinforcement stemming from hy-pothalamic-limbic feedback occurring during the emission of appetitive behavior may help to explain instinctive drift while at the same time preserving reinforcement theory. The locus of reinforcement supporting instinctive drift is internally articulated on brain reward sites associated with drive induction and preparatory appetitive responding. The arbitrary operant, on the other hand, may be more adequately conceptualized as belonging to or conditionally associated with the con-summatory action and subsequent drive reduction. Although both are reinforcing, the action of preparing to eat may be intrinsically more rewarding than eating itself. Motiva-tionally, this makes a lot of sense, since it requires a lot more effort (therefore, a lot more incentive and conditional reinforcement prior to consumption) to find food than to eat it.

Many other problems with the traditional conceptualization of appetitive operant learning have emerged in the laboratory. Brown and Jenkins (1968) discovered that pigeons could learn the key-peck response without being trained to do so by the experimenter. They found that pigeons readily acquired the habit of key pecking by simply exposing the birds to an active key that was programmed to light for 8 seconds and then shut off just before the presentation of food. The process, known as autoshaping, has drawn a great deal of attention, since it implies that the key-peck response may not be, strictly speaking, an operant at all but rather an elicited response acquired without depending on contingent positive reinforcement. It should be noted that while the initial key-peck response was not shaped or prompted, subsequent pecks at the lighted key were linked with the lighted key (conditioned reinforcement) and the presentation of food (i.e., positive reinforcement). The explanation for autoshaping may simply rest on the pigeon's high operant level for pecking and the occurrence of incidental reinforcement (i.e., superstitious learning). Seligman (1970) speculated that the au-toshaping phenomenon depends on a high degree of preparedness in pigeons to associate pecking with the acquisition of food.

Subsequent experiments have yielded results that are even more difficult to explain by resorting to reinforcement theory. Williams and Williams (1969) designed an experiment in which key pecking never resulted in the acquisition of food but actually postponed it, that is, the bird was punished for pecking. Despite the negative punishment contingency, the pigeons maintained the key-peck response at a low rate (responding in one-third of the trials) and persisted in performing the response over the course of several hundred trials without stopping—even though the effort resulted in the omission of reinforcement. This finding is consistent with the observations of Breland and Breland. In effect, animals that are learning a response closely linked with an innate appetitive-con-summatory action tend to drift into its performance despite apparent reinforcement contingencies. The appetitive-consummatory action itself overshadows the arbitrary operant being rewarded. Jenkins and Moore (1973) observed that pigeons autoshaped to peck at a key for water or grain exhibited distinctively different response topographies. Pigeons trained to peck for grain exhibited a response resembling that used during eating, whereas those trained to peck for water did so in a manner topographically similar to drinking. This has led to some speculation that what is being learned is not an operant at all but rather a classically conditioned con-summatory response.

While key pecking is rapidly acquired during appetitive training, not all responses are equally easy to shape. For instance, teaching dogs to scratch, yawn, or lift the rear leg in the typical urination posture is very difficult. Thorndike (1911/1965) found that cats worked much harder to escape problem boxes than did dogs. Many dogs would simply accept confinement in the box and not make the requisite effort to escape and obtain the proffered food reward. Thorndike noted that dogs tended to remain in the front of the box, fixed attentively on the food that remained out of reach. Unlike the cat, the dog "wants to get to the food, not out of the box" (1911/1965:59). Even among general obedience exercises, some behaviors are learned more easily than others. While the average dog readily learns to sit in exchange for a treat, many dogs "resent" being prompted to lay down and may actively resist such training efforts, even when it is carried out with food rewards alone.

