Instrumental Conditioning

Instrumental Conditioning

Also called Operant Conditioning

Appetitive Aversive

PositiveContingency(R produces O)

NegativeContingency(R prevents O)

Instrumental Conditioning Procedures

PositiveReinforcement

Response increases

Punishment

Response decreases

OmissionTraining

Response decreases

NegativeReinforcement

Response increases

Instrumental Conditioning involves three key elements:

• a response

• an outcome (the reinforcer)

• a relation, or contingency, between the R and O

The Instrumental Response

• usually an arbitrary motor response

• for example, bar-pressing has nothing to do with eating food

• there are limits on the types of responses that can be modified by instrumental conditioning

• relevance, or belongingness, is an issue in instrumental conditioning as well as in Pavlovian conditioning

Relevance, or Belongingness, in Instrumental Conditioning

Certain responses naturally ‘belong with’ the reinforcer because of the animal’s evolutionary history

Just like all CSs are not equally associable with all USs, not all responses are equally conditioned with all reinforcers

Shettleworth tried to condition various behaviors with food reward in hamsters

• used a number of different behaviors • digging and face-washing

Mean TimeSpent inBehavior

Trials

Digging

Face-washing

• some responses are more relevant to food reward than others • behavior such as digging increase the chances of coming in contact with food• face-washing won’t increase the chances of coming in contact with food; may even interfere with food-related behaviors

• The Brelands trained many different species to perform tricks for ads, movies, etc. e.g., pigs putting coins in a piggy bank.

Often they found that once the response was trained, it would deteriorate; other “instinctive” behaviors (e.g., rooting the coins) would “drift” in and interfere with performance of the operant response.

The pigs treated the coins as if they were food and these food related behaviors interfered with the response the Brelands were trying to condition

Instinctive Drift

The Instrumental Reinforcer

Increases in the quantity or quality of the reinforcer can increase the rate of responding

Experiment by Hutt (1954) – described in the text

In runway experiments, animals will run faster for bigger reward

Responding to a particular reward also depends on ananimal’s past experience with other reinforcers

Experiment by Mellgren (1972) – described in the textExperiment by Crespi (1942)

Experiment by Crespi (1942)3 groups of rats were given 20 trials to run down an alleywayfor food

Group 1: large reward – 64 pelletsGroup 2: medium reward – 16 pellets

Group 3: small reward – 4 pellets

Baseline

Mean Speed

Gp 1

Gp 2

Gp 3

Crespi (1942)In phase 2, the reward level was switched for 2 groups

Group 1: 64 pellets – 16 pellets; negative contrastGroup 2: 16 pellets – 16 pelletsGroup 3: 4 pellets – 16 pellets; positive contrast

Crespi compared groups who were switched to 16 pellets from a large or small reward to a group consistently given 16 pellets

Test trials

Mean Speed

Gp 1

Gp 2

Gp 3

Baseline

Shift

4-16

16-16

64-16 Negative contrast(64-16 pellets)Ran slower

Positive contrast(4-16 pellets)Ran faster

Positive and negative contrast indicate that behavior is not just affected by current conditions

Performance is also affected by previous reward conditions

The Response – Reinforcer Relation

Two types of relationships exist between a response and a reinforcer:

temporal relationship; temporal contiguity refers to the delivery of the reinforcer immediately after the response

causal relationship; response-reinforcer contingency refers to the extent to which the response is necessaryand sufficient for the occurrence of the reinforcer

Effects of temporal contiguity

Instrumental learning is disrupted by delaying the reinforcer after the response

Dickinson et al (1992)rats were reinforced for lever-pressingvaried the delay between occurrence of the response and delivery of the reinforcer

0

Leverpresses/min

5

10

15

20

Delay (s)

20

6040

Why is instrumental conditioning so sensitive to a delay of reinforcement?

Delay makes it difficult to figure out which response is being reinforcedThere are ways to overcome the problem:

1. Provide a secondary, or conditioned, reinforcerimmediately after the response, even if the primary reinforcer does not occur until later

A secondary or conditioned reinforcer is a conditioned stimulus that was previously associated with the reinforcer

Conditioned reinforcers can serve to ‘bridge’ a delay between the response and the primary reinforcer

2. Another technique that facilitates learning with delayed reinforcement is to mark the target responseto distinguish it from other responses

The marking procedure demonstrated by Lieberman et al (1979)

They tested whether rats could learn a correct turn or choice in a maze despite a long delay of reward

StartBox

ChoiceBox

White

Black

DelayBox

GoalBox

Subjects were placed in the start box and allowed to choose one of two alleyways (White was correct)Three groups:

Group 1: Light – after they made a choice, rats in this groupreceived a 2 s light (regardless of choice) and were allowed to go to the delay boxGroup 2: Noise – treated the same, except 2 s noise

Group 3: Control – no stimulus; went directly to delay box afterthe choice

All rats confined to the delay box for 2 min, then allowed to go to the goal box. Food was given, but only if they had chosen white.

Results:

50

100

Trials

MeanPercent Correct

Control

NoiseLight

Control group stayed at approximately 50% correct

Light and Noise groups learned the discrimination (i.e., learnedto choose white over black)

So why did the Light and Noise improve discrimination learning?

the cues helped to mark the choice response in memory after making a choice and receiving the L or N,subjects more effectively rehearsed the choice theyhad just made when reward was given later on, after 2 min delay,the memory for previous choice was stronger these effects of marking cannot be explained interms of secondary or conditioned reinforcementbecause the marking stimulus was presented afterboth correct and incorrect choices

Instrumental Conditioning

Documents

Transcript of Instrumental Conditioning