Operant conditioning (Skinner – 1938, 1956). Skinner box A small soundproof chamber in which an...

Post on 13-Jan-2016

225 views 1 download

Tags:

Transcript of Operant conditioning (Skinner – 1938, 1956). Skinner box A small soundproof chamber in which an...

Operant conditioning(Skinner – 1938, 1956)

Skinner box

• A small soundproof chamber in which an experimental animal learns to make a particular response for which the consequences are controlled.

Skinner’s original experiments

• When a hungry rat was placed in a Skinner box, it scurried around the box and randomly touched parts of the floor and walls.

• Eventually, it accidentally pressed a lever and a food pellet appeared.

• The rat continued random movement and pressed the lever again. Another food pellet appeared.

• With additional repetitions of lever pressing followed by food, the rat’s random movements were replaced by consistent lever pressing.

• The rat “learned” that particular behaviours resulted in a desirable consequence.

Skinner’s original experiments

Operant conditioning

• The learning process by which the likelihood of a particular behaviour occurring is determined by the consequences of that behaviour, and the environment (antecedent).

• The organism will tend to be repeat behaviour which has a desirable consequence (reinforcement) and tend not to repeat behaviour which has an undesirable consequence (punishment).– Also called instrumental conditioning.

A B C

• Antecedent– Environment or context

• Behaviour– Voluntary behaviours made by organism

• Consequence– Positive (reinforcement) or negative

(punishment)

Types of consequences

• Reinforcement

– Positive reinforcement

• ADD positive to INCREASE behaviour

– Negative reinforcement

• REMOVE negative to INCREASE behaviour

• Punishment

– Punishment

• ADD negative to DECREASE behaviour

– Response cost

• REMOVE positive to DECREASE behaviour

Reinforcement

• Any event which strengthens, increases the frequency, or increases the likelihood of a particular response occurring.– Reinforcer: The stimulus that provides

reinforcement. Often used interchangeably with the term reward.

Types of reinforcement

• Positive reinforcement– The application of a pleasant stimulus following a

response. The behaviour (response) is strengthened (more likely to occur again).

• Negative reinforcement– The removal or avoidance of an unpleasant

stimulus. Because the outcome is a pleasant one, the behaviour that removes or avoids the unpleasant stimulus is strengthened (more likely to occur again).

Punishment

• When a response is followed by a negative consequence (an unpleasant event or taking away something that is pleasant) which decreases the likelihood of that response occurring again over time.

• The weakening of a response by following it with negative consequences is not negative reinforcement – it is punishment.

Punishment considerations

• Order of presentation– Consequence (punishment) must be presented

after undesired behaviour, never before• Timing

– Punishment most effective when given immediately after the behaviour has occurred (without delay)

• Appropriateness– Consequence (reinforcer) must be satisfying,

punishment must be unpleasant

Are speeding fines enough of a deterrent? What else can be done to stop speeding?

Types of reinforcement

• Continuous reinforcement: When a correct response is reinforced every time it occurs.

• Partial reinforcement: Reinforcing only some correct responses.

• Schedule of reinforcement: The frequency and manner in which a correct response is reinforced.

Types of reinforcement

Fixed and variable schedules

• Fixed-ratio schedule: reinforcer given after a fixed number of correct responses

– eg, every 10 times

• Variable-ratio schedule: reinforcer given after a variable number of correct responses

– eg, almost random – certain number of correct responses must be made

• Fixed-interval schedule: reinforcer given after a fixed amount of time since last reinforcer (eg, every 10 seconds).

• Variable-interval schedule: reinforcer given after a variable amount of time

– Eg, almost random – certain number of correct responses must be made

Which schedule does gambling use and why?

Shaping

• Shaping: An operant conditioning procedure in which a reinforcer is given for any response that successively approximates and ultimately leads to the final response, or target behaviour. Also called method of successive approximations.

Shaping

Acquisition

• The overall learning process in which a specific response, or pattern of responses is established, through reinforcement.– A pigeon receives a food pellet every time

it pecks a disk, until the behaviour is established.

Extinction

• The gradual decrease in the strength or rate of a conditioned response following consistent non-reinforcement of the response.– When a pigeon is no longer reinforced for

pecking a disk, the behaviour will diminish over time and eventually extinguish.

Spontaneous recovery

• After the apparent extinction of a conditioned response, the organism again shows the response in the absence of any reinforcement.– A pigeon that was trained 6 months

previously to peck a disk for food and then had that response extinguished, pecks the disk when returned to the Skinner box, without reinforcement.

Stimulus generalisation

• The correct response occurs to another stimulus that is similar to the stimulus that was presented when the response was reinforced.– A pigeon pecks at a disk that is a different

shape from the one it was trained to peck.

Stimulus discrimination

• The organism makes the correct response to a stimulus and is reinforced, but does not respond to any other stimulus, even when they are similar.– The pigeon will only peck the round disk in

the Skinner box, and not any other shaped disk.

Real-life applications

• Animal training

• Behaviour modification in therapy, schools, workplace, etc

• Treat addictive behaviour, eg gambling

• “Loyalty” programs such as coffee clubs, frequent flyer programs, etc