Operant Conditioning Unit 4 - AoS 2 - Learning. Trial and Error Learning An organism’s attempts to...

27
Operant Conditioning Unit 4 - AoS 2 - Learning

Transcript of Operant Conditioning Unit 4 - AoS 2 - Learning. Trial and Error Learning An organism’s attempts to...

Operant ConditioningUnit 4 - AoS 2 - Learning

Trial and Error Learning

• An organism’s attempts to learn or solve a problem by trying alternative possibilities until a correct solution or desired outcome is achieved.

• Often involves many attempts (trials) and incorrect choices (errors)

• Was called instrumental learning,now Operant conditioning - the learner ‘operates’ on the envioronment

Thorndike’s Puzzle Boxes

• Put hungry cats into a ‘puzzle box’, food outside box outside of reach

• Cat had to get out of box to get food.

• The more times a cat was put in the box, the faster it got out (fewer trials)

• After 7 trials would go straight for lever and get out immediately.

• Lever pushing now learnt, not random

Thorndike’s Law of Effect

• a behaviour that is followed by ‘satisfying’ consequences is strengthened (more likely to occur) and a behaviour that is followed by ‘annoying’ consequences is weakened (less likely to occur)

• Instrumental learning because cat is instrumental in obtaining its release

Operant Conditioning

• First used by Burrhus Skinner.

• Operant is a response (or set of responses) that occurs and acts (operates) on the environment to produce some kind of effect.

• Behaviour that has consequences

• ALL behaviour can be explained this way

Operant vs Respondent• Respondents are behaviours that

are elicited by known or recognised stimuli .

• Pavlov’s dogs responded by salivating to meat powder, then a bell.

• Thorndike’s cats made responses not prompted by stimuli.

• In CC, behaviour has no effect on consequences

Skinner Boxes

• Small chamber where an animal learns to make a response for which the consequences can be controlled by experimenter.

• A lever that delivers food / water into a dish.

• Some have lights / buzzers

• Some have a flaw that can shock

Reinforcement• Reinforcement - applying a positive

stimulus OR removing a negative stimulus to subsequently strengthen or increase the likelihood of a particular response that it follow.

• Reinforcer - any object or event that changes the probability that an operant behaviour will occur again.

• Interchangeable with reward, but different

Reinforcement• Initially, most success if behaviour

is continually reinforced.

• Continuous Reinforcement - reinforcing every correct response after it occurs

• Partial Reinforcement - process of reinforcing some correct responses but not all of them.

• Partial may be delivered by different schedules

Fixed-Ratio Schedules• When the reinforcer is given after a

set (fixed) and unvarying number (ratio) of desired responses have been made

• eg every third response, one response for every 10 correct responses (1:10)

• During acquisition phase must be frequent

• Workers who are paid ‘piecework’ eg commission, amount per basket picked.

Variable-Ratio• When the reinforcer is given after

an unpredictable number of correct responses.

• A mean number of correct responses that receive reinforcement.

• Very effective, fast acquisition and doesn’t cease easily.

• Poker machines - expected payout, but don’t know when

Fixed-Interval schedule• When the reinforcer is delivered after a

specific period of time has elapsed since the previous reinforcer, provided the correct response has been made.

• One correct response is all that is needed, like pressing the crossing button.

• Often erratic, since we realise time not responses are the factor, so wait until time

Variable-Interval Schedule

• When the reinforcer is delivered after an irregular period of time has elapsed, provided the correct response has been made.

• A mean period of time, but unpredictable.

• Responses before the delivery time are not reinforced even if correct.

• Fishing, speed cameras, booze busses.

Positive Reinforcement

• Giving or applying a positive reinforcer after the desired response has been made.

• Positive reinforcer - provides a satisfying consequence (reward), so strenghtens the likelihood of a response.

Negative Reinforcement• Removal or avoidance of an

unpleasant stimulus.

• Negative Reinforcer - any unpleasant stimulus that when removed strengthens the likelihood of a desired response occurring.

• In negative reinforcemnt the reinforcer is removed or avoided, not given (positive)

Examples• Getting and A on your exam

(positive reinforcer)can be achieved by studying, so studying will be repeated (increased behaviour)

• Failing your exam (Negative reinforcer) is avoided by studying, so studying will be repeated (increased behaviour)

• Both lead to desirable / positive consequence.

Punishment

• Delivery of an unpleasant stimulus following a response, or removal of a pleasant stimulus.

• Consequence of punishment is weakening of response, or decrease in probability of response occurring again

Order of presentation

• For reinforcement and punishment, it must be presented immediately after a desired response not before.

• The rat needs to press the lever before getting positive reinforcer

Timing

• Most effective when given immediatley after the response, so they are associated directly.

• Delay will cause learning to be slow or unsuccessful.

• Easier in lab than real life.

• Eg student reports, delayed response.

Appropriateness• Reinforcers must provide pleasing

consequences, Punishments must provide unpleasant consequences.

• But how do you know what will please each person?

• Not all reinforcers will work in all situations.

• Inappropriate punishers can become reinforcers - eg. attention seekers

Key processes - Acquisition

• In OC, acquisition is the establhsiment of a response through reinforcement.

• Speed depends on whether continuous or partial reinforcement.

• For complex behaviours successive approximations can be reinforced building up to target behaviour.

Acquisition• Shaping - a procedure in which

reinforcement is given for any response that successively approximates a final target response,

• Also known as method or successive approximations

• eg Skinner’s pigeon will have to turn more and more to get same reward.

Extinction

• The gradual decrease in the strength or rate of a conditioned response following consistent non-reinforcement of the response.

• Eg. When does the pigeon stop turning after it isn’t being fed.

• May actually increase at first, to try to get the reinforcement. don’t want to stop

Spontaneous Recovery

• Can also occur with operant conditioning, when the response occurs in absence of reinforcement after extinction has occurred.

• Likely weaker and temporary

Stimulus Generalisation

• When the correct response is made to another stimulus that is similar to the stimulus that was present when the CR was reinforced. Usually at a reduced level (weaker or less often)

Stimulus Generalisation

• when an oranism makes the correct response to a stimulus and is reinforced, but doesn’t respond to other stimuli, even when similar.

• eg if reinforced for red lights not green lights, will only respond for red.

CC and OC

• Role of Learner

• Timing of Stimulus and Response

• Nature of Response - Reflex or Voluntary?