Operant Conditioning Unit 4 - AoS 2 - Learning. Trial and Error Learning An organism’s attempts to...
-
Upload
matthew-haynes -
Category
Documents
-
view
241 -
download
2
Transcript of Operant Conditioning Unit 4 - AoS 2 - Learning. Trial and Error Learning An organism’s attempts to...
Trial and Error Learning
• An organism’s attempts to learn or solve a problem by trying alternative possibilities until a correct solution or desired outcome is achieved.
• Often involves many attempts (trials) and incorrect choices (errors)
• Was called instrumental learning,now Operant conditioning - the learner ‘operates’ on the envioronment
Thorndike’s Puzzle Boxes
• Put hungry cats into a ‘puzzle box’, food outside box outside of reach
• Cat had to get out of box to get food.
• The more times a cat was put in the box, the faster it got out (fewer trials)
• After 7 trials would go straight for lever and get out immediately.
• Lever pushing now learnt, not random
Thorndike’s Law of Effect
• a behaviour that is followed by ‘satisfying’ consequences is strengthened (more likely to occur) and a behaviour that is followed by ‘annoying’ consequences is weakened (less likely to occur)
• Instrumental learning because cat is instrumental in obtaining its release
Operant Conditioning
• First used by Burrhus Skinner.
• Operant is a response (or set of responses) that occurs and acts (operates) on the environment to produce some kind of effect.
• Behaviour that has consequences
• ALL behaviour can be explained this way
Operant vs Respondent• Respondents are behaviours that
are elicited by known or recognised stimuli .
• Pavlov’s dogs responded by salivating to meat powder, then a bell.
• Thorndike’s cats made responses not prompted by stimuli.
• In CC, behaviour has no effect on consequences
Skinner Boxes
• Small chamber where an animal learns to make a response for which the consequences can be controlled by experimenter.
• A lever that delivers food / water into a dish.
• Some have lights / buzzers
• Some have a flaw that can shock
Reinforcement• Reinforcement - applying a positive
stimulus OR removing a negative stimulus to subsequently strengthen or increase the likelihood of a particular response that it follow.
• Reinforcer - any object or event that changes the probability that an operant behaviour will occur again.
• Interchangeable with reward, but different
Reinforcement• Initially, most success if behaviour
is continually reinforced.
• Continuous Reinforcement - reinforcing every correct response after it occurs
• Partial Reinforcement - process of reinforcing some correct responses but not all of them.
• Partial may be delivered by different schedules
Fixed-Ratio Schedules• When the reinforcer is given after a
set (fixed) and unvarying number (ratio) of desired responses have been made
• eg every third response, one response for every 10 correct responses (1:10)
• During acquisition phase must be frequent
• Workers who are paid ‘piecework’ eg commission, amount per basket picked.
Variable-Ratio• When the reinforcer is given after
an unpredictable number of correct responses.
• A mean number of correct responses that receive reinforcement.
• Very effective, fast acquisition and doesn’t cease easily.
• Poker machines - expected payout, but don’t know when
Fixed-Interval schedule• When the reinforcer is delivered after a
specific period of time has elapsed since the previous reinforcer, provided the correct response has been made.
• One correct response is all that is needed, like pressing the crossing button.
• Often erratic, since we realise time not responses are the factor, so wait until time
Variable-Interval Schedule
• When the reinforcer is delivered after an irregular period of time has elapsed, provided the correct response has been made.
• A mean period of time, but unpredictable.
• Responses before the delivery time are not reinforced even if correct.
• Fishing, speed cameras, booze busses.
Positive Reinforcement
• Giving or applying a positive reinforcer after the desired response has been made.
• Positive reinforcer - provides a satisfying consequence (reward), so strenghtens the likelihood of a response.
Negative Reinforcement• Removal or avoidance of an
unpleasant stimulus.
• Negative Reinforcer - any unpleasant stimulus that when removed strengthens the likelihood of a desired response occurring.
• In negative reinforcemnt the reinforcer is removed or avoided, not given (positive)
Examples• Getting and A on your exam
(positive reinforcer)can be achieved by studying, so studying will be repeated (increased behaviour)
• Failing your exam (Negative reinforcer) is avoided by studying, so studying will be repeated (increased behaviour)
• Both lead to desirable / positive consequence.
Punishment
• Delivery of an unpleasant stimulus following a response, or removal of a pleasant stimulus.
• Consequence of punishment is weakening of response, or decrease in probability of response occurring again
Order of presentation
• For reinforcement and punishment, it must be presented immediately after a desired response not before.
• The rat needs to press the lever before getting positive reinforcer
Timing
• Most effective when given immediatley after the response, so they are associated directly.
• Delay will cause learning to be slow or unsuccessful.
• Easier in lab than real life.
• Eg student reports, delayed response.
Appropriateness• Reinforcers must provide pleasing
consequences, Punishments must provide unpleasant consequences.
• But how do you know what will please each person?
• Not all reinforcers will work in all situations.
• Inappropriate punishers can become reinforcers - eg. attention seekers
Key processes - Acquisition
• In OC, acquisition is the establhsiment of a response through reinforcement.
• Speed depends on whether continuous or partial reinforcement.
• For complex behaviours successive approximations can be reinforced building up to target behaviour.
Acquisition• Shaping - a procedure in which
reinforcement is given for any response that successively approximates a final target response,
• Also known as method or successive approximations
• eg Skinner’s pigeon will have to turn more and more to get same reward.
Extinction
• The gradual decrease in the strength or rate of a conditioned response following consistent non-reinforcement of the response.
• Eg. When does the pigeon stop turning after it isn’t being fed.
• May actually increase at first, to try to get the reinforcement. don’t want to stop
Spontaneous Recovery
• Can also occur with operant conditioning, when the response occurs in absence of reinforcement after extinction has occurred.
• Likely weaker and temporary
Stimulus Generalisation
• When the correct response is made to another stimulus that is similar to the stimulus that was present when the CR was reinforced. Usually at a reduced level (weaker or less often)
Stimulus Generalisation
• when an oranism makes the correct response to a stimulus and is reinforced, but doesn’t respond to other stimuli, even when similar.
• eg if reinforced for red lights not green lights, will only respond for red.