Effects of Stimulus Discriminability on Discrimination Acquisition and Stimulus-Equivalence...

14
ORIGINAL ARTICLE Effects of Stimulus Discriminability on Discrimination Acquisition and Stimulus-Equivalence Formation: Assessing the Utility of a Multiple Schedule Adam H. Doughty & Kaitlyn P. Brierley & Kalon R. Eways & Rebecca M. Kastner Published online: 12 April 2014 # Association of Behavior Analysis International 2014 Abstract A multiple schedule was used to examine the ef- fects of stimulus discriminability on discrimination acquisi- tion in baseline and subsequent stimulus-equivalence forma- tion. Six college students were exposed in two experiments to more or less discriminable stimuli across multiple-schedule components (reinforcer magnitude was unequal and equal in Experiments 1 and 2, respectively). For five of the six partic- ipants, slower baseline acquisition occurred with the less discriminable stimuli. After high accuracy scores occurred in both components for several sessions, stimulus equivalence was tested across multiple sessions. In terms of accuracy, delayed emergence of equivalence formation only occurred in the component with the less discriminable stimuli for three of the five participants who previously displayed differential acquisition. For one of these other two participants, a 2-week follow-up assessment revealed that stimulus equivalence was disrupted only in the component previously correlated with the slower acquisition period. In terms of response latency, slower responding was obtained in the component with the less discriminable stimuli for all six participants, particularly during initial equivalence testing. These results demonstrate that stimulus discriminability can influence both discrimina- tion acquisition and stimulus-equivalence formation. The re- sults also support the utility of examining stimulus- equivalence formation using a multiple schedule. Keywords Arbitrary matching to sample . Conditional discrimination . Delayed emergence . Stimulus equivalence . Mouse click . Human Stimulus equivalence typically is studied by exposing normal- ly capable humans to conditional-discrimination training in the form of arbitrary-matching-to-sample trials (e.g., Sidman 1994, 2000; Sidman and Tailby 1982). Following the estab- lishment of highly accurate baseline discriminations, choices early in probe testing usually show the immediate emergence of stimulus equivalence (i.e., choices characterized by reflex- ivity, symmetry, and transitivity). However, stimulus- equivalence formation can be less reliable, as measured in at least three ways. First, there are findings of gradual, or de- layed, emergence of equivalence relations (i.e., delayed emer- gence) where these relations are not obtained consistently at the start of testing, but only become reliable with repeated testing (e.g., Sidman et al. 1985). Delayed emergence is puzzling in part because equivalence relations usually develop across testing in the absence of differential consequences. Second, different types of equivalence trials may engender differential response latencies (e.g., Spencer and Chase 1996). Third, established equivalence relations may become less accurate when retested following a period of time in which exposure to the stimuli is prevented (e.g., Saunders et al. 1988). The investigation of variables that influence stimulus- equivalence formation is important, in part, because the afore- mentioned findings suggest that stimulus equivalence might not be best characterized as a unitary phenomenon (e.g., Pilgrim and Galizio 1996). One means of influencing stimulus-equivalence formation is by manipulating stimulus type, in the form of stimulus familiarity and stimulus nameability (e.g., Arntzen and Lian 2010; Holth and Arntzen 1998;OConner et al. 2009). Holth and Arntzen used an errorless-learning procedure to teach college students AB and BC conditional discriminations be- fore presenting CA probes. In their Experiment 1, five groups of ten participants differed with respect to stimulus type: Group 1 only received Greek letters, Groups 2 and 3 had the A. H. Doughty (*) : K. P. Brierley : K. R. Eways : R. M. Kastner Department of Psychology, College of Charleston, 57 Coming Street, Charleston, SC 29424, USA e-mail: [email protected] Psychol Rec (2014) 64:287300 DOI 10.1007/s40732-014-0001-7

Transcript of Effects of Stimulus Discriminability on Discrimination Acquisition and Stimulus-Equivalence...

ORIGINAL ARTICLE

Effects of Stimulus Discriminability on DiscriminationAcquisition and Stimulus-Equivalence Formation:Assessing the Utility of a Multiple Schedule

Adam H. Doughty & Kaitlyn P. Brierley &

Kalon R. Eways & Rebecca M. Kastner

Published online: 12 April 2014# Association of Behavior Analysis International 2014

Abstract A multiple schedule was used to examine the ef-fects of stimulus discriminability on discrimination acquisi-tion in baseline and subsequent stimulus-equivalence forma-tion. Six college students were exposed in two experiments tomore or less discriminable stimuli across multiple-schedulecomponents (reinforcer magnitude was unequal and equal inExperiments 1 and 2, respectively). For five of the six partic-ipants, slower baseline acquisition occurred with the lessdiscriminable stimuli. After high accuracy scores occurred inboth components for several sessions, stimulus equivalencewas tested across multiple sessions. In terms of accuracy,delayed emergence of equivalence formation only occurredin the component with the less discriminable stimuli for threeof the five participants who previously displayed differentialacquisition. For one of these other two participants, a 2-weekfollow-up assessment revealed that stimulus equivalence wasdisrupted only in the component previously correlated withthe slower acquisition period. In terms of response latency,slower responding was obtained in the component with theless discriminable stimuli for all six participants, particularlyduring initial equivalence testing. These results demonstratethat stimulus discriminability can influence both discrimina-tion acquisition and stimulus-equivalence formation. The re-sults also support the utility of examining stimulus-equivalence formation using a multiple schedule.

Keywords Arbitrarymatching to sample . Conditionaldiscrimination . Delayed emergence . Stimulus equivalence .

Mouse click . Human

Stimulus equivalence typically is studied by exposing normal-ly capable humans to conditional-discrimination training inthe form of arbitrary-matching-to-sample trials (e.g., Sidman1994, 2000; Sidman and Tailby 1982). Following the estab-lishment of highly accurate baseline discriminations, choicesearly in probe testing usually show the immediate emergenceof stimulus equivalence (i.e., choices characterized by reflex-ivity, symmetry, and transitivity). However, stimulus-equivalence formation can be less reliable, as measured in atleast three ways. First, there are findings of gradual, or de-layed, emergence of equivalence relations (i.e., delayed emer-gence) where these relations are not obtained consistently atthe start of testing, but only become reliable with repeatedtesting (e.g., Sidman et al. 1985). Delayed emergence ispuzzling in part because equivalence relations usually developacross testing in the absence of differential consequences.Second, different types of equivalence trials may engenderdifferential response latencies (e.g., Spencer and Chase 1996).Third, established equivalence relations may become lessaccurate when retested following a period of time in whichexposure to the stimuli is prevented (e.g., Saunders et al.1988). The investigation of variables that influence stimulus-equivalence formation is important, in part, because the afore-mentioned findings suggest that stimulus equivalence mightnot be best characterized as a unitary phenomenon (e.g.,Pilgrim and Galizio 1996).

One means of influencing stimulus-equivalence formationis by manipulating stimulus type, in the form of stimulusfamiliarity and stimulus nameability (e.g., Arntzen and Lian2010; Holth and Arntzen 1998; O’Conner et al. 2009). Holthand Arntzen used an errorless-learning procedure to teachcollege students AB and BC conditional discriminations be-fore presenting CA probes. In their Experiment 1, five groupsof ten participants differed with respect to stimulus type:Group 1 only received Greek letters, Groups 2 and 3 had the

A. H. Doughty (*) :K. P. Brierley :K. R. Eways : R. M. KastnerDepartment of Psychology, College of Charleston, 57 Coming Street,Charleston, SC 29424, USAe-mail: [email protected]

Psychol Rec (2014) 64:287–300DOI 10.1007/s40732-014-0001-7

A and C stimuli replaced with nameable pictures, Group 4 hadthe B stimuli replaced with nameable pictures, and Group 5had the C stimuli replaced with nameable pictures.Equivalence was assessed as a function of group and the firstand second half of the CA probe session. Using a 90%-correctcriterion, only three participants in Group 1 and six partici-pants in Group 5 exhibited equivalence, whereas nearly all ofthe participants in the other groups did so. Furthermore, eightof the former nine participants (from Groups 1 and 5)displayed delayed emergence (i.e., higher accuracy scores inthe second half of the test). Stimulus type also may haveaffected the rate of discrimination acquisition in baseline, inthat Groups 1 and 5 seemed to require more baseline trials toreach the same high accuracy levels. However, it was notreported whether these differences were statistically signifi-cant. One or more factors could have been responsible for thelack of more robust effects. Group differences may have beenminimized by the between-group assessment, and/or stimulusnameability (or familiarity) may be less powerful than othermanipulations (e.g., stimulus discriminability).

The purpose of the present research was to examine theeffects of stimulus discriminability on both discriminationacquisition in baseline and subsequent stimulus-equivalenceformation using a multiple-schedule preparation. Our researchquestion and procedures extend Holth and Arntzen (1998) inat least two significant ways. First, our manipulation of stim-ulus discriminability differs from previous manipulations ofstimulus familiarity and/or nameability. That is, stimulus sim-ilarity per se was altered. It is hypothesized that the inclusionof more similar stimuli may produce both a protracted base-line period as well as less reliable stimulus-equivalence for-mation (e.g., in the form of delayed emergence). Second, ourmultiple-schedule assessment of stimulus-equivalence forma-tion differs from nearly every published study involving stim-ulus equivalence. To our knowledge, only Catania et al.(1989) have tested equivalence formation under a multipleschedule. These authors reported successful equivalence for-mation in only one participant (their number of total partic-ipants is unclear; however, they stated that “several” partic-ipants were exposed to similar procedures). It is unclear ifthe multiple-schedule preparation used by Catania et al. wasresponsible for their largely unsuccessful results. On the onehand, this within-participant design may prove especiallysensitive to detecting the effects of various factors onstimulus-equivalence formation (i.e., with its control overmultiple extraneous variables). On the other hand, the useof a multiple schedule may impact equivalence formationnegatively. For example, the use of a multiple schedulerequires each participant to learn a greater number of dis-criminations (e.g., compared to a between-group assess-ment). In addition, the impact is unclear of demonstratingequivalence relations in one component on responding in asecond component.

Experiment 1

The purpose of Experiment 1 was to extend Holth andArntzen (1998) by examining, in a multiple schedule, theeffects of stimulus discriminability on both discriminationacquisition and stimulus-equivalence formation. To measurethe effects of stimulus discriminability more fully, we alteredthe procedures used by Holth and Arntzen in several ways.The rationale for making these changes was to present ourparticipants with procedures that should result in a longerperiod of baseline acquisition, thus allowing for a more sen-sitive assessment of stimulus discriminability. Errorless-learning procedures were not employed in either multiple-schedule component. In each component, the baseline dis-criminations were presented randomly from the start of train-ing (i.e., instead of presenting AB training, followed by BCtraining, etc.), and a training structure was used with a greaternumber of stimuli (i.e., six 4-member equivalence classescould develop in each component). In addition to alteringstimulus discriminability across components, the more similarstimuli were correlated with a smaller reinforcer magnitude.Thus, we predicted a slower acquisition of the baseline dis-criminations and less reliable stimulus-equivalence formationin the component with the less discriminable stimuli and thesmaller reinforcer. Finally, we alternated multiple-schedulecomponents in such a manner as to potentially produce evenless reliable equivalence formation with the less discriminablestimuli. Perone and colleagues (e.g., Perone and Courtney1992; see Perone 2003 for a review) have obtained responsedisruption (e.g., extended postreinforcement pausing) duringthe transition from a more favorable multiple-schedule com-ponent to a less favorable one. We, therefore, alternated com-ponents in such a way as to assess whether performance in thecomponent with the less discriminable stimuli was even lessreliable immediately following the component with the morediscriminable stimuli.

Method

Participants Three female College of Charleston students(ages 18, 18, and 21) were recruited from Introduction toPsychological Science courses. Each participant was informedthe study would last approximately 20 h and result in approx-imately $10 per hour. The results from an additional partici-pant (i.e., Participant 2) are not reported here because thisparticipant failed stimulus-equivalence testing.

Apparatus A room 8 ft by 13.5 ft with four workstationsseparated by dividers was used. Each workstation had a deskand chair, and on each desk was an iMac or eMac, keyboard(which was not used by the participants), and mouse. Amouseclick over a stimulus displayed on the computer screen was

288 Psychol Rec (2014) 64:287–300

the response. The contingencies were programmed and re-sponses recorded using MTS version 11.6.7 (Dube 1991).

Procedure Each participant was tested individually in threeconditions under a two-component multiple schedule. Onecomponent (hereafter, the More Discriminable component)presented distinct nonrepresentational forms (Dube 1991) ona red screen, and the other component (hereafter, the LessDiscriminable component) presented similar letter/numbercombinations on a yellow screen. Table 1 shows these stimuli.Each stimulus was approximately 4 cm squared and appearedblack on the screen. Each session had nine-trial componentsthat resembled the procedures used by Perone and Courtney(1992). Each session had 21 nine-trial components inConditions 1 and 2 and 33 nine-trial components inCondition 3. Thus, each session had five (Conditions 1 and2) or eight (Condition 3) transitions of each of the four types(i.e., a Less Discriminable component followed by anotherLess Discriminable component, a Less Discriminable compo-nent followed by a More Discriminable component, a MoreDiscriminable component followed by another MoreDiscriminable component, and a More Discriminable compo-nent followed by a Less Discriminable component).

As Table 1 suggests, the arbitrary-matching-to-sampletraining was constructed so that in each component one ofsix sample stimuli appeared (in the center of the screen)surrounded by six comparison stimuli (one in each corner ofthe screen, one in the middle and to the left of the sample, andone in the middle and to the right of the sample). An observingresponse to the sample was not necessary to produce thecomparisons, and the sample remained until a comparisonwas selected. Trials occurred pseudorandomly such that ineach component each sample was presented a comparablenumber of times per session, no single sample occurred onmore than three consecutive trials, the screen location of thecorrect comparison (i.e., S+) could not be the same on morethan three consecutive trials, each S+ occurred in each loca-tion approximately the same number of times per session, andthe location of the incorrect comparisons (i.e., S-s) with anysingle S+ varied unsystematically across trials. The intertrialinterval (ITI) was 1.5 s in both components, involved a blankscreen, and reset if a response occurred. When a new compo-nent started, it did so immediately after the ITI from the lasttrial of the previous component. There were multiple sessionsper day such that each student participated between 1 and 3 hper day. The Appendix shows the instructions provided to theparticipants.

Table 2 shows the arbitrary-matching-to-sample discrimi-nations trained and/or tested. In Condition 1, there was afixed-ratio (FR) one schedule on every trial. Immediately aftereach comparison-stimulus selection, an elliptical star appearedfor 1 s (after choosing the S+) or the screen darkened for 1.5 s(after choosing an S-). Each correct response resulted in the

gain of 3 or 1 cent(s) in the More and Less Discriminablecomponents, respectively. The presentation of the 21

Table 1 Stimulus designations in each component in Experiment 1

More Discriminable Component

A1 B1 C1 D1

A2 B2 C2 D2

A3 B3 C3 D3

A4 B4 C4 D4

A5 B5 C5 D5

A6 B6 C6 D6

Less Discriminable Component

A1

xeg

B1

739

C1

yik

D1

812

A2

xge

B2

793

C2

yki

D2

821

A3

exg

B3

973

C3

kiy

D3

182

A4 B4 C4 D4

egx 937 kyi 128

A5

gxe

B5

379

C5

iyk

D5

218

A6

gex

B6

397

C6

iky

D6

281

Psychol Rec (2014) 64:287–300 289

components (i.e., 20 transitions) differed across sessions inthat there were six presentation sequences used in Condition 1(e.g., MLLMMLMLMMMLLMLMLLMLLwas one presen-tation sequence, where M is more discriminable and L is lessdiscriminable). Of these six sequences, three started with aMore Discriminable component and three started with a LessDiscriminable component. Thus, each session in Condition 1had 189 trials, either 90 or 99 in each component. In eachsession of Condition 1, the discriminations were arrangedsuch that six 4-member stimulus-equivalence classes coulddevelop. Across AB trials, only one of the B comparisonsserved as the S+ in the presence of one of the six A samples(e.g., B1 in the presence of A1). Across BC trials, only one ofthe C comparisons served as the S+ in the presence of one ofthe six B samples (e.g., C2 in the presence of B2), and acrossCD trials, only one of the D comparisons served as the S+ inthe presence of one of the six C samples (e.g., D6 in thepresence of C6). Condition 1 ended for each participant afterdiscrimination accuracy was at least 90 % in each componentfor five consecutive sessions.

Condition 2 involved a single session that was identical to asession in Condition 1 with two exceptions. First, no within-session differential consequences were provided (i.e., the starand dark-screen presentations were eliminated, and the ITIbegan after a comparison selection). Second, instructions wereprovided indicating that these consequences were beingsuspended (see Appendix). Despite this absence of within-session differential consequences, the participants were toldimmediately after the session how much money they earned.Condition 2 assessed the persistence of discrimination accu-racy in the absence of differential consequences becauseCondition 3 tested stimulus equivalence in the absence ofdifferential consequences.

Condition 3 measured stimulus-equivalence formation ineach multiple-schedule component in the absence of eitherwithin-session or postsession differential consequences. Therewere 297 trials in each of two sessions. Each session had acomparable number of baseline trials (i.e., trials with baselinediscriminations) and stimulus-equivalence probes (i.e., trialswith DA, DB, and CA discriminations). Each session beganwith a Less Discriminable component, and the first compo-nent of each type only contained baseline trials. Each session,therefore, had 72 baseline trials in the More Discriminable

component, 81 baseline trials in the Less Discriminable com-ponent, 72 stimulus-equivalence trials in the MoreDiscriminable component, and 72 stimulus-equivalencetrials in the Less Discriminable component. Of the 72stimulus-equivalence trials in each component, therewere approximately 24 trials of each type of stimulus-equivalence probe (i.e., DA, DB, and CA). The probeswere constructed to assess equivalence on every trial(i.e., demonstration of symmetry and transitivity). Forexample, on a DA trial, one of the six D stimuli servedas a sample, and the six A stimuli served as compari-sons (e.g., selection of A4 in the presence of D4 wasconsistent with equivalence).

Results

The left and right graphs in Fig. 1 show percent correct andmean latency, respectively, for each participant in each com-ponent of every session in Condition 1. Accuracy scoresincreased relatively rapidly across sessions in the MoreDiscriminable component for Participants 1 and 3 and in eachcomponent for Participant 4. On the other hand, accuracyscores increased relatively slowly across sessions in the LessDiscriminable component for Participants 1 and 3. Accuracyscores for all participants exceeded 90 % in each componentin the last five sessions of Condition 1. For each participant,mean latency was longer, and session-to-session variability inmean latency greater, in the Less Discriminable component.

The top graph of Fig. 2 shows percent correct on thestimulus-equivalence probes in each component in eachsession of Condition 3 for all participants (for theseprobes and the probe results discussed below, percentcorrect refers to the percentage of choices consistentwith stimulus equivalence). These accuracy scores wereabove 90 % with two exceptions; for Participants 1 and3 in the Less Discriminable component in the firstsession, accuracy scores were 78 % and 83 %, respec-tively. The bottom graph of Fig. 2 shows a more de-tailed analysis of the latter scores by displaying theaccuracy scores for each equivalence probe type forthese two participants in the Less Discriminable compo-nent in this first session. The equivalence probe withthe greater number of nodes (i.e., the DA probe) result-ed in the lowest scores for both participants (i.e., 70 %and 74 % for Participants 1 and 3, respectively). For theremaining probe types, Participants 1 and 3 performedmore accurately on the DB probe than the CA probe.

Figure 3 shows mean latency for each participant on thestimulus-equivalence probe trials in each component of eachsession in Condition 3. Mean latency was longer on the probetrials in the Less Discriminable component in each session foreach participant, with the longest latencies occurring in thefirst session.

Table 2 Discriminations trained and/or tested in each condition of eachexperiment

Condition Discriminations

Number Type Trained Tested

1 Baseline Training AB, BC, CD –

2 Baseline Assessment AB, BC, CD –

3 Stimulus-Equivalence Testing AB, BC, CD DA, DB, CA

290 Psychol Rec (2014) 64:287–300

Additional Analyses Some aspects of the results were notdisplayed graphically. First, performance (i.e., accuracy and la-tency) in the single session of Condition 2 was comparable toresponding in the final sessions of Condition 1. Second, accuracyscores on the baseline trials in Condition 3 exceeded 90 % ineach component for each participant. Third, our transition anal-yses by component yielded no systematic effects in baseline orequivalence testing. For example, latencies on baseline trials inthe Less Discriminable component were not longer after a MoreDiscriminable component than after a Less Discriminable com-ponent. Also, accuracy and latency on probe trials in the LessDiscriminable component were not different after a MoreDiscriminable component than after a Less Discriminable com-ponent. Fourth, an examination of latency on the equivalenceprobes as a function of probe type showed no reliable differ-ences. This absence of reliable differences was due to the rela-tively high degree of variability in these latencies in terms ofprobe type (i.e., mean latency was longer on the DA probes ineach component, but there was sufficient variation on all threeprobe types). Fifth, given the aforementioned degree of variationin latency as a function of probe type, we conducted t tests toconfirm the differences in mean latency across components inboth baseline and equivalence testing. Specifically, for eachparticipant, we compared latency across components during thelast session of baseline and in each equivalence-testing session,and the results were consistent with the graphical displays in

Figs. 1 and 3. That is, mean latency was significantly longer foreach participant in the Less Discriminable component than theMore Discriminable component in the last baseline session andeach equivalence-testing session (p<.05 in every comparison).

Discussion

The results of Experiment 1 were promising in severalregards. First, despite the requirement that each participantlearn a considerable number of conditional discriminations inbaseline, each participant eventually displayed high accuracyscores in both components for several sessions. Second, ac-quisition of the baseline discriminations was considerablyslower in the Less Discriminable component for two of thethree participants. Third, each participant displayed consistentequivalence formation by the end of probe testing. Fourth, andmost important, there were differences in equivalence forma-tion across components. In terms of delayed emergence, thisgradual display of equivalence relations only was observed inthe Less Discriminable component for the two participantswho displayed retarded acquisition in this component. Thisinverse relation between the rapidity of baseline acquisitionand delayed emergence extends the findings of Holth andArntzen (1998). Specifically, these data illustrate the utilityof a multiple schedule at demonstrating that the inclusion ofless discriminable stimuli can produce both a slower baseline

Sessions

P1

M D

LD

Fig. 1 Left and right graphs showpercent correct and mean latency(s), respectively, in eachcomponent in each session ofCondition 1 for each participant inExperiment 1

Psychol Rec (2014) 64:287–300 291

acquisition period as well as less reliable equivalence forma-tion. In addition, the finding of the lowest accuracy scores onDA-probe trials is consistent with studies that have obtainedcomparable node effects (e.g., Spencer and Chase 1996). Thatis, the DA probe with its two nodes (i.e., B and C stimuli)produced lower accuracy scores than the other probes withtheir single nodes (i.e., C in the DB probe and B in the CAprobe). Finally, mean response latency was reliably higher inthe Less Discriminable component in both baseline and equiv-alence testing. Thus, these results generally support the pre-dictions outlined previously. Stimulus discriminability caninfluence both the rapidity of discrimination acquisition andequivalence formation, and these effects can be investigated ina multiple schedule. The implications of these results arepresented in the General Discussion. Prior to that discussion,however, some of the limitations of Experiment 1 were ad-dressed in Experiment 2.

Experiment 2

Despite the aforementioned strengths of Experiment 1, it alsohad limitations. First, the high number of baseline conditional

discriminations may have resulted in the considerably lengthybaseline periods for Participants 1 and 3. It, therefore, seemsuseful to determine whether comparable results could beobtained with briefer baseline periods. Second, there wereno systematic effects of the different transition types in anydependent measure. Third, both stimulus discriminability andreinforcer magnitude were manipulated across components.Based, in part, on anecdotal reports provided by the partici-pants after Experiment 1, reinforcer magnitude may not haveimpacted responding differentially. Fourth, the effects of stim-ulus discriminability were not observed with one of the threeparticipants.

Experiment 2 exposed three additional participants toslightly modified procedures to address the aforementionedissues. First, to reduce the duration of baseline, the number ofdiscriminations was reduced such that four 4-member equiv-alence classes could develop in each component. Second, thealternation procedure resembling Perone and Courtney (1992)was omitted. Third, reinforcer magnitude was equated acrosscomponents. Despite these changes, it was predicted thatdifferential stimulus discriminability would result in differen-tial acquisition of the baseline discriminations and differentialequivalence formation.

P1

P3

P4

Fig. 3 Mean latency (s) on the stimulus-equivalence probe trials in eachcomponent in each session of Condition 3 for each participant in Exper-iment 1. Note. MD more discriminable, LD less discriminable

P1 P3 P4

P1-LD P3 - LDFig. 2 Upper graph shows percent correct in each component in eachsession of Condition 3 for each participant in Experiment 1. Lower graphshows percent correct on each probe type in the Less Discriminablecomponent in the first session of Condition 3 for Participants 1 and 3.Note. MD more discriminable, LD less discriminable

292 Psychol Rec (2014) 64:287–300

Method

Participants Three female College of Charleston students(ages 18, 19, and 27) were recruited from Introduction toPsychological Science courses. Each participant was informedthe study would last approximately 15 h and result in approx-imately $10 per hour.

Apparatus The apparatus was identical to Experiment 1.

Procedure Table 2 shows that the arbitrary-matching-to-sam-ple discriminations trained and/or tested were identical toExperiment 1, and Table 3 shows the stimuli in Experiment2. The procedures in Experiment 2 resembled the proceduresin Experiment 1 with the following differences. The baselinediscriminations were constructed so that four 4-memberequivalence classes could develop in each component.Reinforcer magnitude was equal across components such thatthe participants were told they would earn 1.5 cents followingeach correct response. There were 12 trials in each compo-nent, and each session had an equal number of the twocomponents (eight in Conditions 1 and 2, and 16 in

Condition 3). The 12-trial components alternated in a pseudo-random fashion with the restriction that the same component

could not appear more than three consecutive times. Eachsession in Condition 3 had 384 trials (i.e., 96 baseline trialsin each component, and 96 stimulus-equivalence trials in eachcomponent), and the restriction that equivalence trials couldnot occur in the first component of each component type wasremoved.

In addition to the minor aforementioned changes toCondition 3, there were two significant differences in thiscondition relative to Experiment 1. First, two participants(i.e., Participants 6 and 7) received a stimulus-equivalencefollow-up assessment 2 weeks after their initial assessment.Specifically, as in Experiment 1, these participants receivedtwo sessions of equivalence testing following their Condition2. They then received two additional sessions of equivalencetesting 2 weeks later (i.e., they did not receive any experimen-tal exposure during the intervening two-week period). Theseparticipants received this follow-up assessment because theiraccuracy scores during the initial equivalence-testing sessionswere at or near 100 % in each component (see Results). Thesecond significant change in the procedures of Condition 3 isthat one participant (i.e., Participant 5) received a considerablenumber of equivalence-testing sessions under different proce-dures. Participant 5 received these sessions because shedisplayed considerably low accuracy scores on the equiva-lence probes in the Less Discriminable component (seeResults for a description of these procedures).

Results

The left and right graphs in Fig. 4 show percent correct andmean latency, respectively, for each participant in each com-ponent of every session in Condition 1. Accuracy scoresincreased relatively rapidly across sessions in the MoreDiscriminable component for each participant. In the LessDiscriminable component, however, accuracy increased at aslower rate for each participant. The latter result was mostpronounced for Participant 5. Accuracy scores in both com-ponents exceeded 90 % during the last five sessions ofCondition 1 for each participant. Mean latency was higher inthe Less Discriminable component for each participant, withParticipant 5 showing the most session-to-session variability.

The accuracy and latency results from equivalence testingin Experiment 2 are displayed differently relative toExperiment 1. This change was made because, as noted,Participant 5 received extended equivalence testing. The topgraph of Fig. 5 shows percent correct on the equivalenceprobes in each component of each session in Condition 3 forParticipants 6 and 7 (recall that a two-week period intervenedbetween these initial two sessions and these final two ses-sions). These accuracy scores were above 90 % with oneexception; accuracy in the Less Discriminable componentfor Participant 6 during the third session decreased to 81 %(i.e., during her first session after the two-week period). As in

Table 3 Stimulus designations in each component in Experiment 2

More Discriminable Component

A1 B1 C1 D1

A2 B2 C2 D2

A3 B3 C3 D3

A4 B4 C4 D4

Less Discriminable Component

A1 xeg

B1 739

C1 yik

D1 812

A2 xge

B2 793

C2 yki

D2 821

A3 exg

B3 973

C3 kiy

D3 182

A4 gxe

B4 379

C4 iyk

D4 218

Psychol Rec (2014) 64:287–300 293

Experiment 1, this reduced accuracy was investigated furtherby displaying in the bottom graph of Fig. 5 accuracy for eachequivalence probe type in the Less Discriminable componentin this third session for Participant 6. Participant 6 emitted anidentical number of errors on each of these three probe types.

The top and bottom graphs of Fig. 6 shows mean latencyon the equivalence probe trials in each component of eachsession in Condition 3 for Participants 6 and 7, respectively.For each participant, mean latency was higher in the LessDiscriminable component in each session, with the highestlatencies observed in the first session.

The top graph of Fig. 7 shows accuracy scores forParticipant 5 in each session of Condition 3. These scoresare displayed in each component for both baseline trials andequivalence-probe trials (the dashed vertical lines separate thedifferent procedures exposed to Participant 5 across this con-dition). Her first three sessions were identical to the proce-dures used for Participants 6 and 7, with no within-session orpostsession differential consequences. Next, for three ses-sions, in both components, the within-session differentialconsequences (i.e., star versus dark-screen presentation) werereinstated for the baseline trials only (i.e., there still were nowithin-session differential consequences for the equivalenceprobes, and there still were no postsession differential

consequences). During these initial six sessions, accuracywas higher in the More Discriminable component than theLess Discriminable component for both the baseline trials andthe equivalence-probe trials. However, baseline accuracyscores in the Less Discriminable component in these sessionsstill were considerably high (i.e., approximating 80 %). Themost notable result from these early sessions was the particu-larly low accuracy scores on the equivalence probes in theLess Discriminable component.

Finally, the last eight sessions had, in both components,increased ITI and dark-screen presentations (i.e., 3 s) andincreased reinforcer magnitude (i.e., 3 cents). In these finaleight sessions, there still were no within-session differentialconsequences for the equivalence probes, and there still wereno postsession differential consequences. Across these lasteight sessions, accuracy scores were comparable across alltrial types except the equivalence probes in the LessDiscriminable component. The accuracy scores on the equiv-alence probes in the Less Discriminable component weremarkedly lower; however, they increased across sessions be-fore approximating 80 %. To understand performance betteron the equivalence probes in the Less Discriminable compo-nent, the middle graph of Fig. 7 shows these accuracy scoresas a function of probe type. The results resemble the findings

P5 P5

P6 P6

P7 P7

Fig. 4 Left and right graphs showpercent correct and mean latency(s), respectively, in eachcomponent in each session ofCondition 1 for each participant inExperiment 2. Note. MD morediscriminable, LD lessdiscriminable

294 Psychol Rec (2014) 64:287–300

observed in Experiment 1 in that accuracy scores were loweston the DA probes, particularly in the initial sessions. By theend of the condition, accuracy scores were higher on the CAprobes than the DA and DB probes. The bottom graph ofFig. 7 shows mean latency for Participant 5 on the stimulus-equivalence trials in each component of each session inCondition 3. Mean latency generally was higher in the LessDiscriminable component.

Additional Analyses As in Experiment 1, some aspects of theresults of Experiment 2 were not displayed graphically. First,performance (i.e., accuracy and latency) in the single sessionof Condition 2 was comparable to responding in the finalsessions of Condition 1. Second, accuracy scores on thebaseline trials in Condition 3 exceeded 90 % in each compo-nent for Participants 6 and 7, with one exception. In the thirdsession for Participant 6, accuracy decreased to 89 % in theLess Discriminable component, whereas it was 91 % in theMore Discriminable component. Third, an examination oflatency on the equivalence probes as a function of probe typeshowed no reliable differences. As in Experiment 1, thisabsence of reliable differences was due to the relatively highdegree of variability in these latencies in terms of probe type(i.e., mean latency was longer on the DA and CA probes in

each component, but there was sufficient variation on all threeprobe types). Fourth, as in Experiment 1, we conducted t teststo compare latencies across components during the last ses-sion of baseline and each equivalence-testing session. Theseresults were consistent with the graphical displays in Figs. 4,6, and 7. First, latency differences in baseline were significantfor each participant. Second, latency differences forParticipants 6 and 7 were significant throughout equivalencetesting. Third, latency differences for Participant 5 in equiva-lence testing were less reliable (i.e., these differences weresignificant in half of her sessions [during three of her first foursessions and four of her last six sessions]).

Discussion

Several aspects of the results of Experiment 2 were in generalagreement with Experiment 1. First, the reduced number ofbaseline discriminations decreased the duration of baseline fortwo of the three participants (i.e., Participants 6 and 7).Second, differential baseline acquisition was observed acrosscomponents for each participant as a function of manipulatingonly stimulus discriminability. Third, stimulus discriminabil-ity also produced differential response latencies for each par-ticipant in baseline. Fourth, and most important, there weredifferences in stimulus-equivalence performance across com-ponents. This differential equivalence performance wasindexed in different ways across participants. In terms of

P6

P7

Fig. 6 Mean latency (s) on the stimulus-equivalence probe trials in eachcomponent in each session of Condition 3 for Participants 6 and 7 inExperiment 2. Note. MD more discriminable, LD less discriminable

P6-LD (Session 3)

P6 P7

Fig. 5 Upper graph shows percent correct in each component in eachsession of Condition 3 for Participants 6 and 7 in Experiment 2. Lowergraph shows percent correct on each probe type in the Less Discriminablecomponent for Participant 6 in the third session of Condition 3.Note. MDmore discriminable, LD less discriminable

Psychol Rec (2014) 64:287–300 295

accuracy, Participant 5 displayed a substantial degree of de-layed emergence in the Less Discriminable component, andParticipant 6 displayed reduced maintenance of the equiva-lence relations in only the Less Discriminable componentduring the follow-up assessment. For Participant 5, althoughshe displayed relatively low accuracy scores on all three probe

types in the Less Discriminable component, the DA probetended to produce her lowest scores. In terms of responselatency, each participant (i.e., Participant 7 included)responded more slowly in the Less Discriminable componentduring equivalence testing. Thus, although there were someinconsistencies in the results of Experiment 2 (i.e., Participant5 required a considerable number of sessions in baseline andequivalence testing, and Participant 7 displayed high accuracyscores on all equivalence trials despite differential baselineacquisition), these findings generally support our predictions.Stimulus discriminability can influence both discriminationacquisition and equivalence formation, and a multiple sched-ule can provide a means of studying these effects. The resultsof Experiment 2 suggest that differential stimulus discrimina-bility alone can impact discrimination acquisition and equiv-alence formation.

General Discussion

The present research examined whether a multiple schedulecould be used to investigate the effects of stimulus discrimi-nability on the rapidity of discrimination acquisition in base-line and subsequent stimulus-equivalence formation, as mea-sured by the accuracy, latency, and maintenance of thesederived relations. Furthermore, the effects of stimulus discrim-inability were studied in isolation in Experiment 2. Acrossboth experiments, a more protracted acquisition period andless reliable equivalence formation generally was observed inthe component with more similar stimuli. This less reliableequivalence performance was measured in terms of loweraccuracy (i.e., both delayed emergence [Participants 1, 3,and 5] and maintenance of equivalence relations [Participant6]) and longer latencies ([i.e., all six participants]).

Despite the similar findings across the present experiments,there also were some inconsistencies. One participant (i.e.,Participant 4) learned the baseline discriminations in a com-parable time period across components. In both componentsduring initial testing, two participants (i.e., Participants 6 and7) displayed immediate emergence of equivalence relationsdespite differential acquisition of the baseline discriminations.Participant 7 displayed highly accurate choices during thefollow-up assessment. One participant (i.e., Participant 5)required a considerable number of equivalence-testing ses-sions before displaying reliable equivalence formation.Finally, unlike Participants 6 and 7, Participant 4 did notreceive follow-up testing despite displaying immediate emer-gence in both components. It is not possible at this point todetermine the source of the differences across the participantswho displayed immediate and delayed emergence. However,the three participants who displayed immediate emergencerequired fewer baseline sessions than the three participantswho displayed delayed emergence (i.e., Participants 4, 6, and

Per

cen

t C

orr

ect

0

20

40

60

80

100

DADBCA

P5-Lean

0

20

40

60

80

100

MD BL

LD BL

MD Equiv

LD EquivP5

0

2

4

6

8

10

12

MD

LD

Mea

n L

aten

cy (s

)

P5

Sessions

Per

cen

t C

orr

ect

P5-LD

Fig. 7 Upper graph shows percent correct in each session of Condition 3for Participant 5 in Experiment 2. Percent correct is shown by bothcomponent type (MD more discriminable and LD less discriminable)and trial type (BL baseline trials and Equiv stimulus-equivalence probetrials). Middle graph shows percent correct on each probe type in the LessDiscriminable component for Participant 5 in each session of Condition 3.Bottom graph shows mean latency (s) on the stimulus-equivalence probetrials in each component in each session of Condition 3 for Participant 5

296 Psychol Rec (2014) 64:287–300

7 required 12, 11, and 11 baseline sessions, respectively,whereas Participants 1, 3, and 5 required 18, 22, and 23sessions, respectively).

Despite the aforementioned inconsistencies, the presentfindings were sufficiently reliable to offer several observa-tions. Stimulus discriminability can impact both discrimina-tion acquisition and stimulus-equivalence formation. As al-ready noted, these findings extend the tentative results inHolth and Arntzen (1998) that seemed to show an inverserelation between rate of baseline acquisition and equivalenceformation. The more reliable demonstration of this inverserelation in the present research may be due to our manipula-tion of stimulus discriminability, our multiple-schedule proce-dure, and/or the examination of baseline procedures that pro-duced a longer period of discrimination acquisition (e.g., ouruse of more stimuli, our randomized training, our removal oferrorless-learning procedures). Regardless of the source ofthese differences, the present findings complement other in-vestigations that have examined the relation between stimulustype and stimulus equivalence (e.g., Holth and Arntzen 1998;O’Conner et al. 2009).

To interpret the present results fully, the behavioral historyof the participants must be considered. This consideration isnecessary because differences in equivalence performancecannot be predicted by only considering accuracy scores atthe end of baseline, given that they were equally high acrosscomponents at that point. Thus, the more reliable equivalenceformation in the More Discriminable component is interpret-able in terms of these baseline stimulus relations being corre-lated with a more extensive reinforcement history. That is, forthe five participants who displayed differential baseline acqui-sition, a considerably greater number of reinforcers wereearned in the More Discriminable component in baseline.Thus, interpreting the differential equivalence formation couldbe accomplished by considering the literature on behavioralmomentum and/or overtraining.

Behavioral momentum (e.g., Nevin 1992; Nevin and Grace2000) has been extended to delayed emergence in the contextof stimulus-control-topography analyses (Dube and McIlvane1996; McIlvane et al. 2000). A stimulus-control topographyrefers to a relation between a controlling feature of a discrim-inative stimulus and a response. Multiple stimulus-controltopographies can co-occur in a behavioral repertoire becausemultiple features of a discriminative stimulus can control aresponse (e.g., size, position, shape). The goal of baselinetraining is to establish experimenter-relevant, stimulus-control topographies (i.e., responses controlled by features ofdiscriminative stimuli the experimenter is trying to establish)and reduce, or eliminate, experimenter-irrelevant topogra-phies (i.e., responses under the control of other stimulusfeatures). The environmental changes in equivalence testing(e.g., novel stimulus arrangements) may occasion past,experimenter-irrelevant topographies, resembling the

resurgence of stimulus relations (e.g., Doughty et al. 2011;Wilson and Hayes 1996). When both experimenter-relevantand experimenter-irrelevant topographies occur in early probetrials, it is argued that the experimenter-relevant topographiespersist because of greater behavioral momentum establishedby a rich reinforcement history. Thus, it may be argued that thecontingencies in the More Discriminable componentestablished stimulus-control topographies with greater mo-mentum in the behavioral histories of our participants (cf.Doughty et al. 2005). At the very least, the present resultsconfirm the prediction offered by Dube and McIlvane that theoccurrence of delayed emergence would be related inverselyto the rate of discrimination acquisition in baseline. Theovertraining literature also can be used to understand thedifferences observed in our differential equivalence perfor-mance across components (e.g., Nakagawa 1999). Theovertraining literature applies to our findings because wecontinued to present the More Discriminable component inthe final baseline sessions as accuracy scores in the LessDiscriminable component continued to increase to equallyhigh criterion levels. Future research can address these inter-pretations by changing the baseline arrangements.

Regardless of how impaired equivalence formation shouldbe interpreted, it is indisputable that such impaired perfor-mance can take different forms. For example, the degree ofdelayed emergence in the present research depends on thelevel of analysis considered. If only overall accuracy is con-sidered, then the degree of delayed emergence in the LessDiscriminable component of Experiment 1 might be consid-ered moderate. That is, the overall accuracy scores in the firsttest session were 78 % and 83 % for Participants 1 and 3,respectively. If the accuracy scores are considered by probetype, then a higher degree of delayed emergence was ob-served. The accuracy scores of Participants 1 and 3 on theDA probes in the Less Discriminable component in the firstsession were 70 % and 74 %, respectively. From both per-spectives (i.e., overall accuracy and DA-probe accuracy),choices consistent with equivalence were well above chancelevels; however, they were not at levels usually consideredindicative of mastery (i.e., 90 % or above). In Experiment 2,impaired equivalence performance was evidenced in terms ofdelayed emergence only for Participant 5; however, her degreeof delayed emergence could be described as extreme, regard-less of the level of analysis taken. For example, in her firstsession, choices consistent with equivalence in the LessDiscriminable component (i.e., overall accuracy and bothDA-probe and DB-probe accuracy) were at or near chancelevels. In addition, her equivalence testing required a consid-erable number of sessions. The notion that impaired equiva-lence performance can take various forms was illustrated wellin Experiment 2. Despite the substantial delayed emergencefor Participant 5, equivalence occurred immediately forParticipants 6 and 7. For Participant 6, however, her reduced

Psychol Rec (2014) 64:287–300 297

accuracy in the Less Discriminable component during the firstfollow-up session provides a conceptual replication of theresults for these other participants (i.e., Participants 1, 3, and 5).

Under the present conditions, impaired equivalence forma-tion was observed most reliably in terms of response latency,consistent with observations documenting the greater sensi-tivity of response latency over response accuracy in equiva-lence tests (e.g., Wang et al. 2012). That is, regardless ofwhether the emergence of equivalence relations was immedi-ate or delayed across components, comparison selection wasslower in the Less Discriminable component for all six par-ticipants. Furthermore, the differences in latency across com-ponents were more pronounced during the earliest parts ofequivalence testing. These pronounced differential latenciesduring the earliest portions of testing replicate several findings(e.g., Bentall et al. 1998). Consistent with the material offeredabove, it could be argued that the environmental changes inequivalence testing (e.g., the novel stimulus combinations)served as a form of disruption such that they impacted re-sponse latency to a greater extent in the stimulus contextcorrelated with a less extensive reinforcement history. Theresult that latency was a more sensitive measure than accuracyduring equivalence testing seems to follow from the fact thatthere were differences only in latency across components bythe end of baseline. Participants were instructed that paymentwas determined by their accuracy scores, not by the speed oftheir choices. Future research can assess whether a speedcontingency might eradicate these latency differences acrosscomponents; however, such a contingency may increase thenumber of equivalence-inconsistent choices in the LessDiscriminable component (cf. Tomanari et al. 2006). Futureresearch also may include “word-like” stimuli in bothmultiple-schedule components, given that these stimuli werepresent only in the Less Discriminable component (i.e., toexamine any impact on latency of using stimuli that maydifferentially encourage “reading”).

The impaired equivalence performance observed in thepresent research could be attributed to nodal-distanceeffects in some instances. For example, the lowest equiv-alence accuracy scores in the Less Discriminable com-ponent for Participants 1, 3, and 5 were on the DA-probetrials. In showing reduced maintenance of equivalencerelations during the follow-up assessment, however,Participant 6 did not display such node effects. In addi-tion, the longer mean latencies of our participants in theLess Discriminable component were due to extendedlatencies on all three probe types. Thus, some aspectsof the present results are consistent with the literaturedocumenting the effects of nodal distance in equivalenceformation (e.g. Fields and Moss 2007). The absence ofmore robust nodal-distance effects in the present researchprobably is the result of only using probe trials thatdiffered by one node.

The present research expanded on the work of Catania et al.(1989) and successfully demonstrated the utility of a multipleschedule in the analysis of stimulus equivalence. Severalinvestigations have examined the contextual control of equiv-alence (e.g., Griffee and Dougher 2002), where the samestimuli are presented as samples and comparisons across trialswith different conditional discriminations reinforced in differ-ent contexts (e.g., A1–B1 on a red screen, and A1–B2 on agreen screen). The present multiple-schedule arrangement isdifferent in that the stimuli do not overlap across components.Such an arrangement allows for the examination of differentindependent variables (e.g., stimulus type, training structure,discrimination number) on distinct conditional discrimina-tions during both baseline training and equivalence testing(e.g., Green and Saunders 1998). Most important, this exam-ination can occur with individual participants in the same timeframe, allowing for a substantial level of experimental control.Multiple-schedule examinations of equivalence also raisechallenges that must be explored; in fact, as noted, the roleof using a multiple scheduled in producing the generallyunsuccessful results reported by Catania et al. was unclear.One challenge a multiple-schedule arrangement raises is thatits use increases the number of baseline discriminations and,as such, may present teaching and assessment difficulties. Forexample, four of our six participants required over 2,000 trialsin Condition 1, and Participant 5 required over 5,000 trials inCondition 3. Nevertheless, we envision great utility in explor-ing further the potential sensitivity inherent in the multiple-schedule arrangement developed here.

Acknowledgments Portions of this research were supported by theUndergraduate Research and Creative Activities program at the Collegeof Charleston through the Summer Undergraduate Research with Facultyprogram. Some of this researchwas presented at the annual meeting of theSoutheastern Association for Behavior Analysis, Wilmington, NC (Oc-tober, 2009), and at the annual meeting of the Association for BehaviorAnalysis International, San Antonio, TX (May, 2010). The authors thankVanessa Minervini and Melanie Pasheluk for their contributions to thisresearch. Rebecca Kastner now is at the University of Alabama.

Appendix

The following instructions were provided to each participantbefore the first session of Condition 1 in each experiment:

Welcome to our study on trial-and-error learning! In thisstudy, you will work alone on the computer. The com-puter will present you with many trials and alternatebetween red and yellow background screens. On eachtrial, you will be presented with seven items: one item inthe center of the screen surrounded by six items. [InExperiment 2, it noted that there were five items, one inthe center surrounded by four others]. Click the mouse

298 Psychol Rec (2014) 64:287–300

over any one of the surrounding items that you think“goes with” the item in the center, and one of two thingswill occur: (1) oval-shaped stars will appear or (2) thescreen will darken. If stars appear then you earned 1 centif the screen was yellow and 3 cents if the screen wasred. [In Experiment 2, it noted that the participant earned1.5 cents in each component]. If the screen darkened,then you earned 0 cents. You will receive many sessionseach day, and each one will last between 20 and 30 min.The computer will tell you when the session is over.Good Luck!

The following instructions were provided to each partici-pant before the single session in Condition 2 in eachexperiment:

In your next session, there never will be any stars or darkscreen. The computer still will record whether yourchoice is correct or incorrect. You still will receive 1 or3 cents for each correct choice. [In Experiment 2, itnoted that the participant still earned 1.5 cents in eachcomponent]. The computer will still tell you that thesession is over, and at the end of the session you will betold how much money you earned. Good Luck!”

The following instructions were provided to each partici-pant before the first session of Condition 3 in each experiment(as well as before the third session in Condition 3 inExperiment 2 for Participants 6 and 7):

In your next session, there never will be any stars or darkscreen. The computer still will record whether yourchoice is correct or incorrect. You still will receive 1 or3 cents for each correct choice. [In Experiment 2, itnoted that the participant still earned 1.5 cents in eachcomponent]. And, the computer still will tell you that thesession is over. The difference in this session is that youwill not be told howmuchmoney you earned; instead, atthe end of the study we will tell you how much youearned in this session. Good Luck!

References

Arntzen, E., & Lian, T. (2010). Trained and derived relations withpictures versus abstract stimuli as nodes. The PsychologicalRecord, 60, 659–678.

Bentall, R. P., Jones, R. M., & Dickins, D. (1998). Control over emergentrelations during the formation of equivalence classes: response errorand latency data for 5-member classes. The Psychological Record,49, 93–116.

Catania, A. C., Horne, P., & Lowe, C. F. (1989). Transfer of functionacross members of an equivalence class. The Analysis of VerbalBehavior, 7, 99–110.

Doughty, A. H., Cirino, S., Mayfield, K. H., da Silva, S. P., Okouchi, H.,& Lattal, K. A. (2005). Effects of behavioral history on resistance tochange. The Psychological Record, 55, 315–330.

Doughty, A. H., Kastner, R. M., & Bismark, B. D. (2011). Resurgence ofderived stimulus relations: replication and extensions. BehaviouralProcesses, 86, 152–155.

Dube, W. V. (1991). Computer software for stimulus control researchwith Macintosh computers. Experimental Analysis of HumanBehavior, 9, 28–30.

Dube,W. V., &McIlvane,W. J. (1996). Implications of a stimulus controltopography analysis for emergent behavior and stimulus classes. InT. R. Zentall & P. M. Smeets (Eds.), Stimulus class formation inhumans and animals (pp. 197–218). North Holland: Elsevier.

Fields, L., &Moss, P. (2007). Stimulus relatedness in equivalence classes:interaction of nodality and contingency. European Journal ofBehavior Analysis, 8, 141–159.

Green, G., & Saunders, R. R. (1998). Stimulus equivalence. In K. A.Lattal &M. Perone (Eds.),Handbook of research methods in humanoperant behavior (pp. 229–262). New York, NY: Plenum Press.

Griffee, K., & Dougher, M. J. (2002). Contextual control of stimulusgeneralization and stimulus equivalence in hierarchical categoriza-tion. Journal of the Experimental Analysis of Behavior, 78, 433–447.

Holth, P., & Arntzen, E. (1998). Stimulus familiarity and the delayedemergence of stimulus equivalence or consistent nonequivalence.The Psychological Record, 48, 81–110.

McIlvane, W. J., Serna, R. W., Dube, W. V., & Stromer, R. (2000).Stimulus control topography coherence and stimulus equivalence:Reconciling test outcomes with theory. In J. C. Leslie & D.Blackman (Eds.), Experimental and applied analysis of humanbehavior (pp. 85–110). Reno: Context Press.

Nakagawa, E. (1999). Acquired equivalence of discriminative stimulifollowing two concurrent discrimination learning tasks as a functionof overtraining in rats. The Psychological Record, 49, 327–348.

Nevin, J. A. (1992). An integrative model for the study of behavioralmomentum. Journal of the Experimental Analysis of Behavior, 57,301–316.

Nevin, J. A., & Grace, R. C. (2000). Behavioral momentum and the lawof effect. Behavioral and Brain Sciences, 23, 73–130.

O’Conner, J., Rafferty, A., Barnes-Holmes, D., & Barnes-Holmes, Y.(2009). The role of verbal behavior, stimulus nameability, andfamiliarity on the equivalence performances of autistic and normallydeveloping children. The Psychological Record, 59, 53–74.

Perone, M. (2003). Negative effects of positive reinforcement. BehaviorAnalyst, 26, 1–14.

Perone, M., & Courtney, K. (1992). Fixed-ratio pausing: joint effects ofpast reinforcer magnitude and stimuli correlated with upcomingmagnitude. Journal of the Experimental Analysis of Behavior, 57,33–46.

Pilgrim, C., & Galizio, M. (1996). Stimulus equivalence: A class ofcorrelations or a correlation of classes. In T. R. Zentall & P. M.Smeets (Eds.), Stimulus class formation in humans and animals (pp.173–195). North Holland: Elsevier.

Saunders, R. R., Wachter, J., & Spradlin, J. E. (1988). Establishingauditory stimulus control over an eight-member equivalence classvia conditional discrimination procedures. Journal of theExperimental Analysis of Behavior, 49, 95–115.

Sidman,M. (1994). Equivalence relations and behavior: A research story.Boston: Authors Cooperative.

Sidman, M. (2000). Equivalence relations and the reinforcement contingen-cy. Journal of the Experimental Analysis of Behavior, 74, 127–146.

Sidman, M., & Tailby, W. (1982). Conditional discrimination vs.matching to sample: an expansion of the testing paradigm. Journalof the Experimental Analysis of Behavior, 37, 5–22.

Sidman, M., Kirk, B., & Willson-Morris, M. (1985). Six-member stim-ulus classes generated by conditional-discrimination procedures.Journal of the Experimental Analysis of Behavior, 43, 21–42.

Psychol Rec (2014) 64:287–300 299

Spencer, T. J., & Chase, P. N. (1996). Speed analyses of stimulusequivalence. Journal of the Experimental Analysis of Behavior, 65,643–659.

Tomanari, G. Y., Sidman, M., Rubio, A. R., & Dube, W. V. (2006).Equivalence classes with requirements for short response latencies.Journal of the Experimental Analysis of Behavior, 85, 349–369.

Wang, T., McHugh, L. A., & Whelan, R. (2012). A test of the discrim-ination account in equivalence class formation. Learning andMotivation, 43, 8–13.

Wilson, K. G., & Hayes, S. C. (1996). Resurgence of derived stimulusrelations. Journal of the Experimental Analysis of Behavior, 66, 267–281.

300 Psychol Rec (2014) 64:287–300