The effects of vocoder filter configuration on tactual perception … · 2006. 2. 22. ·...

14
Department of Veterans Affairs Journal of Rehabilitation Research and Development Vol . 26 No . 4 Pages 51—64 The effects of vocoder filter configuration on tactual perception of speech Rebecca E . Eilers, PhD ; Ozcan Ozdamar, PhD ; D . Kimbrough Oiler, PhD ; Edward Miskiei, MS; Richard Urbano, PhD Departments of Psychology, Pediatrics and Biomedical Engineering, University of Miami, Miami, FL 33101 Abstract —Perception of synthetic speech continua through the sense of touch and audition was compared utilizing a 32-channel spectrally-oriented electrocutaneous vocoder display and standard auditory psychophysical procedures . Perception of a consonantal and a vocalic continuum was evaluated utilizing three vocoder filter configurations : logarithmic, linear, and average (geo- metric mean of logarithmic and linear) . Results indicated a close correspondence between tactual and auditory discrimination and identification for the vowel (/a/-/a/) and the consonant (/sta/-/sa/) continuum regardless of the filter characteristics. Key words : artificial hearing, electrocutaneous vocoder, sensory aids—deaf, synthetic speech, tactile perception, vocoder filter configuration. INTRODUCTION Tactual vocoders have recently shown promis- ing results as artificial hearing devices to be used in speech transmission for the deaf (9,11,18,19,20,21). These devices function by displaying spectral infor- mation dynamically on the skin in a manner similar to that produced by the human cochlea . The use of vocoders for artificial hearing assumes that if speech information can be transmitted to the brain through the tactile sense, then it can be properly perceived Address all correspondence and reprint requests to : Rebecca E . Eilers, PhD, University of Miami, Mailman Center for Child Development, P .O . Box 016820, Miami, FL 33101 . and decoded as speech after some training . Many experimental and clinical data provide support for the efficacy of such an approach (1,2,3,22,27,28). Many aspects of vocoder design, however, have never been tested rigorously under experimental conditions . The justification for the use of tactual vocoders depends in part upon the demonstration of the similarities between the auditory and tactual perception of speech. In recent research we have found similarities in boundary locations and discrimination functions for the auditory and tactual perception of a consonant and a vowel contrast (8) . These results have sug- gested the intriguing possibility that some aspects of speech processing may operate independently of the auditory system . Moreover, the results provide further support for a sensory substitution approach to the problem of deafness. Although similarities between tactual and audi- tory perception have been documented, little infor- mation is available to elucidate the underlying mechanisms responsible for the similarities . The correspondence between tactual and auditory per- ception of speech may be mediated by a number of factors, including cross-modally applicable laws of perception (such as Weber's law), and similarities in acoustic frequency coding in tactual vocoders and the auditory system . The present research represents an attempt to consider the latter of these possibili- ties . Our previous demonstrations of cross-modal similarities in speech processing have been based on 51

Transcript of The effects of vocoder filter configuration on tactual perception … · 2006. 2. 22. ·...

  • Department ofVeterans Affairs

    Journal of Rehabilitation Researchand Development Vol . 26 No. 4Pages 51—64

    The effects of vocoder filter configuration ontactual perception of speech

    Rebecca E. Eilers, PhD ; Ozcan Ozdamar, PhD; D. Kimbrough Oiler, PhD ; Edward Miskiei, MS;Richard Urbano, PhDDepartments of Psychology, Pediatrics and Biomedical Engineering, University of Miami,Miami, FL 33101

    Abstract—Perception of synthetic speech continuathrough the sense of touch and audition was comparedutilizing a 32-channel spectrally-oriented electrocutaneousvocoder display and standard auditory psychophysicalprocedures . Perception of a consonantal and a vocaliccontinuum was evaluated utilizing three vocoder filterconfigurations : logarithmic, linear, and average (geo-metric mean of logarithmic and linear) . Results indicateda close correspondence between tactual and auditorydiscrimination and identification for the vowel (/a/-/a/)and the consonant (/sta/-/sa/) continuum regardless ofthe filter characteristics.

    Key words : artificial hearing, electrocutaneous vocoder,sensory aids—deaf, synthetic speech, tactile perception,vocoder filter configuration.

    INTRODUCTION

    Tactual vocoders have recently shown promis-ing results as artificial hearing devices to be used inspeech transmission for the deaf (9,11,18,19,20,21).These devices function by displaying spectral infor-mation dynamically on the skin in a manner similarto that produced by the human cochlea. The use ofvocoders for artificial hearing assumes that if speechinformation can be transmitted to the brain throughthe tactile sense, then it can be properly perceived

    Address all correspondence and reprint requests to : Rebecca E . Eilers,PhD, University of Miami, Mailman Center for Child Development,P .O . Box 016820, Miami, FL 33101 .

    and decoded as speech after some training . Manyexperimental and clinical data provide support forthe efficacy of such an approach (1,2,3,22,27,28).Many aspects of vocoder design, however, havenever been tested rigorously under experimentalconditions . The justification for the use of tactualvocoders depends in part upon the demonstration ofthe similarities between the auditory and tactualperception of speech.

    In recent research we have found similarities inboundary locations and discrimination functions forthe auditory and tactual perception of a consonantand a vowel contrast (8) . These results have sug-gested the intriguing possibility that some aspects ofspeech processing may operate independently of theauditory system . Moreover, the results providefurther support for a sensory substitution approachto the problem of deafness.

    Although similarities between tactual and audi-tory perception have been documented, little infor-mation is available to elucidate the underlyingmechanisms responsible for the similarities . Thecorrespondence between tactual and auditory per-ception of speech may be mediated by a number offactors, including cross-modally applicable laws ofperception (such as Weber's law), and similarities inacoustic frequency coding in tactual vocoders andthe auditory system . The present research representsan attempt to consider the latter of these possibili-ties .

    Our previous demonstrations of cross-modalsimilarities in speech processing have been based on

    51

  • 52

    Journal of Rehabilitation Research and Development Vol . 26 No . 4 Fall 1989

    tactual vocoders that employ logarithmic filter spac-ing . Research exploring the representation of fre-quency on the basilar membrane in animals suggeststhat a near logarithmic coding may approximateorganization of frequency in the cochlea (5,10) . Anexception to this general principle can be found atfrequencies below 1 kHz (17), where the relationshipbetween frequency and position approaches linear-ity . To date, designs of tactual vocoders, both in ourlaboratories and elsewhere, have generally usedlogarithmic filter spacings (e .g ., 1, 11) . Given thegeneral similarity of the frequency coding incorpo-rated in a logarithmic tactual vocoder system andaudition, one might predict that similarities acrosstactual and auditory perception would be foundusing existing devices.

    The present research was designed to evaluatethe effect of tactual vocoder filter configuration onspeech transmission . In particular, the researchsought to assess the possibility that similarities ofprocessing in taction and audition seen in ourprevious work may have depended upon the loga-rithmic filter configuration . The present work uti-lized three tactual vocoder filter configurations tocompare with audition—logarithmic, linear, andaverage (mean of the logarithmic and linear configu-rations) . Both labelling and discrimination testingwere conducted for the consonant and vowel stimulifrom the earlier study . The techniques and stimuliwere chosen to be representative of standard speechscience tools for evaluating speech transmission inthe auditory modality.

    METHOD

    SubjectsSubjects were four healthy, normal adults with

    no history of dermal disease or trauma . All subjectshad more than 40 hours of experience with thepresently-employed scheme of electrocutaneousstimulation in both psychophysical and speech per-ception experiments.

    ApparatusAuditory . Auditory testing of the speech con-

    tinua was conducted in a sound-treated booth.Speech syllables were presented through a highquality speaker at 70 dB sound pressure level (SPL) .

    Tactual . Tactual stimulation was providedthrough a 32-channel computer-driven electro-cutaneous display as described in detail elsewhere(23) . Briefly, the system consisted of a minicom-puter system (PDP-11/23 +) that controlled a 32-channel electrical stimulation system . Each stimula-tion channel was independently controlled through aparallel output port . Selected channel data wereconverted to frequency-modulated biphasic pulsesthrough D/A converters, voltage to frequency con-verters, and constant current pulse generators.

    Stimulation was presented to the abdomenthrough 32 linearly-arranged electrodes . The elec-trodes were gold-plated (8 mm2) and set on aconductive rubber ground plane . They were sepa-rated by 1 .5 cm (center to center) and worn on anelasticized belt supplied by Tacticon Corp . (25). Atany point in time, stimulation at a particularelectrode site was proportional to the energy in thespeech signal within the frequency range of thecorresponding filter . Absence of energy in a particu-lar frequency band would result in no stimulation.Thus, depending on the input, between 0 (a silentinterval) and 32 electrodes could be simultaneouslyactivated and each electrode could provide stimula-tion at varying electrical frequencies . The drivecircuitry of the display generated constant-currentbiphasic pulses . Pulse height was set at + 10 mA;pulse width was under subject control. Typically,subjects adjusted pulse width between 13 and 18 ps.Pulse frequency was proportional to intensity ofstimulation . Electrical stimulation for an activeelectrode varied between 10 and 400 Hz, thefrequency range identified to include maximum skinsensitivity (4).

    Three filter configurations were designed andnamed according to the bandwidth calculation pro-cedure described below . In the log configuration,the frequency range (100-4,500 Hz) was divided into32 logarithmically-equal bandwidth regions . Thisconfiguration assigns more channels to the lowfrequency end of the spectrum than to the higherend . The evidence cited suggests that above 1 kHz,the cochlea of a mammal is characterized by aroughly log configuration . It is argued that the logconfiguration extracts maximum information withthe minimum number of channels (or hair cells)from natural acoustic stimuli (5).

    In the linear filter configuration, the frequencyrange (100-4,500 Hz) was divided into 32 linearly-

  • 53

    EILERS et al .

    Effects of Vocoder Filter Configuration

    equal bandwidth regions . This configuration isdifferent from that found in the cochlea in thatthere is no special emphasis given to low frequen-cies .

    It is reasonable to assume that speech contrastswhich differ primarily in the high frequencies will bebetter transmitted by a linear than a log configura-tion and that those with low frequency informationwill be better transmitted by the log configuration.A compromise average configuration (arithmeticmean of log and linear) was also tested to provide amore parametric perspective on the role of filter

    configuration in speech perception . In fact, anatom-ical and physiological data show that the frequencymapping of the cochlea is not exactly logarithmic,but falls intermediate between log and averageconfigurations . Greenwood's formula (14) approxi-mates frequency mapping in many species, includingman . Figure 1 shows a cochlear filter plot based onGreenwood's formulation along with the vocoderfilter configurations chosen for study.

    In each configuration, bandpass characteristicsof the adjacent filters were lined up precisely (i .e .,the upper corner frequency of bandpass filter n

    10 4

    Ai

    II _

    10 2

    r li

    ~

    i0 2

    CD

    i

    1

    i

    T

    1

    1

    1

    1 , 1 i

    I

    10 .12 14 16 18 20 22 24 26 28 30 32

    Cncnne °\L tuberFigure I.Greenwood's cochlear (C) function compared to log (Lg), linear (Ln), and average (A) filter configurations used in current study.The length of the lines is equal to the filter bandwidth .

  • 54

    Journal of Rehabilitation Research and Development Vol . 26 No. 4 Fall 1989

    equals the lower corner frequency of bandpass filtern + 1) so that there were no gaps or overlaps acrossthe frequency range. The characteristic variable foreach filter configuration was the bandwidth.Bandpass filters for all filter configurations and a 50Hz lowpass filter were designed by using the AtlantaSignal Processing, Inc . Digital Filter Design Pack-age . Bilinear transformation of elliptic analog proto-types was employed to create Infinite ImpulseResponse filters . Filter orders were chosen so thatthe maximum ripple in the stopband region of allfilters was less than – 40 dB . Filter order was 6 forall the bandpass filters . The order of the lowpassfilter was 4. In order to maintain filter roll-offcharacteristics consistently across filter configura-tions, filter cutoff frequencies were set to the filtercenter frequency plus or minus one-half the filterbandwidth . Slopes of the filter skirts were dependenton the filter configuration . For the logarithmicconfiguration, the slope was 35 dB/octave or higherfor all filters . In the linear configuration, the mostshallow slope was 20 dB/octave (lowest frequency)and the steepest slope was 1,240 dB/octave (highestfrequency) . In the average configuration all slopeswere intermediate, falling between the values for thelog and linear configurations.

    StimuliTwo 11-step speech continua were synthesized

    using Klatt's (15) routines. Klatt's software packageallows manipulation of a wide variety ofacoustic/articulatory parameters (e .g., amplitude ofvoicing, fundamental frequency, F l frequency, F 2frequency, etc .) consistent with accepted theory ofspeech acoustics . Waveforms were generated at a 10kHz sampling rate . The vowel continuum ranged in11 steps from /a/ (step 1) to /a/ (step 11).Intermediate stimuli were created by changing thefirst three formants in equal frequency incrementsfrom 742, 1,144, 2,517 Hz for F l , F2, and F3 ,respectively for step 1 ; to 450, 1,450, and 2,450 forstep 11 . All stimuli were 350 ms in duration, and allincluded an identical natural-sounding rise-fall pat-tern of amplitude and F o .

    For the consonantal continuum, step 1 was thesyllable /sta/ and step 11 /sa/ . The percept of /t/ in/sta/ resulted from 130 ms of silence insertedbetween the /s/ and la/ . Across the continuum, thesilent interval was shortened by 13 ms at each stepso that step 11 contained no silent interval . All of

    the /sa/-/sta/ stimuli contained identical /s/ and/a/ segments as well as identical rise-fall patterns ofamplitude and Fe0 . The two speech continua were thesame as those studied in Eilers et al. (8).

    Synthetic speech stimuli were software-filteredin non-real time through the three different filterconfigurations : log, linear, and average . The outputof the filter program was stored on disk and servedas input to the tactual display . Figures 2a, 2b, and2c show the endpoint stimulus for each filterconfiguration as a three-dimensional dynamic plotwith the z dimension representing pulses/second.

    The stimuli utilized in the study provided aninteresting test of the role of filter configuration inthe transmission of syllables . The variations informant frequencies across the 11 tokens of thevowel continuum produced different patterns ofstimulation for each of the three filter configura-tions . These patterns can be seen in Figures 2a, 2b,and 2c. From the figures, it is clear that the primaryvariations appear to concern F l . For each voweltoken, the most intense stimulation occurred at F i(450 Hz at step 11, /al, and 742 Hz at step 1, /a/).For the log configuration, stimuli corresponding to450 and 742 Hz were separated by 7 channels, whilein the linear and average cases, the differences were4 and 5 channels, respectively. Consequently, sub-jects had a differing basis for judgment of variationsin location of F l stimulation depending upon thevocoder configuration. Smaller differences in therelative position of formants in the endpoint stimuliacross the three filter configurations can be foundfor F 2 . It is most likely, however, that subjects relyon both F i position and amplitude to judge vowelidentity, since the stimulation resulting from thisformant is high amplitude relative to other parts ofthe signal.

    The stimuli of the consonant continuum like-wise differed across the three configurations, al-though an understanding of the significance of thedifference requires attention to the relationship inspace and time between the fricative /s/ and thevowel /a/ in the tactile representations . The task ofsubjects in labelling the consonant continuum can bethought of as one of noting the relative position intime of the /s/ and the /a/, a task that may havebeen influenced by backward masking of the rela-tively less intense /s/ by the relatively more intense/a/ . Furthermore, the masking may have beenheightened by the fact that a larger number of

  • 55

    EILERS et al .

    Effects of Vocoder Filter Configuration

    /sta/

    aim

    N

    RaWrOMN710000001r,0

    g

    I' a1%OftlIM

    %aHIP'timinbiftinia .00ai:2ammo

    %tivEzrimalp."Eiin%zipsmum%invasy

    =Xmmumum8

    16

    24

    32

    ®~ p CHANNEL UMBER

    CHANNEL NUMBER

    Figure 2a.Three-dimensional plots of endpoint stimuli for the continuum for the log filter bank showing dynamic representation on the skin.

    The z direction represents pulses/second.

    /e/

    CHANNEL NUMBER

    0 .1

    16

    24

    CHANNEL NUMBER

    32

    y 0 .5

    0 .4

    /a/

    0,,

    0

    0.4

  • 56

    Journal of Rehabilitation Research and Development Vol . 26 No . 4 Fall 1989

    I

    /sta/

    0.5

    0 .4

    0.3c

    0. 2 2

    0 .1

    2

    0 .4

    e .-

    32

    sal/a /

    I1WWX===== =mss===

    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXX

    — ...fin-ri ;;G We'S Oileaseawng;m--s =~m_g= ..

    =

    %,

    rnrnminnirmimpimmillaftsimmmiwAl

    Egi

    .. –.'24

    2

    CHANNEL NUMBER

    .--iii ry~

    21111L'ItIIIiiIm n............................-.ssina

    mimm

    _~ 'SSA_-~~'- ---~-,IM6

    16

    24

    2

    CHANNEL NUMBER

    0 .6

    0.4

    1 6

    0 .4

    0 .5

    Figure 2b.Three-dimensional plots of endpoint stimuli for the continuum for the average filter bank showing dynamic representation on theskin . The z direction represents pulses/second .

  • 57

    EILERS et al .

    Effects of Vocoder Filter Configuration

    /a/

    _

    S='if Xmas EM =a

    2M = =

    smoms

    ==a

    -

    MMMMMMMMMMMMMMMM MMMMM

    MMmMMIMPOO, M_1NE

    —•_=~

    _ —~g i =5

    tr,ffffs:=404%mmwn%_.,, S ==0.1

    MMMMMMMMMMMMMMMMMMMMMM

    6

    6

    24

    32

    CHANNEL NUMBER

    /sta/0.5

    JS

    A\\

    Z

    0 .2

    0.5

    0wU)

    L

    *,

    Nkmg

    mmmm*mmmmmmmommigii miaow arm

    M°S°S= MM MMMMMMMM =MEWMMIMMI

    MMMMMMMMMMMMMMMMMMMMMMMMMMMM

    mm..mmmmm.:--:a:--= ~_ _ •

    71i --E- SS.SSSS....S~==-==i;

    l~'\I-LT

    smax.tem

    0 .1

    0 .4

    16

    24

    CHANNEL NUMBER

    32

    Iv

    /a//sa/

    0.5

    0 .5

    Y

    0.4

    3 W~

    0,

    LU

    0 .2 Z1"°

    IrMUMMWWRINIMMMMMMMM

    -- —~°SSSASS=SS~CSSS=SSV55 ZZZZZZZZZZZZ_MMMMMMuXRE

    y3- -

    lhawsm:mn

    11~ 0.4wm8=x!MM

    -

    MMMMMMr"

    ftimmirenimiO.MMMW ImpoNewwwoeeemiMMMMMMMMM

    mO

    __m

    m =5=4ig=ii

    ftE16

    24

    2

    CHANNEL NUMBER

    -= ===================

    m ammemgC_E S!aM_lS:

    t,40 oi %_

    NarsimmmommmOMIOM

    Siff ~EE 1---

    ---OWWWWWTUMMIMMMMMEEMM

    MMM=='m'

    - - -CM ~M~SSM ~M=516

    24

    32

    CHANNEL NUMBER

    o .1

    II

    0.1

    0 .4

    0 .3

    0 .2

    Figure 2c.

    Three-dimensional plots of endpoint stimuli for the continuum for the linear filter bank sho

    The z direction represents pulses/second .

    ng dynamic representation on the skin .

  • 58

    Journal of Rehabilitation Research and Development Vol . 26 No . 4 Fall 1989

    channels was activated for the vowel than for theconsonant (see Figures 2a-2c) . Studies of tactileperception attest to the strength of backward mask-ing and suggest it is enhanced by the intensificationof maskers and increasing the number of sites ofmasking stimulation (6,7,12).

    The factors influencing potential masking dif-fered greatly across the three filter configurations.The smallest number of activated /s/ channelsoccurred in the log configuration (2 channels), whilethe largest number occurred in the linear configura-tion (5 channels).

    The larger number of channels activated in thelinear case may have provided not only a greaterresistance to masking, but the potential maskingdifference may also have been enhanced by the factthat the location of the primary /a/ stimulation (theF 1 /F2 complex in Figure 2a) extended up to channel22-23 for the log configuration, only nine channelsaway from the location of the /s/ . The sameformant complex in the linear configuration (seeFigure 2c) extended only to channel nine, a full 19channels away from the location of the /s/ . Giventhese distances, potential masking effects wouldhave to apply across the midline of the abdomen forthe linear configuration, while the log configurationwould have included, at least partially, ipsilateraleffects . Electrotactile perception has been shown toinclude greater masking for stimuli that are closertogether (24,26), and vibrotactile research has shownincreased masking by ipsilateral stimuli (13) . These,and other parallel lines of reasoning, suggest thatthe effect of masking for perception of the /sa/-/sta/ continuum may have been substantially differ-ent for the different filter configurations.

    Procedure for identification (labelling) testingTesting was done in a sound-treated booth for

    both tactual and auditory conditions . For tactualconditions, subjects followed the warm-up proce-dures described in detail in Bull et al . (4), except that32 channels, rather than a single channel, wereactivated during warm-up . In the present study,subjects were permitted to adjust pulse width toachieve a comfortable and clear level of stimulation.Warm-up procedures occurred before each tactualsession. Prior to Experiment 1, subjects were famil-iarized with the test stimuli in both modalities byparticipating in a standard adaptive discriminationparadigm (16) . In this paradigm, subjects were

    presented with the two most distant (1 and 11)stimuli in a two-interval forced-choice task . Theywere asked to indicate which of two intervals (onecontaining a `target' stimulus and one a 'compar-ator') contained the targeted endpoint (e .g ., 1)stimulus . Feedback was provided on each trial.Correct responses led to the presentation of pairsthat were more similar, and incorrect responses ledto larger differences between stimuli . A computer-implemented stopping rule determined a thresholdof Just Noticeable Differences (JND) . These wereautomatically recorded by the computer . Aftersubjects achieved consistent discrimination scoresfor each filter configuration and for the auditorycondition, they proceeded to Experiment 1 . JNDvalues for these stimuli may be found in Ozdamar etal. (23).

    Order of presentation of modality (auditoryand tactual), continuum (vowel and consonant), andfilter configuration (logarithmic, average, and lin-ear) were counter-balanced across subjects . Tactualand auditory discrimination were tested in separatesessions . The identification paradigm was controlledby computer through a subject-activated responsebox . To begin a trial, the subject depressed a startbutton and two repetitions of step 1 or step 11stimuli (/a/ or /a/ from the vowel continuum or/sal or /sta/ from the consonant continuum) werepresented auditorially or tactually . These presenta-tions were separated by 500 ms . The subject indi-cated by pressing one of two buttons on theresponse box whether the endpoint was step 1 orstep 11 . Lights on the response box informed thesubject if the response was correct . Subjects wererequired to respond correctly to 9 out of 10consecutive trials before proceeding with labelling ofintermediate stimuli . Typically, subjects achievedthis criterion in 9 to 11 trials . After the criterion wasmet, feedback was discontinued, and the 11 stimuliwere presented in random order a total of 20 timeseach . Subjects continued to respond by pressing oneof the two buttons for each trial, indicating therebythat the stimulus felt or heard was, or was not, likethe step 1 stimulus.

    Procedure for discrimination testingDiscrimination testing followed the same gen-

    eral procedures described for identification testing.Order of presentation of modality (auditory andtactual), continuum (vowel and consonant), and

  • 59

    EILERS et al .

    Effects of Vocoder Filter Configuration

    filter configuration (log, average, and linear) werecounter-balanced across subjects . The discriminationparadigm was under computer control . Pairs ofstimuli selected across the continua were presentedfor discrimination in a same-different paradigm.Half of the trials contained two identical stimuli,while half contained two different stimuli . Subjectspressed button 1 on the response box to indicatedifferent, and button 2 to indicate same. Based onprevious studies (8), step sizes were chosen tomaximize discrimination peaks. For tactual presen-tation, the step size was three (stimulus 1 versus 4, 2versus 5, etc .), while the auditory presentation wastwo (1 versus 3, 2 versus 4, etc .) . Members of thetest pair were separated by 500 ms and feedback wasnot given . Subjects received 40 trials per stimuluspair; 20 were the same and 20 were different.

    RESULTS

    Figures 3a and 3b show the labelling functionsfor each of the filter configurations for eachcontinuum. For purposes of statistical analysis, step11 of /sa/-/sta/ was compared to step 1 of /a/-/a/and so forth through the continuum, so that step 1of /sa/-/sta/ was compared with step 11 of /a/-/a/.Comparisons were done in this manner to allow forthe statistical analysis of function shapes.

    The labelling functions for both continua pro-vided no support for the suggestion that similaritiesof tactual and auditory processing were dependenton the log filter configuration used in the previouswork. In fact, all three filter configurations pro-duced labelling functions that approximated theauditory function.

    For statistical analysis, each subject's raw datawere fitted with a third order polynomial to smooththe identification functions . After deriving the re-gression equation, 25 percent, 50 percent, and 75percent values on the psychometric function (pointsat which 25, 50 and 75 percent of stimuli wereidentified as /sa/ or /a/) in steps were calculated foreach subject for each identification function . Thesevalues in steps from endpoints constituted thedependent variables for an analysis of variance withthree factors : Continuum (consonant and vowel),Filter Configuration/Modality (log, linear, average,and auditory), and Psychometric Value (25 percent,

    50 percent, and 75 percent points) . The analysisyielded the expected main effect for PsychometricValue [F(2,6) = 592, p< 0 .001], an effect that merelyreflects slope of the functions . More important wasthe lack of a significant main effect of FilterConfiguration/Modality, a manifestation of thesimilarity of the psychometric function values in theauditory and tactual conditions . Examination of thefigures confirms the consistency of labelling of thecontinua in all three tactual conditions . The maxi-mum difference between the two most divergentidentification functions (auditory or tactual), at anypoint on the function, did not exceed one step.

    The similarity in labelling functions acrossaudition, and the three tactual filter configurationscannot be accounted for solely by stimulus-indepen-dent response strategies (e .g., subjects might attemptto parse each continuum as close to the endpoint aspossible), since the two continua have differentlabelling function shapes . Instead, subjects musthave responded to the acoustic and tactual proper-ties of the speech signal in the labelling task basedon information available from each filter configura-tion . Furthermore, other data (23) show that sub-jects' labelling boundaries were quite consistentacross a number of speech continua, a pattern thatcannot be accounted for by stimulus-independentstrategies.

    Two significant two-way interactions with Psy-chometric Value were obtained . The interaction ofContinuum and Psychometric Value [F(2,6) =11 .15,p< 0 .001] reflects the fact that the continua weretreated similarly at the 75 percent point on thepsychometric function, but diverged at the 50percent and 25 percent points . In other words, thelabelling functions for the two continua had differ-ent shapes . The interaction of Filter Configuration/Modality and Psychometric Value [F(6,18) = 3 .25,p < 0 .02] reflected the convergence of configurations/modalities at the 75 percent value of the functionsand divergence at the 50 percent and 25 percentvalues . These differences in the shape of thefunctions may reflect the somewhat steeper slope ofthe auditory function.

    While Filter Configuration/Modality and Con-tinuum interacted with Psychometric Value, therewas no significant three-way interaction of FilterConfiguration/Modality by Continuum by Psycho-metric Value [F(6,18) =1 .70, p< 0 .18] . The lack of athree-way interaction indicates that the consonantal

  • 60

    Journal of Rehabilitation Research and Development Vol . 26 No . 4 Fall 1989

    LABELLING TH E /stca/-/so/ CONTINUUM

    0 LOG FILTER

    • AVERAGE FILTER

    +LINEHR FILTER

    1

    2

    3

    4

    5

    6

    7

    8

    STEP NUMBER

    Figure 3a.

    Labelling function for the /so/4sta/continuum.

    A

    10

    11

    20

    100

    AO

    80

    LABELLING THE /o/-// CONTINUUM

    Figure 3b.Labelling function for the /a/4o/continuum.

    R13I TORY

    o LOG FILTER

    ~ AVERIGE FILTER

    + LINEAR FILTER

    0

    7

    8

    9

    10

    STEP NUME3ER

  • 61

    eoEmset al .

    Effects of Vocoder Filter Configuration

    DISCRIMINATION ACROSS T!OO

    80

    2O

    0

    HE /sto/-/sca/ CONTINUUM

    PLO I TORY0L[G FILTER- AVERAGE FILTER+LINEFR FILTER*!torU fpr-o log forn average fpr

    linear fpr

    - 0

    2

    3

    4

    5

    6

    7

    8

    g

    1O

    STIMULUS STEP

    Figure 4a.Pair-wise discrimination across the/su/-/ua/coobuuum.

    DISCRIMINATION ACROSS THE /n/-/a/ CONTINUUM

    '

    A Q

    QO

    8O

    Q

    RUOlTORY°LOG FILTER^ AVERAGE FILTER+ LINEAR FILTER*auditory fpro log forIn average fpr

    ```

    \``

    +---+^~~ \`

    .'~_~_ .

    ~ ..\'

    "`

    b~

    '

    '' ~

    5

    6

    7

    8

    9

    STIMULUS STEP!O

    !O

    4!

    2

    3Figure 4b.Pair-wise discrimination across the/a/-/o/ continuum .

  • 62

    Journal of Rehabilitation Research and Development Vol . 26 No. 4 Fall 1989

    and vowel continua did not have significantlydifferent shapes as a function of configuration type.

    Discrimination functions for the two continuaare shown in Figures 4a and 4b . It is apparent fromFigure 4a that the /sa/-/sta/ continuum yielded anauditory peak in the discrimination function nearthe phonemic auditory category boundary, the re-gion in which subjects split the continuum by usingdifferent labels to define adjacent stimuli . Theboundary was defined as the value in steps at anordinate value of 50 percent . Within-category dis-crimination was, in general, poor . The labellingfunctions for each of the three tactual filter configu-rations showed tactual /sa/ falling steeply betweensteps 10 and 6, while the discrimination peak for allconfigurations was located between steps 9 and 6 . Inboth modalities for all filter conditions, discrimina-tion peaks occurred in the region of categorylabelling boundaries and discrimination within cate-gories was relatively poor . This pattern of resultswas consistent both tactually and auditorially withthe results of Eilers et al. (8) and fit a model ofcategorical perception for the consonantal contin-uum. In contrast to the consonant functions, dis-criminative performance on the vowel continuum(Figure 41)) was well above chance at each stimulusinterval for both modalities and all tactual filterconfigurations . A region of relative discriminativesensitivity coincided with the steeply falling regionof the labelling function . The middle of the contin-uum showed the poorest discriminability in bothmodalities . The high discriminability, both audi-torially and tactually, on the right-hand side of thevowel continuum is indicative of the presence ofanother phonemic (or potentially phonemic) bound-ary between the step 11 vowel [a], as in the word"just" pronounced with low stress and a fairly highcentral vowel in the sentence, "He's just a man,"and the intermediate step mid-central vowel [A], asin, "He's a just man ." A comparison of Figures 4aand 4b indicates the stronger tendency towardcategorical perception of consonants than of vowelsfor both the auditory and tactual modalities irre-spective of the filter configuration considered.

    To assess potential differences among the threefilter configurations, two separate analyses of vari-ance were conducted, one for each continuum.These analyses had two factors, Filter Configuration(log, linear, and average) and Pairs (eight) . Theseanalyses yielded significant effects only for Pairs

    [F(7,21) = 13 .30, p < 0.0001 for /sa/-/sta/;F(7,21) = 7 .10, p < 0 .0002 for /a/-/o/] . The dif-ferences between filter configurations were notsignificant. It was not possible to assess the statisti-cal significance of differences between the auditoryand tactual modality, since the number of stepsbetween contrasting stimuli differed across modal-ity. Nevertheless, it is clear that tactual discrimina-tion functions differed somewhat from auditorydiscrimination functions of the same stimuli.

    DISCUSSION

    Both labelling and discrimination functions fortwo continua, one consonantal and one vocalic,were similar in two sensory modalities, tactual andauditory . Furthermore, the similarities between themodalities did not seem to depend upon the filterconfiguration used in the tactual presentation sys-tem, since three different configurations all showedthe similarity with audition . While the log andaverage filter configurations simulate cochlear func-tion to a greater extent than linear arrays, evidencewas not found to suggest that a linear filterconfiguration limits or distorts perception of theconsonantal and vocalic continua studied here . Itappears that as long as the filter configuration doesnot eliminate critical, spatial, or temporal cues(analogous to spectral and temporal cues in audi-tion), there is considerable flexibility in the tactilesystem for perception of speech-like events repre-sented by a broad array of potential patterns.

    The labelling and discrimination functions weresimilar but not identical for the tactual and auditorymodalities . In particular, they differed in the realmof discriminatory precision or sensitivity . Auditorydiscrimination was superior, yielding higher hitrates, lower false positive rates, and discriminationof smaller step sizes . These differences were observ-able in the height of the /sa/-/sta/ discriminationpeak, and in the poorer overall tactual discrimina-tion of the vowel continuum.

    Factors responsible for better auditory thantactual perception may include superior auditorysensitivity, greater auditory experience in perceptionof speech sounds, and limitations in the designs oftactual vocoders in current use . While importantdifferences between filter configurations have notbeen apparent with the stimuli studied in the present

  • 63

    EIIERS et al .

    Effects of Vocoder Filter Configuration

    REFERENCESexperiments, it is possible that alternative filterdesigns may yield improved performance in codingsome speech information . In particular, fundamen-tal frequency has been shown to be difficult totransmit by other than log configurations (23) . Inaddition, improved tactual performance may bedependent on developing more adequate codingschemes for intensity . At the present time, it is notpossible to evaluate independently the contributionsof auditory experience, auditory sensitivity, andadequacy of current vocoder design.

    In summary, the experiments showed thatspeech perception through the skin is similar inimportant ways to perception through audition, andthat the similarities occur with various spectralcoding schemes employed in tactual vocoders . In thepresent experiments, this was true for two speechcontinua, one spectrally varying, and one temporallyvarying. This research, however, should not beconstrued to suggest that modifications of spectralcharacteristics of vocoders is unimportant ; butrather, that additional research with additionalspeech contrasts is warranted . For example, it maybe that log or average filter configurations ormodifications of these configurations are necessaryfor successful formant tracking or fundamentalfrequency perception . These results do suggest that avariety of filter configurations may be useful andthat tactual perception is relatively robust through aspectrally-oriented display . Regardless of filter con-figuration, labelling functions reflected phono-logically appropriate parsing of two speech continuaand appropriate peaks and valleys in discriminationfunctions . These data suggest that perception inboth modalities is at least partially dependent onpattern recognition, though the details of the pat-terns can vary somewhat . As long as acousticinformation can be adequately coded, it appearsthat it can be interpreted in similar ways by bothauditory and tactual systems.

    ACKNOWLEDGMENTS

    This research was partially supported by a subcontractfrom Artificial Hearing Systems Corp . through SBIRgrant #R43NS21390-01AI . We wish to thank Rita andJerome Cohen and Austin and Marta Weeks, withoutwhose support this research laboratory would not bepossible .

    1. Brooks PL, Frost BJ : Evaluation of a tactile vocoder forword recognition . J Acoust Soc Am 74 :34-39, 1983.

    2. Brooks PL, Frost BJ, Mason JL, Gibson DM : Continuingevaluation of the Queen's University tactile vocoder. I:Identification of open set words . J Rehabil Res Dev23(1) :119-128, 1986.

    3. Brooks PL, Frost BJ, Mason JL, Gibson DM : Continuingevaluation of the Queen's University tactile vocoder . II:Identification of open set sentences and tracking narra-tive . J Rehabil Res Dev 23(1) :129-138, 1986.

    4. Bull D, Filers RE, Oiler D, Mandalia B : The effect offrequency on discrimination of pulse bursts in an electro-cutaneous tactual vocoder . J Acoust Soc Am 77 :1192-

    1198,1985.5 Buchsbaum G: The possible role to the cochlear frequen-

    cy-positionmap inauditory signal coding . J Acoust SocAm 77 :573-576, 1985.

    6. Craig JC : Vibrotactile letter recognition : The effect of amasking stimulus . Percept Psychophys 20 :317-326, 1976.

    7. Craig JC : Vibrotactile masking : A comparison of energyand pattern maskers . Percept Psychophys 31 :523-529,1982.

    8. Eilers RE, Ozdamar 0, Oiler DK, Miskiel E, Urbano R:Similarities between tactual and auditory speech percep-tion . J Speech Hear Res 31 :124-131, 1988.

    9. Filers RE, Widen J, Oiler DK : Assessment techniques toevaluate tactual aids for hearing-impaired subjects . JRehabil Res Dev 25(2) :33-46, 1988.

    10. Eldredge 1), Miller J, Bohne BA : A frequency-positionmap for the chinchilla cochlea . J Acoust Soc Am69 :1091-1095, 1981.

    11. Engelmann S, Rosov RJ : Tactual hearing experiment withdeaf and hearing subjects . Except Child 41 :243-253, 1975.

    12. Gilson RD : Vibrotactile masking : Effects of multiplemaskers . Percept Psychophys 5 :181-182, 1969.

    13. Gilson RI) : Vibrotactile masking : Some spatial andtemporal aspects . Percept Psychophys 5 :176-180, 1969.

    14. Greenwood D : Critical bandwidth and the frequencycoordinates of the basilar membrane . J Acoust Soc Am33 :1344-1356, 1961.

    15. Klatt D : Software for a cascade/parallel formant synthe-sizer . J Acoust Soc Am 67 :971-995, 1980.

    16. Levitt

    Transformed up-down methods in psycho-acoustics . J Acoust Soc Am 49:467-477, 1971.

    17. Liberman M: The cochlear frequency map for the cat.Labelling auditory-nerve fibers of known characteristicfrequency . J Acoust Soc Am 72 :1441-1449, 1982.

    18. Lynch MP, Eilers RE, Oiler DK, LaVoie L : Speechperception by congenitally deaf subjects using an electro-cutaneous vocoder . J Rehabil Res Dev 25(3) :41-50, 1988.

    19. Lynch MP, Eilers RE, Oiler DK, Cobo-Lewis A:Multisensory speech perception by profoundly hearing-impaired children . J Speech Hear Res 54 :57-67, 1989.

    20. Oiler DK, Eilers RE, Vergara K, LaVoie E : Tactualvocoders in a multisensory program training speechproduction and reception . Volta Rev 88 :21-36, 1986.

    21. Oiler DK, Filers RE : Tactual artificial hearing for the

  • 64

    Journal of Rehabilitation Research and Development Vol . 26 No. 4 Fall 1989

    26.

    27.

    28.

    deaf . In Hearing Impairment in Children, F. Bess (Ed .).Parkton, MD: York, 1988.

    22. Ozdamar O, Filers RE, 011er DK : Tactile vocoders for thedeaf . IEEE Eng Med Biol Mag 6(3) :37-42, 1987.

    23. Ozdamar O, 011er DK, Miskiel E, Filers RE : Computersystem for quantitative evaluation of an electrocutaneoustactile vocoder . Comp Biomed Res 21 :85-100, 1988.

    24. Rosner BS : Neural factors limiting cutaneous spatio-temporal discriminations . In Sensory Communication, W.A . Rosenblith (Ed .) . New York : John Wiley & Sons,1961.

    25. Saunders F: Wearable multichannel electrotactile sensory

    aids . Presented at the Second Tactual CommunicationsConference, Wichita, KS, 1985.Schmid E : Temporal aspects of cutaneous interaction withtwo-point electrical stimulation . J Exp Psycho/ 61 :400-409, 1961.Weisenberger JM, Miller JD : The role of tactile aids inproviding information about acoustic stimuli . J AcoustSoc Am 82 :906-916, 1987.Weisenberger JM, Potts SM, Saunders NA : Evaluation oftwo multichannel tactile aids for the deaf : Phoneme andsyllable identification . J Acoust Soc Am (in press) .

    The effects of vocoder filter configuration ontactual perception of speechRebecca E. Eilers, PhD ; Ozcan Ozdamar, PhD; D. Kimbrough Oiler, PhD ; Edward Miskiei, MS;Richard Urbano, PhD

    INTRODUCTIONMETHODRESULTSDISCUSSIONACKNOWLEDGMENTSREFERENCES