Age-Related Changes in Monosyllabic Word Recognition Performance When Audibility … · 2019. 12....

13
J Am Acad Audiol 8 : 150-162 (1997) Age-Related Changes in Monosyllabic Word Recognition Performance When Audibility Is Held Constant Gerald A. Studebaker* Robert L. Sherbecoe* D. Michael McDaniel' Ginger A . Gray* Abstract Monosyllabic word recognition was studied in 140 subjects between the ages of 20 and 90 years . The subjects were tested under a condition of fixed audibility that was achieved by presenting bandpass-filtered Northwestern University Auditory Test No . 6 (NU-6) word lists at a constant signal-to-noise ratio and limiting threshold losses at the speech frequencies to 25 dB HL . The results indicated the following : (1) Performance did not vary appreciably with age, except among subjects over 70 years . Subjects from 70 to 80 years produced modestly reduced scores (significantly below only the 30-year-old group) .Those over 80 years pro- duced significantly lower scores (than all other groups) . (2) There were no significant differences in learning or test-retest reliability associated with age . (3) The performance of the oldest subjects could not be explained by differences in speech audibility . Based on these results, a strategy is proposed for correcting predicted word recognition scores for the effects of age. Key Words : Age, articulation index (AI), audibility, word recognition Abbreviations : A/D = Analog-to-digital, AI = articulation index, D/A = digital-to-analog, HLD = hearing loss desensitization, Leq = equivalent level, p = probability, P = proficiency factor, PTA = pure-tone average, r2 = coefficient of determination, rau = rationalized arcsine unit A udiologists have long been aware that the elderly have more difficulty under- tanding speech than young listeners . -AL Despite numerous studies of this problem, how- ever, they have yet to determine the exact rela- tionship between speech recognition performance and chronological age . Some investigators have reported that performance deteriorates as early as the 40s and grows steadily worse with age, while others have claimed that it stays constant until the 60s or later and then declines by only a small amount (Bergman, 1980 ; Willot, 1991) . *School of Audiology and Speech-Language Pathol- ogy, The University of Memphis, Memphis, Tennessee ; tProgram in Communication Disorders, Arkansas State University, Arkansas Reprint requests : Gerald A . Studebaker, Memphis Speech and Hearing Center, 807 Jefferson Avenue, Mem- phis, TN 38105 The reasons why speech recognition decreases with age are currently a matter of dispute. At the same time, two facts seem reasonably certain : (1) the elderly have more peripheral hearing loss than younger listeners and, therefore, speech may be less audible for them (Gates et al, 1990 ; Humes and Roberts, 1990 ; Humes et al, 1994) ; and (2) their ability to understand speech also may be reduced by sensory distortion effects that are not due simply to audibility loss (Dubno et al, 1984 ; Jerger et al, 1989 ; Humes and Christopherson,1991 ; Schum et a1,1991 ; Hargus and Gordon-Salant, 1995) . In clinical settings, it is often necessary to decide if a particular listener's score on a speech test is better or worse than that of an average person with the same amount of hearing loss . One way to do this is to compare the listener's test score to a predicted score based on the Artic- ulation Index (AI) . This allows the effects of audibility loss to be separated from the effects of other factors that may be of influence or inter- 150

Transcript of Age-Related Changes in Monosyllabic Word Recognition Performance When Audibility … · 2019. 12....

  • J Am Acad Audiol 8 : 150-162 (1997)

    Age-Related Changes in Monosyllabic Word Recognition Performance When Audibility Is Held Constant Gerald A. Studebaker* Robert L. Sherbecoe* D. Michael McDaniel' Ginger A. Gray*

    Abstract

    Monosyllabic word recognition was studied in 140 subjects between the ages of 20 and 90 years. The subjects were tested under a condition of fixed audibility that was achieved by presenting bandpass-filtered Northwestern University Auditory Test No . 6 (NU-6) word lists at a constant signal-to-noise ratio and limiting threshold losses at the speech frequencies to 25 dB HL . The results indicated the following : (1) Performance did not vary appreciably with age, except among subjects over 70 years. Subjects from 70 to 80 years produced modestly reduced scores (significantly below only the 30-year-old group).Those over 80 years pro-duced significantly lower scores (than all other groups) . (2) There were no significant differences in learning or test-retest reliability associated with age . (3) The performance of the oldest subjects could not be explained by differences in speech audibility . Based on these results, a strategy is proposed for correcting predicted word recognition scores for the effects of age.

    Key Words: Age, articulation index (AI), audibility, word recognition

    Abbreviations: A/D = Analog-to-digital, AI = articulation index, D/A = digital-to-analog, HLD = hearing loss desensitization, Leq = equivalent level, p = probability, P = proficiency factor, PTA = pure-tone average, r2 = coefficient of determination, rau = rationalized arcsine unit

    A udiologists have long been aware that the elderly have more difficulty under- tanding speech than young listeners. -AL

    Despite numerous studies of this problem, how-ever, they have yet to determine the exact rela-tionship between speech recognition performance and chronological age. Some investigators have reported that performance deteriorates as early as the 40s and grows steadily worse with age, while others have claimed that it stays constant until the 60s or later and then declines by only a small amount (Bergman, 1980 ; Willot, 1991).

    *School of Audiology and Speech-Language Pathol-ogy, The University of Memphis, Memphis, Tennessee ; tProgram in Communication Disorders, Arkansas State University, Arkansas

    Reprint requests : Gerald A . Studebaker, Memphis Speech and Hearing Center, 807 Jefferson Avenue, Mem-phis, TN 38105

    The reasons why speech recognition decreases with age are currently a matter of dispute. At the same time, two facts seem reasonably certain: (1) the elderly have more peripheral hearing loss than younger listeners and, therefore, speech may be less audible for them (Gates et al, 1990 ; Humes and Roberts, 1990; Humes et al, 1994); and (2) their ability to understand speech also may be reduced by sensory distortion effects that are not due simply to audibility loss (Dubno et al, 1984 ; Jerger et al, 1989 ; Humes and Christopherson,1991 ; Schum et a1,1991; Hargus and Gordon-Salant, 1995).

    In clinical settings, it is often necessary to decide if a particular listener's score on a speech test is better or worse than that of an average person with the same amount of hearing loss . One way to do this is to compare the listener's test score to a predicted score based on the Artic-ulation Index (AI) . This allows the effects of audibility loss to be separated from the effects of other factors that may be of influence or inter-

    150

  • Age-Related Changes in Monosyllabic Word Recognition/Studebaker et al

    est (Studebaker et a1,1995) . For the comparisons to be meaningful, however, they may need to include an age correction to compensate for the fact that the AI does not predict the performance of older listeners as well as it predicts the per-formance of young listeners (Dubno et al, 1984 ; Hargus and Gordon-Salant, 1995). Currently, there are no corrections of this type available.

    The purpose of this study was to evaluate the speech recognition performance of different age groups as a prelude to establishing an age correction for use with the AI . To minimize the effects of hearing loss on performance, all of the listeners were tested under a condition in which audibility was held constant . The test materials were Northwestern University Auditory Test No . 6 (NU-6) monosyllabic word lists. These lists were chosen because they are widely used to eval-uate speech recognition performance (Martin et al, 1994). Also, they were available on com-mercial recordings and we had previously derived speech spectra and other AI functions for those recordings (Sherbecoe et al, 1993; Stude-baker et al, 1993b) .

    Four experimental questions were addressed: (1) Do monosyllabic word scores decrease significantly with age under conditions in which audibility is held constant? (2) Do learning effects on a word test depend on the age of the subject? (3) Does age influence test-retest reliability? and (4) What proportion of the vari-ance in a standard 50-item word score is explained by age?

    METHOD

    Subj ects

    Seven groups of adult subjects were recruited through newspaper ads and contacts with acquaintances, coworkers, and community organizations. Each group had different age lim-its, measured in decades, and was composed of 20 people . Thus, there were a total of 140 sub-jects. Half of the subjects in every group were recruited and tested at Arkansas State Univer-sity (ASU). The other half were recruited and tested at the University of Memphis (UM) .

    Each potential subject received an audiologic evaluation before being chosen for the experi-ment . The criteria for adding a subject were no obvious hearing problems, normal tympano-metric results, and pure-tone thresholds in one ear of 25 dB HL or better at the octave fre-quencies from 250 to 2000 Hz (ANSI, 1989). Threshold levels were also determined for 3000 Hz and 4000 Hz but potential subjects were not excluded if they had a hearing loss at these fre-quencies . Frequencies beyond 250 to 4000 Hz were not tested .

    Table 1 reports demographic information for each age group. Because it was hard to find subjects with good hearing for the older groups, we did not attempt to test equal numbers of men and women or equal numbers of right and left ears in each group. The consequence was that more women were tested than men (100 vs 40)

    Table 1 Demographic Information for Each Age Group and Test Site

    Frequency (Hz)

    M F Age Education 250 500 1000 2000 3000 4000

    ASU 20s 5 5 22.0 14.0 1 -2 -2 3 2 3 30s 6 5 32 .3 15 .3 3 5 4 4 4 6 40s 0 10 46 .0 17 .0 7 7 7 7 6 7 50s 3 7 53 .7 16 .5 8 10 7 8 10 12 60s 1 9 65 .0 12 .2 13 13 12 15 15 19 70s 2 8 72 .7 15 .9 13 12 9 12 17 21 80s 3 7 83 .1 12 .9 17 17 20 22 28 39

    UM 20s 5 5 24.9 15 .3 7 5 2 1 0 2 30s 3 7 33.9 16 .8 7 8 5 6 4 9 40s 2 8 44.8 15 .1 9 5 4 8 6 7 50s 4 6 53.4 16.9 5 5 5 8 7 9 60s 2 8 63 .9 16 .7 10 10 6 9 12 20 70s 2 8 72 .8 14 .4 17 13 14 14 19 24 80s 2 8 82 .4 15 .7 16 12 12 13 15 25

    The table reports numbers of male (M) and female (F) subjects and mean values for subject age (years), education level (years),

    and pure-tone threshold (dB HL) at 250 to 4000 Hz . The pure-tone threshold values have been rounded to the nearest decibel .

    151

  • Journal of the American Academy of Audiology/Volume 8, Number 3, June 1997

    and more right ears were tested than left ears (82 vs 58).

    Stimuli

    The speech stimuli were bandpass-filtered recordings of the Auditec of St. Louis version of the NU-6 word test, spoken by a male talker. The recordings were prepared by playing the origi-nal unfiltered materials through two cascaded brickwall filters (Wavetek, 751A). A personal computer (Gateway, 386/25 mHz) with a 12-bit A/D converter (Data Translation, DT 2821) oper-ating at 25,000 samples/sec was used to digitize the filtered stimuli. The re-recorded speech was then transferred to Bernoulli disk cartridges (Iomega, 44 mbyte) for permanent storage.

    The bandwidth of the filter system was 220 to 2300 Hz. This range was chosen to encompass the 3-octave bands with center frequencies from 250 to 2000 Hz . This is also the same frequency region where the subjects were required to have normal hearing thresholds . The rejection rate of each filter skirt was 230 dB/oct .

    To ensure that audibility within the pass-band would not be influenced by variations in hearing level, the speech was mixed with a masking noise before it was presented to the sub-jects. The long-term spectrum of this noise matched the 1 percent short-term peak spec-trum of the talker's speech (±1 .00 dB); thus, the noise produced virtually equal speech audibil-ity across the frequency range of the filter.

    The masking noise was created by playing white noise from a random-noise generator (Gen-eral Radio, 1382) through a system of filters and into a stereo tape recorder (JVC, DD-9), where it was recorded on cassette tape (Sony, Metal Master 90, Type IV). The filter system con-sisted of an analog filter (Wavetek, 852), a dig-ital filter (Institute for Hearing Research, UDF-III), and another analog filter (Wavetek, 852) . The passband was 141 to 4470 Hz, or about two-thirds of an octave wider than the speech band . The filter skirts were approximately 150 dB/oct in the low frequencies and in excess of 3000 dB/oct in the high frequencies.

    a two-channel clinical audiometer (Madsen, OB-822) . The taped masking noise was also played into the audiometer, directly from the JVC recorder, and mixed with the speech . The com-bined signal was then amplified and presented monaurally to the subject through an insert earphone (Etymotic Research, ER-1). The same recorded speech materials, noise tapes, cassette player, and earphone were used at both ASU and the UM so that the frequency characteristics of the equipment would not differ appreciably by test site .

    Calibration of the equipment was performed at the beginning of the experiment and also prior to each test session. At the outset of the experiment, the digitized NU-6 test words, joined in a concatenated string without intervening silent periods, and the speech noise recorded on audiotape were played through the audiometer and into a Zwislocki coupler (Knowles Elec-tronics, DB-100). A sound level meter (Larson Davis, 800-B) set for C scale, integrate mode, and a 3-dB exchange rate was used to measure the SPL in the coupler. Signal levels (2-minute Leq) were read directly from the meter's digital dis-play. Subsequently, 1000-Hz pure tones adjusted to the same rms levels as the speech and noise stimuli were used as calibration signals. The tones were measured with the sound level meter set for slow, linear weighting, 3-octave mode .

    The audiometer was adjusted so that the speech had a coupler level of 70.7 dB SPL and the noise had a coupler level of 68.2 dB SPL. Thus, the signal-to-noise ratio was 2.5 dB . These levels were chosen for the following reasons. The noise level was selected to provide a spec-trum that would effectively mask a mild hear-ing loss without being too loud to listen to for a long time . The speech level was selected, based on AI calculations, to provide an average word recognition score near 50 percent correct when the noise level was at the desired intensity. Pilot tests on several young normal-hearing subjects, who did not participate in the actual experi-ment, confirmed the accuracy of the AI predictions .

    Procedures

    Equipment

    The digitized speech was reconverted to analog form using a 12-bit D/A converter (Data Translation, DT-2751) and the computer men-tioned earlier, reduced in level using an atten-uator (Hewlett Packard, 350D), and played into

    Each subject received eight 50-item word tests during each of two 1-hour sessions. To min-imize order effects, the four available lists were rotated so that each list was presented an equal number of times to each subject. The average length of time between the two test sessions was 5 days.

    152

  • Age-Related Changes in Monosyllabic Word Recognition/Studebaker et al

    Computer software developed at the UM (Matesich, 1991) was used to select and play the word lists in the desired order. The subject sat in a double-walled sound-treated room and used a hand switch to control the presentation of the test items. A video monitor facing the subject displayed the current item number.

    In most cases, the subjects wrote their answers on printed forms. However, a few 80-year-old subjects who had problems with man-ual dexterity were allowed to respond orally and their responses were transcribed. At the end of the experiment, all of the subjects were paid for their participation.

    Data Analysis

    Each subject's answers were scored by two examiners. The second examiner rechecked the accuracy of the work performed by the first examiner and corrected any scoring mistakes . Scores for all of the subjects were then trans-formed into rationalized arcsine units (rau) to equalize their variance (Studebaker, 1985) and submitted to statistical analysis.

    The main tests performed on the data were multi- and univariate analysis of variance (ANOVA) and regression analysis . Follow-up tests on significant main effects and interac-tions revealed by the ANOVAs were conducted using the Student-Newman-Keuls multiple com-parison procedure. A 5 percent level of confi-dence was used for each statistical analysis performed.

    RESULTS

    Table 2 Mean Speech Recognition Scores for Each Age Group

    Group 20s 30s 40s 50s 60s 70s 80s

    ASU 53.7 57.2 57.9 54.5 55.8 49 .1 41 .9 UM 55.0 55 .8 52 .5 55.3 51 .7 53.0 52.3 Mean 54.3 56 .5 55 .2 54.9 53.8 51 .1 47 .1

    All values are in rau .

    and that the subjects in their 70s differed sig-nificantly from those in their 30s. There were no significant differences between any of the groups under age 70 . In each case where a significant difference occurred, the older subjects had the lower average scores .

    Post-hoc tests were also performed to deter-mine the source of the interaction between age and test site . One set of tests looked at each site separately and compared the scores by age group while the other set looked at each age group separately and compared the scores by site. In both cases, the results indicated that the interaction in the data was due to the perfor-mance of the oldest group of subjects .

    When the data for each site were analyzed separately, it was found that the scores for the ASU subjects varied with age in the same way as the scores for the complete data set. In con-trast, the scores for the UM subjects did not vary significantly with age.

    When the data were compared across site, only one significant difference was found. The ASU subjects over age 80 scored significantly lower than every group tested at the UM.

    T he complete speech recognition score data set was evaluated using a three-way ANOVA

    with repeated measures . Two between-subject factors, age and test site, and one within-subject factor, the number of lists presented (trial), were tested for significance . The results indicated that age (F = 7.35; df = 6,126; p < .0001), trial (F = 16.77; df = 15,1890; p < .0001), and the interaction between age and test site (F = 5.05; df = 6,126; p = .0001) were significant . Test site by itself and the remaining interactions were not significant .

    Age

    Table 2 reports the mean speech recognition scores for each age group. A post-hoc analysis of these means indicated that the subjects over age 80 differed significantly from those under 80

    Trial

    Figure 1 plots the mean word scores for each age group versus trial . From this figure, it is evident that the mean score increased as more word lists were presented. However, there was no significant relationship between age group or test site and the size of the increase that occurred. The difference in score between the first and last trial, averaged across both age and site, was 8.9 rau.

    To determine if the scores plotted in Figure 1 varied monotonically with trial, the data were averaged across age and site and an additional post-hoc analysis was performed. This analysis revealed that the mean score did not always change significantly from one trial to the next, or in a positive direction, although the general trend was for the score to increase with trial. In

  • Journal of the American Academy of Audiology/Volume 8, Number 3, June 1997

    60

    55

    0

    0 O

    50

    45

    0 40

    35

    30

    Number of word tests presented

    Figure 1 Mean NU-6 word scores, in ran, as a func-tion of the number of 50-item tests presented. Tests 1 to 8 were presented on the first day; tests 9 to 16 were pre-sented on the second day.

    addition, no significant changes occurred within trials 1 to 4, trials 5 to 8, trials 9 to 12, or trials 13 to 16 . This pattern reflects the fact that within each of these trial blocks, each of the 200 words in the test corpus was heard one time .

    The fluctuations in the trial data made it dif-ficult to determine if the same trends occurred both within and across the two test sessions . We dealt with this problem by averaging the scores for each subject across the nonsignificant trial blocks and performing an ANOVA on the aver-aged scores . This is comparable to assuming that the subjects each received 4 200-item word lists, rather than 16 50-item lists .

    The new ANOVA still produced a signifi-cant trial effect (F = 19.99; df = 3,556; p < .0001). However, the post-hoc tests were now easier to interpret. They indicated that the second score on both test days was significantly higher than the first score but that the first score on day two was not significantly higher than the second score on day one. In other words, the trial main effect resulted primarily from a within-session change in score rather than a between-session change . The absence of a significant site-by-trial or age-by-trial interaction indicates that this was the only significant pattern that occurred .

    Test Site

    Test site by itself did not have a significant effect on score. However, as we mentioned pre-viously, there was a significant interaction between site and age.

    F overall mean

    1 4 6 12 16

    O 20 - 29 years

    30 - 39 years

    v 40 - 49 years

    T 50 - 59 years 0 60 - e9 years

    70 - 79 years

    0 80 - 89 years

    1 4 8 12 16

    A number of variables may have contributed to the dissimilar results obtained at ASU and the UM. Two factors that we checked were differ-ences in the pure-tone thresholds and education levels of the subject groups examined at each site. In both cases, a two-way ANOVA was performed with test site and age group as the independent variables. The dependent variable was either education level, in years, or the pure-tone aver-age (PTA), in dB HL.

    Two formulas for calculating PTA were tried. PTA1 included only the HLs for frequencies within the speech passband (250, 500, 1000, and 2000 Hz), while PTA2 used HLs for every fre-quency tested (250, 500, 1000, 2000, 3000, and 4000 Hz).

    The results of the ANOVAs were similar to each other as well as to those obtained when the speech test score served as the dependent vari-able . There were no significant differences in edu-cation or PTA due to test site alone; however, both variables exhibited a significant site-by-age interaction (Education : F = 2.64, df = 6,116, p = .0195; PTAl : F = 3.28, df = 6,126, p = .0049; PTA2 : F = 4.33, df = 6,126, p = .0005) . Age alone also produced a significant difference in the PTA (PTA1: F = 23 .68, df = 6,126, p < .0001; PTA2 : F = 45.79, df = 6,126, p < .0001) .

    Post-hoc tests were performed to analyze the site-by-age interaction for each variable and the age main effect for PTA. Only comparisons in which age group was held constant (education, PTA) or test site was held constant (education) were considered of interest . The remaining com-parisons were ignored because they did not help to explain why subjects at only one test site had word scores that changed significantly as a func-tion of age.

    The follow-up tests on education level revealed that the ASU subjects in their 60s had significantly less education than the ASU sub-jects in their 40s. This finding is consistent with previous data on the relationship between age and education (e.g ., Deming and Cutler, 1983). However, it does not shed any further light on the speech test results because these two sub-ject groups did not produce significantly differ-ent word scores .

    The follow-up tests on PTA revealed a num-ber of significant differences between the vari-ous age groups, but only one finding was judged to be important: the ASU subjects in their 80s had significantly poorer hearing thresholds than the UM subjects in their 80s (PTA1 of 18.8 dB vs 13.2 dB ; PTA2 of 23.6 dB vs 15.6 dB HL).

  • Age-Related Changes in Monosyllabic Word Recognition/Studebaker et al

    Theoretically, these differences in hearing threshold should have been nullified by the fil-tering and noise characteristics of the test con-dition . However, it could not be determined from the ANOVA results alone whether this was, in fact, true . Further, the similar, way in which the pure-tone thresholds and speech scores varied across test site and age suggested that it might not be true. Therefore, four additional analyses were performed.

    Hearing Loss Desensitization

    It is well known that subjects who have more than a mild hearing loss often perform more poorly than the AI predicts and that the disparity between their observed and predicted performance increases with the degree of hear-ing loss they exhibit (Pavlovic,1984 ; Pavlovic et al, 1986). In earlier reports we have referred to this effect, which occurs even when listeners are tested in quiet, as hearing loss desensitiza-tion (HLD) (Studebaker et al, 1993a; Stude-baker and Sherbecoe, 1994).

    To determine if performance differences between the two 80-year-old groups were due to this factor, we derived an equation (see Appen-dix) to predict the amount of HLD from the PTA (250 to 4000 Hz) and used it to adjust our word score predictions. The predictions for the two groups were then compared . The results sug-gested that the subjects did not experience enough HLD to produce the differences in score that were observed . For example, it was noted that the predicted score of the subject with the worst hearing loss was reduced by only about 1 rau when this correction was included .

    Insufficient Masking

    Another way that the results of this study might have been caused by hearing loss rather than age would be if the thresholds of the older subjects exceeded the effective levels of the noise within the speech passband or within the skirts of the passband. If either of these events had occurred, then speech audibility would have been reduced by absolute threshold and the sub-jects would have performed more poorly than expected . The high-frequency skirt was of par-ticular concern in this regard because frequen-cies near 2000 Hz are very important for understanding NU-6 words (Studebaker et al, 1993b) .

    To evaluate this issue, we determined both the average and the maximum hearing thresh-

    olds for each age group and then compared those values to the effective levels of the masker. The results indicated that the average difference between the noise effective levels and absolute threshold levels was 16 dB and that the noise effective level within the passband was always at least 8 dB above the highest absolute thresh-old allowed. In the case of the high-frequency fil-ter skirt, we found that even the highest absolute thresholds in the 2000- to 3000-Hz range were at least 4 to 6 dB below the effective level of the noise. It seems highly unlikely, therefore, that the subjects performed as they did because of inad-equate direct masking.

    Spread of Masking

    A third factor that varies with threshold level is spread of masking. Presumably, this fac-tor affects speech recognition in the same way as direct masking, by reducing audibility.

    Using the methods of Ludvigsen (1985), and the average thresholds in Table 1, we estimated the amount of masking spread for the ASU and UM 80-year-old groups . We then predicted and compared the scores of the two groups with this masking factor included in the calculations . The results indicated a difference in score between the groups of less than 1 rau. This suggests that the lower performance of the ASU subjects was not due to greater spread of masking.

    ExternallInternal Noise Summation

    AI calculation schemes often assume that absolute threshold is produced by an internal noise within the auditory system and that this noise summates on a power basis with any exter-nal masking noises that happen to be present. An important consequence of this assumption is that the audibility of speech in noise will depend on the difference between the external noise level and the listener's internal noise level. The closer the two noise levels become, the more the combined noise level will rise and speech audi-bility will decrease .

    As we mentioned previously, the ASU sub-jects over age 80 had higher pure-tone thresh-olds than the UM subjects over age 80 . Conceivably, therefore, the ASU subjects obtained lower word scores because of the reduction in speech audibility produced by noise sum-mation .

    To see if this was, in fact, the case, we used the average audiograms of the 80-year-old sub-jects to estimate the average combined noise

    155

  • Journal of the American Academy of Audiology/Volume 8, Number 3, June 1997

    levels of the ASU and UM groups and then to predict their average test scores . The results indicated that power summation had only a very small effect on the score difference between the two groups (0 .5 rau) . Even when we assumed that all of the subjects in one group had thresh-olds of 25 dB HL at every frequency and all of the subjects in the other group had thresholds of 0 dB HL at every frequency, the effect was still less than 2 rau.

    It might be concluded from this analysis that differences in audibility due to noise sum-mation were not responsible for the speech score differences between the groups. However, Humes et al (1988) have argued that simple power addi-tion substantially underestimates the actual effects of external/internal noise summation on threshold. As an alternative, they proposed a "nonlinear" summation method . They showed that this method predicted results that were more in line with results found in the literature.

    To evaluate nonlinear summation as a pos-sible mechanism underlying our results, we pre-dicted the performance of the ASU and UM groups under the assumption that the external and internal noises combine as proposed by Humes et al (1988) . In this case, the calculations predicted a score difference between the two 80-year-old groups of 8.7 rau. The actual per-formance difference was 10.4 rau. This is not a bad match and, by itself, could be interpreted as evidence that our results were due to hearing threshold differences operating via nonlinear summation. However, an analysis of the rela-tionships between the observed scores and the scores predicted by the AI with nonlinear sum-mation included produced two anomalous results that do not support this conclusion .

    One odd result was that the predicted scores decreased faster, as a function of threshold loss and age, than was actually observed . Therefore, the difference between the observed and pre-dicted scores became increasingly positive as threshold level and age increased. This result, if taken at face value, suggests that the more hearing loss a person has, or the older they are, the better they are able to make use of what is audible to them . We are not aware of any inde-pendent evidence that supports this possibility.

    The other odd result was that when we sub-tracted the predicted scores for each subject from their observed scores and then performed an ANOVA on the differences, there was still a significant interaction between site and age (F = 2.70, df = 6,126, p = .0170) . In this case, the interaction occurred because the ASU 60 year

    olds performed significantly better, relative to predictions, than the ASU 20 year olds, even though the 20 year olds had better hearing thresholds . No other group differences were sig-nificant at either site or between sites.

    Intersubject Variability

    The intersubject variability of the data was assessed using two methods. In one method, the test scores were reorganized by trial and a vari-ance calculated for each trial. This was done for each age group. Then, each group's variances were averaged across trial and converted into a group standard deviation by taking the square root of the average. In the other method, differ-ences between the observed scores and the pre-dicted scores, based on the AI calculations, were plotted as a function of subject age and evalu-ated using regression analysis .

    The variance calculations were performed to determine the relationship between age and the dispersion of scores on a 50-item word test . It was assumed in these calculations that the scores obtained on each trial were independent and that intersubject variability remained constant over time. Although neither of these assumptions may actually be true, the absence of interac-tions between trial and the other factors suggests that the scores for each age group varied over time in a comparable manner. In any case, the variability of rau data is not significantly influ-enced by changes in the mean score unless the mean is close to the limits of the rau scale (Stude-baker et al, 1995). That did not happen in this study.

    The regression analysis was performed to determine how much of the variance in a 50-item word recognition score could be attributed to age and to generate descriptive functions for the data. Difference scores (in rau), rather than the original unmodified scores, were used to reduce any residual audibility effects and thus make the results more applicable to open-set monosyllabic word lists in general. The predic-tions on which the difference scores were based assumed that the "threshold noise" and the masker noise added together on a power basis. Corrections for HLD were not included because our earlier analysis had suggested that those cor-rections did not substantially alter the results.

    To eliminate the effects of trial (learning) on the data, the individual difference scores were adjusted so that the average difference score for every trial was equal to 0. This was done by subtracting the average value for a trial from

    156

  • Age-Related Changes in Monosyllabic Word Recognition/Studebaker et al

    Table 3 Inter- and Intrasubject Standard Deviations for Each Age Group

    Group

    Intersubject SD

    20s

    7 .7

    30s

    7 .3

    40s

    8 .4

    50s

    8 .1

    60s 70s 80s

    8.6 8 .7 11 .4

    0

    00 N

    [9 .1] U

    Intrasubject C N SD 7.2 7 .1 7 .0 6 .5 6 .8 7 .6 7 .4

    All values are in rau and assume a 50-item word test . The value in brackets was calculated after excluding scores from the three outlying subjects tested at ASU.

    40

    30

    20

    1 0

    0

    -10

    -20 0

    -30

    -40 20 30 40 50 60 70 80 90

    Age (years)

    each of the individual scores that contributed to that trial.

    Finally, even though the ANOVA for the full data set revealed a significant interaction between age and site, the data were not sepa-rated by test site in any of the calculations . This was done so that the results were based on the entire subject pool and were thus more repre-sentative of the population of normal hearers in general.

    Table 3 reports standard deviations for each age group. The values reveal that the subjects over age 40 produced a broader range of scores than the subjects under 40 . Between ages 40 and 80, variability remained fairly constant, then it increased again.

    An examination of the individual data sug-gested that there was greater score dispersion among the subjects in their 80s because three subjects from the ASU group obtained low scores . These subjects had somewhat higher thresh-olds outside the speech passband than average. However, within the passband, their thresholds ranked 2nd, 11th, and 15th best out of 20 . Thus, their poor performance did not seem to be related to their higher hearing thresholds .

    When the outliers were removed from the data pool, the subjects over age 80 were only slightly more variable than the other subjects. However, their mean word score was still sig-nificantly lower than the mean scores of the subjects under 60 years old. This suggests that the poorer speech test performance of the ASU subjects did not occur simply because of the inclusion of a few unusual subjects .

    Figure 2 plots the difference scores, in rau, as a function of subject age, in years. Data from all of the subjects, including the low-scoring out-liers, are shown. The solid line through the points is the best-fitting second order polynomial . The dashed lines indicate the 95 percent confi-

    Figure 2 Difference scores, corrected for practice, as a function of subject age in years. Each circle represents the difference between one observed score, based on a 50-item test, and one expected score, based on the Al . The solid line indicates the best-fitting second order polynomial. The dashed lines indicate the 95 percent confidence limits based on the difference scores for the subjects under age 40 .

    dence limits (1.96 SD) based on the difference scores for the subjects under age 40 years.

    Regression analysis indicated that the poly-nomial function fit the data as well as, or bet-ter than, a number of other simple equations (Fit Std Error = 8.51 rau) . However, the results also indicated that the chosen equation accounted for only 8.5 percent of the variance . In other words, most of the scatter in the data was not associ-ated with age.

    Intrasubject Variability

    Estimates of intrasubject variability were also calculated for each age group using the original unmodified data . This was accomplished as follows. First, the variance of the 16 scores pro-duced by each subject in a particular age group was determined . Then, the 20 variances for the group were averaged and converted into a group standard deviation by taking the square root. As in the analysis of intersubject variability, no attempt was made to separate the data by test site . In this case, however, no corrections were made for trial effects. The results of each calcu-lation are reported in Table 3.

    The table values indicate that intrasubject variability did not change appreciably with age. When averaged across all seven age groups, the test-retest difference for a 50-item test was 7.1 rau. This is only slightly larger (by about 0.6 rau) than the value produced by a computer model

  • Journal of the American Academy of Audiology/Volume 8, Number 3, June 1997

    Table 4 Equation and Fitting Constants for the Prediction of Listener Proficiency from Age in Years

    Equation A

    y = a + bx + cx2 8.7880972e-01

    8 C

    6.8361149e-03 -7.7860341e-05

    Results shown are applicable only to listeners between ages 20 and 90 years.

    based on the binomial theorem (Studebaker et al, 1995).

    Correcting for Age

    In a recent series of papers, Studebaker et al (1993a,1994,1995) discuss the proficiency fac-tor (P), a concept originally proposed by Fletcher and Galt (1950), and how it can be used to adjust the AI for the effects of variables such as age. Given the potential utility of AI-based speech test score predictions, we thought it worthwhile to evaluate this method further using the data of this investigation. This was done as follows.

    First, each subject's average word score, in percent, was converted into an observed AI value, using the score-to-AI transfer function (Studebaker et al, 1993b) . The observed Als were then divided by their corresponding expected AN based on speech, noise, and hear-

    20 30 40 50 60 70 80 90

    Age (years)

    Figure 3 Proficiency factors as a function of subject

    age in years. Each circle represents the P factor for one

    subject, based on the average of that subject's 16 NU-6 word scores . The solid line indicates the best-fitting sec-

    ond order polynomial .

    ing threshold measurements. The result was 140 P factors, one for each subject.

    The P factors were plotted versus the ages of the subjects and submitted to regression analysis . As in the case of the rau difference scores, a second order polynomial was found to provide the best fit to the data (Fit Std Error = 0 .0871) although, once again, the proportion of the variance related to age was fairly small (about 16%) . The constants for the polynomial are reported in Table 4.

    Figure 3 displays P factors for all 140 sub-jects along with the line determined by regres-sion analysis. The figure shows the decline in performance with advancing age indicated by the ANOVA. In addition, it also reveals that subjects in their 20s obtained lower P factors than sub-jects in their 30s and 40s. This outcome is con-sistent with the mean word recognition data reported in Table 1 . However, as mentioned ear-

    20 30 40 50 60 70 80 90

    Age (years)

    Figure 4 Differe$ce scores, corrected for practice and

    age, as a function of subject age in years. Each circle

    represents the difference between one observed score,

    based on a 50-item test, and one expected score, based on

    the Al . The solid line indicates the best-fitting second order

    polynomial . The dashed lines indicate the 95 percent

    confidence limits based on the difference scores for the

    subjects under age 40 .

  • Age-Related Changes in Monosyllabic Word Recognition/Studebaker et al

    lier, the ANOVA results indicated that the score differences between the subjects in their 20s and 30s were not significant. Therefore, the small rise in the curve may reflect only the effects of sampling error.

    The polynomial function in Table 4 was used to calculate a P factor adjustment for each sub-ject based on the subject's age, in years. This cor-rection was multiplied by the subject's expected AI . The corrected AI was then used to predict the subject's expected score, in rau, which was in turn subtracted from the subject's observed score, in rau. The rau difference scores for every subject, now corrected for the average effect of age, were subsequently re-evaluated using regression analysis.

    Figure 4 shows the age-adjusted rau dif-ference scores plotted versus subject age. As in Figure 2, the solid line indicates the best-fit sec-ond order polynomial while the dashed lines indicate the 95 percent limits based on the dif-ference scores for the subjects under age 40 . It is apparent from the virtually flat regression line that the P factor correction successfully removed the age effect from the data . This result was confirmed by the regression statistics, which indicated that none of the remaining variance (Fit Std Error = 8.51 rau, df adjusted r2 = .0000) was related to age.

    DISCUSSION

    T he results of this study suggest that age has only a modest effect on the intelligibility of

    masked monosyllabic words for persons with normal or near-normal hearing. For the data combined across test sites, only the 80-year-old group differed significantly from all of the other groups . The scores of the 70 year olds were sig-nificantly reduced when compared to the scores of the 30 year olds at one test site but the reduc-tion did not reach statistical significance when the data were combined across sites.

    Because of the similar way that the speech scores and pure-tone thresholds of the subjects changed with test site and age, it might be argued that the poor performance of the 80 year olds was due to hearing loss rather than age. However, an analysis of four different psycho-acoustic parameters that are affected by hear-ing loss (HLD, masked threshold level, spread of masking, and linear external/internal noise summation) suggested that these mechanisms could not explain their low speech scores. The nonlinear external/internal noise summation of Humes et al (1988) produced predicted perfor-

    mances more nearly like the data, but left some troublesome details unexplained.

    Two other findings cast doubt on the con-clusion that the pattern of our results was caused by the hearing threshold differences across the subject groups . First, it was noted that, among the subjects in the 80-year-old group, the worst test scores were not obtained by those with the poorest thresholds . In fact, as noted earlier, the pure-tone thresholds of the three poorest per-forming subjects on the speech test ranked 2nd, 11th, and 15th best out of the 20 subjects in this group. Second, it was noted while ranking all of the subjects by average threshold that the five ASU subjects with the best hearing and the five UM subjects with the worst hearing in the 80-year-old group had very similar average audiograms (Fig . 5) . However, in spite of this, their average speech test scores were just as different as those of the full groups from which they were drawn.

    Interestingly, while the 80-year-old subjects as a whole had significantly lower speech recog-nition scores than the younger subjects, they did not perform significantly differently from them as a function of trial . All seven age groups produced higher scores during the second half of each test session and on day two than they did during the first half of each session and on day one. This suggests that learning effects on open-set monosyllabic word tests are not influenced by age, at least if the listener has the opportu-nity to control the rate of word presentation . The implication is that age can be ignored when the scores from such tests are corrected for learning.

    0

    10

    20

    30

    40

    50

    250 500 1000 2000 4000

    Frequency (Hz)

    Figure 5 Mean thresholds for the five 80-year-old ASU subjects with the best hearing and the five 80-year-old UM subjects with the worst hearing. The thresholds of the two groups were equal at 3000 and 4000 Hz .

  • Journal of the American Academy of Audiology/Volume 8, Number 3, June 1997

    Comparison of the score standard devia-tions for the various groups revealed that inter-subject variability was greater for subjects over age 40 than for those under 40. This indicates that older subjects are more likely to produce a broad range of scores than younger subjects even when tested under the same conditions of audibility. The reasons for this are unclear. However, it did not usually occur because of a disproportionate increase in the number of low-scoring subjects with increasing age. The excep-tion was the oldest age group, which did include a disproportionate number of low-scoring sub-j ects .

    In contrast, the intrasubject standard devi-ations indicate that age had no effect on test-retest reliability at either test site. In addi-tion, the similarity between the results of this study and the distribution of scores predicted by the binomial model (Thornton and Raffin,1978; Studebaker et al, 1995) offers support for using that model to describe the intrasubject vari-ability of scores produced by elderly subjects as old as 89 years.

    Even assuming that the speech test score effects seen here are due to age, independent of hearing threshold, it is evident that they are not large. In view of this, it should be asked whether the effects are indeed large enough to justify con-sideration in a scheme designed to predict word recognition test performance. It is suggested that the answer is yes if computer-assisted test procedures are assumed that provide the oper-ator with predicted scores, based on the audio-gram, and a statistical statement concerning the difference between the score actually obtained and the one predicted. In this kind of environment, consideration of even small effects can be justified because (1) adjusted scores are still better indicators of what is expected than unadjusted scores even if the difference is small, (2) probability statements about a result can be substantially altered by even modest changes in predicted scores, and (3) the calculations and display of the information require no additional effort by the operator.

    Of course, an age correction may not always be appropriate for other reasons. One example is in the assessment and/or prediction of absolute unaided or aided auditory communication abil-ity. An age correction is not called for in this case because it is less relevant how a particular lis-tener performs in relation to a typical person of the same age than how he or she performs in relation to an absolute communication stan-dard . This standard, presumably, should be the

    speech recognition performance of a group of young normal hearers listening under the same conditions of audibility.

    Age corrections may be needed, however, when speech test results are used for diagnos-tic or medico-legal purposes . As this study demonstrates, older listeners produce lower average speech test scores than younger listen-ers under comparable conditions . Why this occurs is not clear, but without age corrections, all reductions in the listener's speech recognition score are, in effect, attributed to sensory pathol-ogy. In the case of the oldest subjects, at least, this could lead to erroneous conclusions.

    Ultimately, it is up to the audiologist to decide whether a word score should be inter-preted with or without age corrections . In cer-tain cases, it may be advantageous to look at both results. Regardless of which approach is selected, however, the following procedures can be used to determine the listener's expected score and to analyze the relationship between that score and the listener's observed score. These procedures represent an extension of the methods described by Studebaker et al (1995) .

    First, an AI is calculated in the usual way. Then, if age corrections are desired, the AI is mul-tiplied by a correction factor derived with the polynomial function and constants reported in Table 4. Comparison of Figures 2 and 4 sug-gests that this approach will satisfactorily elim-inate the average effects of age on a standard 50-item NU-6 word test . Presumably, the same approach will also work if the test contains fewer than 50 items. However, the reader should be aware that scores based on shortened word lists are more variable and thus cannot be pre-dicted as well .

    The next step is to convert the AI, with or without corrections, into a predicted score using an AI-to-score transfer function. The predicted score, expressed in rau, is then subtracted from the listener's observed score, in rau, and the resulting difference divided by the standard deviation of the distribution of rau differences.' This produces a z score. With the aid of a table

    'The SD that should be used in the z score calcula-tion depends on the application to which the results will be put and how they will'be interpreted . In many cases, a value for the large group of hearing-impaired subjects whose losses occur gradually and are not due to illness would be appropriate . Our preliminary estimate of the SD for this group, based on unpublished data from 1103 clinical sub-jects of varying ages who were tested one time on a 50-item test, is about 13 rau . This is slightly less than twice the value for young-adult normal hearers .

    160

  • Age-Related Changes in Monosyllabic Word Recognition/Studebaker et al

    of area values for the normal distribution, the z score may be transformed into the likelihood that the listener performed abnormally, either in relation to young normal hearers, in the case of the unadjusted AI, or in relation to subjects of his or her own age, in the case of the adjusted AI .

    We are currently testing a computer pro-gram that performs this sequence of calcula-tions and then makes the results available to the audiologist on a computer monitor (see Stude-baker et al, 1995). Data obtained from clinical trials of this program will be presented in a future publication.

    Humes LE, Roberts L . (1990) . Speech-recognition diffi-culties of the hearing-impaired elderly: the contributions of audibility. J Speech Hear Res 33:726-735.

    Humes LE, Watson BU, Christensen LA, Cokely CG, Halling DC, Lee L . (1994) . Factors associated with individual differences in clinical measures of speech recog-nition among the elderly. J Speech Hear Res 37:465-473 .

    Jerger J, Jerger S, Oliver T, Pirozzolo F. (1989) . Speech understanding in the elderly. Ear Hear 10:79-89 .

    Ludvigsen C. (1985). Relations among some psychoacoustic parameters in normal and cochlearly impaired listeners. J Acoust Soc Am 78:1271-1280 .

    Martin FN, Woodrick Armstrong T, Champlin CA. (1994) . A survey of audiological practices in the United States . Am JAudiol 3(2):20-26 .

    Acknowledgments. This project was supported by

    research grant number 5 R01 DC 00154-16 from the National Institute on Deafness and Other Communica-tion Disorders, National Institutes of Health, and by funding from the Center for Research Initiatives and Strategies for the Communicatively Impaired (CRISCI) . The authors thank Christine G. Eubanks and Edith McDaniel for their help with subject recruitment and data collection and Larry Humes for his comments on an ear-lier version of this manuscript .

    REFERENCES

    American National Standards Institute. (1989) . Specifications for Audiometers (ANSI S3.6-1989) . New York : ANSI .

    Bergman M. (1980) . Aging and the Perception of Speech . Baltimore: University Park Press.

    Deming MB, Cutler NE . (1983). Demography of the aged. In : Woodruff DS, Birren JE, eds. Aging: Scientific Perspectives and Social Issues . Monterey : Brooks/Cole, 18-51.

    Dubno JR, Dirks DD, Morgan DE. (1984) . Effects of age and mild hearing loss on speech recognition in noise . J Acoust Soc Am 76:87-96 .

    Fletcher H, Galt RH . (1950) . The perception of speech and its relation to telephony. JAcoust SocAm 22:89-151 .

    Gates GA, Cooper JC, Kannel WB, Miller NJ. (1990) . Hearing in the elderly : the Framingham cohort, 1983-1985 . Part I . Basic audiometric test results . Ear Hear 11:247-256 .

    Hargus SE, Gordon-Salant S. (1995) . Accuracy of speech intelligibility index predictions for noise-masked young listeners with normal hearing and for elderly listeners with hearing impairment. J Speech Hear Res 38:234-243 .

    Humes LE, Christopherson L. (1991) . Speech identifica-tion difficulties of hearing-impaired elderly persons: the contributions of auditory processing deficits . J Speech Hear Res 34:686-693 .

    Humes LE, Espinoza-Varas B, Watson CS . (1988) . Modeling sensorineural hearing loss. I . Model and ret-rospective evaluation . JAcoust Soc Am 83:188-202 .

    Matesich JS . (1991) . Monosyllabic Word Test Presentation Program. Unpublished computer program. Memphis: The University of Memphis.

    Pavlovic CV (1984) . Use of the articulation index for assessing residual auditory function in listeners with sensorineural hearing impairment . J Acoust Soc Am 75:1253-1258 .

    Pavlovic CV, Studebaker GA, Sherbecoe RL . (1986) . An articulation index based procedure for predicting the speech recognition performance of hearing-impaired indi-viduals. JAcoust SocAm 80:50-57 .

    Schum DJ, Matthews LJ, Lee F. (1991) . Actual and predicted word-recognition performance of elderly hear-ing-impaired listeners. J Speech Hear Res 34:636-642 .

    Sherbecoe RL, Studebaker GA, Crawford MR. (1993) . Speech spectra for six recorded monosyllabic word tests. Ear Hear 14:104-111 .

    Studebaker GA . (1985) . A "rationalized" arcsine trans-form . J Speech Hear Res 28:455-462 .

    Studebaker GA . McDaniel DM, Sherbecoe RL . (1995) . Evaluating relative speech recognition performance using the proficiency factor and rationalized arcsine differences . JAm Acad Audiol 6:173-182 .

    Studebaker GA, McDaniel DM, Wark DJ, Sherbecoe RL . (1993a). Speech Recognition Proficiency as a Function of Age and Audibility Loss . Paper presented at the International Hearing Aid Conference II : Signal pro-cessing and efficacy, June 24-27, The University of Iowa, Iowa City, IA .

    Studebaker GA, Sherbecoe RL . (1994) . Evaluating the Speech Recognition Performance of Hearing-Impaired Subjects. Paper presented at the 1994 Lake Arrowhead International Conference on Issues in Advanced Hearing Aid Research, May 30-June 4, Lake Arrowhead, CA.

    Studebaker GA, Sherbecoe RL, Gilmore C. (1993b). Frequency-importance and transfer functions for the Auditec of St . Louis recordings of the NU6 word test . J Speech Hear Res 36:799-807 .

    Thornton AR, Raffin MJM. (1978) . Speech discrimina-tion scores modeled as a binomial variable . J Speech Hear Res 21 :507-518 .

    Willot JF. (1991) .Aging and theAuditory System:Anatomy, Physiology, and Psychophysics. San Diego : Singular.

  • Journal of the American Academy of Audiology/Volume 8, Number 3, June 1997

    APPENDIX

    To adjust the AI for HLD, we used a multi-plier based on the P factor. Like the P factor, the HLD multiplier ranges from 0 to 1. Zero indicates maximum HLD;1 indicates no HLD. The size of the HLD multiplier was predicted independently for each subject using equation A-1.

    Y = 1-(x/a)b (A-1)

    This equation should be interpreted as follows: x is the PTA, in dB HL, for the frequencies 250, 500, 1000, 2000, 3000, and 4000 Hz ; a and b are

    fitting constants equal to 112.47672 and 3 respec-tively; and y is the predicted amount of HLD.

    The fitting constants were based on an analysis of the relationship between the P fac-tor and the PTA. The subjects evaluated in that analysis were 71 people (135 ears) who did not participate in the current study. Each of these individuals was under age 70 and had received a 50-item NU-6 word test in quiet as part of a routine audiologic evaluation conducted at the Memphis Speech and Hearing Center. Their average pure-tone losses (250 to 4000 Hz) ranged from 1 to 91 dB HL.