Temporal and Spectral Contributions to Musical … and Spectral Contributions to Musical Instrument...

1
Temporal and Spectral Contributions to Musical Instrument Identification Among Cochlear Implant Users Timothy Stoddard, MD, MS, Tanner Fullmer, BS, Alison Crane, BS, Christina Runge, PhD, David Friedland, MD, PhD Department of Otolaryngology and Communication Sciences, Medical College of Wisconsin, Milwaukee, WI ABSTRACT METHODS RESULTS RESULTS CONCLUSIONS AND FUTURE WORK ACKNOWLEDGEMENTS Temporal - Time Domain Spectral - Frequency Domain Spectrogram – Illustrates Time and Frequency Information Clarinet Trumpet Funding for this research was provided by the MED-EL Corporation CI users have difficulty identifying instruments, but can detect differences in timbre as well as NH subjects There is a trend toward poorer timbre discrimination when stimuli are outside an instrument’s characteristic range. For all subjects, performance was poorer in normalized, attack, and clipped conditions compared to the native condition, but this was not significant. CI users perceive temporal envelope differences but do not interpret them as associated with specific instruments NH listeners’ discrimination performance is affected by space between stimuli, whereas CI users’ performance is not. Specifically, NH listeners’ performance is better when the sound are concatenated than when separated in time. As such, NH listeners likely access fine structure cues that CI users cannot. This study did not identify any single feature of the temporal envelope critical to timbre discrimination. Future work will examine enhancement of temporal cues rather than removing them from stimuli. In addition, future work will focus on identifying and manipulating important harmonics to improve instrument identification, particularly those shown here that were most difficult for CI subjects. Concatenated Objective To identify how cochlear implant users utilize acoustic cues for music perception. Study Design Prospective cohorts of adult normal hearing (n=12) and cochlear implant (n=25) subjects. Methods Subjects were presented with acoustic samples of musical notes played by trumpet, clarinet, alto-saxophone, flute, and violin. Notes were modified to remove components of the temporal envelope such as the attack and decay. Spectral cues were controlled by normalizing instrument samples to the same pitch. Tests of instrument identification and instrument discrimination were performed with native and modified temporal and spectral stimuli. Tests were performed with notes spaced by 0.5 second or with no interval space. Results CI users scored significantly lower than NH listeners on instrument identification across all conditions. Performance worsened for both NH and CI users when either the attack or decay was removed from the temporal envelope of the stimulus. CI users could discriminate between stimuli as accurately as NH subjects when the stimuli were spaced apart. When stimuli were concatenated, timbre discrimination improved for NH listeners but not for CI users. Conclusion CI users have difficulty identifying musical instruments but can discern differences between them as well as NH subjects. This suggests that CI listeners perceive temporal envelope differences but are not interpreting them appropriately. In the absence of temporal envelope cues, NH listeners, in contrast to CI users, rely on harmonic fine structure to detect timbre differences. Auditory training and accentuation of fine structure are potential strategies to improve timbre discrimination among CI users. INTRODUCTION Cochlear implant (CI) users attain excellent speech recognition, but struggle to perceive basic elements of music including melody, pitch, and timbre. Timbre is the attribute of sound that distinguishes notes played at the same pitch, loudness, and duration. Temporal and spectral cues are both critical components of timbre perception. The temporal envelope contains information such as the attack and decay, while the spectral content or fine structure includes the fundamental frequency and related harmonics. Subjects Prospective cohorts of 25 CI users (mean age 58 + 16 years) and 12 NH adults. CI user inclusion criteria: Post-lingually deafened Subjects were selected from an array of listening options including any of the current FDA approved devices, single-sided or bilateral implantation, experience with bimodal listening, any duration of post- lingual deafness, at least 6 months of CI use, and a range of musical experience All subjects were tested in the CI alone condition Stimuli Acoustic notes of trumpet (T), alto-saxophone (S), clarinet (C), flute (F), and violin (V) were downloaded from the Philharmonia Orchestra’s Sound Exchange™ website and altered using Adobe Audition™ to create the following conditions: Spaced Clarinet Trumpet Trumpet Clarinet Silence Statistical Analysis D-prime was calculated to quantify discrimination performance for experiments 2 and 3. Larger D’ values indicate better performance. A two-tailed t-test was used to compare mean performance scores between CI and NH subjects. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 NTV NML ATT CLP D’ CI NH 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 NTV NML ATT CLP D' Flute + trumpet pairing CI NH 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 NTV NML ATT CLP D' Flute + sax pairing CI NH 0 0.5 1 1.5 2 2.5 3 3.5 4 FS CF ST CT FT CS D' NTV NML ATT CLP 0 0.5 1 1.5 2 2.5 3 3.5 TS CT SV CV FT FV TV FS CF CS D’ D’ Spaced Same-Different Task CI Subjects NH Subjects Figure 3. NH subjects showed significantly better discrimination performance than CI users for flute-saxophone and flute- trumpet pairs when both attack and decay were removed from the temporal envelope (CLP condition) . Figure 2. There were no significant differences in average D’ between NH and CI users in the native (NTV), normalized (NML), attack removed (ATT), and attack and decay removed (CLP) conditions. 0 0.5 1 1.5 2 2.5 3 3.5 4 FT TS CT SV FV CV TV CS FS CF D’ Concatenated Same-Different Task CI Subjects NH Subjects Figure 4. There was a trend toward poorer discrimination performance for each instrument pairing with Native (NTV) > Normalized (NML) > Attack (ATT) > Clipped (CLP), although these were not statistically significant. F= flute, S=saxophone, T=trumpet, C= clarinet. D’ scores for CI users by instrument pairing and listening condition P = 0.117 P = 0.225 P = 0.624 P = 0.173 P = 0.01 P = 0.047 To optimize speech perception in quiet, CI processors preserve the temporal envelope of frequency-specific bands but limit the delivery of fine-structure information. It’s been suggested CI users rely primarily upon the temporal envelope to assess timbre. Figure 5. When a space was present between stimuli, both NH and CI users showed significantly poorer discrimination performance between flute-sax, clarinet-sax, and clarinet-flute comparisons. With concatenated stimuli, the CI users’ performance did not change, but the NH listeners showed marked improvement in the same instrument pairings. Flute, T=Trumpet, C=Clarinet, S=Sax, V=Violin. * p<.05 for comparisons between groups. Timbre Discrimination 0 10 20 30 40 50 60 70 80 90 100 Percent Correct Scale Category Normal Hearing CI User Normalized Release Native Attack Clipped Experiment 2: Timbre Discrimination Stimuli were single notes of the T, S, C, and F instruments in four different conditions: Native: from instrument’s characteristic register T = G4 C = C5 F = G5 S = C4 Normalized: C5 pitch for all instruments Attack: C5 with attack removed Clipped: C5 with attack & decay removed Subjects were presented with 2 notes, each was 1 second in duration separated by 0.5 second of silence. They were asked if the notes were the same or different. A trial consisted of 576 pairs of randomly selected stimuli, with 4 iterations of all instrument combinations within the native, normalized, attack, and clipped conditions. Experiment 3: Spaced vs Concatenated stimuli Stimuli were single notes of T, S, C, F, and V in the clipped condition (C5 with attack and decay removed). Subjects were presented with 2 notes of 1 s duration that were either separated by 0.5 second silence or were concatenated (no space between stimuli). After hearing two notes, subjects were asked whether the stimuli were the same or different. Native: notes in the characteristic range of each instrument Normalized: G4-G5 major scale Attack: G4-G5 scale with attack removed Release: G4-G5 scale with decay removed Clipped: G4-G5 scale with attack and decay removed Instrument identification among NH and CI users Figure 1. Cochlear implant (CI) users scored significantly lower than normal hearing (NH) subjects in instrument identification in all conditions (p<.05). Experiment 1: Instrument Identification Subjects were presented with five separate tests consisting of randomly ordered eight-note scales within each condition. After each scale, subjects were asked to identify the instrument on a computer screen Concatenated

Transcript of Temporal and Spectral Contributions to Musical … and Spectral Contributions to Musical Instrument...

Page 1: Temporal and Spectral Contributions to Musical … and Spectral Contributions to Musical Instrument Identification Among Cochlear ... Prospective cohorts of adult normal hearing (n=12

Temporal and Spectral Contributions to Musical Instrument Identification Among Cochlear Implant Users

Timothy Stoddard, MD, MS, Tanner Fullmer, BS, Alison Crane, BS, Christina Runge, PhD, David Friedland, MD, PhD

Department of Otolaryngology and Communication Sciences, Medical College of Wisconsin, Milwaukee, WI

ABSTRACT

METHODS RESULTS RESULTS

CONCLUSIONS AND FUTURE WORK

ACKNOWLEDGEMENTS

Temporal - Time Domain

Spectral - Frequency Domain

Spectrogram – Illustrates Time and Frequency Information

Clarinet

Trumpet

• Funding for this research was provided by the MED-EL Corporation

•CI users have difficulty identifying instruments, but can detect differences in timbre as well as NH subjects

• There is a trend toward poorer timbre discrimination when stimuli are outside an instrument’s characteristic range. For all subjects, performance was poorer in normalized, attack, and clipped conditions compared to the native condition, but this was not significant.

• CI users perceive temporal envelope differences but do not interpret them as associated with specific instruments

• NH listeners’ discrimination performance is affected by space between stimuli, whereas CI users’ performance is not. Specifically, NH listeners’ performance is better when the sound are concatenated than when separated in time. As such, NH listeners likely access fine structure cues that CI users cannot.

•This study did not identify any single feature of the temporal envelope critical to timbre discrimination. Future work will examine enhancement of temporal cues rather than removing them from stimuli.

• In addition, future work will focus on identifying and manipulating important harmonics to improve instrument identification, particularly those shown here that were most difficult for CI subjects.

Concatenated

Objective To identify how cochlear implant users utilize acoustic cues for music perception. Study Design Prospective cohorts of adult normal hearing (n=12) and cochlear implant (n=25) subjects. Methods Subjects were presented with acoustic samples of musical notes played by trumpet, clarinet, alto-saxophone, flute, and violin. Notes were modified to remove components of the temporal envelope such as the attack and decay. Spectral cues were controlled by normalizing instrument samples to the same pitch. Tests of instrument identification and instrument discrimination were performed with native and modified temporal and spectral stimuli. Tests were performed with notes spaced by 0.5 second or with no interval space. Results CI users scored significantly lower than NH listeners on instrument identification across all conditions. Performance worsened for both NH and CI users when either the attack or decay was removed from the temporal envelope of the stimulus. CI users could discriminate between stimuli as accurately as NH subjects when the stimuli were spaced apart. When stimuli were concatenated, timbre discrimination improved for NH listeners but not for CI users. Conclusion CI users have difficulty identifying musical instruments but can discern differences between them as well as NH subjects. This suggests that CI listeners perceive temporal envelope differences but are not interpreting them appropriately. In the absence of temporal envelope cues, NH listeners, in contrast to CI users, rely on harmonic fine structure to detect timbre differences. Auditory training and accentuation of fine structure are potential strategies to improve timbre discrimination among CI users.

INTRODUCTION

Cochlear implant (CI) users attain excellent speech recognition, but struggle to perceive basic elements of music including melody, pitch, and timbre. Timbre is the attribute of sound that distinguishes notes played at the same pitch, loudness, and duration. Temporal and spectral cues are both critical components of timbre perception.

The temporal envelope contains information such as the attack and decay, while the spectral content or fine structure includes the fundamental frequency and related harmonics.

Subjects Prospective cohorts of 25 CI users (mean age 58 + 16 years) and 12 NH adults. CI user inclusion criteria:

• Post-lingually deafened • Subjects were selected from an array of listening options including any of the current FDA approved devices, single-sided or bilateral implantation, experience with bimodal listening, any duration of post-lingual deafness, at least 6 months of CI use, and a range of musical experience • All subjects were tested in the CI alone condition

Stimuli Acoustic notes of trumpet (T), alto-saxophone (S), clarinet (C), flute (F), and violin (V) were downloaded from the Philharmonia Orchestra’s Sound Exchange™ website and altered using Adobe Audition™ to create the following conditions:

Spaced

Clarinet Trumpet Trumpet Clarinet

Silence

Statistical Analysis D-prime was calculated to quantify discrimination performance for experiments 2 and 3. Larger D’ values indicate better performance. A two-tailed t-test was used to compare mean performance scores between CI and NH subjects.

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

NTV NML ATT CLP

D’ CI

NH

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

NTV NML ATT CLP

D'

Flute + trumpet pairing

CI

NH

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

NTV NML ATT CLP

D'

Flute + sax pairing

CI

NH

0

0.5

1

1.5

2

2.5

3

3.5

4

FS CF ST CT FT CS

D'

NTV

NML

ATT

CLP

0

0.5

1

1.5

2

2.5

3

3.5

TS CT SV CV FT FV TV FS CF CS

D’

D’ Spaced Same-Different Task

CI Subjects

NH Subjects

Figure 3. NH subjects showed significantly better discrimination performance than CI users for flute-saxophone and flute-trumpet pairs when both attack and decay were removed from the temporal envelope (CLP condition) .

Figure 2. There were no significant differences in average D’ between NH and CI users in the native (NTV), normalized (NML), attack removed (ATT), and attack and decay removed (CLP) conditions.

0

0.5

1

1.5

2

2.5

3

3.5

4

FT TS CT SV FV CV TV CS FS CF

D’ Concatenated Same-Different Task

CI Subjects

NH Subjects

Figure 4. There was a trend toward poorer discrimination performance for each instrument pairing with Native (NTV) > Normalized (NML) > Attack (ATT) > Clipped (CLP), although these were not statistically significant. F= flute, S=saxophone, T=trumpet, C= clarinet.

D’ scores for CI users by instrument pairing and listening condition

P = 0.117

P = 0.225

P = 0.624

P = 0.173

P = 0.01 P = 0.047

To optimize speech perception in quiet, CI processors preserve the temporal envelope of frequency-specific bands but limit the delivery of fine-structure information. It’s been suggested CI users rely primarily upon the temporal envelope to assess timbre.

Figure 5. When a space was present between stimuli, both NH and CI users showed significantly poorer discrimination performance between flute-sax, clarinet-sax, and clarinet-flute comparisons. With concatenated stimuli, the CI users’ performance did not change, but the NH listeners showed marked improvement in the same instrument pairings. Flute, T=Trumpet, C=Clarinet, S=Sax, V=Violin. * p<.05 for comparisons between groups.

Timbre Discrimination

0

10

20

30

40

50

60

70

80

90

100

Perc

en

t C

orr

ect

Scale Category

Normal Hearing

CI User

Normalized Release Native Attack

Clipped

Experiment 2: Timbre Discrimination Stimuli were single notes of the T, S, C, and F instruments in four different conditions: Native: from instrument’s characteristic register • T = G4 • C = C5 • F = G5 • S = C4

Normalized: C5 pitch for all instruments Attack: C5 with attack removed Clipped: C5 with attack & decay removed Subjects were presented with 2 notes, each was 1 second in duration separated by 0.5 second of silence. They were asked if the notes were the same or different. A trial consisted of 576 pairs of randomly selected stimuli, with 4 iterations of all instrument combinations within the native, normalized, attack, and clipped conditions.

Experiment 3: Spaced vs Concatenated stimuli Stimuli were single notes of T, S, C, F, and V in the clipped condition (C5 with attack and decay removed). Subjects were presented with 2 notes of 1 s duration that were either separated by 0.5 second silence or were concatenated (no space between stimuli). After hearing two notes, subjects were asked whether the stimuli were the same or different.

Native: notes in the characteristic range of each instrument Normalized: G4-G5 major scale Attack: G4-G5 scale with attack removed Release: G4-G5 scale with decay removed Clipped: G4-G5 scale with attack and decay removed

Instrument identification among NH and CI users

Figure 1. Cochlear implant (CI) users scored significantly lower than normal hearing (NH) subjects in instrument identification in all conditions (p<.05).

Experiment 1: Instrument Identification Subjects were presented with five separate tests

consisting of randomly ordered eight-note scales

within each condition. After each scale, subjects

were asked to identify the instrument on a computer

screen

Concatenated