Supplementary Materials for Materials and Methods Larynx specimen The larynx specimen came from a 25...

12
www.sciencemag.org/cgi/content/full/337/6094/595/DC1 Supplementary Materials for How Low Can You Go? Physical Production Mechanism of Elephant Infrasonic Vocalizations Christian T. Herbst,* Angela S. Stoeger, Roland Frey, Jörg Lohscheller, Ingo R. Titze, Michaela Gumpenberger, W. Tecumseh Fitch* *To whom correspondence should be addressed. E-mail: [email protected] (W.T.F.); [email protected] (C.T.H.) Published 3 August 2012, Science 337, 595 (2012) DOI: 10.1126/science.1219712 This PDF file includes: Materials and Methods Supplementary Text Figs. S1 and S2 References (3245) Captions for Movies S1 and S2 Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/cgi/content/full/337/6094/595/DC1) Movies S1 and S2

Transcript of Supplementary Materials for Materials and Methods Larynx specimen The larynx specimen came from a 25...

www.sciencemag.org/cgi/content/full/337/6094/595/DC1

Supplementary Materials for

How Low Can You Go? Physical Production Mechanism of Elephant Infrasonic Vocalizations

Christian T. Herbst,* Angela S. Stoeger, Roland Frey, Jörg Lohscheller, Ingo R. Titze, Michaela Gumpenberger, W. Tecumseh Fitch*

*To whom correspondence should be addressed. E-mail: [email protected] (W.T.F.);

[email protected] (C.T.H.)

Published 3 August 2012, Science 337, 595 (2012) DOI: 10.1126/science.1219712

This PDF file includes:

Materials and Methods Supplementary Text Figs. S1 and S2 References (32–45) Captions for Movies S1 and S2

Other Supplementary Material for this manuscript includes the following: (available at www.sciencemag.org/cgi/content/full/337/6094/595/DC1)

Movies S1 and S2

2

Materials and Methods Larynx specimen

The larynx specimen came from a 25 years old female African elephant (Loxodonta africana, body mass 2500 kg), which died of natural causes (diaphragmatic elevation) in October 2010 in the Tierpark Berlin. The larynx and the tongue were excised several hours post-mortem on the same day, immediately packed in plastic bags to avoid drying and stored at –20° C. For the experiment, the trachea was cut 10 cm caudal to the cricoid cartilage. Most of the tongue was removed immediately rostral to the basihyoid by a transverse cut. The ventral, U-shaped part of the hyoid apparatus (the unpaired median basihyoid plus the two lateral thyrohyoids) remained connected to the larynx. The rostral parts of the larynx (epiglottis, laryngeal vestibulum) were removed by transverse cuts, in order to provide good view of the oscillating vocal folds.

CT scan

CT examination was performed using a Somatom Emotion multislice scanner (Siemens AG, Germany). The specimen, placed in a ventral recumbence, was scanned with 130 kV, 200 mA, 0,6 s and 1 mm thick slices. Images were reconstructed with the software OsiriX 3.7.1 64-bit (© Antoine Rosset)

Excised larynx setup

The excised larynx was mounted on a vertical air supplying metal tube (outer diameter 5 cm). The upper 10 cm of the specimen’s trachea (labeled ‘1’ in Figures 2 B, C and D) formed an airtight seal with that tube. The position of the larynx was fixed by suspending the basihyoid from the horizontal bar of a surrounding metal mount. The larynx was phonated by blowing warmed (ca. 36° Celsius (18)) and humidified (100 % relative humidity) air through the adducted glottis. Subglottal air pressure was controlled manually with a pressure valve, and was measured to be in the range of 9 to 60 mBar.

Larynx manipulation during experiment

As yet, no physiological information on vocal folds adduction (direction and amount of the applied force) in vivo is available. Since this is a first exploratory study, manual manipulation was favoured over a more rigidly controlled approach, to have a better chance of capturing the full spectrum of possible modes of phonation. In particular, vocal folds adduction was accomplished by squeezing the arytenoids (labeled ‘4’ in Figures 2 B, C and D) at their apexes, and either (a) pivoting the arytenoid apexes forward and down; or (b) moving them dorsally, in order to increase vocal folds elongation.

Electroglottographic recording

Electroglottography (EGG) provides a non-invasive method to monitor relative vocal fold contact area (VFCA) during phonation (32). A low intensity, high-frequency current is passed between two electrodes placed on each side of the thyroid cartilage (on the sternothyroid muscles) at vocal fold level. The movements and collisions of the vocal folds (resulting in opening and closing of the glottis) cause variations in the electrical

3

impedance across the larynx, resulting in a variation in current between the two electrodes (33, 34), which is related to vocal fold contact area (35). An EGG device measures the relative changes of vocal fold contact in the sagittal plane: at the maximum lateral excursion of the vibrating vocal folds their contact area is at a minimum (they are maximally separated from each other), and consequently the EGG signal amplitude is low. When, during the vibratory cycle, the vocal folds are maximally approximated (i.e. the glottis is sealed, preventing air flow), the EGG signal exhibits a local maximum. In this study, the EGG signal (VFCA) was captured with a Glottal Enterprises EG 2-1000 two-channel electroglottograph (lower cutoff-frequency 2 Hz).

Acoustic recording and pre-processing

The acoustic signal was recorded with a DPA 4061 omni-directional microphone positioned 7 cm from the vocal folds. Both acoustic and EGG signal were recorded with a RME Fireface 800 external soundcard at a sampling frequency of 44100 Hz. The signals were downsampled to 8000 Hz with the software package Cool Edit Pro 2.0 (2095.0). In order to reflect the time delay caused by the larynx-to-microphone distance, the acoustic signal was shifted forward in time by 0.2 milliseconds.

High-speed video

Vibration of the vocal folds was captured with a Canon EX-F1 digital high-speed video (HSV) camera operating at a rate of 600 frames per second. The camera was mounted 52 cm above the vocal fold level, allowing for the entire visible vocal fold portion to be captured. The HSV data were synchronized with the acoustic and EGG data by a periodic TTL signal that was routed to both the RME Fireface 800 external sound card and a light emitting diode (LED) visible in the camera focus. The time-varying colour intensity values of the blinking LED were extracted from the HSV, and the resulting signal was cross-correlated with the TTL signal captured by the sound card.

Acoustic and EGG signal analysis

Spectrograms of the acoustic signal were calculated with the software Praat 5.1.19 (view range 0 – 200 Hz, window length 1 second, dynamic range 50 dB). Based on the visual inspection of the time domain signals and their spectrograms, the recorded data were classified by signal type: periodic, sub-harmonic and chaotic (36, 37). The fundamental frequency of all the periodic signals was estimated with the software Praat (autocorrelation algorithm, pitch range 5 – 100 Hz, voicing threshold 0.45, octave cost 0.05). This algorithm is well suited for fundamental frequency estimation in elephant rumbles, if no other periodic (background) sounds are present in the analyzed signal (38).

Vocal fold vibration analysis using glottovibrograms (GVG) and phase portraits

High-speed video data were analyzed by segmenting within each image the glottal area (GA) which is enclosed by the vibrating vocal fold edges, using a specifically adapted region growing algorithm (39) (see Movie S2 for a sample of glottal contour extraction). Based on this information, the time-varying space between the vocal folds (glottal width) can be determined at each position (along the ventral-dorsal dimension) of the vocal folds for an entire high-speed sequence.

4

To describe the behaviour of a one-dimensional dynamical system, it can be mapped onto a so-called ‘phase space’, representing all its possible states. A phase portrait is the geometric representation of the trajectories of such dynamical system, allowing to describe the system’s evolution in time (40). For the purpose of this study, 3-dimensional phase portraits were calculated from EGG signals and from the time-varying glottal area (GA) data.

For visualizing the entire two-dimensional vibration patterns of the vocal folds so-called glottovibrograms (GVG) were computed (41). GVG visualize the time varying glottal width for each vocal fold position by colour-coding the computed distance values as pixel intensities. GVG provide information about the dependency of the varying glottal width (colour information) along the glottal axis (y-axis) from ventral to dorsal as a function of time (x-axis). Thus, the two-dimensional dynamics of the vibrating vocal folds are incorporated and can be assessed in a single graph.

Computer simulation of elephant vocal fold vibration

The model was created as a 5 x 6 x 7 = 210 mass-spring model (7). The model geometry and structure was determined according to our CT scans of the elephant larynx specimen, and based on recently reported anatomical and histological data (42) (see Figure S1). The layered tissue structure was assumed to be the same as in a human, with a ground-substance (gel-like) elastic shear modulus of 0.1 kPa. This assumption was motivated by the fact that the biomechanical tissue properties of the elephant vocal folds have never been investigated.

In vivo elephant vocalizations

The vocalizations analyzed for this study consisted of 474 individually identified “rumble” vocalizations of 7 adult female African elephants, which were recorded at the Vienna Zoo and the Basel Zoo between 2002 and 2010.

Supplementary Text Model for estimating fundamental frequency

As a crude approximation, the fundamental frequency of mammal voice production can be estimated by the “piano string” model as

F0 =12L

σρ , (1)

where L is the membranous length of the vocal folds, σ is the stress within the vocal

fold, and ρ is the tissue density (3). Assuming that σ and ρ were constant, a change in anatomical vocal fold length would be inversely and linearly related to fundamental frequency. In human males, for example, the average membranous vocal fold length is 1.597 cm (7), and the average speaking fundamental frequency was measured to be at about 120 Hz (3). Relating these values to the measured elephant vocal fold length of

5

10.4 cm (see Fig. 2 b – d), and solving for F0 in formula (1) results in a F0 of 18.43 Hz, very close to the mean F0 of 16.38 Hz measured in our experiments.

In both the AMC and the MEAD voice production mode, the fundamental frequency

is determined in the larynx. According to the source-filter theory (43), the vocal output is a superposition of the spectral characteristics of the laryngeal sound source, which creates a harmonic series with a fundamental frequency determined by the vibratory rate of the laryngeal tissue, with the vocal tract filter function. Recent research suggests that the frequency of vocal fold vibration may also be influenced by non-linear interactions between larynx and vocal tract: compliance (negative vocal tract reactance, regularly occurring just above each formant frequency) adds stiffness to the interactive vibrating system, thereby raising F0, whereas inertance (positive reactance, regularly occurring just below each formant frequency) adds mass, thereby lowering F0 (44). Therefore, the fundamental frequencies observed in in vivo elephant vocalizations (i.e. with an attached vocal tract) could be either slightly lower or higher than the data gathered in this experiment, dependent on actual vocal tract formant frequencies and their relation to the harmonics of the glottal sound source. In addition, the acoustic loading of the vocal tract might potentially introduce additional non-linear phenomena, such as subharmonics or chaos. Acoustic interaction with the vocal tract might also lower or raise the threshold lung pressure for phonation.

In evaluating these possibilities, the relationship between F0 and formants is

essential, and we therefore estimated the formant frequencies found in elephant vocalizations. As is customary in speech production theory, the elephant vocal tract is assumed to be a quarter wave resonator (3, 45). The resonance (formant) frequencies of the vocal tract can then be calculated as Fn = (2n – 1)(c/4L), where n is the formant number, c is the speed of sound (350 m/s for warm humid air), and L is the vocal tract length. Assuming a length of 0.75 m for the oral vocal tract and a length of 2.5 m for the nasal vocal tract (45), the lowest formant can be found in the vicinity of 117 Hz (oral) and 35 Hz (nasal) sounds, respectively. In the case of infrasound vocalizations, this means that the lowest formant frequency is at least two times greater than the fundamental frequency. With this relationship, non-linear interactions between the laryngeal sound source and the vocal tract are much less likely to happen than when the fundamental frequency is close to or higher than the lowest formant. Thus, in the case of infrasound vocalizations, source-filter interactions would only occur between formant frequencies and higher weaker harmonics (i.e. integral multiples of the fundamental frequency). This suggests that in infrasound vocalization, any nonlinear influence of the vocal tract is weaker than in other vocalization types, such as the trumpet, in which the fundamental frequency is well above the first formant (45). The non-linearities found in our data are (due to the absence of a vocal tract in an excised larynx setup) thus presumably caused by the biomechanical properties of the vibrating vocal folds themselves.

Relation of fundamental frequency to glottal flow

6

A particularly regular example of predominantly periodic vocal fold oscillation is illustrated in Figure S2 (A): Glottal air flow was reduced from a maximum of 1.662 l/s (corresponding to a tracheal pressure of ca. 50 mBar) to the point of cessation of phonation (0.974 l/s), and was increased again to the maximum. Phonation commenced in the second part of the sample at an air flow rate of 1.095 l/s. The phonation onset pressure in this example was measured to be ca. 20 mBar. The observed hysteresis of phonation onset and offset is an inherent property of the vibrating vocal folds, which is also typically observed in human phonation. Glottal resistance varied from 30.86 mBar per l/s at the maximum air flow rate to 18.26 mBar per l/s at the phonation threshold, suggesting that glottal resistance was non-linear over the examined flow range. According to the data from Figure S2 (B), fundamental frequency varied linearly with glottal flow. The observed hysteresis between negative-going (blue) and positive-going flow rates (green) may be caused by the pseudo-lung of our setup acting as a capacitor.

.

7

Fig. S1. Computer simulation of vibrating vocal folds. (A) and (B) top view and coronal section of 5 x 6 x 7 = 210 mass-spring model of the elephant vocal folds at maximum vocal fold displacement during vibration. Thyroid and arytenoid cartilages are shown in light gray, modelled tissue is shown in orange. The black lines in A (going from ventral to dorsal) represent tissue fibres in the vocal folds. (C) acoustic output of the model for phonation onset. The fundamental frequency of vocal folds vibration was measured to be 17.5 Hz.

8

Fig. S2 Phonation as a function of transglottal air flow. (A) from top to bottom: averaged transglottal air flow – a moving average filter (window length: 201 frames, sampling frequency of data: 1000 Hz) was applied; normalized electroglottographic signal; acoustic signal; and spectrogram (analysis parameters as in Fig. 3). (B) Relationship between transglottal air flow [l/s] and fundamental frequency [Hz]. The two periodic sequences from (A) are displayed in blue (t ≈ 43 – 58 s) and green (t ≈ 75 – 98 s), respectively.

9

Movie S1 Abrupt transition from chaotic to periodic vocal fold vibration. High-speed video sequence showing the elephant vocal folds in our excised larynx setup. Note that a gradual reduction of vocal fold elongation (and thus vocal fold tension) results in an abrupt transition from a chaotic to a periodic vibratory regime. The dorsal end of the vocal folds (inserting into the arytenoid cartilages) is oriented upwards in the image. The total duration is 0.667 s (400 video frames at 600 frames per second).

Movie S2 Illustration of glottal contour extraction from high-speed video. High-speed video sequence showing the elephant vocal folds vibrating at a subharmonic regime (period doubling). The dorsal end of the vocal folds (inserting into the arytenoid cartilages) is oriented upwards in the image. The total duration is 1.667 s (1000 video frames at 600 frames per second). The extracted vocal fold edges (41) were superimposed upon the individual images (right = blue; left = red).

References and Notes 1. D. K. Mellinger, C. W. Clark, Blue whale (Balaenoptera musculus) sounds from the North

Atlantic. J. Acoust. Soc. Am. 114, 1108 (2003). doi:10.1121/1.1593066 Medline

2. G. Jones, Scaling of echolocation call parameters in bats. J. Exp. Biol. 202, 3359 (1999). Medline

3. I. R. Titze, Principles of Voice Production (National Center for Voice and Speech, Iowa City, IA, 2000), vol. Second Printing.

4. D. F. N. Harrison, The Anatomy and Physiology of the Mammalian Larynx (Cambridge Univ. Press, New York, 1995).

5. C. P. Elemans, R. Laje, G. B. Mindlin, F. Goller, Smooth operator: Avoidance of subharmonic bifurcations through mechanical mechanisms simplifies song motor control in adult zebra finches. J. Neurosci. 30, 13246 (2010). doi:10.1523/JNEUROSCI.1130-10.2010 Medline

6. J. van den Berg, J. Speech Hear. Res. 3, 227 (1958).

7. I. R. Titze, The Myoelastic Aerodynamic Theory of Phonation (National Center for Voice and Speech, Iowa City, IA, 2006).

8. K. McComb, Female choice for high roaring rates in red deer, Cervus elaphus. Anim. Behav. 41, 79 (1991). doi:10.1016/S0003-3472(05)80504-4

9. D. Reby, K. McComb, Anatomical constraints generate honesty: Acoustic cues to age and weight in the roars of red deer stags. Anim. Behav. 65, 519 (2003). doi:10.1006/anbe.2003.2078

10. I. R. Titze et al., Vocal power and pressure-flow relationships in excised tiger larynges. J. Exp. Biol. 213, 3866 (2010). doi:10.1242/jeb.044982 Medline

11. D. Sissom, D. Rice, G. Peters, How cats purr. Zool. Soc. London 223, 67 (1991). doi:10.1111/j.1469-7998.1991.tb04749.x

12. The glottis (lateral rima glottidis) is the elongated opening between the vocal folds.

13. J. E. Remmers, H. Gautier, Neural and mechanical mechanisms of feline purring. Respir. Physiol. 16, 351 (1972). doi:10.1016/0034-5687(72)90064-3 Medline

14. R. Husson, Ètude des Phénomènes Physiologiques et Acoustiques Fondamentaux de la Voix Chantée (Thesis) (Éditions de La Revue Scientifique, Paris, 1950).

15. A. S. Stoeger-Horwath, S. Stoeger, H. M. Schwammer, H. Kratochvil, Call repertoire of infant African elephants: First insights into the early vocal ontogeny. J. Acoust. Soc. Am. 121, 3922 (2007). doi:10.1121/1.2722216 Medline

16. K. M. Leong, A. Ortolani, K. D. Burks, J. D. Mellen, A. Savage, Quantifying acoustic and temporal characteristics of vocalizations for a group of captive African elephants Loxodonta africana. Bioacoustics 13, 213 (2003). doi:10.1080/09524622.2003.9753499

17. J. H. Poole, K. Payne, W. R. Langbauer Jr., C. J. Moss, The social contexts of some very low frequency calls of African elephants. Behav. Ecol. Sociobiol. 22, 385 (1988). doi:10.1007/BF00294975

18. M. Garstang, Long-distance, low-frequency elephant communication. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 190, 791 (2004). doi:10.1007/s00359-004-0553-0 Medline

19. W. R. Langbauer, Elephant communication. Zoo Biol. 19, 425 (2000). doi:10.1002/1098-2361(2000)19:5<425::AID-ZOO11>3.0.CO;2-A

20. K. McComb, D. Reby, L. Baker, C. Moss, S. Sayialel, Long-distance communication of acoustic cues to social identity in African elephants. Anim. Behav. 65, 317 (2003). doi:10.1006/anbe.2003.2047

21. J. Soltis, K. Leong, A. Savage, African elephant vocal communication II: Rumble variation reflects the individual identity and emotional state of callers. Anim. Behav. 70, 589 (2005). doi:10.1016/j.anbehav.2004.11.016

22. K. B. Payne, W. R. Langbauer Jr., E. M. Thomas, Infrasonic calls of the Asian elephant (Elephas maximus). Behav. Ecol. Sociobiol. 18, 297 (1986). doi:10.1007/BF00300007

23. J. Shoshani, in Encyclopedia Britannica (Encyclopædia Britannica, Chicago, IL); available at www.britannica.com/EBchecked/topic/184366/elephant/234256/Sound-production-and-water-storage.

24. J. Shoshani, Understanding proboscidean evolution: A formidable task. Trends Ecol. Evol. 13, 480 (1998). doi:10.1016/S0169-5347(98)01491-8 Medline

25. M. S. Fee, B. Shraiman, B. Pesaran, P. P. Mitra, The role of nonlinear dynamics of the syrinx in the vocalizations of a songbird. Nature 395, 67 (1998). doi:10.1038/25725 Medline

26. G. Peters, Purring and similar vocalizations in mammals. Mammal Rev. 32, 245 (2002). doi:10.1046/j.1365-2907.2002.00113.x

27. F. P. Möhres, in Les Systèmes Sonars Animaux: Biologie et Bionique, R.-G. Busnel, Ed. (Laboratoire de Physiologie, Paris, 1967), pp. 401–407.

28. T. A. Griffiths, Mammalia 47, 377 (1983).

29. M. D. Hauser, The evolution of nonhuman primate vocalizations: Effects of phylogeny, body weight, and social context. Am. Nat. 142, 528 (1993). doi:10.1086/285553 Medline

30. F. Alipour, S. Jaiswal, Glottal airflow resistance in excised pig, sheep, and cow larynges.J. Voice 23, 40 (2009). doi:10.1016/j.jvoice.2007.03.007 Medline

31. I. R. Titze, The physics of small-amplitude oscillation of the vocal folds. J. Acoust. Soc. Am. 83, 1536 (1988). doi:10.1121/1.395910 Medline

32. P. Fabre, Percutaneous electric process registering glottic union during phonation: Glottography at high frequency; first results. Bull. Acad Natl. Med. 141, 66 (1957). Medline

33. A. J. Fourcin, E. Abberton, First applications of a new laryngograph. Med. Biol. Illus. 21, 172 (1971). Medline

34. R. J. Baken, Electroglottography. J. Voice 6, 98 (1992). doi:10.1016/S0892-1997(05)80123-7

35. R. C. Scherer, D. G. Druker, I. R. Titze, in Vocal Fold Physiology, Vol. 2: Voice Production, Mechanisms and Functions, O. Fujimura, Ed. (Raven Press, New York, 1988), pp. 279–290.

36. W. T. Fitch, J. Neubauer, H. Herzel, Calls out of chaos: The adaptive significance of nonlinear phenomena in mammalian vocal production. Anim. Behav. 63, 407 (2002). doi:10.1006/anbe.2001.1912

37. I. Titze, Workshop on Acoustic Voice Analysis. Summary Statement (National Center for Voice and Speech, Iowa City, IA, 1995).

38. P. J. Venter, J. J. Hanekom, Automatic detection of African elephant (Loxodonta africana) infrasonic vocalisations from recordings. Biosystems Eng. 106, 286 (2010). doi:10.1016/j.biosystemseng.2010.04.001

39. J. Lohscheller, H. Toy, F. Rosanowski, U. Eysholdt, M. Döllinger, Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos. Med. Image Anal. 11, 400 (2007). doi:10.1016/j.media.2007.04.005 Medline

40. P. Bergé, Y. Pomeau, C. Vidal, Order Within Chaos: Towards a Deterministic Approach to Turbulence (Hermann and John Wiley & Sons, Paris, 1984).

41. J. Lohscheller, U. Eysholdt, H. Toy, M. Dollinger, Phonovibrography: Mapping high-speed movies of vocal fold vibrations into 2-D diagrams for visualizing and analyzing the underlying laryngeal dynamics. IEEE Trans. Med. Imaging 27, 300 (2008). doi:10.1109/TMI.2007.903690 Medline

42. P. Kühhaas, Morphologie des Larynx des Afrikanischen Elefanten (Loxodonta africana) (thesis, University of Veterinary Medicine Vienna, Vienna, Austria, 2011).

43. G. Fant, Acoustic Theory of Speech Production (Mouton and Co., 's-Gravenhage, Netherlands, 1960).

44. I. R. Titze, Nonlinear source-filter coupling in phonation: Theory. J. Acoust. Soc. Am. 123, 2733 (2008). doi:10.1121/1.2832337 Medline

45. J. Soltis, Vocal communication in African elephants (Loxodonta africana). Zoo Biol. 29, 192 (2010). Medline