Communication Acoustics Karjalainen
description
Transcript of Communication Acoustics Karjalainen
M. Karjalainen1
• General introduction
• Communication by sound and voice
– Examples of communication situations
• Systems approach to communication
• Modeling and theory formation in research
Chapter 1: Introduction
M. Karjalainen2
Information Transmission by Sound
Environmental orientation by sound
M. Karjalainen3
Communication by Speech
Speech communication via acoustic medium
M. Karjalainen4
Communication by Music
Music via acoustic medium
M. Karjalainen5
Communication by Music
Origins of speech and music ?
Speech has been important in evolution by what about music?
Role of music: just a side product or important factor?
- Charles Darvin: Important for mating etc.
Two interesting recent books:
Steven Mithen: “The Singing Neanderthals ---
The Origins of Music, Language, Mind, and Body”
Harward University Press, 2006
Daniel J. Levitin: This is Your Brain on Music ---
The Science of a Human Obsession, PLUME 2006
M. Karjalainen6
Speech Transmission
Speech communication electronic medium
M. Karjalainen7
Virtual Acoustic Reality
Virtual instrument in virtual space
M. Karjalainen8
Man-Machine Communication by Speech
Speech synthesis and recognition
M. Karjalainen9
A Black-Box Approach
Input-output relationship
M. Karjalainen10
A Systems Approach
A multi-level system
M. Karjalainen11
Systemic Concepts
• Element (part of a whole, entity)
• Relation / property• Structure (relatively permanent properties of a system)
• Function(ality) (relatively variant properties of a system)
• Event (a relatively discrete change, typically in time)
• State• Object• Type (class)• System• Control• Process• Organization• Hierarchy / heterarchy• Data / information / knowledge (communication, language)
M. Karjalainen12
Abstraction in Modeling and Theory Formation
Abstraction hierarchy
M. Karjalainen13
Communication by Sound and Voice
hardware
software
functionware
contentware
Physics
Cognition
Signals
Information
Analysis Synthesis
M. Karjalainen1
Chapter 2: Acoustics
This is background information that is not asked directlyin the exam, but knowing it certainly helps, especially if you need to apply your knowledge in practice.
M. Karjalainen2
Chapter 2: Acoustics
Sound as physical phenomenon
When a tree in a forrest falls, and there isno one to listen, does it make a sound?
• Vibration – generation of sound
• Sound radiation
• Sound propagation
• Reflection, absorption,
• Diffraction, refraction
• Standing waves
• Resonance, resonators
M. Karjalainen3
Vibrating systems
• Simple vibration: mass–spring system
M. Karjalainen4
Vibrating systems
Undamped and damped oscillation
M. Karjalainen5
Resonance
Mass-springresonator
Helmholtz-resonator
M. Karjalainen6
Two-mass vibrating system
Transversal and longitudinal vibration of a two-mass system
M. Karjalainen7
Vibration modes of a string
M. Karjalainen8
Wave propagation
Wave equation:
D’Alembert:
M. Karjalainen9
Sound pressure, sound pressure level, decibel
Sound pressure: p [Pa]
Sound pressure level:
Reference:
M. Karjalainen10
Wave phenomena: spherical wave
Sound velocity in the air:
Spherical wave:
M. Karjalainen11
Wave phenomena: planar wave
Planar wave in a tube:
Reflection (and transmission):
M. Karjalainen12
Lowest resonance modes in a tube
Open ends One end closed
M. Karjalainen13
Spectral content of string vibration
M. Karjalainen14
Bar and membrane modes
Bar
Membrane
M. Karjalainen15
Reflection and refraction (bending)
M. Karjalainen16
Diffraction
M. Karjalainen17
Sound propagation paths in a room
M. Karjalainen18
Sound field decay in a room
Tapiola-sali
M. Karjalainen19
Sound field in a room, Computer simulation
M. Karjalainen20
Sound field level in a reverberant room
M. Karjalainen21
Modal behavior in a room
L i = dimensions of a rectangular roomn i = integer indices 0, 1, 2, ...
measured magnitude response in a room
M. Karjalainen22
Sound propagation by image source model
Solid line = real path; dotted line virtual path
M. Karjalainen23
Electroacoustics: Loudspeaker
principle driver structure enclosure
Dynamic loudspeaker
M. Karjalainen24
Electroacoustics: Microphone
principle construction
Condenser microphone
M. Karjalainen1
Chapter 3: Sound and Voice as Signals
This is background information that is not asked directlyin the exam, but knowing it certainly helps, especially if you need to apply your knowledge in practice.
M. Karjalainen2
Sound and Voice as Signals
• Signal as a mathematical function:– Pure tone:
– Random signal:
• Discrete-time numeric sequence
In signal representations a physical or abstract variable is typically reptesented as a function of time, such as:
Continues ...
M. Karjalainen3
Sound and Voice as Signals
Continues ... • Graphical presentations:
sinewave random noise
speech waveformsample sequence
unit impulse unit pulse
M. Karjalainen4
Linear and time-invariant (LTI) systems
• Any (stable) LTI system can be fullyrepresented by its impulse response
• Output cannot include any frequencies thatare not in the input (no nonlinear distortion)
• Any bandlimited LTI system can beapproximated by digital filters with arbitraryaccuracy (theoretically)
Properties of LTI systems:
M. Karjalainen5
Convolution
Signal processing algorithms
Fourier analysis
M. Karjalainen6
Signal processing algorithms
Fourier synthesis
Convolution vs. Fourier transform
M. Karjalainen7
Decomposition of sawtooth waveform
M. Karjalainen8
Spectrum analysis
Magnitude spectrum
Phase spectrum
Group delayPhase delay
M. Karjalainen9
Fourier analysis with windowing
• Rectangular window
• Hamming window
• Hann(ing) window
• Kaiser window
• Blackman (Blackman-Harris) window
M. Karjalainen10
Spectrum analysis using Fourier analysis with windowing
Sine wave
Sine wavewindowed
synchronously
Sine wavewindowed non-synchronously
Sine wave,Hamming-windowed
M. Karjalainen11
Vowel spectra
M. Karjalainen12
Time-frequency representations: Spectrogram
Word: /kaksi/
M. Karjalainen13
Auto- and cross-correlation
Cross-correlationAutocorrelation
M. Karjalainen14
Cepstrum
• Compute Fourier transform
• Logarithm of (power) spectrum
• Inverse Fourier transform
M. Karjalainen15
Digital signal processing: DSP systems
• Analog-to-digital (A/D) converter
• Digital signal processor (+ software)
• Digital-to-analog (D/A) converter
M. Karjalainen16
Signal quantization: A/D conversion
• Linear quantization (PCM-coding)
• Discrete levels: 2n (n= bit number)
• 16–24 bits/sample in audio ( 96 dB SNR)
• Sample rate: 44100 or 48000 samples/sec
M. Karjalainen17
Z-transform
Linear transform of sequence x(n) :
Unit delay as building element:
Digital filtering can be expressed as
rational function (or polynomial) of z-1
M. Karjalainen18
Digital filtering: FIR filters
FIR = finite impulse response filter
M. Karjalainen19
Digital filtering: IIR filters
IIR = infinite impulse response filter
M. Karjalainen20
Linear prediction (AR-modeling)
Modeling of signal generation with flat
spectrum excitation (impulse or noise)
and IIR (all-pole) filter. Speech example:
Signal
Windowed
FFT-spectrum
LP-spectra
M. Karjalainen21
Neural networks
MLF = multilayer feedforward network = multilayer perceptron
Input layer + hidden and output layer nodeswith sigmoidal nonlinearity
Backpropagation algorithm for training
M. Karjalainen22
Hidden Markov models (HMM)
For probabilistic modeling of state sequences
Used especially in speech recognition
M. Karjalainen23
Audio reproduction: loudspeaker response
Magnitude response of a non-ideal loudspeaker
M. Karjalainen24
Group delay response of a loudspeaker
M. Karjalainen25
Reproduction quality: Distortion and SNR
Nonlinearity results in distortion: Sine wave inputresults in generation of harmonic components A(i)Distortion (usually given in %):
Signal-to-noise ratio (SNR):
Distortion in general is discussed in later chapters
M. Karjalainen26
Response equalization
Non-flat magnitude response can be equalized(flattened), by digital filtering.
Example by so-called frequency-warped filters
M. Karjalainen1
Chapter 4: Speech and Music
• Speech communication
• Speech production:– Speech production mechanism
– Vocal cords – phonation
– Vocal and nasal tract – articulation
– Units and notation of speech: vowels, consonants
– Prosody of speech
– Modeling of speech production
• Singing voice
• Speech processing: analysis, synthesis, coding, recognition
• Musical instruments as sound sources
• Music signal processing– Sound synthesis techniques
– Physical modeling
– Digital audio vs. music
M. Karjalainen2
Speech communication chain
M. Karjalainen3
Speech production mechanism
M. Karjalainen4
Phonation and articulation
• Vocal cords (vocal folds) — phonation– Generation and controlling of voiced sound at glottis
• Vocal tract and nasal tract — articulation– Controlling of voice features by articulation organs
• Concepts:– Glottis (vocal cord opening)
– Voiced / unvoiced / combined
– Constriction
– Formant (and antiformant)
– Vowel / consonant
– Prosodic features
M. Karjalainen5
Units and notation of speech – Phonetics
• Phonetics: study and description of spoken language
• Languages and language families
– Indo-European, Finno-Ugric, …
• Phonetic alphabet:
– IPA (International Phonetic Alphabet)
– Computerized: SAMPA, Worldbet, ...
• Units of spoken language:
– Phoneme (smallest linguistic unit), abstract unit class
– Allophone (variant of a phoneme)
– Phone (äänne in Finnish), a concrete unit of speech
– Diphone (from mid phone via transition to the mid of next one
– Triphone (similar combination of three successive phones)
– Speech segment (typically subunit of a phone)
M. Karjalainen6
Vowels (Finnish)
• Front–back (etisyys: etu–taka)
• Open–closed (suppeus: suppea–väljä)
• Rounded–unrounded (lavea–pyöreä)
M. Karjalainen7
Consonants (Finnish)
• Articulation place (ääntämispaikka):
– Labial, dental, palatal, velar, laryngeal
• Articulation manner (ääntämistapa)
– Stop consonant (klusiili), fricative (frikatiivi), nasal(nasaali) tremulant (tremulantti), lateral (lateraali),semivowel (puolivokaali)
M. Karjalainen8
Prosody (suprasegmental features)
• Intonation (intonaatio)
– Primarily by fundamental frequency trajectory
• Stress (paino)
– Primarily by intensity (loudness) of pronounciation
• Timing (ajoitus)
– Rhythmic pattern (primarily by segment durations)
M. Karjalainen9
Modeling of speech production
• Simplification of the speech production mechanism
– Acoustic model
M. Karjalainen10
Circuit model (transmission-line model)
• Glottal oscillator
– Varying cross-section between vocal cords
• Vocal tract as a transmission line
– Two-directional wave propagation
• Lip radiation (acoustic load)
• Variables: pressure and volume velocity
M. Karjalainen11
Signal model = Source-Filter model
• Source = excitation– (a) voiced = quasiperiodic excitation
– (b) unvoiced = noislike excitation
• Filter = vocal and nasal tract
M. Karjalainen12
Glottal oscillation
• Phonation = vibration of vocal folds– Glottal opening is a function of time:
• Open phase, closed phase
• Glottal closure event generates the mainexcitation to the vocal tract
M. Karjalainen13
Formants (tract resonances)
• Example: resonances of a homogeneous tube– Volume velocity transfer function
– 17 cm tube corresponds to typical male vocal tract
– quarter waveleght resonator with resonances at
M. Karjalainen14
Vocal tract transfer functions: vowel /i/
• Inhomogeneous vocal tract area profile /i/– Constriction in frontal tract
– Cavity in the rear part of tract
– First formant down from neutral position
– Second formant up from neutral position
M. Karjalainen15
Radiation directivity of speech
• Omnidirectional at low frequencies
• Increased frontal directivity at high frequencies
Azimuth Elevation
M. Karjalainen16
Singing voice
• Classical singing style– `Singers formant´ around 3 kHz makes voice more audible
– In soprano singing the high fundamental frequency or aharmonic component should match a formant
• Singing in popular music– Style and way of voice production is free since
amplification makes it loud anyway
– Personality of voice is important
M. Karjalainen17
Speech processing
• Speech analysis
– Feature analysis of speech signals
• Speech synthesis
– Typically synthesis from text
• Speech recognition
– From speech to text or commands
• Speech coding
– Compression for transmission or storage
• Speech enhancement
– Improving degraded speech signals
M. Karjalainen18
Formant synthesis models
• Cascaded and parallel filter models
M. Karjalainen19
Synthesis by waveform concatenation
• Overlap-add reconstruction of voiced speech
– Fundamental frequency (pitch) can be changed
M. Karjalainen20
Text-to-speech synthesis
• Transforming text to speech signal
– Language-dependent text processing
– Speech signal production quite language-independent
M. Karjalainen21
Text-to-speech synthesis
M. Karjalainen22
Speech coding
• Speech signal analysis
– Typically model-based (linear prediction) where source and
filter parameters are analyzed from speech signal
• Quantization of the parameters (bit compression)
• Transmission or storage of parametrized speech
• Reconstruction of parameters
• Reconstruction of speech signal
• Encoding -> transmission -> decoding
M. Karjalainen23
Speech recognition
• Feature analysis of signal
– Typically mel cepstral coefficients
– Compression of data & redundancy removal
• Pattern recognition
– Comparison to speech units
– Typically by Hidden Markov Models (HMM)
• Possible postprocessing
– Language modeling
• Formal grammar
• Unlimited text is difficult
M. Karjalainen24
Musical instrument sounds
• String instruments
– Plucked string instruments
– Struck string instruments
– Bowed string instrument
• Wind instruments
– Brass instruments
– Woodwind instruments
• Percussion instruments
– Drums etc.
M. Karjalainen25
Modeling of musical instruments (string modeling)
• String model– Two-dimensional waveguide (transmission line)
– Excitation (pluck) inserted to both delay lines
– Wave reflections at terminations modeled as filters
– Output is taken at bridge or pickup, sum of both lines
– The same model is applicable to wind instrument bores(but there is a nonlinear oscillating feedback in them)
M. Karjalainen26
Simplified string modeling
• String model reduction (signal model)– Two delay lines can be combined to one
– Filters in the loop can be combined to a single loop filter
– Computation is more efficient
– So-called Karplus-Strong model is a simplified case wherean intial random noise is inserted in the delay line beforesynthesis and loop filter is a simple two-tap FIR filter
M. Karjalainen27
Impulse response of a simple string model
• Impulse and magnitude responses of the previous model
M. Karjalainen28
Body response modeling
• String instrument body works like an LTI system (filter)
Impulse
response
Magnitude
response
(low frequencies)
M. Karjalainen1
Chapter 5: Structure and Function of Hearing
• Peripheral hearing– External ear
– Middle ear
– Inner ear (cochlea)• Basilar membrane
• Hair cells
• Auditory nerve
• Active cochlea and nonlinearities
• Higher levels of the auditory system
• Basic properties of human hearing– Effective hearing area (level vs. frequency)
– Equal loudness curves
– Technical measures related to hearing• Sound level and frequency weighting functions
M. Karjalainen2
Approaches to hearing research
• Anatomy of hearing
– The structure of hearing organs is studied
• Physiology of hearing
– The (physiological) responses of hearing to physical
sound stimuli are studied
• Psychology of hearing
– Functional properties of auditory perception are studied
as subjects reactions to physical sound stimuli
• The main interest here is ’Engineering psychoacoustics’ and
computational models of auditory functions
M. Karjalainen3
Peripheral hearing
• External ear (outer ear) Middle ear Inner ear
M. Karjalainen4
Schematic of peripheral hearing
• External ear (outer ear) Middle ear Inner ear
M. Karjalainen5
External ear and ear canal transmission
• Transfer functions– Frontal sound source to the eardrum (solid line)
– Entrance of ear canal to the eardrum (dotted line)
• Head-related transfer functions (HRTFs) discussed later
M. Karjalainen6
Middle ear: Bone conduction
• Ossicles– Malleus (hammer-shaped bone)
– Incus (anvil-shaped bone)
– Stapes (stirrup-shaped bone)
• Impedance match from air to liquid (1:3000)
M. Karjalainen7
Animations of middle ear function
Animations: University of Wisconsin http://www.neurophys.wisc.edu/~ychen/auditory/fs-auditory.html
M. Karjalainen8
Middle ear conduction and features
• Signal transfer function is a bandpass filter
• Other middle ear features:– Acoustic reflex
– Eustachian tube
M. Karjalainen9
Inner ear: the cochlea
• Cochlea is a spiral-shaped, liquid-filled tube of about2.7 turns and 35 mm long
• Stapes vibration enters to cochlea through oval window
• Another window to mid-ear is called round window
• Basilar membrane divides the cochlea into two parts
Cochlea linearized
M. Karjalainen10
Cross-section of the cochlea
• Basilar membrane between bony shelves– Division to scala vestibuli and scala tympani
• Reissner’s membrane separates scala media
• Organ of Corti: hair cells
• Tectrorial membrane
M. Karjalainen11
Basilar membrane motion: traveling waves
• Basilar membrane is a nonhomogeneous transmission line:– Wider and more massive towards apex
– Sound pressure entering the liquid of cochlea generates atraveling wave along the basilar membrane
– Traveling wave has maximum vibration amplitude dependingon the frequency of wave (characteristic frequency = C.F.)
– High frequencies resonate close to the oval window and lowfrequencies close to helicotrema
M. Karjalainen12
Animation of basilar membrane motion
M. Karjalainen13
Basilar membrane response to a square-wave signal
• Time–position–amplitude pattern of basilar membranemovement as a response to square-wave signal
M. Karjalainen14
Hair cells
• Inner hair cells, in one row
• Outer hair cells, in 3-5 rows
• Together about 15000 – 16000 hair cells
• Each hair cell is equipped on top with u-, v-, or w-shaped filament called stereocilia
• Neural fibers are connected to hair cells
M. Karjalainen15
Hair cells in the organ of Corti
M. Karjalainen16
Stereocilia (= ’hair bundles’ of hair cells)
M. Karjalainen17
Movement of the organ of Corti
M. Karjalainen18
Movement and activation of hair cells
M. Karjalainen19
Hair cells: neural conduction
• Vibration of the basilar membrane causes bending ofstereocilia and this opens ion channels which modulatespotential within the cell
• Activation of the cell releases neurotransmitter tosynaptic junctions between hair cell and neural fibers ofthe auditory nerve
• A neural spike is generated that propagates in theauditory nerve fiber
• Next spike possible only after at least 1 ms
M. Karjalainen20
Activation and inhibition of hair cells
• Asymmetrical effect of sterocilia bending on firing rate
• Cochlear potentials
M. Karjalainen21
Phase-locking and synchrony of neural firing
• Statistically phase-lockedwithin half cycle
• Statistical synchrony ofneural firing
M. Karjalainen22
Passive vs. active cochlea
• Georg von Békésy found basilar membrane behavior by
experimention with ears from dead animals
=> reduced frequency resolution
• Explanation: second filter needed
• Now it is known that the cochlea is active:
– Especially at low signal levels the outer hair cells amplify
basilar membrane motion
• Outer hair cells receive many efferent neural fibers from
higher neural levels
• Outer hair cells are able to change their length very
rapidly (in synchrony with high audio frequencies)
• Otoacoustic emission (cochlear echo) as a response to
external stimulus, recordable in near canal, is related to
this phenomenon
M. Karjalainen23
Auditory nerve responses: firing rate
• Steady-state firing rate is a saturating function with
spontaneous rate (= without sound excitation)
• There are fibers with different sensitivity (and
spontaneous rate)
M. Karjalainen24
Poststimulus time histogram (PST)
• Firing rate overshoot and undershoot with onset and
offset of excitation
– Works like automatic gain control
M. Karjalainen25
PST with steady-state sinusoidal excitation
• Statistically, half-wave rectification appears along with
automatic gain control
M. Karjalainen26
Firing rate saturation for a vowel excitation
• For increasing level of excitation, the firing rate profile
(’neural activation spectrum’) saturates
M. Karjalainen27
Tuning curves for constant firing level
• If the firing rate of a neural fiber is kept constant for varying
excitation frequency, a tuning curve is obtained
• This characterizes the frequency selectivity of cochlea
M. Karjalainen28
Effects of active cochlea
• Low-level signals are amplified substantially byactive cochlea:– Sensitivity of hearing is increased
– Due to AGC-like compression, the narrow dynamic range(about 25 dB) of hair cells is expanded to more than 100 dB
• Selectivity (frequency resolution) is increased(especially at low signal levels) due to active function
• If outer hair cells are damaged, the activeamplification is degraded or disappears– Loss of auditory sensitivity
– Tuning curves are broadened
– Otoacoustic emissions disappear
M. Karjalainen29
Cochlear nonlinearity: Two-tone suppression
• Addition of another tone (shaded area in figure below)
suppresses the activation due to probe tone at its characteristic
frequency (= kind of masking)
M. Karjalainen30
Cochlear nonlinearity: Combination tones
• Nonlinear interaction of two tones generates
new tones that are perceived:
– Difference tone: fdiff = f2 – f1• E.g.: 1.1 kHz and 1.0 kHz => 100 Hz
– Cubic difference tone: fcubic = 2f1 – f2• E.g.: 1.0 kHz and 1.1 kHz => 900 Hz
• Appears already at low level of excitation
M. Karjalainen31
Central auditory system
• Higher-level functions
not known well.
• Cochlear nucleus has
specific cells such as
’chopper cells’ that do
temporal processing.
Spectral information is
recovered unsaturated.
• Binaural hearing starts
at superior olive level.
• Auditory cortex is the
center for processing
perceptions and
integrating the sound
scene.
• Interaction with other
senses (vision) strong.
M. Karjalainen32
Dynamic range of hearing
Sound
level
’thermo-
meter’
6 dB steps
3 dB steps
1 dB steps
M. Karjalainen33
Equal loudness curves and threshold of hearing
• Equal loudness level perception, unit phone = SPL at 1 kHz
M. Karjalainen34
Sound level and frequency weighting curves
• Weighting filters for sound level measurement (A most common)
M. Karjalainen35
Recommended frequences and bands
• Recommended
frequences and
frequency bands
for measurements
and technical
applications:
• Octave = 2:1
• 1/2 octave
• 1/3 octave
M. Karjalainen36
Filtered noise demo
• White noise
• Low-pass filtered noise,
decreasing cutoff frequency
• High-pass filtered noise,
increasing cutoff frequency
• 1/3 octave noise,
increasing center frequency
• White and pink noise
M. Karjalainen1
Chapter 6: Fundamentals of Psychoacoustics
• Psychoacoustics = auditory psychophysics
• Sound events vs. auditory events– Sound stimuli types, psychophysical experiments
– Psychophysical functions
• Basic phenomena and concepts– Masking effect
• Spectral masking, temporal masking
– Pitch perception and pitch scales• Different pitch phenomena and scales
– Loudness formation• Static and dynamic loudness
– Timbre• as a multidimensional perceptual attribute
– Subjective duration of sound
M. Karjalainen2
Psychophysical experimentation
• Sound events (si) = pysical (objective) events
• Auditory events (hi) = subject’s internal events– Need to be studied indirectly from reactions (bi)
• Psychophysical function h=f(s)• Reaction function b=f(h)
M. Karjalainen3
Sound events: Stimulus signals
• Elementary sounds– Sinusoidal tones
– Amplitude- and frequency-modulated tones
– Sinusoidal bursts
– Sine-wave sweeps, chirps, and warble tones
– Single impulses and pulses, pulse trains
– Noise (white, pink, uniform masking noise)
– Modulated noise, noise bursts
– Tone combinations (consisting of partials)
• Complex sounds– Combination tones, noise, and pulses
– Speech sounds (natural, synthetic)
– Musical sounds (natural, synthetic)
– Reverberant sounds
– Environmental sounds (nature, man-made noise)
M. Karjalainen4
Sound generation and experiment environment
• Reproduction techniques– Natural acoustic sounds (repeatability
problems)
– Loudspeaker reproduction
– Headphone reproduction
• Reproduction environment– Not critical in headphone reproduction
– Anechoic chamber (free field)• Room effects minimized
• Not a natural environment
– Listening room• Carefully designed, relatively normal
acoustics
– Reverberation chamber• Special experiments with diffuse
sound field
M. Karjalainen5
Psychophysical functions
• Sound event property to auditory event property mapping
h = a log(s) Weber, Weber-Fechner law
h = c sk (e.g., loudness)
M. Karjalainen6
Experimental concepts: Thresholds
• Threshold values– Absolute thresholds (e.g., threshold of hearing)
– Difference thresholds (just noticeable difference, JND)
Example: Threshold of perception:
- 50%, 75%, etc. thresholds
M. Karjalainen7
Experimental concepts
• Comparison of percepts– Magnitude estimation
– Magnitude production
• Probe tone method– Generation of a probe tone to make test tone
audible/noticeable
– Modulation, canceling, interference
• Classification and scaling of percepts– Nominal scale (rough, sharp, reverberant, …)
– Ordinal scale (percepts have ordering)
– Interval scale (numeric scale, no zero point defined)
– Ratio scale (numeric scale, zero point defined)
• Multidimensional scaling– Semantic differentials: low – high, dull – sharp, ...
M. Karjalainen8
Psychoacoustic experiments
• Description of auditory events– Oral or written description
• Method of adjustment– Adjusting a stimulus to correspont to a reference
• Selection methods– Forced choice methods (select one!):
• Two alternative forced choice (TAFC, 2AFC)
• Method of tracking– Tracking with varying stimulus
• Bekesy audiometry
• Bracketing method– Descending and ascending bracketing
• Yes/no answering
• Reaction time measurement– Indicates the difficulty of decision task
M. Karjalainen9
Békésy audiometry
• Slow frequency sweep and level tracking
M. Karjalainen10
Typical psychoacoustical test types
• AB test– Set in preference order / select one
– AB hidden reference (one must be recognized)
• AB scale test– As AB but assign numeric values for A and B
• ABC test– A is fixed reference (anchor point) for assigning
values for B and C
• ABX test– Which one, A or B, is equal to X ?
• TAFC (2AFC)– Two alternative forced choice
• Formation of a listening test panel
• Formation of a description language
M. Karjalainen11
Masking effect
• ”A loud sound makes a weaker sound imperceptible”
• Categories and aspects of masking– Frequency masking
– Temporal masking
– Time-frequency masking
– Frequency selectivity of the auditory system
– Psychophysical tuning curves
– Critical band• Bark bandwidth
• ERB bandwidth
• Masking tone and test tone
M. Karjalainen12
Frequency masking
• Masking by white noise
M. Karjalainen13
Frequency masking
• Masking by narrow-band noise (0.25, 1, 4 kHz)
M. Karjalainen14
Frequency masking
• Frequency masking as a function of masker level
M. Karjalainen15
Frequency masking
• Frequency masking by lowpass and highpass noise
M. Karjalainen16
Frequency masking
• Frequency masking by 1 kHz sinusoidal signal
M. Karjalainen17
Frequency masking
• Frequency masking by a complex tone(harmonic complex)
M. Karjalainen18
Temporal masking
• Masking before and after a noise signal
M. Karjalainen19
Temporal masking
• Beginning of postmasking
M. Karjalainen20
Temporal masking
• Postmasking as a function of time– For 200 ms long masker
– For 5 ms long masker
M. Karjalainen21
Time-frequency masking
• Masking of a tone burst in time and frequencyby a time-frequency block of noise
M. Karjalainen22
Temporal masking
• Masking due to an impulse train
M. Karjalainen23
Frequency selectivity of hearing
• Masking curves tell much about auditory selectivity
• Psychophysical tuning curves match with physiological curves
M. Karjalainen24
Critical band experiment
• Experiment: loudness vs. bandwidth of noise
M. Karjalainen25
Critical band
• Loudness vs. bandwidth of noise– Loudness increases when bandwidth exceeds
a critical band
M. Karjalainen26
Critical band (Bark band) vs. frequency
• Critical band (Bark band) fG vs. mid frequency
• Ref: just noticeable tone frequency change vs. frequency
M. Karjalainen27
Critical band: 24 Bark bands (Zwicker)
M. Karjalainen28
ERB band experiment
• ERB = Equivalent Rectangular Bandwidth
• Loudness of a tone is measured as a function of frequencygap in masking noise around the test tone
• ERB band is narrower than Bark band, especially at lowfrequences
M. Karjalainen29
Pitch scales
• Pitch = subjective measure of tone hight
• Mel scale
• Bark scale
• ERB scale
or
or
Inverse function:
Inverse :
M. Karjalainen30
Logarithmic pitch scale
• Logarithmic scale used in music and audio
• Frequency ratios more important than absolute frequencies
• Octave and ratios of small integers important
M. Karjalainen31
Comparison of pitch scales
• Pitch scales are related to place coding on the basilar
membrane, although they are measured by psychoacoustic
experiments
M. Karjalainen32
Comparison of pitch scales
• Comparison (log reference) of:– logarithmic scale– ERB scale– Bark scale– linear scale
M. Karjalainen33
Comparison of pitch scales
• Comparison (linear reference) of:– logarithmic scale– ERB scale– Bark scale– linear scale
M. Karjalainen34
Pitch
• Continues in file KA6b
M. Karjalainen1
Pitch phenomenaCont’d from file 6a
• Pitch of a pure tone as a function of amplitude– Individually varying property
M. Karjalainen2
JND of frequency modulation
• Frequency modulation JND threshold– As a function of carrier frequency– As a function of modulation frequency– About 4 Hz modulation most easily perceivable
M. Karjalainen3
Minumum duration of a tone for pitch percept
• Duration to make pitch perceivable– Duration in milliseconds– Duration of two cycles as a reference
M. Karjalainen4
JND pitch change vs. tone duration
• Threshold of perceived pich variation increases below200 ms duration
M. Karjalainen5
Pitch strength
• How strong or weak a pitch perception is?
M. Karjalainen6
Pitch phenomena and theories
• Place (spectral) pitch vs. temporal pitch theories
• Spectral pitch (due to spectral peak)
• Temporal pitch (periodicity)
• Missing fundamental
• Virtual pitch
• Repetition pitch
• Pitch of inharmonic signals
• Absolute pitch (memory)
M. Karjalainen7
Loudness
• Loudness is the perceived subjective ’strength’(’volume’, ’intensity, etc.) of a sound
– Subjective scale defined in relation to physical scale
– Unit is sone: 1 sone — 40 dB SPL at 1 kHz
M. Karjalainen8
Loudness of a sinusoidal tone
• Loudness N vs. SPL of a 1 kHz tone
– Power law found to mach best
Power law:
More precisely:
Loudness vs.
loudness level :
M. Karjalainen9
Partial loudness (by noise masking)
• Partial loudness of 1 kHz tone in presence of masking noise
– As a function of tone level and masking noise level
M. Karjalainen10
Loudness example: two tones
• Loudness of a pair of tones as a function of frequency difference
– Slow beat range: loudness due to peaks (6 dB over 60 dB)
– Medium rate fluctuation: power doubled => 3 dB increase
– Fast fluctuation: wideband signal => loudness doubled (10 dB)
M. Karjalainen11
Loudness computation (Zwicker formulation)
• Excitation signal => power spectral density on the Bark scale
• Spreading function B(z), such as
• Convolution by spreading function
• Loudness density
• Total loudness
M. Karjalainen12
Loudness computation, examples
• Left: excitation level for sinusoidal tone and white noise
• Right: loudness density for sinusoidal and white noise
M. Karjalainen13
Loudness graphically
• Graphical chart determination of loudness (Zwicker)
M. Karjalainen14
JND of loudness level
• Just noticeable difference by amplitude modulation
– Modulation of 1 kHz tone
– Modulation of white noise
– Modulation frequency 4 Hz
M. Karjalainen15
JND of loudness level
• Just noticeable difference by amplitude modulation
– As a function of modulation frequency
– Modulation of 1 kHz tone
– Modulation of white noise
M. Karjalainen16
Modulation detection
• Detection of amplitude and frequency modulation
– Amplitude modulation easily detectable by ’off-band listening’(loudness modulated due to upper spreading slope variation)
– No slope variation in frequency modulation
M. Karjalainen17
Loudness vs. duration
• Temporal integration of loudness for duration < 200 ms
– Loudness level decreases 10 phon for for 10-fold decrease induration
M. Karjalainen18
Loudness formation temporally
• Loudness formation for different durations of a tone burst
– Peak value of total loudness is tracked in time-varying cases
M. Karjalainen19
Timbre (perceived ’sound color’)
• Timbre is a multidimensional attribute of sound– For stationary sounds:
• Spectrum: (loudness spectrum)
• Periodicity (periodic, multiperiodic, noise-like)
• Repetitiveness (reflections, reverberation, spatialness)
– For time-varying signals
• Amplitude envelope important
– Amplitude envelope at each critical band
– For transients and onsets
• Changes are more prominent than steady-state parts,especially onsets
M. Karjalainen20
Subjective duration
• Subjective vs. objective duration
M. Karjalainen21
Auditory Demonstrations 1
1 Cancelled harmonics
2-6 Critical bands by masking
7 C.B. by loudness comparison
8-11 The decibel scale
12-16 Filtered noise
17-18 Frequency response of the ear
19-20 Loudness scaling
21 Temporal integration
22 Asymmetry of masking by pulsed tones
23-25 Backward and forward masking
26 Pulsation threshold
M. Karjalainen22
Auditory Demonstrations 2
27-28 Dependence of pich on intensity
29 Pitch salience and tone duration
30 Influence of masking noise on pitch
31 Octave matching
32 Streched and compressed scales
33 Frequency difference limen
34-35 Log and lin frequency scales
36 Pitch streaming
37 Virtual pitch (missing fundamental)
38-39 Shift of virtual pitch
40-42 Masking spectral and virtual pitch
M. Karjalainen23
Auditory Demonstrations 3
43-45 Virtual pitch with random harmonics
46-47 Strike note of chime
48 Analytic vs synthetic pitch
49-51 Scales with repetition pitch
52 Circularity in pitch judgment
53 Effect of spectrum on timbre
54-56 Effect of tone envelope on timbre
57 Change in timbre with transposition
58-61 Tones and tuning with streched partials
62-63 Primary and secondary beats
M. Karjalainen1
Chapter 7: Other psychoacoustic concepts
• Sharpness– Spectral center of gravity
• Fluctuation strength– Perception of slow modulations (beats)
• Impulsiveness• Roughness
– Perception of fast modulations
• Tonality– Periodic vs. random excitation
• Sensory pleasantness• Psychoacoustic concepts and music
– Sensory consonance and dissonance– Intervals, scales, and tunings– Rhythm, tempo, bar, measure
• Perceptual organization of sound
M. Karjalainen2
Sharpness
• Perceived sharpness is proportional to spectral center of gravity
• Unit of sharpness is 1 acum ~ for noise of 60 dB, 1 kHz, 1 Bark
• Sharpness for 1 Bark wide noise, lowpass noise, and highpass noise
• Increase of level from 30 dB to 90 dB doubles the sharpness
Bandpass noises:
M. Karjalainen3
Computation of sharpness
• Sharpness can be estimated (without level effect) from
where is defined by curve:
M. Karjalainen4
Fluctuation strength
• Perception of relatively slow modulations: fluctuation strength• Highest sensitivity to modulation at 4 Hz
• Unit of fluctuation strength is 1 vacil~ 4 Hz 100 % modulation of 1 kHz 60 dB tone
• Figure: (a) AM broadband noise, (b) AM sinusoidal tone, (c) FM sinusoidal tone
1 Hz
4 Hz
16 Hz
M. Karjalainen5
Fluctuation strength
• Left: fluctuation strength for AM (4 Hz) wideband noise (60 dB)
• Right: sine tone, 1.5 kHz, 70 dB, modulated at 4 Hz, as a functionof FM deviation
M. Karjalainen6
Fluctuation strength
• Fluctuation strength computation:
M. Karjalainen7
Impulsiveness
• There is no clearly defined psychoacoustic concept of impulsiveness
• Impulsiveness is related to rapid onsets in signal
• If the repetition rate of impulses is > 10–15 Hz, roughness is perceived
• In noise control, impulsiveness is considered to increase hearing
damage risk compared to non-impulsive sound of same energy
M. Karjalainen8
Roughness
• Fast (> 15 Hz) modulation is perceived as roughness
• Addition of two tones of different frequencies creates envelopefluctuation
• When the frequency difference increases, tones start to segregate
• When the frequency difference is larger than a critical band,roughness disappears
7 Hz
70 Hz
300 Hz
1 kHz+ f
M. Karjalainen9
Roughness
• Unit of roughness is 1 asper ~ 1 kHz tone, 60 dB, 100 % AM
modulated at 70 Hz.
• Towards lower and higher modulation frequences the roughness
decreases
M. Karjalainen10
Roughness
• Roughness for different carrier frequencies as a function of AM
modulation frequency with 100 % modulation.
7 Hz
70 Hz
300 Hz
1 kHz+ f
M. Karjalainen11
Tonality
• Tonality (tonalness) = sound exhibits voiced component(s), periodicity
• Non-tonal sound is noise-like, non-periodic
• Non-tonal (noisy) signal masks a tonal one more easily than vice versa
• For tonality index , critical band index i, the masking threshold is:
– ( = 0.0: non-tonal, = 0.5: half-tonal, = 1: fully tonal)
• Tonality with varying modal density, log. distribution of frequencies (approx/critical band):
80/CB40/CB20/CB10/CB
M. Karjalainen12
Sensory pleasantness
• Sensory pleasantness (example by Zwicker):
– P = sensory pleasantness– S = sharpness– R = roughness– T = tonality– N = loudness
– Product sound quality measures are often constructed bysimilar techniques.
M. Karjalainen13
Sensory consonance and dissonance
• Consonance and dissonance are closely related to roughness
• Consonance vs. dissonance of two partials:
M. Karjalainen14
Consonance and dissonance of harmonic tones
• Roughness due to interaction of partials in a sound contribute todissonance
• Rations of small integers are most consonant (just intonation)
• Consonance vs. dissonance of two harmonic complexes:
M. Karjalainen15
Examples of intervals
• Pythagoras noticed that intervals 2:1, 3:2, and 4:3 sound
”pleasant”
• Consonant intervals (decreasing order of consonance):
– 2:1 octave
– 3:2 perfect fifth
– 4:3 perfect fourth
– 5:3 major sixth
– 5:4 major third
– 8:5 minor sixth
– 6:5 minor third
– 16/15 (dissonant)
– 40/27 (dissonant)
1.4983 fifth
1.2599 third
Equally
tempered
intervals
M. Karjalainen16
Examples of intervals
• Log and lin uniformly spaced scales
• Which one is the best octave ?
• Stretched and compressed scales
Octave and its partitioning
Circularity of pitch
• Shepard effect
M. Karjalainen17
Intervals, scales, tuning
• Just intonation, Pythagorean scale, (equally) tempered scale
• On a tempered scale a semitone is 1:1.05946
• 1 cent is 1/100 of a semitone
M. Karjalainen18
Non-western scales and tunings
• The (tempered) western scale is adapted to a multitude of
harmonic timbres of western instruments
• For example the Balinese gamelan music is quite different
– W. A. Sethares: Tuning, Timbre, Spectrum, Scale. Springer 1998
• Example of tuning where octave is a very dissonant interval!
• Tunings and musical scales are strongly bound with spectral
properties of musical instruments
M. Karjalainen19
Temporal structures in music: Rhythm, tempo
• Rhythm: periodicity and repeated structure in music
• Tempo: rate of main events in music
• Beat: positioning of emphasis on some events
• Measure: basic rhythmic sequence
• Duration of a note or another basic unit
M. Karjalainen20
Perception of magnitude and phase spectrum
• Magnitude
– 1 dB deviation per critical band noticeable in direct comparison.
Even smaller deviations can be noticed by trained ”golden ears”
– Even ± 3...5 dB deviations are not easy to ”perceive” when there is
no immediate reference (except for well trained listeners)
– Magnitude response deviations = spectral coloration
• Phase and time differences
– The auditory system is relatively insensitive to phase (Helmholtz)
in general: magnitude spectrum more important than phase
spectrum, but sometimes phase is important
– Phase functions from Fourier analysis are circular and difficult to
analyze and interpret
– Group delay (phase derivative) is a relatively good perceptual
measure which describes the delay of modulation (not the carrier)
M. Karjalainen21
• Special phase effects:
– The following two signals have the same magnitude spectrum but
sound (as well as look) different
Perception of phase: extreme cases
This is how the response looks
like in a single critical band
M. Karjalainen22
Perceptual organization of sound
• Streaming (sequential grouping) of pitch sequences:
– Slow repetition: one stream perceived
– Fast repetition: segregation into two separate streams
A
B
C
D
E
FB
D
FB
D
F
A
CE
A
CE
(b)(a)
Time TimeTwo streamsOne stream
M. Karjalainen23
Perceptual organization of sound
• Streaming may change also the perceived rhythm:
– Large separation: B-D-F vs. A-C-E
– Small separation: B-D vs. A-C-E-F
A
CE
BD
F
BD
A
CE
F
Time Time
Lower streamUpper stream Upper stream Lower stream
M. Karjalainen24
Perceptual organization of sound
• Streaming with increasing tempo
increasingtempo orfrequencydifference
segregationof multiplestreams
TIMBRE/TEXTURE
time
M. Karjalainen25
Perceptual organization of sound
• Streaming or segregation as a function of frequency
difference and repetition period
0 50 100 150 200 250 300 400 500
20
15
10
5
0
20 10 5 3
Repetition period (msec)
alwayscoherenti
alwaysseparated
separatedor coherent
M. Karjalainen26
Auditory scene analysis
• Auditory scene analysis
– Bregman: Auditory scene analysis (MIT Press, 1990)
• Sequential integration and segregation
– Spectral vs. temporal relations
– Spatial cues in segregation
• Integration and segregation of simultaneous auditory components
– Spectral vs. temporal relations
– The ”old-plus-new” heuristics
– Spatial cues in segregation
• Primitive auditory organization
– Built-in and low-level mechanisms
• Schema-based auditory organization
– Learning of stream integration and segregation
M. Karjalainen27
Computational auditory scene analysis (CASA)
• Computational auditory scene analysis (CASA) is an attempt to
computationally simulate and model human auditory scene analysis
– Sound source segregation (separation)
– Multipitch signal analysis of harmonic sound mixtures
– Bottom-up vs. top-down driven processing
– Prediction-driven processing
– Spatial source separation (coctail-party effect)
– Applications:
• Audio content analysis and content-based coding
• Automatic music transcription
• Speech recognition
Tilakuuleminen
Ville PulkkiAkustiikan ja aanenkasittelytekniikan laboratorio
Teknillinen korkeakouluEspoo, Suomi
http://www.acoustics.hut.fi/
Ville [email protected]
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Aani tilassa
Ville Pulkki ([email protected]) sivu 3
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
TilakuuleminenSuuntakuulo
• Suuntakuulon tarkkuus
• Suuntakuulon teoria
Etaisyyskuulo
Tilan havaitseminen
Tilaanentoisto
Ville Pulkki ([email protected]) sivu 4
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Siirtofunktio aanilahteesta korvakaytavaan
Head Related Impulse Response (HRIR)Head Related Transfer Function (HRTF)
c©Duda: http://interface.cipic.ucdavis.edu/CIL tutorial/
Ville Pulkki ([email protected]) sivu 5
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
HRTF:ien mittaaminen
c©Algazi et al.: http://interface.cipic.ucdavis.edu/
Ville Pulkki ([email protected]) sivu 6
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
HRTF:n riippuvuus aanilahteen suunnasta
0 1 2
-0.2
-0.1
0
0.1
0.2
0 1 2
-0.2
-0.1
0
0.1
0.2
0 1 2
-0.2
-0.1
0
0.1
0.2
0 1 2
-0.2
-0.1
0
0.1
0.2
0 1 2 msmsms
msmsms
-0.2
-0.1
0
0.1
0.2
0 1 2
-0.2
-0.1
0
0.1
0.2vasenϕ = 0δ = 0
vasenϕ = 60δ = 0
vasenϕ = 0δ = 60
oikeaϕ = 60δ = 0
oikeaϕ = 0δ = 60
oikeaϕ = 0δ = 0
a) b) c)
c©M. Karjalainen
Ville Pulkki ([email protected]) sivu 7
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
HRTF:n riippuvuus aanilahteen vaakakulmasta
c©Algazi et al.: http://interface.cipic.ucdavis.edu/
Ville Pulkki ([email protected]) sivu 8
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
HRTF:n riippuvuus aanilahteen pystykulmasta
c©Algazi et al.: http://interface.cipic.ucdavis.edu/
Ville Pulkki ([email protected]) sivu 9
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
HRTF:n riippuvuus aanilahteen suunnasta
102 104
-40
-30
-20
-10
0dB
102 103 103 103
103 103 103
104
-40
-30
-20
-10
0dB
10 2 10 4
-40
-30
-20
-10
0
102 104
-40
-30
-20
-10
0
10 2 10 4
-40
-30
-20
-10
0
10 2 10 4
-40
-30
-20
-10
0
vasenϕ = 0δ = 0
vasenϕ = 60δ = 0
vasenϕ = 0δ = 60
oikeaϕ = 0δ = 0
oikeaϕ = 0δ = 60
oikeaϕ = 60δ = 0
a) b) c)Hz Hz
HzHzHz
Hz
c©M. Karjalainen
Ville Pulkki ([email protected]) sivu 10
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Suuntakuulon tarkkuus horisontaalitasossa
179,3°180°
±5,5°
281,6°±10°
359°±3,6°
80,7°±9,2°
0°
90°
Kuulotapahtuman suunta
Äänitapahtuman suunta270°
ϕ
c©M. Karjalainen
Ville Pulkki ([email protected]) sivu 11
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Suuntakuulon tarkkuus mediaanitasossa
Äänitapahtumansuunta
Kuulotapah-tuman suunta
δ = 0ο
δ = 36ο+68ο+74ο
±22ο±13ο
±9ο
+27ο±15ο
+30ο±10ο
δ = 36ο
δ = 90ο
δ = 0οϕ = 180οϕ = 0ο
0ο
c©M. Karjalainen
Ville Pulkki ([email protected]) sivu 12
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Lateralisaatiokokeet
τ ph1 τ ph2 a 1 a2
a) b)
vaimentimet
signaalisignaali
viivepiirit
c©M. Karjalainen
Ville Pulkki ([email protected]) sivu 13
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Lateralisaatiokokeet, aikaviive
hava
ittu
late
raal
isija
inti
oike
ava
sen
6
4
2
0
2
4
6 vasen aiemmin vasen myöh.
-15000 -1000 -500 0 500 1000 15000
korvien välinen vaiheviive τph / μs
c©M. Karjalainen
Ville Pulkki ([email protected]) sivu 14
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Lateralisaatiokokeet, ominaisuuksia
Hyvat puolet:
• Voidaan vapaasti tuottaa mika tahansa ITD-ILD yhdistelma
• Perustulokset
Ongelmat:
• Epaluonnollisuus
• Paan sisalle lokalisointi
• Korkeiden taajuuksien toisto erilainen eri kuuntelukerroilla
Ville Pulkki ([email protected]) sivu 15
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Suuntakuulo
Vihjeet:
• Binauraaliset vihjeet
– Korvienvalinen aikaero
– Korvienvalinen voimakkuusero
• Monauraalinen spektri
• Paan kaantelyn vaikutus binauraalisiin vihjeisiin
• Heijastusten suppressio
Ville Pulkki ([email protected]) sivu 16
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Binauraaliset vihjeet
• Interaural Time Difference, korvienvalinen aikaero
• ITD
Ville Pulkki ([email protected]) sivu 17
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
ITD:n taajuusriippuvuus
vasen
oikea
ITD ITD
kantoaallon aikaviive
korkeat taajuudet > ~1600 Hz
verhokayran aikaviive
matalat taajuudet ~200 − ~1600 Hz
Ville Pulkki ([email protected]) sivu 18
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
ITD:n mallinnus
τ τ
τ
τ
τ
τ
τ
τ
τ
τ
oikeastakorvastavasemmasta korvasta
oikeakeskivasen
c©M. Karjalainen
Ville Pulkki ([email protected]) sivu 19
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
ITD:n mallinnus
ITD
IACC
IACC
Composite
IACC
IACCspectrum
GTFB
GTFB
filteringlow pass
rectificationhalf wave
Ville Pulkki ([email protected]) sivu 20
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Ristikorrelaatio ERB-kanavilla
−1
−0.5
0
0.5
1
5
10
15
20
25
30
0
0.2
0.4
0.6
0.8
121 kHz
10 kHz
90°
3 kHz
60°40°
1.5 kHz
20°
800 Hz
0°
200 Hz
20°40°60°
Band cross correlation functions
90°
Ville Pulkki ([email protected]) sivu 21
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
ITD:n taajuusriippuvuus
0.20.4
0.71.1
1.72.6
3.95.7
8.512.4
18.2 9060
300
−30−60
−90
−1
−0.5
0
0.5
1
x 10−3
Direction [degree]Frequency [kHz]
ITD
[ms]
Ville Pulkki ([email protected]) sivu 22
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Binauraaliset vihjeet
dB dB
• Interaural Level Difference, korvienvalinen voimakkuusero
• ILD
Ville Pulkki ([email protected]) sivu 23
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
ILD:n mallinnus
ILD
levelloudness
spectrum
spectrum
Composite
CLL
GFTB
GFTB
CLL
ILD
CLL
ILD
ILD
LL
LL
LL
LL
LL
LL
Ville Pulkki ([email protected]) sivu 24
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
ILD:n taajuusriippuvuus
0.20.4
0.71.1
1.72.6
3.95.7
8.512.4
18.290
6030
0−30
−60−90
−60
−40
−20
0
20
40
60
Direction [degree]Frequency [kHz]
ILD
[pho
n]
Ville Pulkki ([email protected]) sivu 25
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Sekaannuskartio
sekaannuskartio
ääni- lähde
ccφ
ccθ
• ITD ja ILD ratkaisevat missa sekaannuskartiossa aanilahde on
– korvalehden ja kehon vaikutus
– paan kaantely
Ville Pulkki ([email protected]) sivu 26
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Paan kaantelyn vaikutus binauraalisiin vihjeisiin
ITD & ILDmuuttuvat paljon
ITD & ILD vakio
paan pyoritys
ITD & ILD
vastakkaiseen suuntaanmuuttuvat paljon
- karkea vihje
Ville Pulkki ([email protected]) sivu 27
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Kehon vaikutus
Korvalehti, paa, keho
Spektri muuttuu, ILD muuttuu
Ville Pulkki ([email protected]) sivu 28
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Korvalehden vaikutus
• Korvalehden onkalot varittavat aanta saapumissuunnasta riippuen
Ville Pulkki ([email protected]) sivu 29
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Elevaation vaikutus spektriin
90
60
30
15
0
−15
−300.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 18.2
−30
−20
−10
0
10
20
30
Frequency [kHz]v [degr]
Loud
ness
leve
l spe
ctru
m [p
hon]
90
60
30
15
0
−15
−300.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 1
−30
−20
−10
0
10
20
30
Frequency [kHz]Elev [degr]
Loud
ness
leve
l spe
ctru
m [p
hon]
1 2
Auditorinen spektri mediaanitasossa, suunnasta riippumaton osuus keskivoistettu pois.
Ville Pulkki ([email protected]) sivu 30
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Elevaation vaikutus spektriin
90
60
30
15
0
−15
−300.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 18.2
−30
−20
−10
0
10
20
30
Frequency [kHz]v [degr]
Loud
ness
leve
l spe
ctru
m [p
hon]
90
60
30
15
0
−15
−300.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 1
−30
−20
−10
0
10
20
30
Frequency [kHz]Elev [degr]
Loud
ness
leve
l spe
ctru
m [p
hon]
3 4
Auditorinen spektri mediaanitasossa, suunnasta riippumaton osuus keskivoistettu pois.
Ville Pulkki ([email protected]) sivu 31
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Vihjeiden luotettavuusJos vihjeet ovat ristiriitaisia:
• Signaalin spektri < ˜ 1000 Hz
– ITD yleensa vahvin
– ILD heikko, trading?
• Korkeammat taajuudet
– ITD ja ILD kumpikin vahvoja
– ILD voimakkaampi joskus
• Johdonmukaisempi vihje voittaa [Wightman]
• Voi syntya useita havaintoja suunnasta
• Aanilahteen koko
• Individuaalisuus
Ville Pulkki ([email protected]) sivu 32
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Suuntakuulon fysiologia
c©Kalat 1998
Ville Pulkki ([email protected]) sivu 33
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Presedenssiefekti
Vihjeet relevantteja vain silloin kun suora aani dominoi
Ville Pulkki ([email protected]) sivu 34
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Presedenssiefekti
ϕ = 40o
ϕ =-40o
ϕ = 0o
ϕ
So
ST
α=80o
0 1 2ms 20 30 40 50ms
kaiku
kaik
ukyn
nys
ST:n viive τph
ensimm. kuulotapahtuma
Ville Pulkki ([email protected]) sivu 35
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Kaikujen havaitsemiskynnykset
ST:n viive
taso
ero
LS
T -
LS
O40dB
20
0
-20
-40
0 20 40 60 80 100 ms
ensimmäinen äänitapahtuma ei enää erotettavissa(ensiääni estetty) (≥ 6 henkeä)
ensimmäinen äänitapahtuma ja kaikuyhtä äänekkäät (≥ 6 henkeä)
kaiku häiritsevä (80 henkilöä)
peittokynnys(1-2 henkilöä)
Ville Pulkki ([email protected]) sivu 36
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Aanekkyyden vaikutus vapassa kentassa
0 0 2 4 6 8 10
2
4
6
8
kuul
otap
ahtu
man
etä
isyy
s / m
äänilähteen etäisyys / m
viiden henk. keskiarvo
c©M. Karjalainen
Ville Pulkki ([email protected]) sivu 37
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Etaisyyden havaitseminen
Vihjeet
• Aanekkyys
• Binauraaliset vihjeet
• Suoran aanen suhde kaiuntakenttaan
• Spektri
Ville Pulkki ([email protected]) sivu 38
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Tilaaanen toistometodit
• Perinteinen toisto
– Monofonia
– Stereofonia
– Monikanava 2-D
– Monikanava 3-D
• Binauraalinen toisto
– Kuulokkeet
– Kaiuttimet, ristiinkuulumisen esto
Ville Pulkki ([email protected]) sivu 39
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Monofoninen toisto
Ville Pulkki ([email protected]) sivu 40
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Stereofoninen toisto
Ville Pulkki ([email protected]) sivu 41
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
“Surround” toisto
Ville Pulkki ([email protected]) sivu 42
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
3-D monikanavatoisto
Ville Pulkki ([email protected]) sivu 43
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Binauraalinen toisto
H l Hr H i H i
Hc Hc
ˆ y l ˆ y rxm
y l y lyr yr
c©M. Karjalainen
Yksinkertaisimmillaan kuunnellaan keino- tai tosipaa-aanitysta kuulokkeilla.
Ville Pulkki ([email protected]) sivu 44
TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002
Binauraalinen toisto
1Hi + Hc
1Hi − Hc− −
yl
yr − −
yl
yr
binau-raalinen
transau-raalinen
Hi + Hc
Hi − Hc
binauraalinen
(a)
(b)
(c)
(d)
−
stereo
yl
yr
mono
Hl + Hr
Hi + Hc
Hl − Hr
Hi − Hc
Hl
Hr −
x l
xr
ˆ y l ˆ y l
ˆ y l
ˆ y r ˆ y r
ˆ y r
xm
transau-raalinen
binau-raalinen
transau-raalinen
c©M. Karjalainen
Yksinkertaisimmillaan kuunnellaan keino- tai tosipaa-aanitysta kuulokkeilla.
Ville Pulkki ([email protected]) sivu 45
M. Karjalainen1
Chapter 9: Auditory modeling
• Simple psychoacoustic models– Psychoacoustic spectrum and spectrogram– Mel spectrum and cepstrum– Perceptual linear prediction– Examples of auditory spectra
• Auditory filter bank models– Gammatone filterbanks– Inner ear simulation models– Temporal dynamics and masking
• Cochlear models– Basilar membrane models– Hair cell models
• Modeling of higher level functions– Pitch and periodicity analysis– Speech specific models– Computational auditory scene analysis
• Binaural auditory modeling
M. Karjalainen2
Simple psychoacoustic modeling
• Problems with Fourier spectrum from auditoryperception viewpoint:– Linear frequency scale vs. critical band scale
– Level (dB) vs. loudness scaling
– Frequency bins vs. spreading and masking
– Flat response vs. equal loudness sensitivity
– Windowing vs. temporal integration and masking
– Temporal adaptation in auditory perception
M. Karjalainen3
Auditory spectrum through FFT
M. Karjalainen4
Examples of psychoacoustic spectra
• Auditory spectra– Sinewave (400 Hz)
– White noise
M. Karjalainen5
Examples of psychoacoustic spectra
• Vowel /a/ and fricative /s/– Fourier spectrum vs. auditory spectrum
M. Karjalainen6
Mel frequency cepstral coefficients
• MFCC computation– FFT, mel warping, logarithm, inverse cosine transform
M. Karjalainen7
Filterbank auditory models
• General principle of an auditory filterbank model
M. Karjalainen8
Response of a filterbank model (Bark-bank)
• Simple Bark-filterbank by warped filters (Karjalainen)
M. Karjalainen9
Gammatone filterbank
• Temporal and magnitude response of one channel
• Filterbank
M. Karjalainen10
Neural adaptation
• Neural adaptation model by Dau et al– Automatic gain control feedbacks
M. Karjalainen11
Temporal processing
• Adaptation, temporal integration, and masking model (Karjalainen)– Neural feedback model
– Adaptation (AGC)
– Loudness (level) computation
– Teporal masking effect
M. Karjalainen12
Responses
• Excitation, firing rate response, and loudness level response
M. Karjalainen13
Basilar membrane traveling wave model
• Principle of approximating basilar membrane traveling wave propagation
M. Karjalainen14
Meddis hair cell model
• Processing of neurotransmitter in the hair cell
M. Karjalainen15
Periodicity analysis (Meddis)
• Computation of sum autocorrelation function (SACF)
M. Karjalainen16
Periodicity analysis example
• Signal, filterbank responses, cochlegrams, sum autocorrelation for speech
M. Karjalainen17
Auditory spectrum vs. auditory formant spectrum
• Example of vowel /ä/ and fricative /s/
M. Karjalainen18
Auditory representation of speech
• Example of vowel transitions /...iaiai.../– Auditory spectrogram
– Auditory formant spectrogram
M. Karjalainen19
Applications of auditory modeling
• Audio coding– Psychoacoustic or perceptual models of masking
• Sound quality modeling– Modeling of perceived differences
– Criteria for audio reproduction
– Binaural audio quality
• Speech recognition– Advanced front-end models
• Advanced hearing aids– Cochlear implants
M. Karjalainen1
Chapter 10: Sound quality
• Effects of sound:– Physical effects (generally meaningless)– Physiological effects (hearing loss)– Information and knowledge effects (communication)– Esthetic and emotional effects (communication)
• Concept of quality in general:– Quality as contrast to quantity (categorical
dissimilarity)– Quality on scale low-Q vs. high-Q (measure of
preference)
• Speech intelligibility and quality• Sound quality of concert halls and auditoria• Sound quality in audio reproduction• Noise quality• Product sound quality
M. Karjalainen2
Evaluation and measurement of sound quality
• Sound quality is a fundamentally subjective (perceptual) conceptbut it can be approximated by objective and computational criteria
• Subjective quality can be evaluated by listening experiments, forexample:– Compare to ’perfect quality’ reference to find out if any degradation
can be noticed
– Compare two or more sounds and sort then by quality preference
– Characterize sound quality by conceptual description (such as notannoying, slightly annoying, annoying, very annoying)
– Give an overall quality rating on a numerical scale
– Give a rating for a specific quality factor (numerical scale)
– Give quality ratings for several different quality factors(multidimensional scaling)
• Based on subjective experimentation, a computational (objective)measure and model can be derived to simulate the perceived quality– Objective measures are less laborious and yield high repeatability
– It is important to check the validity range of a model
M. Karjalainen3
Development of sound quality models and theories
Theories and models in general
Computational models
Computational models with reference
M. Karjalainen4
Intelligibility and quality of speech
• Intelligibility of speech in general depends on:– the ability of a speaker to produce intelligible message and clear speech
– quality of speech transmission medium (acoustic or technical)
– the ability of a listener to analyze and conceive the message
• Technical concept of speech intelligibility:– related to the quality of transmission channel
– developed since 1920’s (Harvey Fletcher, Bell Labs)
• Articulation– score of correct recognition of phones and (nonsense) phone sequences
– articulation index is a measure that is additive from frequency bands(like loudness adds from critical band specific loudnesses)
• Speaker identification score– quality of channel to convey speaker identity
• Naturalness of speech– particularly in speech synthesis (and coding)
M. Karjalainen5
Speech quality: subjective measures and methods
• Articulation tests and articulation score– /CV/ or /CVC/ sequences used to measure recognition percentage
• Intelligibility test and intelligibility score– recognition percentage using meaningful words or sentences
• Rhyme tests (RT)– using ’rhyme’ words or syllables (in Finnish: /patti/, /tatti/, /katti/)
• Diagnostic rhyme tests (DRT)– modifying single distinctive feature at a time (nasality, voicing, etc.) in RT
• Speech interference tests (find a disturbing noise level of 50% articulation)
• Quality comparison method, including pairwise comparison methods– ordering of sound examples by overall or specific quality factor
• Mean opinion score (MOS)– overall rating on 1–5 scale
• Other methods– Indirect judgement tests (PARM, QUART)
– Communicability tests (communicate a drawing task, measure the difficulty)
– Task recall tests (memorizing ability)
– Analytic measures (multidimensional scaling)
M. Karjalainen6
Speech quality: objective measures and methods
• Articulation index (AI)– for measuring a (linear) speech transmission channel with additive noise
– articulation loss is assumed to be additive from 20 frequency band AIvalues
• Percentage articulation loss of consonants (%ALcons)– measure of speech intelligibility, can be estimated from acoustic
properties of a room
• Room acoustical indices, see below
• Speech transmission index (STI, RASTI)– based on modulation transfer function, see below
• Signal-to-noise ratio (SNR)– ratio of speech vs. noise (power) level (in dB)
– segmental SNR (SNRseg) based on short-time segmental SNRs
• Spectral distance measures (distance measures in the frequency domain)
• Auditory sound quality measures (based on auditory modeling)
• Other methods– weighted spectral slope distance
– LPC (linear prediction) distance measure
M. Karjalainen7
MOS (mean opinion score)
• A very popular technique to quantify overall quality in speechand audio
• Combines a quantitative scale and qualitative categorizations
• Three sorts of MOS measures used:– MOS = (direct) evaluation on 1–5 scale
– DMOS = degradation MOS (how much signal is degraded)
– CMOS = comparative MOS (typically scale -3...+3)
• Sometimes a scale of 1–10 by step of 0.1 is used instead
• Basic MOS scaling:
Very disturbingBad1
Disturbing but tolerablePoor2
Noticeable, slightly disturbingFair3
Just noticeable, not disturbingGood4
Not noticeableExcellent5
Degradation (DMOS)Quality (MOS)Rating
M. Karjalainen8
Modulation transfer function
• The auditory system analyzes signals by critical bands
• Each band is analyzed by signal level, i.e., modulationenvelope
• More important than the exact transfer function ismodulation transfer function, i.e., how signal modulations ineach critical band are transmitted
• The auditory system is most sensitive to modulations ofabout 4 Hz
• Modulation transfer is degraded by:
– Reverberation (lowpass of modulation)
– Background noise (reduction of relative modulation)
– These effects are multiplicative (cascaded)
• Modulation transfer function is a mathematically motivatedapproximation of auditorily relevant signal transfer analysis
M. Karjalainen9
Modulation transfer function (2)
M. Karjalainen10
Modulation transfer function (3)
M. Karjalainen11
Modulation transfer function (4): STI
• Total effect on modulation transfer function
• Apparent SNRapp vs. modulation reduction
• Speech transmission index (STI), for each band:
– STI = 1.0 for SNRapp 15 dB
– STI = 0.0 for SNRapp -15 dB
– otherwise STI = m, see also next figure
– (Weighted) average of SNRapp values of bands is computedand converted to total STI
M. Karjalainen12
Modulation transfer function (5)
M. Karjalainen13
STI vs. speech intelligibility
M. Karjalainen14
RASTI vs. STI
• RASTI = Rapid STI
• Partial evaluation offrequency bands &modulation bandsused
• Specific RASTIinstrument availablefor speech acousticsevaluation
M. Karjalainen15
Percentage articulation loss of consonants (%ALcons)
• Estimate of speech intelligibility
• %Alcons can be estimated
• where– r = distance of source and listener– RT = reverberation time– V = room volume– Q = directivity of a sound source– k = constant (for individual listener) = 1.5 ... 12.5 %
• %Alcons can also be estimated from roommeasurements
• %Alcons up to 25...30% can be tolerated inmeaningful speech due to informationredundancy
M. Karjalainen16
Sound quality in concert halls (and performing spaces)
• Esthetic effects very important– communication by esthetic and emotional factors
• ’Good acoustics’ depends on type of music– for example tempo, mixture of instruments (size of orchestra)
• Many factors to be taken into account– multidimensional scaling of quality needed
• Different proposed theories and models exist– no full agreement upon indices and factors of quality
• Visual factors also very prominent in concert halls– a concert is a multimodal experience to most listeners
• It is not only the audience but also the musicians– stage acoustics is important as well
• Theaters and other performing spaces– may require different acoustics
• Active (electroacoustically created or enhanced) acoustics– used increasingly except for classical acoustic music
M. Karjalainen17
Sound quality in concert halls: (1) subjective indices
• Intimacy or presence• Reverberation (subjective)• Spaciousness (apparent source width, listener envelopment)• Clarity (separation of sounds and sources)• Warmth (level and reverberation at low frequencies)• Loudness• Acoustic glare (walls should not reflect like mirrors)• Brilliance (due to long reverberation at high frequences)• Balance (how sound sources (instruments) are balanced)• Blend (how instruments are mixed harmonically)• Ensemble (how musicians can play together)• Immediacy of response (from the hall back to musicians)• Texture (how early reflections arrive to listeners)• Freedom from echo (discrete echoes are highly undesirable)• Dynamic range (useful range of playing levels)• Extraneous effects on tonal quality (no extra sounds desired)• Uniformity of sound (quality should be equal in all positions)
M. Karjalainen18
Sound quality in concert halls: (2) objective measures
• Loudness– Gmid (sound level at mid frequencies)
• Reverberation time– RT60 (decay time of 60 dB for full hall)– EDT (early decay time, 0–10 dB scaled to correspond to 60 dB)
• Clarity– Early vs. late energy ratio C80 (empty hall)
• Spaciousness– IACCearly (interaural cross-correlation, early)– LFearly (lateral energy fraction, early)
• Envelopment– IACClate and visual inspection of surface irregularity
• Intimacy– ITDG (initial time delay)
• Warmth– BR (base ratio, full hall)
• Stage support– Early energy (20-100 ms), sound source on the stage 1m from the
microphone
M. Karjalainen19
Objective sound quality in concert halls: definitions
• Interaural cross-correlation function IACFt( )
from pressure signals of left and right ears• Interaural cross-correlation, max of IACFt( )• Lateral energy fraction (LF or LEF)
• Gain factor (level vs 10 m free field distance level)
• Base ratio
• Stage support
M. Karjalainen20
Early vs. late ratios
Clearness
Centertime
M. Karjalainen21
Audio sound quality
• HiFi (High Fidelity) vs. professional reproduction
• Good quality is defined indirectly by loss ofdegradations
• Degradations & distortions:
– Linear distortion
– Nonlinear distortion
– Transient distortion
– Noise & quantization noise (SNR)
– Spatially poor reproduction
M. Karjalainen22
• Phase in audio reproduction
– Group delay differences of about 1 ms are noticed in extreme cases
• In high-Q-value cases even much lower differences
– Group delay differences of about 2 ms become noticeable in critical
listening (about 60 cm of propagation distance difference)
– 5-10 ms group delay differences may start to be disturbing
– Even 50-100 ms group delay errors may be tolerable sometimes
– In spatial sound perception (Chapter 8): precedence effect
• Perception of distortion
– Linear distortion = magnitude and phase distortion
– Nonlinear distortion = new spectral components are produced
Perception of audio reproduction
M. Karjalainen23
• Nonlinear distortion
– In a nonlinear system a sine wave generates harmonics:
– If total rms level is:
– Then harmonic distortion (HM):
– HM is not a particularly good measure from a perceptual point of view
– Low-order HM may improve perceived quality
– JND: 1% for 2nd, 0.3% for 3rd, 0.1-0.3% for 4th harmonic
Nonlinear distortion
M. Karjalainen24
• Other distortion mechanisms:
– Intermodulation distortion (IM)
• Sine waves, of f1 and f2 generate f1 – f2 , f1 + f1 etc.
• IM describes perceived distortion better than HM
– Transient intermodulation distortion (TIM)
• Distortion that is created in fast transients but not in steady
state signals
– Quantization noise in digital signal processing
• Perceived as distortion if correlation with signal
• Perceived as noise if not correlated
– Pre-echo in audio coding
• Temporal spreading of a signal in time ”backwards”
– Perceptual criteria needed in digital audio instead of simpledistortion and SNR measures
Audio distortion mechanisms
M. Karjalainen25
Perceptual (objective) sound quality models for audio
• Schroederet al.:
• Karjalainen:
M. Karjalainen26
PAQM (perceptual audio quality measure)
M. Karjalainen27
Product sound quality
• Minimize negative effects and maximize positiveeffects of product sound
• Examples:– Cars and work machines
– Home appliances
– Office equipment
– Personal devices
• Computational models of product sound quality
M. Karjalainen1
Chapter 11: Technical audiology
• How do we hear ? (discussed already)
• What if we don’t hear ?– Why don’t we hear? (mechanisms)
– How to measure ? (audiometry)
– How to improve hearing? (hearing aids)
• Technical devices:– Audiometric equipment
– Hearing aids
– Cochlear implants
M. Karjalainen2
Hearing degradation I
• Hearing disabled population– WHO: 270 million hearing disabled in the world (5 %)
– In Finland: ~740 000 with hearing degradation
14 000 new hearing device fittings per year
• Categories of handicap– Disease (sairaus)
– Impairment (vaurio)
– Disability (toimintavajavuus)
– Handicap (haitta)
• Hearing disorders: social classification– Hard-of-hearing persons (huonokuuloinen)
– Deafened persons (kuuroutunut)
– Deaf persons (kuuro)
M. Karjalainen3
Hearing degradation II
• Medical classification of hearing impairments– Conductive hearing loss (äänen johtumisvika)
• External and middle ear problems
• Attenuation of loudness
– Sensorineural hearing loss• Inner ear and retrocochlear problems
• Attenuation or recruitment
• Tinnitus
– Central hearing loss• Higher neural levels
• Problems in sound separation or speech analysis
• Problems in localization (spatial separation)
• Tinnitus
– Psychic hearing problems• No clear physiological reason
M. Karjalainen4
Hearing threshold change
M. Karjalainen5
Audiometry
Audiometer and calibrated headphones
M. Karjalainen6
Audiogram behavior
Loud noise effect
(impulse noise)
Effect of age
(presbyacusis)
M. Karjalainen7
Degrees of hearing impairment
• Measure of hearing degradation– Average of threshold values at
500, 1000, 2000, 4000 Hz
M. Karjalainen8
Other hearing impairment problems
• Other effects of impairment– Sound separation problems, particularly in
noise and reverberation
– Speech communication problems
– Tinnitus• Source at different levels
• No good treatment known
• Often like sinusoidal tone,
but can be hum, broadband noise,
pulsation, etc.
M. Karjalainen9
Ear drum impedance measurement
M. Karjalainen10
Noise and causes of hearing loss
• Noise measurement– A-weighted equivalent level
– 85 dB long-term daily exposure limit
• Other factors:– Vibration
– Smoking
– Drugs
– Deseases
– Genetic effects
– Combined = often more than their sum
M. Karjalainen11
Inner ear damage
Inner hair cell
damage
Outer hair cell
partial damage
Outer hair cell
full damage
M. Karjalainen12
Temporary threshold shift
M. Karjalainen13
Hearing protectors
Ear plugs Ear muffsAttenuation
M. Karjalainen14
Hearing aid types
M. Karjalainen15
Hearing aid response
Typical frequency response
of a traditional hearing aid
Multichannel digital hearing aids:
- each frequency channel programmed separately
M. Karjalainen16
Hearing aid gain control
Linear gain + limiter Automatic gain control
M. Karjalainen17
Hearing aid AGC control
Feedback control Feedforward control
M. Karjalainen18
Hearing aid output waveforms
M. Karjalainen19
Other issues in hearing aids
• Directional microphones
• Binaural processing
• Noise cancellation
• Wind noise cancellation
• Feedback cancellation
• Speech enhancement
M. Karjalainen20
Cochlear implants
• Electronic stimulation of auditory nerve
M. Karjalainen21
Cochlear implants II
• ~100 000 units fitted worldwide
• For deafened adults and deaf-born children
• Price about 50 000 $ in USA
• Multielectrode devices nowadays– (e.g. 24 channels)
– Speech from microphone is divided to channels
– Inductive coupling through skin
– Multielectrode in the cochlea
– Different pulse modulations used