Communication Acoustics Karjalainen

M. Karjalainen1

• General introduction

• Communication by sound and voice

– Examples of communication situations

• Systems approach to communication

• Modeling and theory formation in research

Chapter 1: Introduction

M. Karjalainen2

Information Transmission by Sound

Environmental orientation by sound

M. Karjalainen3

Communication by Speech

Speech communication via acoustic medium

M. Karjalainen4

Communication by Music

Music via acoustic medium

M. Karjalainen5

Communication by Music

Origins of speech and music ?

Speech has been important in evolution by what about music?

Role of music: just a side product or important factor?

- Charles Darvin: Important for mating etc.

Two interesting recent books:

Steven Mithen: “The Singing Neanderthals ---

The Origins of Music, Language, Mind, and Body”

Harward University Press, 2006

Daniel J. Levitin: This is Your Brain on Music ---

The Science of a Human Obsession, PLUME 2006

M. Karjalainen6

Speech Transmission

Speech communication electronic medium

M. Karjalainen7

Virtual Acoustic Reality

Virtual instrument in virtual space

M. Karjalainen8

Man-Machine Communication by Speech

Speech synthesis and recognition

M. Karjalainen9

A Black-Box Approach

Input-output relationship

M. Karjalainen10

A Systems Approach

A multi-level system

M. Karjalainen11

Systemic Concepts

• Element (part of a whole, entity)

• Relation / property• Structure (relatively permanent properties of a system)

• Function(ality) (relatively variant properties of a system)

• Event (a relatively discrete change, typically in time)

• State• Object• Type (class)• System• Control• Process• Organization• Hierarchy / heterarchy• Data / information / knowledge (communication, language)

M. Karjalainen12

Abstraction in Modeling and Theory Formation

Abstraction hierarchy

M. Karjalainen13

Communication by Sound and Voice

hardware

software

functionware

contentware

Physics

Cognition

Signals

Information

Analysis Synthesis

M. Karjalainen1

Chapter 2: Acoustics

This is background information that is not asked directlyin the exam, but knowing it certainly helps, especially if you need to apply your knowledge in practice.

M. Karjalainen2

Chapter 2: Acoustics

Sound as physical phenomenon

When a tree in a forrest falls, and there isno one to listen, does it make a sound?

• Vibration – generation of sound

• Sound radiation

• Sound propagation

• Reflection, absorption,

• Diffraction, refraction

• Standing waves

• Resonance, resonators

M. Karjalainen3

Vibrating systems

• Simple vibration: mass–spring system

M. Karjalainen4

Vibrating systems

Undamped and damped oscillation

M. Karjalainen5

Resonance

Mass-springresonator

Helmholtz-resonator

M. Karjalainen6

Two-mass vibrating system

Transversal and longitudinal vibration of a two-mass system

M. Karjalainen7

Vibration modes of a string

M. Karjalainen8

Wave propagation

Wave equation:

D’Alembert:

M. Karjalainen9

Sound pressure, sound pressure level, decibel

Sound pressure: p [Pa]

Sound pressure level:

Reference:

M. Karjalainen10

Wave phenomena: spherical wave

Sound velocity in the air:

Spherical wave:

M. Karjalainen11

Wave phenomena: planar wave

Planar wave in a tube:

Reflection (and transmission):

M. Karjalainen12

Lowest resonance modes in a tube

Open ends One end closed

M. Karjalainen13

Spectral content of string vibration

M. Karjalainen14

Bar and membrane modes

Bar

Membrane

M. Karjalainen15

Reflection and refraction (bending)

M. Karjalainen16

Diffraction

M. Karjalainen17

Sound propagation paths in a room

M. Karjalainen18

Sound field decay in a room

Tapiola-sali

M. Karjalainen19

Sound field in a room, Computer simulation

M. Karjalainen20

Sound field level in a reverberant room

M. Karjalainen21

Modal behavior in a room

L i = dimensions of a rectangular roomn i = integer indices 0, 1, 2, ...

measured magnitude response in a room

M. Karjalainen22

Sound propagation by image source model

Solid line = real path; dotted line virtual path

M. Karjalainen23

Electroacoustics: Loudspeaker

principle driver structure enclosure

Dynamic loudspeaker

M. Karjalainen24

Electroacoustics: Microphone

principle construction

Condenser microphone

M. Karjalainen1

Chapter 3: Sound and Voice as Signals

This is background information that is not asked directlyin the exam, but knowing it certainly helps, especially if you need to apply your knowledge in practice.

M. Karjalainen2

Sound and Voice as Signals

• Signal as a mathematical function:– Pure tone:

– Random signal:

• Discrete-time numeric sequence

In signal representations a physical or abstract variable is typically reptesented as a function of time, such as:

Continues ...

M. Karjalainen3

Sound and Voice as Signals

Continues ... • Graphical presentations:

sinewave random noise

speech waveformsample sequence

unit impulse unit pulse

M. Karjalainen4

Linear and time-invariant (LTI) systems

• Any (stable) LTI system can be fullyrepresented by its impulse response

• Output cannot include any frequencies thatare not in the input (no nonlinear distortion)

• Any bandlimited LTI system can beapproximated by digital filters with arbitraryaccuracy (theoretically)

Properties of LTI systems:

M. Karjalainen5

Convolution

Signal processing algorithms

Fourier analysis

M. Karjalainen6

Signal processing algorithms

Fourier synthesis

Convolution vs. Fourier transform

M. Karjalainen7

Decomposition of sawtooth waveform

M. Karjalainen8

Spectrum analysis

Magnitude spectrum

Phase spectrum

Group delayPhase delay

M. Karjalainen9

Fourier analysis with windowing

• Rectangular window

• Hamming window

• Hann(ing) window

• Kaiser window

• Blackman (Blackman-Harris) window

M. Karjalainen10

Spectrum analysis using Fourier analysis with windowing

Sine wave

Sine wavewindowed

synchronously

Sine wavewindowed non-synchronously

Sine wave,Hamming-windowed

M. Karjalainen11

Vowel spectra

M. Karjalainen12

Time-frequency representations: Spectrogram

Word: /kaksi/

M. Karjalainen13

Auto- and cross-correlation

Cross-correlationAutocorrelation

M. Karjalainen14

Cepstrum

• Compute Fourier transform

• Logarithm of (power) spectrum

• Inverse Fourier transform

M. Karjalainen15

Digital signal processing: DSP systems

• Analog-to-digital (A/D) converter

• Digital signal processor (+ software)

• Digital-to-analog (D/A) converter

M. Karjalainen16

Signal quantization: A/D conversion

• Linear quantization (PCM-coding)

• Discrete levels: 2n (n= bit number)

• 16–24 bits/sample in audio ( 96 dB SNR)

• Sample rate: 44100 or 48000 samples/sec

M. Karjalainen17

Z-transform

Linear transform of sequence x(n) :

Unit delay as building element:

Digital filtering can be expressed as

rational function (or polynomial) of z-1

M. Karjalainen18

Digital filtering: FIR filters

FIR = finite impulse response filter

M. Karjalainen19

Digital filtering: IIR filters

IIR = infinite impulse response filter

M. Karjalainen20

Linear prediction (AR-modeling)

Modeling of signal generation with flat

spectrum excitation (impulse or noise)

and IIR (all-pole) filter. Speech example:

Signal

Windowed

FFT-spectrum

LP-spectra

M. Karjalainen21

Neural networks

MLF = multilayer feedforward network = multilayer perceptron

Input layer + hidden and output layer nodeswith sigmoidal nonlinearity

Backpropagation algorithm for training

M. Karjalainen22

Hidden Markov models (HMM)

For probabilistic modeling of state sequences

Used especially in speech recognition

M. Karjalainen23

Audio reproduction: loudspeaker response

Magnitude response of a non-ideal loudspeaker

M. Karjalainen24

Group delay response of a loudspeaker

M. Karjalainen25

Reproduction quality: Distortion and SNR

Nonlinearity results in distortion: Sine wave inputresults in generation of harmonic components A(i)Distortion (usually given in %):

Signal-to-noise ratio (SNR):

Distortion in general is discussed in later chapters

M. Karjalainen26

Response equalization

Non-flat magnitude response can be equalized(flattened), by digital filtering.

Example by so-called frequency-warped filters

M. Karjalainen1

Chapter 4: Speech and Music

• Speech communication

• Speech production:– Speech production mechanism

– Vocal cords – phonation

– Vocal and nasal tract – articulation

– Units and notation of speech: vowels, consonants

– Prosody of speech

– Modeling of speech production

• Singing voice

• Speech processing: analysis, synthesis, coding, recognition

• Musical instruments as sound sources

• Music signal processing– Sound synthesis techniques

– Physical modeling

– Digital audio vs. music

M. Karjalainen2

Speech communication chain

M. Karjalainen3

Speech production mechanism

M. Karjalainen4

Phonation and articulation

• Vocal cords (vocal folds) — phonation– Generation and controlling of voiced sound at glottis

• Vocal tract and nasal tract — articulation– Controlling of voice features by articulation organs

• Concepts:– Glottis (vocal cord opening)

– Voiced / unvoiced / combined

– Constriction

– Formant (and antiformant)

– Vowel / consonant

– Prosodic features

M. Karjalainen5

Units and notation of speech – Phonetics

• Phonetics: study and description of spoken language

• Languages and language families

– Indo-European, Finno-Ugric, …

• Phonetic alphabet:

– IPA (International Phonetic Alphabet)

– Computerized: SAMPA, Worldbet, ...

• Units of spoken language:

– Phoneme (smallest linguistic unit), abstract unit class

– Allophone (variant of a phoneme)

– Phone (äänne in Finnish), a concrete unit of speech

– Diphone (from mid phone via transition to the mid of next one

– Triphone (similar combination of three successive phones)

– Speech segment (typically subunit of a phone)

M. Karjalainen6

Vowels (Finnish)

• Front–back (etisyys: etu–taka)

• Open–closed (suppeus: suppea–väljä)

• Rounded–unrounded (lavea–pyöreä)

M. Karjalainen7

Consonants (Finnish)

• Articulation place (ääntämispaikka):

– Labial, dental, palatal, velar, laryngeal

• Articulation manner (ääntämistapa)

– Stop consonant (klusiili), fricative (frikatiivi), nasal(nasaali) tremulant (tremulantti), lateral (lateraali),semivowel (puolivokaali)

M. Karjalainen8

Prosody (suprasegmental features)

• Intonation (intonaatio)

– Primarily by fundamental frequency trajectory

• Stress (paino)

– Primarily by intensity (loudness) of pronounciation

• Timing (ajoitus)

– Rhythmic pattern (primarily by segment durations)

M. Karjalainen9

Modeling of speech production

• Simplification of the speech production mechanism

– Acoustic model

M. Karjalainen10

Circuit model (transmission-line model)

• Glottal oscillator

– Varying cross-section between vocal cords

• Vocal tract as a transmission line

– Two-directional wave propagation

• Lip radiation (acoustic load)

• Variables: pressure and volume velocity

M. Karjalainen11

Signal model = Source-Filter model

• Source = excitation– (a) voiced = quasiperiodic excitation

– (b) unvoiced = noislike excitation

• Filter = vocal and nasal tract

M. Karjalainen12

Glottal oscillation

• Phonation = vibration of vocal folds– Glottal opening is a function of time:

• Open phase, closed phase

• Glottal closure event generates the mainexcitation to the vocal tract

M. Karjalainen13

Formants (tract resonances)

• Example: resonances of a homogeneous tube– Volume velocity transfer function

– 17 cm tube corresponds to typical male vocal tract

– quarter waveleght resonator with resonances at

M. Karjalainen14

Vocal tract transfer functions: vowel /i/

• Inhomogeneous vocal tract area profile /i/– Constriction in frontal tract

– Cavity in the rear part of tract

– First formant down from neutral position

– Second formant up from neutral position

M. Karjalainen15

Radiation directivity of speech

• Omnidirectional at low frequencies

• Increased frontal directivity at high frequencies

Azimuth Elevation

M. Karjalainen16

Singing voice

• Classical singing style– `Singers formant´ around 3 kHz makes voice more audible

– In soprano singing the high fundamental frequency or aharmonic component should match a formant

• Singing in popular music– Style and way of voice production is free since

amplification makes it loud anyway

– Personality of voice is important

M. Karjalainen17

Speech processing

• Speech analysis

– Feature analysis of speech signals

• Speech synthesis

– Typically synthesis from text

• Speech recognition

– From speech to text or commands

• Speech coding

– Compression for transmission or storage

• Speech enhancement

– Improving degraded speech signals

M. Karjalainen18

Formant synthesis models

• Cascaded and parallel filter models

M. Karjalainen19

Synthesis by waveform concatenation

• Overlap-add reconstruction of voiced speech

– Fundamental frequency (pitch) can be changed

M. Karjalainen20

Text-to-speech synthesis

• Transforming text to speech signal

– Language-dependent text processing

– Speech signal production quite language-independent

M. Karjalainen21

Text-to-speech synthesis

M. Karjalainen22

Speech coding

• Speech signal analysis

– Typically model-based (linear prediction) where source and

filter parameters are analyzed from speech signal

• Quantization of the parameters (bit compression)

• Transmission or storage of parametrized speech

• Reconstruction of parameters

• Reconstruction of speech signal

• Encoding -> transmission -> decoding

M. Karjalainen23

Speech recognition

• Feature analysis of signal

– Typically mel cepstral coefficients

– Compression of data & redundancy removal

• Pattern recognition

– Comparison to speech units

– Typically by Hidden Markov Models (HMM)

• Possible postprocessing

– Language modeling

• Formal grammar

• Unlimited text is difficult

M. Karjalainen24

Musical instrument sounds

• String instruments

– Plucked string instruments

– Struck string instruments

– Bowed string instrument

• Wind instruments

– Brass instruments

– Woodwind instruments

• Percussion instruments

– Drums etc.

M. Karjalainen25

Modeling of musical instruments (string modeling)

• String model– Two-dimensional waveguide (transmission line)

– Excitation (pluck) inserted to both delay lines

– Wave reflections at terminations modeled as filters

– Output is taken at bridge or pickup, sum of both lines

– The same model is applicable to wind instrument bores(but there is a nonlinear oscillating feedback in them)

M. Karjalainen26

Simplified string modeling

• String model reduction (signal model)– Two delay lines can be combined to one

– Filters in the loop can be combined to a single loop filter

– Computation is more efficient

– So-called Karplus-Strong model is a simplified case wherean intial random noise is inserted in the delay line beforesynthesis and loop filter is a simple two-tap FIR filter

M. Karjalainen27

Impulse response of a simple string model

• Impulse and magnitude responses of the previous model

M. Karjalainen28

Body response modeling

• String instrument body works like an LTI system (filter)

Impulse

response

Magnitude

response

(low frequencies)

M. Karjalainen1

Chapter 5: Structure and Function of Hearing

• Peripheral hearing– External ear

– Middle ear

– Inner ear (cochlea)• Basilar membrane

• Hair cells

• Auditory nerve

• Active cochlea and nonlinearities

• Higher levels of the auditory system

• Basic properties of human hearing– Effective hearing area (level vs. frequency)

– Equal loudness curves

– Technical measures related to hearing• Sound level and frequency weighting functions

M. Karjalainen2

Approaches to hearing research

• Anatomy of hearing

– The structure of hearing organs is studied

• Physiology of hearing

– The (physiological) responses of hearing to physical

sound stimuli are studied

• Psychology of hearing

– Functional properties of auditory perception are studied

as subjects reactions to physical sound stimuli

• The main interest here is ’Engineering psychoacoustics’ and

computational models of auditory functions

M. Karjalainen3

Peripheral hearing

• External ear (outer ear) Middle ear Inner ear

M. Karjalainen4

Schematic of peripheral hearing

• External ear (outer ear) Middle ear Inner ear

M. Karjalainen5

External ear and ear canal transmission

• Transfer functions– Frontal sound source to the eardrum (solid line)

– Entrance of ear canal to the eardrum (dotted line)

• Head-related transfer functions (HRTFs) discussed later

M. Karjalainen6

Middle ear: Bone conduction

• Ossicles– Malleus (hammer-shaped bone)

– Incus (anvil-shaped bone)

– Stapes (stirrup-shaped bone)

• Impedance match from air to liquid (1:3000)

M. Karjalainen7

Animations of middle ear function

Animations: University of Wisconsin http://www.neurophys.wisc.edu/~ychen/auditory/fs-auditory.html

M. Karjalainen8

Middle ear conduction and features

• Signal transfer function is a bandpass filter

• Other middle ear features:– Acoustic reflex

– Eustachian tube

M. Karjalainen9

Inner ear: the cochlea

• Cochlea is a spiral-shaped, liquid-filled tube of about2.7 turns and 35 mm long

• Stapes vibration enters to cochlea through oval window

• Another window to mid-ear is called round window

• Basilar membrane divides the cochlea into two parts

Cochlea linearized

M. Karjalainen10

Cross-section of the cochlea

• Basilar membrane between bony shelves– Division to scala vestibuli and scala tympani

• Reissner’s membrane separates scala media

• Organ of Corti: hair cells

• Tectrorial membrane

M. Karjalainen11

Basilar membrane motion: traveling waves

• Basilar membrane is a nonhomogeneous transmission line:– Wider and more massive towards apex

– Sound pressure entering the liquid of cochlea generates atraveling wave along the basilar membrane

– Traveling wave has maximum vibration amplitude dependingon the frequency of wave (characteristic frequency = C.F.)

– High frequencies resonate close to the oval window and lowfrequencies close to helicotrema

M. Karjalainen12

Animation of basilar membrane motion

M. Karjalainen13

Basilar membrane response to a square-wave signal

• Time–position–amplitude pattern of basilar membranemovement as a response to square-wave signal

M. Karjalainen14

Hair cells

• Inner hair cells, in one row

• Outer hair cells, in 3-5 rows

• Together about 15000 – 16000 hair cells

• Each hair cell is equipped on top with u-, v-, or w-shaped filament called stereocilia

• Neural fibers are connected to hair cells

M. Karjalainen15

Hair cells in the organ of Corti

M. Karjalainen16

Stereocilia (= ’hair bundles’ of hair cells)

M. Karjalainen17

Movement of the organ of Corti

M. Karjalainen18

Movement and activation of hair cells

M. Karjalainen19

Hair cells: neural conduction

• Vibration of the basilar membrane causes bending ofstereocilia and this opens ion channels which modulatespotential within the cell

• Activation of the cell releases neurotransmitter tosynaptic junctions between hair cell and neural fibers ofthe auditory nerve

• A neural spike is generated that propagates in theauditory nerve fiber

• Next spike possible only after at least 1 ms

M. Karjalainen20

Activation and inhibition of hair cells

• Asymmetrical effect of sterocilia bending on firing rate

• Cochlear potentials

M. Karjalainen21

Phase-locking and synchrony of neural firing

• Statistically phase-lockedwithin half cycle

• Statistical synchrony ofneural firing

M. Karjalainen22

Passive vs. active cochlea

• Georg von Békésy found basilar membrane behavior by

experimention with ears from dead animals

=> reduced frequency resolution

• Explanation: second filter needed

• Now it is known that the cochlea is active:

– Especially at low signal levels the outer hair cells amplify

basilar membrane motion

• Outer hair cells receive many efferent neural fibers from

higher neural levels

• Outer hair cells are able to change their length very

rapidly (in synchrony with high audio frequencies)

• Otoacoustic emission (cochlear echo) as a response to

external stimulus, recordable in near canal, is related to

this phenomenon

M. Karjalainen23

Auditory nerve responses: firing rate

• Steady-state firing rate is a saturating function with

spontaneous rate (= without sound excitation)

• There are fibers with different sensitivity (and

spontaneous rate)

M. Karjalainen24

Poststimulus time histogram (PST)

• Firing rate overshoot and undershoot with onset and

offset of excitation

– Works like automatic gain control

M. Karjalainen25

PST with steady-state sinusoidal excitation

• Statistically, half-wave rectification appears along with

automatic gain control

M. Karjalainen26

Firing rate saturation for a vowel excitation

• For increasing level of excitation, the firing rate profile

(’neural activation spectrum’) saturates

M. Karjalainen27

Tuning curves for constant firing level

• If the firing rate of a neural fiber is kept constant for varying

excitation frequency, a tuning curve is obtained

• This characterizes the frequency selectivity of cochlea

M. Karjalainen28

Effects of active cochlea

• Low-level signals are amplified substantially byactive cochlea:– Sensitivity of hearing is increased

– Due to AGC-like compression, the narrow dynamic range(about 25 dB) of hair cells is expanded to more than 100 dB

• Selectivity (frequency resolution) is increased(especially at low signal levels) due to active function

• If outer hair cells are damaged, the activeamplification is degraded or disappears– Loss of auditory sensitivity

– Tuning curves are broadened

– Otoacoustic emissions disappear

M. Karjalainen29

Cochlear nonlinearity: Two-tone suppression

• Addition of another tone (shaded area in figure below)

suppresses the activation due to probe tone at its characteristic

frequency (= kind of masking)

M. Karjalainen30

Cochlear nonlinearity: Combination tones

• Nonlinear interaction of two tones generates

new tones that are perceived:

– Difference tone: fdiff = f2 – f1• E.g.: 1.1 kHz and 1.0 kHz => 100 Hz

– Cubic difference tone: fcubic = 2f1 – f2• E.g.: 1.0 kHz and 1.1 kHz => 900 Hz

• Appears already at low level of excitation

M. Karjalainen31

Central auditory system

• Higher-level functions

not known well.

• Cochlear nucleus has

specific cells such as

’chopper cells’ that do

temporal processing.

Spectral information is

recovered unsaturated.

• Binaural hearing starts

at superior olive level.

• Auditory cortex is the

center for processing

perceptions and

integrating the sound

scene.

• Interaction with other

senses (vision) strong.

M. Karjalainen32

Dynamic range of hearing

Sound

level

’thermo-

meter’

6 dB steps

3 dB steps

1 dB steps

M. Karjalainen33

Equal loudness curves and threshold of hearing

• Equal loudness level perception, unit phone = SPL at 1 kHz

M. Karjalainen34

Sound level and frequency weighting curves

• Weighting filters for sound level measurement (A most common)

M. Karjalainen35

Recommended frequences and bands

• Recommended

frequences and

frequency bands

for measurements

and technical

applications:

• Octave = 2:1

• 1/2 octave

• 1/3 octave

M. Karjalainen36

Filtered noise demo

• White noise

• Low-pass filtered noise,

decreasing cutoff frequency

• High-pass filtered noise,

increasing cutoff frequency

• 1/3 octave noise,

increasing center frequency

• White and pink noise

M. Karjalainen1

Chapter 6: Fundamentals of Psychoacoustics

• Psychoacoustics = auditory psychophysics

• Sound events vs. auditory events– Sound stimuli types, psychophysical experiments

– Psychophysical functions

• Basic phenomena and concepts– Masking effect

• Spectral masking, temporal masking

– Pitch perception and pitch scales• Different pitch phenomena and scales

– Loudness formation• Static and dynamic loudness

– Timbre• as a multidimensional perceptual attribute

– Subjective duration of sound

M. Karjalainen2

Psychophysical experimentation

• Sound events (si) = pysical (objective) events

• Auditory events (hi) = subject’s internal events– Need to be studied indirectly from reactions (bi)

• Psychophysical function h=f(s)• Reaction function b=f(h)

M. Karjalainen3

Sound events: Stimulus signals

• Elementary sounds– Sinusoidal tones

– Amplitude- and frequency-modulated tones

– Sinusoidal bursts

– Sine-wave sweeps, chirps, and warble tones

– Single impulses and pulses, pulse trains

– Noise (white, pink, uniform masking noise)

– Modulated noise, noise bursts

– Tone combinations (consisting of partials)

• Complex sounds– Combination tones, noise, and pulses

– Speech sounds (natural, synthetic)

– Musical sounds (natural, synthetic)

– Reverberant sounds

– Environmental sounds (nature, man-made noise)

M. Karjalainen4

Sound generation and experiment environment

• Reproduction techniques– Natural acoustic sounds (repeatability

problems)

– Loudspeaker reproduction

– Headphone reproduction

• Reproduction environment– Not critical in headphone reproduction

– Anechoic chamber (free field)• Room effects minimized

• Not a natural environment

– Listening room• Carefully designed, relatively normal

acoustics

– Reverberation chamber• Special experiments with diffuse

sound field

M. Karjalainen5

Psychophysical functions

• Sound event property to auditory event property mapping

h = a log(s) Weber, Weber-Fechner law

h = c sk (e.g., loudness)

M. Karjalainen6

Experimental concepts: Thresholds

• Threshold values– Absolute thresholds (e.g., threshold of hearing)

– Difference thresholds (just noticeable difference, JND)

Example: Threshold of perception:

- 50%, 75%, etc. thresholds

M. Karjalainen7

Experimental concepts

• Comparison of percepts– Magnitude estimation

– Magnitude production

• Probe tone method– Generation of a probe tone to make test tone

audible/noticeable

– Modulation, canceling, interference

• Classification and scaling of percepts– Nominal scale (rough, sharp, reverberant, …)

– Ordinal scale (percepts have ordering)

– Interval scale (numeric scale, no zero point defined)

– Ratio scale (numeric scale, zero point defined)

• Multidimensional scaling– Semantic differentials: low – high, dull – sharp, ...

M. Karjalainen8

Psychoacoustic experiments

• Description of auditory events– Oral or written description

• Method of adjustment– Adjusting a stimulus to correspont to a reference

• Selection methods– Forced choice methods (select one!):

• Two alternative forced choice (TAFC, 2AFC)

• Method of tracking– Tracking with varying stimulus

• Bekesy audiometry

• Bracketing method– Descending and ascending bracketing

• Yes/no answering

• Reaction time measurement– Indicates the difficulty of decision task

M. Karjalainen9

Békésy audiometry

• Slow frequency sweep and level tracking

M. Karjalainen10

Typical psychoacoustical test types

• AB test– Set in preference order / select one

– AB hidden reference (one must be recognized)

• AB scale test– As AB but assign numeric values for A and B

• ABC test– A is fixed reference (anchor point) for assigning

values for B and C

• ABX test– Which one, A or B, is equal to X ?

• TAFC (2AFC)– Two alternative forced choice

• Formation of a listening test panel

• Formation of a description language

M. Karjalainen11

Masking effect

• ”A loud sound makes a weaker sound imperceptible”

• Categories and aspects of masking– Frequency masking

– Temporal masking

– Time-frequency masking

– Frequency selectivity of the auditory system

– Psychophysical tuning curves

– Critical band• Bark bandwidth

• ERB bandwidth

• Masking tone and test tone

M. Karjalainen12

Frequency masking

• Masking by white noise

M. Karjalainen13

Frequency masking

• Masking by narrow-band noise (0.25, 1, 4 kHz)

M. Karjalainen14

Frequency masking

• Frequency masking as a function of masker level

M. Karjalainen15

Frequency masking

• Frequency masking by lowpass and highpass noise

M. Karjalainen16

Frequency masking

• Frequency masking by 1 kHz sinusoidal signal

M. Karjalainen17

Frequency masking

• Frequency masking by a complex tone(harmonic complex)

M. Karjalainen18

Temporal masking

• Masking before and after a noise signal

M. Karjalainen19

Temporal masking

• Beginning of postmasking

M. Karjalainen20

Temporal masking

• Postmasking as a function of time– For 200 ms long masker

– For 5 ms long masker

M. Karjalainen21

Time-frequency masking

• Masking of a tone burst in time and frequencyby a time-frequency block of noise

M. Karjalainen22

Temporal masking

• Masking due to an impulse train

M. Karjalainen23

Frequency selectivity of hearing

• Masking curves tell much about auditory selectivity

• Psychophysical tuning curves match with physiological curves

M. Karjalainen24

Critical band experiment

• Experiment: loudness vs. bandwidth of noise

M. Karjalainen25

Critical band

• Loudness vs. bandwidth of noise– Loudness increases when bandwidth exceeds

a critical band

M. Karjalainen26

Critical band (Bark band) vs. frequency

• Critical band (Bark band) fG vs. mid frequency

• Ref: just noticeable tone frequency change vs. frequency

M. Karjalainen27

Critical band: 24 Bark bands (Zwicker)

M. Karjalainen28

ERB band experiment

• ERB = Equivalent Rectangular Bandwidth

• Loudness of a tone is measured as a function of frequencygap in masking noise around the test tone

• ERB band is narrower than Bark band, especially at lowfrequences

M. Karjalainen29

Pitch scales

• Pitch = subjective measure of tone hight

• Mel scale

• Bark scale

• ERB scale

or

or

Inverse function:

Inverse :

M. Karjalainen30

Logarithmic pitch scale

• Logarithmic scale used in music and audio

• Frequency ratios more important than absolute frequencies

• Octave and ratios of small integers important

M. Karjalainen31

Comparison of pitch scales

• Pitch scales are related to place coding on the basilar

membrane, although they are measured by psychoacoustic

experiments

M. Karjalainen32


• Comparison (log reference) of:– logarithmic scale– ERB scale– Bark scale– linear scale

M. Karjalainen33


• Comparison (linear reference) of:– logarithmic scale– ERB scale– Bark scale– linear scale

M. Karjalainen34

Pitch

• Continues in file KA6b

M. Karjalainen1

Pitch phenomenaCont’d from file 6a

• Pitch of a pure tone as a function of amplitude– Individually varying property

M. Karjalainen2

JND of frequency modulation

• Frequency modulation JND threshold– As a function of carrier frequency– As a function of modulation frequency– About 4 Hz modulation most easily perceivable

M. Karjalainen3

Minumum duration of a tone for pitch percept

• Duration to make pitch perceivable– Duration in milliseconds– Duration of two cycles as a reference

M. Karjalainen4

JND pitch change vs. tone duration

• Threshold of perceived pich variation increases below200 ms duration

M. Karjalainen5

Pitch strength

• How strong or weak a pitch perception is?

M. Karjalainen6

Pitch phenomena and theories

• Place (spectral) pitch vs. temporal pitch theories

• Spectral pitch (due to spectral peak)

• Temporal pitch (periodicity)

• Missing fundamental

• Virtual pitch

• Repetition pitch

• Pitch of inharmonic signals

• Absolute pitch (memory)

M. Karjalainen7

Loudness

• Loudness is the perceived subjective ’strength’(’volume’, ’intensity, etc.) of a sound

– Subjective scale defined in relation to physical scale

– Unit is sone: 1 sone — 40 dB SPL at 1 kHz

M. Karjalainen8

Loudness of a sinusoidal tone

• Loudness N vs. SPL of a 1 kHz tone

– Power law found to mach best

Power law:

More precisely:

Loudness vs.

loudness level :

M. Karjalainen9

Partial loudness (by noise masking)

• Partial loudness of 1 kHz tone in presence of masking noise

– As a function of tone level and masking noise level

M. Karjalainen10

Loudness example: two tones

• Loudness of a pair of tones as a function of frequency difference

– Slow beat range: loudness due to peaks (6 dB over 60 dB)

– Medium rate fluctuation: power doubled => 3 dB increase

– Fast fluctuation: wideband signal => loudness doubled (10 dB)

M. Karjalainen11

Loudness computation (Zwicker formulation)

• Excitation signal => power spectral density on the Bark scale

• Spreading function B(z), such as

• Convolution by spreading function

• Loudness density

• Total loudness

M. Karjalainen12

Loudness computation, examples

• Left: excitation level for sinusoidal tone and white noise

• Right: loudness density for sinusoidal and white noise

M. Karjalainen13

Loudness graphically

• Graphical chart determination of loudness (Zwicker)

M. Karjalainen14

JND of loudness level

• Just noticeable difference by amplitude modulation

– Modulation of 1 kHz tone

– Modulation of white noise

– Modulation frequency 4 Hz

M. Karjalainen15

JND of loudness level

• Just noticeable difference by amplitude modulation

– As a function of modulation frequency

– Modulation of 1 kHz tone

– Modulation of white noise

M. Karjalainen16

Modulation detection

• Detection of amplitude and frequency modulation

– Amplitude modulation easily detectable by ’off-band listening’(loudness modulated due to upper spreading slope variation)

– No slope variation in frequency modulation

M. Karjalainen17

Loudness vs. duration

• Temporal integration of loudness for duration < 200 ms

– Loudness level decreases 10 phon for for 10-fold decrease induration

M. Karjalainen18

Loudness formation temporally

• Loudness formation for different durations of a tone burst

– Peak value of total loudness is tracked in time-varying cases

M. Karjalainen19

Timbre (perceived ’sound color’)

• Timbre is a multidimensional attribute of sound– For stationary sounds:

• Spectrum: (loudness spectrum)

• Periodicity (periodic, multiperiodic, noise-like)

• Repetitiveness (reflections, reverberation, spatialness)

– For time-varying signals

• Amplitude envelope important

– Amplitude envelope at each critical band

– For transients and onsets

• Changes are more prominent than steady-state parts,especially onsets

M. Karjalainen20

Subjective duration

• Subjective vs. objective duration

M. Karjalainen21

Auditory Demonstrations 1

1 Cancelled harmonics

2-6 Critical bands by masking

7 C.B. by loudness comparison

8-11 The decibel scale

12-16 Filtered noise

17-18 Frequency response of the ear

19-20 Loudness scaling

21 Temporal integration

22 Asymmetry of masking by pulsed tones

23-25 Backward and forward masking

26 Pulsation threshold

M. Karjalainen22


27-28 Dependence of pich on intensity

29 Pitch salience and tone duration

30 Influence of masking noise on pitch

31 Octave matching

32 Streched and compressed scales

33 Frequency difference limen

34-35 Log and lin frequency scales

36 Pitch streaming

37 Virtual pitch (missing fundamental)

38-39 Shift of virtual pitch

40-42 Masking spectral and virtual pitch

M. Karjalainen23


43-45 Virtual pitch with random harmonics

46-47 Strike note of chime

48 Analytic vs synthetic pitch

49-51 Scales with repetition pitch

52 Circularity in pitch judgment

53 Effect of spectrum on timbre

54-56 Effect of tone envelope on timbre

57 Change in timbre with transposition

58-61 Tones and tuning with streched partials

62-63 Primary and secondary beats

M. Karjalainen1

Chapter 7: Other psychoacoustic concepts

• Sharpness– Spectral center of gravity

• Fluctuation strength– Perception of slow modulations (beats)

• Impulsiveness• Roughness

– Perception of fast modulations

• Tonality– Periodic vs. random excitation

• Sensory pleasantness• Psychoacoustic concepts and music

– Sensory consonance and dissonance– Intervals, scales, and tunings– Rhythm, tempo, bar, measure

• Perceptual organization of sound

M. Karjalainen2

Sharpness

• Perceived sharpness is proportional to spectral center of gravity

• Unit of sharpness is 1 acum ~ for noise of 60 dB, 1 kHz, 1 Bark

• Sharpness for 1 Bark wide noise, lowpass noise, and highpass noise

• Increase of level from 30 dB to 90 dB doubles the sharpness

Bandpass noises:

M. Karjalainen3

Computation of sharpness

• Sharpness can be estimated (without level effect) from

where is defined by curve:

M. Karjalainen4

Fluctuation strength

• Perception of relatively slow modulations: fluctuation strength• Highest sensitivity to modulation at 4 Hz

• Unit of fluctuation strength is 1 vacil~ 4 Hz 100 % modulation of 1 kHz 60 dB tone

• Figure: (a) AM broadband noise, (b) AM sinusoidal tone, (c) FM sinusoidal tone

1 Hz

4 Hz

16 Hz

M. Karjalainen5


• Left: fluctuation strength for AM (4 Hz) wideband noise (60 dB)

• Right: sine tone, 1.5 kHz, 70 dB, modulated at 4 Hz, as a functionof FM deviation

M. Karjalainen6


• Fluctuation strength computation:

M. Karjalainen7

Impulsiveness

• There is no clearly defined psychoacoustic concept of impulsiveness

• Impulsiveness is related to rapid onsets in signal

• If the repetition rate of impulses is > 10–15 Hz, roughness is perceived

• In noise control, impulsiveness is considered to increase hearing

damage risk compared to non-impulsive sound of same energy

M. Karjalainen8

Roughness

• Fast (> 15 Hz) modulation is perceived as roughness

• Addition of two tones of different frequencies creates envelopefluctuation

• When the frequency difference increases, tones start to segregate

• When the frequency difference is larger than a critical band,roughness disappears

7 Hz

70 Hz

300 Hz

1 kHz+ f

M. Karjalainen9

Roughness

• Unit of roughness is 1 asper ~ 1 kHz tone, 60 dB, 100 % AM

modulated at 70 Hz.

• Towards lower and higher modulation frequences the roughness

decreases

M. Karjalainen10

Roughness

• Roughness for different carrier frequencies as a function of AM

modulation frequency with 100 % modulation.

7 Hz

70 Hz

300 Hz

1 kHz+ f

M. Karjalainen11

Tonality

• Tonality (tonalness) = sound exhibits voiced component(s), periodicity

• Non-tonal sound is noise-like, non-periodic

• Non-tonal (noisy) signal masks a tonal one more easily than vice versa

• For tonality index , critical band index i, the masking threshold is:

– ( = 0.0: non-tonal, = 0.5: half-tonal, = 1: fully tonal)

• Tonality with varying modal density, log. distribution of frequencies (approx/critical band):

80/CB40/CB20/CB10/CB

M. Karjalainen12

Sensory pleasantness

• Sensory pleasantness (example by Zwicker):

– P = sensory pleasantness– S = sharpness– R = roughness– T = tonality– N = loudness

– Product sound quality measures are often constructed bysimilar techniques.

M. Karjalainen13

Sensory consonance and dissonance

• Consonance and dissonance are closely related to roughness

• Consonance vs. dissonance of two partials:

M. Karjalainen14

Consonance and dissonance of harmonic tones

• Roughness due to interaction of partials in a sound contribute todissonance

• Rations of small integers are most consonant (just intonation)

• Consonance vs. dissonance of two harmonic complexes:

M. Karjalainen15

Examples of intervals

• Pythagoras noticed that intervals 2:1, 3:2, and 4:3 sound

”pleasant”

• Consonant intervals (decreasing order of consonance):

– 2:1 octave

– 3:2 perfect fifth

– 4:3 perfect fourth

– 5:3 major sixth

– 5:4 major third

– 8:5 minor sixth

– 6:5 minor third

– 16/15 (dissonant)

– 40/27 (dissonant)

1.4983 fifth

1.2599 third

Equally

tempered

intervals

M. Karjalainen16

Examples of intervals

• Log and lin uniformly spaced scales

• Which one is the best octave ?

• Stretched and compressed scales

Octave and its partitioning

Circularity of pitch

• Shepard effect

M. Karjalainen17

Intervals, scales, tuning

• Just intonation, Pythagorean scale, (equally) tempered scale

• On a tempered scale a semitone is 1:1.05946

• 1 cent is 1/100 of a semitone

M. Karjalainen18

Non-western scales and tunings

• The (tempered) western scale is adapted to a multitude of

harmonic timbres of western instruments

• For example the Balinese gamelan music is quite different

– W. A. Sethares: Tuning, Timbre, Spectrum, Scale. Springer 1998

• Example of tuning where octave is a very dissonant interval!

• Tunings and musical scales are strongly bound with spectral

properties of musical instruments

M. Karjalainen19

Temporal structures in music: Rhythm, tempo

• Rhythm: periodicity and repeated structure in music

• Tempo: rate of main events in music

• Beat: positioning of emphasis on some events

• Measure: basic rhythmic sequence

• Duration of a note or another basic unit

M. Karjalainen20

Perception of magnitude and phase spectrum

• Magnitude

– 1 dB deviation per critical band noticeable in direct comparison.

Even smaller deviations can be noticed by trained ”golden ears”

– Even ± 3...5 dB deviations are not easy to ”perceive” when there is

no immediate reference (except for well trained listeners)

– Magnitude response deviations = spectral coloration

• Phase and time differences

– The auditory system is relatively insensitive to phase (Helmholtz)

in general: magnitude spectrum more important than phase

spectrum, but sometimes phase is important

– Phase functions from Fourier analysis are circular and difficult to

analyze and interpret

– Group delay (phase derivative) is a relatively good perceptual

measure which describes the delay of modulation (not the carrier)

M. Karjalainen21

• Special phase effects:

– The following two signals have the same magnitude spectrum but

sound (as well as look) different

Perception of phase: extreme cases

This is how the response looks

like in a single critical band

M. Karjalainen22

Perceptual organization of sound

• Streaming (sequential grouping) of pitch sequences:

– Slow repetition: one stream perceived

– Fast repetition: segregation into two separate streams

A

B

C

D

E

FB

D

FB

D

F

A

CE

A

CE

(b)(a)

Time TimeTwo streamsOne stream

M. Karjalainen23


• Streaming may change also the perceived rhythm:

– Large separation: B-D-F vs. A-C-E

– Small separation: B-D vs. A-C-E-F

A

CE

BD

F

BD

A

CE

F

Time Time

Lower streamUpper stream Upper stream Lower stream

M. Karjalainen24


• Streaming with increasing tempo

increasingtempo orfrequencydifference

segregationof multiplestreams

TIMBRE/TEXTURE

time

M. Karjalainen25


• Streaming or segregation as a function of frequency

difference and repetition period

0 50 100 150 200 250 300 400 500

20

15

10

5

0

20 10 5 3

Repetition period (msec)

alwayscoherenti

alwaysseparated

separatedor coherent

M. Karjalainen26

Auditory scene analysis

• Auditory scene analysis

– Bregman: Auditory scene analysis (MIT Press, 1990)

• Sequential integration and segregation

– Spectral vs. temporal relations

– Spatial cues in segregation

• Integration and segregation of simultaneous auditory components

– Spectral vs. temporal relations

– The ”old-plus-new” heuristics

– Spatial cues in segregation

• Primitive auditory organization

– Built-in and low-level mechanisms

• Schema-based auditory organization

– Learning of stream integration and segregation

M. Karjalainen27

Computational auditory scene analysis (CASA)

• Computational auditory scene analysis (CASA) is an attempt to

computationally simulate and model human auditory scene analysis

– Sound source segregation (separation)

– Multipitch signal analysis of harmonic sound mixtures

– Bottom-up vs. top-down driven processing

– Prediction-driven processing

– Spatial source separation (coctail-party effect)

– Applications:

• Audio content analysis and content-based coding

• Automatic music transcription

• Speech recognition

Tilakuuleminen

Ville PulkkiAkustiikan ja aanenkasittelytekniikan laboratorio

Teknillinen korkeakouluEspoo, Suomi

http://www.acoustics.hut.fi/

Ville [email protected]

TKK, Akustiikan ja aanenkasittelytekniikan laboratorio 26.3.2002

Aani tilassa

Ville Pulkki ([email protected]) sivu 3


TilakuuleminenSuuntakuulo

• Suuntakuulon tarkkuus

• Suuntakuulon teoria

Etaisyyskuulo

Tilan havaitseminen

Tilaanentoisto



Siirtofunktio aanilahteesta korvakaytavaan

Head Related Impulse Response (HRIR)Head Related Transfer Function (HRTF)

c©Duda: http://interface.cipic.ucdavis.edu/CIL tutorial/



HRTF:ien mittaaminen

c©Algazi et al.: http://interface.cipic.ucdavis.edu/



HRTF:n riippuvuus aanilahteen suunnasta

0 1 2

-0.2

-0.1

0

0.1

0.2

0 1 2

-0.2

-0.1

0

0.1

0.2

0 1 2

-0.2

-0.1

0

0.1

0.2

0 1 2

-0.2

-0.1

0

0.1

0.2

0 1 2 msmsms

msmsms

-0.2

-0.1

0

0.1

0.2

0 1 2

-0.2

-0.1

0

0.1

0.2vasenϕ = 0δ = 0

vasenϕ = 60δ = 0

vasenϕ = 0δ = 60

oikeaϕ = 60δ = 0

oikeaϕ = 0δ = 60

oikeaϕ = 0δ = 0

a) b) c)

c©M. Karjalainen



HRTF:n riippuvuus aanilahteen vaakakulmasta




HRTF:n riippuvuus aanilahteen pystykulmasta




HRTF:n riippuvuus aanilahteen suunnasta

102 104

-40

-30

-20

-10

0dB

102 103 103 103

103 103 103

104

-40

-30

-20

-10

0dB

10 2 10 4

-40

-30

-20

-10

0

102 104

-40

-30

-20

-10

0

10 2 10 4

-40

-30

-20

-10

0

10 2 10 4

-40

-30

-20

-10

0

vasenϕ = 0δ = 0

vasenϕ = 60δ = 0

vasenϕ = 0δ = 60

oikeaϕ = 0δ = 0

oikeaϕ = 0δ = 60

oikeaϕ = 60δ = 0

a) b) c)Hz Hz

HzHzHz

Hz

c©M. Karjalainen



Suuntakuulon tarkkuus horisontaalitasossa

179,3°180°

±5,5°

281,6°±10°

359°±3,6°

80,7°±9,2°

0°

90°

Kuulotapahtuman suunta

Äänitapahtuman suunta270°

ϕ

c©M. Karjalainen



Suuntakuulon tarkkuus mediaanitasossa

Äänitapahtumansuunta

Kuulotapah-tuman suunta

δ = 0ο

δ = 36ο+68ο+74ο

±22ο±13ο

±9ο

+27ο±15ο

+30ο±10ο

δ = 36ο

δ = 90ο

δ = 0οϕ = 180οϕ = 0ο

0ο

c©M. Karjalainen



Lateralisaatiokokeet

τ ph1 τ ph2 a 1 a2

a) b)

vaimentimet

signaalisignaali

viivepiirit

c©M. Karjalainen



Lateralisaatiokokeet, aikaviive

hava

ittu

late

raal

isija

inti

oike

ava

sen

6

4

2

0

2

4

6 vasen aiemmin vasen myöh.

-15000 -1000 -500 0 500 1000 15000

korvien välinen vaiheviive τph / μs

c©M. Karjalainen



Lateralisaatiokokeet, ominaisuuksia

Hyvat puolet:

• Voidaan vapaasti tuottaa mika tahansa ITD-ILD yhdistelma

• Perustulokset

Ongelmat:

• Epaluonnollisuus

• Paan sisalle lokalisointi

• Korkeiden taajuuksien toisto erilainen eri kuuntelukerroilla



Suuntakuulo

Vihjeet:

• Binauraaliset vihjeet

– Korvienvalinen aikaero

– Korvienvalinen voimakkuusero

• Monauraalinen spektri

• Paan kaantelyn vaikutus binauraalisiin vihjeisiin

• Heijastusten suppressio



Binauraaliset vihjeet

• Interaural Time Difference, korvienvalinen aikaero

• ITD



ITD:n taajuusriippuvuus

vasen

oikea

ITD ITD

kantoaallon aikaviive

korkeat taajuudet > ~1600 Hz

verhokayran aikaviive

matalat taajuudet ~200 − ~1600 Hz



ITD:n mallinnus

ITD

IACC

IACC

Composite

IACC

IACCspectrum

GTFB

GTFB

filteringlow pass

rectificationhalf wave



Ristikorrelaatio ERB-kanavilla

−1

−0.5

0

0.5

1

5

10

15

20

25

30

0

0.2

0.4

0.6

0.8

121 kHz

10 kHz

90°

3 kHz

60°40°

1.5 kHz

20°

800 Hz

0°

200 Hz

20°40°60°

Band cross correlation functions

90°



ITD:n taajuusriippuvuus

0.20.4

0.71.1

1.72.6

3.95.7

8.512.4

18.2 9060

300

−30−60

−90

−1

−0.5

0

0.5

1

x 10−3

Direction [degree]Frequency [kHz]

ITD

[ms]



Binauraaliset vihjeet

dB dB

• Interaural Level Difference, korvienvalinen voimakkuusero

• ILD



ILD:n mallinnus

ILD

levelloudness

spectrum

spectrum

Composite

CLL

GFTB

GFTB

CLL

ILD

CLL

ILD

ILD

LL

LL

LL

LL

LL

LL



ILD:n taajuusriippuvuus

0.20.4

0.71.1

1.72.6

3.95.7

8.512.4

18.290

6030

0−30

−60−90

−60

−40

−20

0

20

40

60

Direction [degree]Frequency [kHz]

ILD

[pho

n]



Sekaannuskartio

sekaannuskartio

ääni- lähde

ccφ

ccθ

• ITD ja ILD ratkaisevat missa sekaannuskartiossa aanilahde on

– korvalehden ja kehon vaikutus

– paan kaantely



Paan kaantelyn vaikutus binauraalisiin vihjeisiin

ITD & ILDmuuttuvat paljon

ITD & ILD vakio

paan pyoritys

ITD & ILD

vastakkaiseen suuntaanmuuttuvat paljon

- karkea vihje



Kehon vaikutus

Korvalehti, paa, keho

Spektri muuttuu, ILD muuttuu



Korvalehden vaikutus

• Korvalehden onkalot varittavat aanta saapumissuunnasta riippuen



Elevaation vaikutus spektriin

90

60

30

15

0

−15

−300.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 18.2

−30

−20

−10

0

10

20

30

Frequency [kHz]v [degr]

Loud

ness

leve

l spe

ctru

m [p

hon]

90

60

30

15

0

−15

−300.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 1

−30

−20

−10

0

10

20

30

Frequency [kHz]Elev [degr]

Loud

ness

leve

l spe

ctru

m [p

hon]

1 2

Auditorinen spektri mediaanitasossa, suunnasta riippumaton osuus keskivoistettu pois.



Elevaation vaikutus spektriin

90

60

30

15

0

−15

−300.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 18.2

−30

−20

−10

0

10

20

30

Frequency [kHz]v [degr]

Loud

ness

leve

l spe

ctru

m [p

hon]

90

60

30

15

0

−15

−300.2 0.4 0.7 1.1 1.7 2.6 3.9 5.7 8.5 12.4 1

−30

−20

−10

0

10

20

30

Frequency [kHz]Elev [degr]

Loud

ness

leve

l spe

ctru

m [p

hon]

3 4

Auditorinen spektri mediaanitasossa, suunnasta riippumaton osuus keskivoistettu pois.



Vihjeiden luotettavuusJos vihjeet ovat ristiriitaisia:

• Signaalin spektri < ˜ 1000 Hz

– ITD yleensa vahvin

– ILD heikko, trading?

• Korkeammat taajuudet

– ITD ja ILD kumpikin vahvoja

– ILD voimakkaampi joskus

• Johdonmukaisempi vihje voittaa [Wightman]

• Voi syntya useita havaintoja suunnasta

• Aanilahteen koko

• Individuaalisuus



Presedenssiefekti

Vihjeet relevantteja vain silloin kun suora aani dominoi



Presedenssiefekti

ϕ = 40o

ϕ =-40o

ϕ = 0o

ϕ

So

ST

α=80o

0 1 2ms 20 30 40 50ms

kaiku

kaik

ukyn

nys

ST:n viive τph

ensimm. kuulotapahtuma



Kaikujen havaitsemiskynnykset

ST:n viive

taso

ero

LS

T -

LS

O40dB

20

0

-20

-40

0 20 40 60 80 100 ms

ensimmäinen äänitapahtuma ei enää erotettavissa(ensiääni estetty) (≥ 6 henkeä)

ensimmäinen äänitapahtuma ja kaikuyhtä äänekkäät (≥ 6 henkeä)

kaiku häiritsevä (80 henkilöä)

peittokynnys(1-2 henkilöä)



Aanekkyyden vaikutus vapassa kentassa

0 0 2 4 6 8 10

2

4

6

8

kuul

otap

ahtu

man

etä

isyy

s / m

äänilähteen etäisyys / m

viiden henk. keskiarvo

c©M. Karjalainen



Etaisyyden havaitseminen

Vihjeet

• Aanekkyys

• Binauraaliset vihjeet

• Suoran aanen suhde kaiuntakenttaan

• Spektri



Tilaaanen toistometodit

• Perinteinen toisto

– Monofonia

– Stereofonia

– Monikanava 2-D

– Monikanava 3-D

• Binauraalinen toisto

– Kuulokkeet

– Kaiuttimet, ristiinkuulumisen esto



Monofoninen toisto



Stereofoninen toisto



“Surround” toisto



3-D monikanavatoisto



Binauraalinen toisto

H l Hr H i H i

Hc Hc

ˆ y l ˆ y rxm

y l y lyr yr

c©M. Karjalainen

Yksinkertaisimmillaan kuunnellaan keino- tai tosipaa-aanitysta kuulokkeilla.



Binauraalinen toisto

1Hi + Hc

1Hi − Hc− −

yl

yr − −

yl

yr

binau-raalinen

transau-raalinen

Hi + Hc

Hi − Hc

binauraalinen

(a)

(b)

(c)

(d)

−

stereo

yl

yr

mono

Hl + Hr

Hi + Hc

Hl − Hr

Hi − Hc

Hl

Hr −

x l

xr

ˆ y l ˆ y l

ˆ y l

ˆ y r ˆ y r

ˆ y r

xm

transau-raalinen

binau-raalinen

transau-raalinen

c©M. Karjalainen

Yksinkertaisimmillaan kuunnellaan keino- tai tosipaa-aanitysta kuulokkeilla.


M. Karjalainen1

Chapter 9: Auditory modeling

• Simple psychoacoustic models– Psychoacoustic spectrum and spectrogram– Mel spectrum and cepstrum– Perceptual linear prediction– Examples of auditory spectra

• Auditory filter bank models– Gammatone filterbanks– Inner ear simulation models– Temporal dynamics and masking

• Cochlear models– Basilar membrane models– Hair cell models

• Modeling of higher level functions– Pitch and periodicity analysis– Speech specific models– Computational auditory scene analysis

• Binaural auditory modeling

M. Karjalainen2

Simple psychoacoustic modeling

• Problems with Fourier spectrum from auditoryperception viewpoint:– Linear frequency scale vs. critical band scale

– Level (dB) vs. loudness scaling

– Frequency bins vs. spreading and masking

– Flat response vs. equal loudness sensitivity

– Windowing vs. temporal integration and masking

– Temporal adaptation in auditory perception

M. Karjalainen3

Auditory spectrum through FFT

M. Karjalainen4

Examples of psychoacoustic spectra

• Auditory spectra– Sinewave (400 Hz)

– White noise

M. Karjalainen5

Examples of psychoacoustic spectra

• Vowel /a/ and fricative /s/– Fourier spectrum vs. auditory spectrum

M. Karjalainen6

Mel frequency cepstral coefficients

• MFCC computation– FFT, mel warping, logarithm, inverse cosine transform

M. Karjalainen7

Filterbank auditory models

• General principle of an auditory filterbank model

M. Karjalainen8

Response of a filterbank model (Bark-bank)

• Simple Bark-filterbank by warped filters (Karjalainen)

M. Karjalainen9

Gammatone filterbank

• Temporal and magnitude response of one channel

• Filterbank

M. Karjalainen10

Neural adaptation

• Neural adaptation model by Dau et al– Automatic gain control feedbacks

M. Karjalainen11

Temporal processing

• Adaptation, temporal integration, and masking model (Karjalainen)– Neural feedback model

– Adaptation (AGC)

– Loudness (level) computation

– Teporal masking effect

M. Karjalainen12

Responses

• Excitation, firing rate response, and loudness level response

M. Karjalainen13

Basilar membrane traveling wave model

• Principle of approximating basilar membrane traveling wave propagation

M. Karjalainen14

Meddis hair cell model

• Processing of neurotransmitter in the hair cell

M. Karjalainen15

Periodicity analysis (Meddis)

• Computation of sum autocorrelation function (SACF)

M. Karjalainen16

Periodicity analysis example

• Signal, filterbank responses, cochlegrams, sum autocorrelation for speech

M. Karjalainen17

Auditory spectrum vs. auditory formant spectrum

• Example of vowel /ä/ and fricative /s/

M. Karjalainen18

Auditory representation of speech

• Example of vowel transitions /...iaiai.../– Auditory spectrogram

– Auditory formant spectrogram

M. Karjalainen19

Applications of auditory modeling

• Audio coding– Psychoacoustic or perceptual models of masking

• Sound quality modeling– Modeling of perceived differences

– Criteria for audio reproduction

– Binaural audio quality

• Speech recognition– Advanced front-end models

• Advanced hearing aids– Cochlear implants

M. Karjalainen1

Chapter 10: Sound quality

• Effects of sound:– Physical effects (generally meaningless)– Physiological effects (hearing loss)– Information and knowledge effects (communication)– Esthetic and emotional effects (communication)

• Concept of quality in general:– Quality as contrast to quantity (categorical

dissimilarity)– Quality on scale low-Q vs. high-Q (measure of

preference)

• Speech intelligibility and quality• Sound quality of concert halls and auditoria• Sound quality in audio reproduction• Noise quality• Product sound quality

M. Karjalainen2

Evaluation and measurement of sound quality

• Sound quality is a fundamentally subjective (perceptual) conceptbut it can be approximated by objective and computational criteria

• Subjective quality can be evaluated by listening experiments, forexample:– Compare to ’perfect quality’ reference to find out if any degradation

can be noticed

– Compare two or more sounds and sort then by quality preference

– Characterize sound quality by conceptual description (such as notannoying, slightly annoying, annoying, very annoying)

– Give an overall quality rating on a numerical scale

– Give a rating for a specific quality factor (numerical scale)

– Give quality ratings for several different quality factors(multidimensional scaling)

• Based on subjective experimentation, a computational (objective)measure and model can be derived to simulate the perceived quality– Objective measures are less laborious and yield high repeatability

– It is important to check the validity range of a model

M. Karjalainen3

Development of sound quality models and theories

Theories and models in general

Computational models

Computational models with reference

M. Karjalainen4

Intelligibility and quality of speech

• Intelligibility of speech in general depends on:– the ability of a speaker to produce intelligible message and clear speech

– quality of speech transmission medium (acoustic or technical)

– the ability of a listener to analyze and conceive the message

• Technical concept of speech intelligibility:– related to the quality of transmission channel

– developed since 1920’s (Harvey Fletcher, Bell Labs)

• Articulation– score of correct recognition of phones and (nonsense) phone sequences

– articulation index is a measure that is additive from frequency bands(like loudness adds from critical band specific loudnesses)

• Speaker identification score– quality of channel to convey speaker identity

• Naturalness of speech– particularly in speech synthesis (and coding)

M. Karjalainen5

Speech quality: subjective measures and methods

• Articulation tests and articulation score– /CV/ or /CVC/ sequences used to measure recognition percentage

• Intelligibility test and intelligibility score– recognition percentage using meaningful words or sentences

• Rhyme tests (RT)– using ’rhyme’ words or syllables (in Finnish: /patti/, /tatti/, /katti/)

• Diagnostic rhyme tests (DRT)– modifying single distinctive feature at a time (nasality, voicing, etc.) in RT

• Speech interference tests (find a disturbing noise level of 50% articulation)

• Quality comparison method, including pairwise comparison methods– ordering of sound examples by overall or specific quality factor

• Mean opinion score (MOS)– overall rating on 1–5 scale

• Other methods– Indirect judgement tests (PARM, QUART)

– Communicability tests (communicate a drawing task, measure the difficulty)

– Task recall tests (memorizing ability)

– Analytic measures (multidimensional scaling)

M. Karjalainen6

Speech quality: objective measures and methods

• Articulation index (AI)– for measuring a (linear) speech transmission channel with additive noise

– articulation loss is assumed to be additive from 20 frequency band AIvalues

• Percentage articulation loss of consonants (%ALcons)– measure of speech intelligibility, can be estimated from acoustic

properties of a room

• Room acoustical indices, see below

• Speech transmission index (STI, RASTI)– based on modulation transfer function, see below

• Signal-to-noise ratio (SNR)– ratio of speech vs. noise (power) level (in dB)

– segmental SNR (SNRseg) based on short-time segmental SNRs

• Spectral distance measures (distance measures in the frequency domain)

• Auditory sound quality measures (based on auditory modeling)

• Other methods– weighted spectral slope distance

– LPC (linear prediction) distance measure

M. Karjalainen7

MOS (mean opinion score)

• A very popular technique to quantify overall quality in speechand audio

• Combines a quantitative scale and qualitative categorizations

• Three sorts of MOS measures used:– MOS = (direct) evaluation on 1–5 scale

– DMOS = degradation MOS (how much signal is degraded)

– CMOS = comparative MOS (typically scale -3...+3)

• Sometimes a scale of 1–10 by step of 0.1 is used instead

• Basic MOS scaling:

Very disturbingBad1

Disturbing but tolerablePoor2

Noticeable, slightly disturbingFair3

Just noticeable, not disturbingGood4

Not noticeableExcellent5

Degradation (DMOS)Quality (MOS)Rating

M. Karjalainen8

Modulation transfer function

• The auditory system analyzes signals by critical bands

• Each band is analyzed by signal level, i.e., modulationenvelope

• More important than the exact transfer function ismodulation transfer function, i.e., how signal modulations ineach critical band are transmitted

• The auditory system is most sensitive to modulations ofabout 4 Hz

• Modulation transfer is degraded by:

– Reverberation (lowpass of modulation)

– Background noise (reduction of relative modulation)

– These effects are multiplicative (cascaded)

• Modulation transfer function is a mathematically motivatedapproximation of auditorily relevant signal transfer analysis

M. Karjalainen9

Modulation transfer function (2)

M. Karjalainen10


M. Karjalainen11

Modulation transfer function (4): STI

• Total effect on modulation transfer function

• Apparent SNRapp vs. modulation reduction

• Speech transmission index (STI), for each band:

– STI = 1.0 for SNRapp 15 dB

– STI = 0.0 for SNRapp -15 dB

– otherwise STI = m, see also next figure

– (Weighted) average of SNRapp values of bands is computedand converted to total STI

M. Karjalainen12


M. Karjalainen13

STI vs. speech intelligibility

M. Karjalainen14

RASTI vs. STI

• RASTI = Rapid STI

• Partial evaluation offrequency bands &modulation bandsused

• Specific RASTIinstrument availablefor speech acousticsevaluation

M. Karjalainen15

Percentage articulation loss of consonants (%ALcons)

• Estimate of speech intelligibility

• %Alcons can be estimated

• where– r = distance of source and listener– RT = reverberation time– V = room volume– Q = directivity of a sound source– k = constant (for individual listener) = 1.5 ... 12.5 %

• %Alcons can also be estimated from roommeasurements

• %Alcons up to 25...30% can be tolerated inmeaningful speech due to informationredundancy

M. Karjalainen16

Sound quality in concert halls (and performing spaces)

• Esthetic effects very important– communication by esthetic and emotional factors

• ’Good acoustics’ depends on type of music– for example tempo, mixture of instruments (size of orchestra)

• Many factors to be taken into account– multidimensional scaling of quality needed

• Different proposed theories and models exist– no full agreement upon indices and factors of quality

• Visual factors also very prominent in concert halls– a concert is a multimodal experience to most listeners

• It is not only the audience but also the musicians– stage acoustics is important as well

• Theaters and other performing spaces– may require different acoustics

• Active (electroacoustically created or enhanced) acoustics– used increasingly except for classical acoustic music

M. Karjalainen17

Sound quality in concert halls: (1) subjective indices

• Intimacy or presence• Reverberation (subjective)• Spaciousness (apparent source width, listener envelopment)• Clarity (separation of sounds and sources)• Warmth (level and reverberation at low frequencies)• Loudness• Acoustic glare (walls should not reflect like mirrors)• Brilliance (due to long reverberation at high frequences)• Balance (how sound sources (instruments) are balanced)• Blend (how instruments are mixed harmonically)• Ensemble (how musicians can play together)• Immediacy of response (from the hall back to musicians)• Texture (how early reflections arrive to listeners)• Freedom from echo (discrete echoes are highly undesirable)• Dynamic range (useful range of playing levels)• Extraneous effects on tonal quality (no extra sounds desired)• Uniformity of sound (quality should be equal in all positions)

M. Karjalainen18

Sound quality in concert halls: (2) objective measures

• Loudness– Gmid (sound level at mid frequencies)

• Reverberation time– RT60 (decay time of 60 dB for full hall)– EDT (early decay time, 0–10 dB scaled to correspond to 60 dB)

• Clarity– Early vs. late energy ratio C80 (empty hall)

• Spaciousness– IACCearly (interaural cross-correlation, early)– LFearly (lateral energy fraction, early)

• Envelopment– IACClate and visual inspection of surface irregularity

• Intimacy– ITDG (initial time delay)

• Warmth– BR (base ratio, full hall)

• Stage support– Early energy (20-100 ms), sound source on the stage 1m from the

microphone

M. Karjalainen19

Objective sound quality in concert halls: definitions

• Interaural cross-correlation function IACFt( )

from pressure signals of left and right ears• Interaural cross-correlation, max of IACFt( )• Lateral energy fraction (LF or LEF)

• Gain factor (level vs 10 m free field distance level)

• Base ratio

• Stage support

M. Karjalainen20

Early vs. late ratios

Clearness

Centertime

M. Karjalainen21

Audio sound quality

• HiFi (High Fidelity) vs. professional reproduction

• Good quality is defined indirectly by loss ofdegradations

• Degradations & distortions:

– Linear distortion

– Nonlinear distortion

– Transient distortion

– Noise & quantization noise (SNR)

– Spatially poor reproduction

M. Karjalainen22

• Phase in audio reproduction

– Group delay differences of about 1 ms are noticed in extreme cases

• In high-Q-value cases even much lower differences

– Group delay differences of about 2 ms become noticeable in critical

listening (about 60 cm of propagation distance difference)

– 5-10 ms group delay differences may start to be disturbing

– Even 50-100 ms group delay errors may be tolerable sometimes

– In spatial sound perception (Chapter 8): precedence effect

• Perception of distortion

– Linear distortion = magnitude and phase distortion

– Nonlinear distortion = new spectral components are produced

Perception of audio reproduction

M. Karjalainen23

• Nonlinear distortion

– In a nonlinear system a sine wave generates harmonics:

– If total rms level is:

– Then harmonic distortion (HM):

– HM is not a particularly good measure from a perceptual point of view

– Low-order HM may improve perceived quality

– JND: 1% for 2nd, 0.3% for 3rd, 0.1-0.3% for 4th harmonic

Nonlinear distortion

M. Karjalainen24

• Other distortion mechanisms:

– Intermodulation distortion (IM)

• Sine waves, of f1 and f2 generate f1 – f2 , f1 + f1 etc.

• IM describes perceived distortion better than HM

– Transient intermodulation distortion (TIM)

• Distortion that is created in fast transients but not in steady

state signals

– Quantization noise in digital signal processing

• Perceived as distortion if correlation with signal

• Perceived as noise if not correlated

– Pre-echo in audio coding

• Temporal spreading of a signal in time ”backwards”

– Perceptual criteria needed in digital audio instead of simpledistortion and SNR measures

Audio distortion mechanisms

M. Karjalainen25

Perceptual (objective) sound quality models for audio

• Schroederet al.:

• Karjalainen:

M. Karjalainen26

PAQM (perceptual audio quality measure)

M. Karjalainen27

Product sound quality

• Minimize negative effects and maximize positiveeffects of product sound

• Examples:– Cars and work machines

– Home appliances

– Office equipment

– Personal devices

• Computational models of product sound quality

M. Karjalainen1

Chapter 11: Technical audiology

• How do we hear ? (discussed already)

• What if we don’t hear ?– Why don’t we hear? (mechanisms)

– How to measure ? (audiometry)

– How to improve hearing? (hearing aids)

• Technical devices:– Audiometric equipment

– Hearing aids

– Cochlear implants

M. Karjalainen2

Hearing degradation I

• Hearing disabled population– WHO: 270 million hearing disabled in the world (5 %)

– In Finland: ~740 000 with hearing degradation

14 000 new hearing device fittings per year

• Categories of handicap– Disease (sairaus)

– Impairment (vaurio)

– Disability (toimintavajavuus)

– Handicap (haitta)

• Hearing disorders: social classification– Hard-of-hearing persons (huonokuuloinen)

– Deafened persons (kuuroutunut)

– Deaf persons (kuuro)

M. Karjalainen3

Hearing degradation II

• Medical classification of hearing impairments– Conductive hearing loss (äänen johtumisvika)

• External and middle ear problems

• Attenuation of loudness

– Sensorineural hearing loss• Inner ear and retrocochlear problems

• Attenuation or recruitment

• Tinnitus

– Central hearing loss• Higher neural levels

• Problems in sound separation or speech analysis

• Problems in localization (spatial separation)

• Tinnitus

– Psychic hearing problems• No clear physiological reason

M. Karjalainen4

Hearing threshold change

M. Karjalainen5

Audiometry

Audiometer and calibrated headphones

M. Karjalainen6

Audiogram behavior

Loud noise effect

(impulse noise)

Effect of age

(presbyacusis)

M. Karjalainen7

Degrees of hearing impairment

• Measure of hearing degradation– Average of threshold values at

500, 1000, 2000, 4000 Hz

M. Karjalainen8

Other hearing impairment problems

• Other effects of impairment– Sound separation problems, particularly in

noise and reverberation

– Speech communication problems

– Tinnitus• Source at different levels

• No good treatment known

• Often like sinusoidal tone,

but can be hum, broadband noise,

pulsation, etc.

M. Karjalainen9

Ear drum impedance measurement

M. Karjalainen10

Noise and causes of hearing loss

• Noise measurement– A-weighted equivalent level

– 85 dB long-term daily exposure limit

• Other factors:– Vibration

– Smoking

– Drugs

– Deseases

– Genetic effects

– Combined = often more than their sum

M. Karjalainen11

Inner ear damage

Inner hair cell

damage

Outer hair cell

partial damage

Outer hair cell

full damage

M. Karjalainen12

Temporary threshold shift

M. Karjalainen13

Hearing protectors

Ear plugs Ear muffsAttenuation

M. Karjalainen14

Hearing aid types

M. Karjalainen15

Hearing aid response

Typical frequency response

of a traditional hearing aid

Multichannel digital hearing aids:

- each frequency channel programmed separately

M. Karjalainen16

Hearing aid gain control

Linear gain + limiter Automatic gain control

M. Karjalainen17

Hearing aid AGC control

Feedback control Feedforward control

M. Karjalainen18

Hearing aid output waveforms

M. Karjalainen19

Other issues in hearing aids

• Directional microphones

• Binaural processing

• Noise cancellation

• Wind noise cancellation

• Feedback cancellation

• Speech enhancement

M. Karjalainen20

Cochlear implants

• Electronic stimulation of auditory nerve

M. Karjalainen21

Cochlear implants II

• ~100 000 units fitted worldwide

• For deafened adults and deaf-born children

• Price about 50 000 $ in USA

• Multielectrode devices nowadays– (e.g. 24 channels)

– Speech from microphone is divided to channels

– Inductive coupling through skin

– Multielectrode in the cochlea

– Different pulse modulations used

Communication Acoustics Karjalainen

Documents

Transcript of Communication Acoustics Karjalainen