The Physics and Psycho-Acoustics of Surround Recording Part 2 David Griesinger Lexicon...

The Physics and Psycho-Acoustics of Surround Recording Part 2

David GriesingerLexicon

[email protected]/~griesngr

http://www.world.std.com/~griesngr

Introduction• We all know how to make a good recording

– We need good music– A very good performance– And satisfactory balance between the solos and the instruments.

• But we want to make a great recording– How do we do it?– How do we know when a recording is great?

• We must learn how to hear the technical quality of a great recording,– And learn how to achieve the best result.

• The talk is based on classical music – but the techniques and perceptions apply to all recordings.

The recording space is very important!

• It is much easier to achieve a great result in a large hall.– But large halls with great acoustics are rare.– Our job is to make a great result in the hall we have available (usually

small).• This talk will tell you how to do it.

– And help you hear the difference.• We will not talk about issues such as instrumental balance

– or the differences between microphones or sample rates.– We will talk about basic sound properties:

• The clarity and localization of the direct sound• The perceived distance between the sound source and the listener (depth)• The recording and reproduction of the sound of the hall.

Major Goals• To review the physical and psychoacoustic properties that make a great

recording (or a great performance space).

– The clarity of the direct sound (the absence of muddiness)

– The creation of a large listening area and a stable front image – using three front speakers in a 5.1 recording.

– The blending together of the different instruments into a whole acoustic scene through early reflections.

– The re-creation of the acoustic space of the performance, through late reflections and envelopment.

• To show how muddiness occurs when there are too many early reflections• To show how we perceive muddiness through our perception of pitch. • To show how the loudspeaker positions in the playback room influences

the envelopment at low frequencies.• To play as many musical examples as possible!

Localization – a stable front image over a large listening area

• In a high-quality recording the front image does not greatly change when a listener moves away from the sweet spot.– Image stability requires using the center channel speaker in

a 5.1 recording.– Even without the center speaker some two channel

recordings are more stable than others.– Popular music recordings are often better than classical

recordings in image stablilty.• The secret is Amplitude Panning

– Which is almost universally used in popular music recording.

Time delay panning

• Many engineers attempt to record a broad sound source with closely spaced microphones– Omni microphones are often used in a so-called “Decca

Tree”.– Cardioid microphones are often used in the “ORTF”

configuration• Both these techniques rely on time delay differences

to spread the front image– Time delay spreading only works when the listener is in the

sweet spot.

– The front image is not stable over a large area.

Training to hear localization• The importance of ignoring the sweet spot

– Most research tests of localization use a single listener, who is strictly restricted to the sweet spot.

– Your customers will not listen this way!

• How do you know if the recording has a stable front image?– Move laterally in front of the loudspeakers. Does the sound image stay wide and

fixed to the loudspeakers, or does it follow you?– Do the soloists in the center follow you left or right? If they do they are recorded

with too much phantom center.

• Since most 5 channel recording methods are derived from stereo techniques almost all have too much phantom center.

• A center image that follows a listener who moves laterally out of the sweet spot is the most common failing of even the best five channel recordings.

» Play examples

Example: Time delay panning outside the sweet spot.

Record the orchestra with a “Decca Tree” - three omni microphones separated by one meter. A source on the left will be picked up with equal level in all three microphones. The time delays will be different by +-3ms.

On playback, a listener on the far right will hear this instrument coming from the right loudspeaker. This listener will hear every instrument coming from the right.

Amplitude panning outside the sweet spot.

If you record with three widely spaced microphones, an instrument on the left will have high amplitude in the left microphone. The time delay will also be much shorter.

A listener on the far right will hear the instrument on the left. Now the orchestra spreads out across the entire loudspeaker basis, even when the listener is not in the sweet spot.

WARNING!!!

• In the author’s experience a front image that is not stable when you walk in front of the speakers will never make a great recording. – regardless of how beautiful it is in the sweet

spot.

• This is my FIRST test of a recording, either two channel or surround.

Summary of acoustic perceptions in a recording

• 1. Clarity – the lack of muddiness– Clarity is perceived through the direct sound – sound that travels

directly from the instrument to the microphone.• A clear direct sound requires that the microphone be relatively close to the

instrument!

• 2. Blend and depth– Blend and depth are perceived through early reflections that arrive

from all around the listener.• The total energy in these early reflections must be less than the energy in

the direct sound!• In a surround recording these reflections should come equally from all the

loudspeakers (except the center,) and they must be decorrelated. (different)

• 3. Envelopment (reverberation)– Envelopment is perceived through late reflected energy that arrives

from all around the listener. (Not just from the rear!)• The energy must be decorrelated in each loudspeaker

Clarity

• Clarity to an acoustician is determined through intelligibility – the ability to understand speech or a musical line.

• For this talk I will use a different meaning:– For me clarity is the perception that the sound

source is acoustically close to the listener.– While this definition may seem vague, almost

everyone agrees on the optimal acoustic distance for a recorded sound source.

– We can demonstrate this perception:

Muddiness: Dry Speech + 40ms reflections

Mono speech: The sound is clear, but much too close to the loudspeaker.

Speech with ~40ms allpass reflections and no direct sound.

Mono:

Stereo:

Note both the mono and the stereo version sound muddy and distant.

There is no phantom image in the stereo version.

Reflections used in these experiments

The reflections used in these experiments form a decaying burst which peaks about 25ms after the direct sound, and has largely decayed away by 50ms.

The reflections are different in the two channels, and have a flat frequency response.

Optimum level for Early Reflections

• Recorded sound consists of a mix of direct sound and reflections– Too many reflections and muddiness results.– But reflections add a sense of blend and depth.– An optimum mix must be found.

• The optimum level for early reflections is -4 to -6dB relative to the direct sound.– This level is preferred by almost every listener.

• In a surround recording the reflections should come equally from all directions (except the center), and be decorrelated.

• The perceived result is independent of the precise delay time and the pattern of the reflections.– It is the total energy which determines the perception.

Depth without Muddiness• Dry speech

– Note the sound is uncomfortably close

• Mix of dry with early reflections at -5dB.– The mix has distance (depth), and is not muddy!– Note there is no apparent reverberation, just depth.

• Same but with the reflections delayed 20ms at -5dB.– Note also that with the additional delay the reflections begin to be heard as discrete

echos.• But the apparent distance remains the same.

• Same but with the reflections delayed 50ms at -3dB– Now the sound is becoming garbled. These reflections are undesirable!– If the speech were faster it would be difficult to understand.

• Same but with reflections delayed 150ms at -12dB– I also added a few reflections between 20 and 80ms at a level of -8dB to smooth the decay.– Note the strong hall sense, and the lack of muddiness.

The ideal mix• We see from the previous slide that the ideal acoustic mix has

three independent perceptual requirements:

– 1. The direct sound dominates the total energy by at least 4dB.

– 2. There are early reflections that add blend, distance, and depth to the sound.

• These should come equally from all directions in a surround recording• And they should avoid adding energy in the 50ms to 100ms time region.

– 3. There should be reflections (reverberation) with time delays greater than 150ms to provide the impression of the hall.

• To make a great recording we must separately capture all three!

Direction of early reflections

• It is not possible to detect whether the reflections come from the front or the rear when they arrive between 20ms and 50ms after the end of a sound.

• But it is more natural if they come from both front and rear.• Using all four speakers also results in the largest sweet spot - demo

Muddiness is hard to avoid in small spaces!

• We are attempting to show that the optimum total energy for all reflections is at least 4dB less than the direct sound.

• The total reflected energy sum does not include the floor reflection. – I will explain why later if there is time.

• The direct sound must dominate the total sound picture– The reverberation radius of a small hall or church is usually below 2m,

and may be as low as 1m.– Every microphone used in the recording picks up both direct sound and

reverberation.• But only the microphone closest to the sound source picks up true direct

sound.• Direct sound into all the other microphones is perceived as a reflection,

and adds to the potential distance and muddiness.

Muddiness also comes from the playback room!

In this room there is no absorption in the front, and thus the reverberation radius is small, perhaps as low as 2.5m.

The distance from the front loudspeakers to the listeners is greater than the reverberation radius.

So the reverberation will be stronger than the direct sound.

We are trying to keep the direct sound stronger than the reflections by 4dB.

This goal is probably not possible to achieve in this room! (Except at frequencies above 1000Hz, where the side curtains begin to be absorptive.)

Always mix your recordings in an absorbent space!

Boston Cantata Singers Cantata #76Die Himmel erzahlen die Ehre Gottes Performance in

Jordan Hall, January 23, 2004. Reverberation time in Jordan ~1.4 seconds at 1000Hz. This is similar to the Semperoper Dresden.

The typical audience member is ~ 3 reverb radii from this singer.

The dramatic consequences are highly audible.

Although Jordan is beloved as a chamber music hall, the stage house is deep and reverberant. When the hall is full, the sound in the audience can be dry and muddy.

The recording engineer must overcome these obstacles.

Cantata Singers Bach BWV 76

Multimiked recording. Note the clarity of vocal timbre (low sonic distance).

Recording simulating the sound in the hall. Note the timbre coloration and the sense of distance to the performers.

With the picture and after adaptation the performance is quite enjoyable.

The Ideal Reverberation

– has 20ms to 50ms reflections with a total energy -4dB to -6dB– has relatively little energy from 50 to 150ms.

– Have exponential decay– If we pick up enough late reflections to hear the hall, we will

get too many early reflections.• We will get coloration and poor intelligibility.

Most small rooms – (including playback rooms)

Example of as small recording space: Swedenborg Chapel, Cambridge

Oriana Consort in Swedenborg Chapel

Oriana Setup

Recording in Sweedenborg Chapel, Cambridge

• The chapel holds perhaps 200 people, but when it is empty the RT is ~ 1.8 seconds.– And the reverberation radius is ~ 1.5m

• The picture shows four supercardioid microphones about 1m from the chorus. These provide the direct sound.– With the supercardioid pattern we have a 6dB direct/reverberant ratio,

so the reverberation is less than the direct sound by about 6dB.– Note that in this space we must add hall sound and early reflections

very carefully, or the sound will become muddy!– In addition the early reflections and reverberation arrive soon after the

direct sound. The sound seems small and cramped. There is no sense of space around the direct sound.

• The chorus microphones are as close as they can be to the chorus without creating balance problems.– We cannot exclude the early reverberation by moving the mikes closer.

Main microphones in Sweedenborg Chapel

• The picture also shows two variable pattern microphones about 2m from the chorus.– I put these there for an experiment. The sound is not very

good…

• The problem with a “main microphone” pair in this space is that it must be placed too far from the singers!

• A main pair must be at least 2.5m away or there will be balance problems.– This distance is beyond the reverberation radius, and the

sound will be muddy.

Hall Sound in Sweedenborg

• The chapel is reverberant – with a high reverberation level

• But the reverberation is too strong in the 10-150ms time range.

• Using cardioid microphones pointing away from the sound source reduces the early reverberation energy and maximizes the late energy.

• The hall sounds larger and better.

Distance Perception and MUD

• Reflections during the sound event and up to 150ms after it ends create the perception of distance

• But there is a price to pay:– Reflections from 10-50ms do not impair intelligibility.

• The fluctuations they produce are perceived as an acoustic “halo” or “air”around the original sound stream. (ESI)

– Reflections from 50-150ms contribute to the perception of distance – but they degrade both timbre and intelligibility, producing the perception of sonic MUD.

• We will have many examples of mud in this talk!

Training to hear MUD

• Mud occurs when the reverberant decay of the recording venue has too much reflected energy in the 10-150ms region of the decay curve.

– This is true of nearly all sound stages, small auditoria, and churches.

• If you are recording in such a space with a relatively large ensemble, you are in trouble.

• The perception of mud can be tricky, because our hearing mechanism adapts to a muddy environment, and the sonic degradation becomes inaudible after about 10 minutes.

– It is easy to convince yourself the recording is excellent when you have been listening to it all day.

• This is why we can enjoy a concert even when we are sitting far from the instruments.

– You MUST compare your recording to a reference recording in a short time A/B test.

Example: John Eargle at Skywalker ranch

• John Eargle has made wonderful recordings, particularly those with the Dallas Symphony on Delos Records– But even he can be fooled by a small space– As I said, you adapt quickly to such a space, and no longer hear the

mud that it produces.• John Eargle recently made a 5.1 channel DVD audio recording

at the Skywalker ranch in Los Angeles.– He was very excited by it – but listen and compare to Dallas.

• Skywalker is a large sound stage with controllable acoustics. It is not a concert hall.

• As a consequence the reverberation radius is relatively short. By my estimate (without having seen it) the radius is less than 3.5 meters.

• It is very easy to record mud in such a space.– Many instruments are beyond the reverb radius.– Adding more microphones only increases the reverberant pickup.

Recording in a large space is much easier!

Covenant church is a very large space, holding more than 1000 people. It is damped by pew cushions and acoustic treatment on the walls, yielding a RT of 2.5 seconds and a large reverberation radius – probably above 3m.

The microphones can be quite distant without picking up early reflections or reverberation. It is a very good place to record! (And it is exceptionally beautiful visually…)

Example – depth perspective through mike technique:

Direct sound:

Early reflection:

Late reverberation:

Direct + Early -5dB:

Direct + Early + Late -8dB:

• When the reverberation radius is large enough we can use an extra pair of microphones to create a single early reflection.– This can provide the needed perspective and depth

Mike 480L

Mike 480L

The depth impression is greatly improved in surround

• I will run the same experiment, but use all five speakers.

• The early reflections will come from both the front and rear equally, but different delay patterns will be used for each speaker.– This means the reflections are decorrelated.

• The late (hall) reflections will also come equally but decorrelated in the front and rear speakers.– This will create a large and uniform sweet spot for the acoustics.

The Polyhymnia Pentangle• The Polyhymnia engineers employ a surround array of spaced omni microphones, at a

spacing similar to the ITU playback array.

• The technique works well in spaces where the reverberation radius is equal to or greater than the microphone spacing!

• In this case the direct sound picked up by the rear microphones is perceived as an early lateral reflection and the adds distance to the front image.

• Caution!! In a small hall this array will be TOO MUDDY!!!

In practice the Polyhemnia engineers often pick up the direct sound with accent microphones.

In this case the front microphones provide a first reflection to the front speakers.

The center microphone is also often moved closer to the sound sources, so it picks up mostly direct sound.

Boston Symphony Hall

Boston Symphony Hall

• 2631 seats, 662,000ft^3, 18700m^3, RT 1.9s– It’s enormous!– One of the greatest concert halls in the world – maybe the

best.– Recording here is almost too easy! – Working here is a rare privilege

• Sufficiently rare I do not do it. (It’s a union shop.)

– The recording in this talk is courtesy of Alan McClellan of WGBH Boston. (Mixed from 16 tracks by the presenter)

– Reverb Radius is >20’ (>6.6m) even on stage.– The stage house is enormous and NOT reverberant. With

the orchestra in place, stage house RT = ~1 sec

Boston Symphony Hall, occupied, stage to front of balcony, 1000Hz

This picture compares favorably to our picture of the ideal reverberation on a recording. But this is what an audience member hears 100 feet from the stage!

Boston Symphony Orchestra in Symphony Hall

Boston Cantata Singers in Symphony Hall. March 17, 2002

Microphone Array (WGBH)

Beware the “main microphone” array

• Nearly all engineers will provide a “main microphone” usually a “Decca Tree”, or a pair of omni or cardioid microphones.

• Almost always the sound from this array is only acceptable for instruments close to the microphones.

– Most of the instruments are far beyond the reverberation radius.– The more distant instruments must be spot-miked.

• A cardioid pair (ORTF) has too much phantom center for an acceptable surround recording. (this is a two-channel technique only.)

• Very frequently time delay panning (for a Decca Tree or spaced omnis) makes the sound unusable in a high-quality mix.

– Time delay panning makes the front image unstable– Closely spaced microphones yield high correlation at low frequencies, which

degrades the sense of space.

• It is better to simply turn off the main microphone (even if your instructor insists you install one.)

• In our Boston Symphony Hall recording a pair of B&K omnis spaced ~25cm was hung behind the conductor by the WGBH engineer.

Front pair Front pair LF

Correlation in the “main microphone:” two omnis spaced by ~25cm, just behind the conductor.

___ = measured correlation; - - - = calculated, assuming d=25cm

The high correlation in this pair makes the sound unusable in a stereo or surround mix. It sounds unpleasant even in this lecture room, as the audio demo makes clear.

Beware the exclusive use of spaced front microphones

• In our recording the wide front orchestra pick-up is fine for the first row of the strings.

• But nearly all the orchestra is beyond the reverberation radius for these microphones.

• If we want good balance and clarity, we must use additional microphones over the orchestra– And treat these microphones as part of our “main” array.

• Using cardioid microphones in front will help a lot.– The cardioid is 4.7 dB less sensitive to reverberation, which will pick out

more distant instruments with clarity.

• Using super cardioid microphones will help a little bit more.– But if the stage house is reverberant the improvement is minimal.

• The author greatly prefers to use (equalized) directional microphones for orchestra and chorus pick-up.– After equalization the bass performance is adequate.– There is better control of leakage, and less MUD.

Balance and distance come first

• In any recording the balance between the musical forces should reflect the needs of the music.

• In this recording, even with 120 singers the chorus is nearly inaudible in the hall.– So we must heavily use the chorus accent microphones.

– In the final mix MOST of the energy in the recording will come from these. In practice, these are our MAIN microphones!

• However, if we heavily use the chorus microphones, the chorus will sound too close to the loudspeakers– And in front of the orchestra.

• To correct this distance problem we MUST use electronic early reflections.– There is no other possible solution.

» Play example

Let’s build the hall sound

• We need decorrelated reverberation in both the front and the rear with equal level

• Test just the hall microphones to see if the reverberation is enveloping and uniform.

• Then add the front microphones for the direct sound.– Where the hall balance is not correct you MUST augment the

natural reverberation with electronics.

• In this recording the orchestra is much stronger than the chorus – even with 120 singers – and there is too little chorus in the natural reverberation!!– When we add the accent microphones the chorus will sound as if

they are in a smaller space.

• So we add electronic reverberation from the chorus (equally in all four outer speakers) from the surround reverberator.

Final Mix• The final mix uses the three omni microphones over the

chorus as the main microphones. They are simply patched to left, center, and right.

• The spot microphones for the soloists are mostly mixed to the center, with some panning to the left or right. (No divergence was used.)

• The orchestra is a combination of two wide spaced omnis patched to left front and right front.– Augmented by spot microphones over the woodwinds and the

more distant strings.– the center channel was provided automatically through leakage

from the soloists’s microphones.

• The rear channels come from a widely spaced pair of omnis about 20 feet behind the conductor,– Extensively augmented by electronic early reflections and late

reverberation.

Hall sound: decorrelation at low frequencies.

• It is widely believed that localization is impossible below 100Hz.

• So a single subwoofer has become the standard for reproducing low frequencies.

• Although localization below 100Hz is difficult in a small room, there is a large difference between a single subwoofer and an independently driven pair.– We have turned off the subwoofer in this room and we are running the

other speakers full-range.

• A great recording will easily demonstrate the difference between a single subwoofer and full-range discrete speakers.

• As a consequence you must be sure the hall sound in your recordings is decorrelated at low frequencies!– Both in the front and in the rear of a surround recording.– Most single microphone array surround techniques fail for this reason.

Conclusions

• A great recording:– Has a stable front image over a large listening area.– Has direct sound stronger than early reflections,

microphone leakage, and reverberation.• So it is not MUDDY!

– Has decorrelated early reflections both in the front speakers and in the rear speakers.

• These provide a sense of blend and depth to the recording. But be sure to mix in an absorbent space!

– Has decorrelated late reverberation in both the front and the back speakers.

• The decorrelation must be active for low frequencies– It is possible to make a great recording in a small space

• But if the group is physically larger than the reverberation radius, electronic early reflections and reverberation will probably be necessary.

Medial Reflections – the detection of muddiness.

• Medial reflections can cause clear differences in quality.• We can measure medial energy through an analysis of

pitch.• Pitch information is available in each critical band, even

those above the frequency of auditory phase-locking.• Here is an example of speech filtered into a 1000Hz 1/3

octave band.

The waveform appears to be a series of decaying tone bursts, repeating at the fundamental frequency.

When this signal is rectified, there is substantial energy at the fundamental frequency.

Waveform of speech formants

The waveform of the word “five” in the 2kHz 1/3 octave band.

The same, but convolved with a 20ms windowed burst of white noise, simulating a diffuse reflection, or the sound of a small reverberant room.

Non-reverberant speech has a clear repeating pattern in the waveform. Reverberant speech does not. We can devise a measurement system around this difference.

The plus/minus pitch detector

The pitch detector operates separately on each third octave band. Each band is rectified and low-pass filtered. The output is delayed, and then added and subtracted from the undelayed signal. The logs of the “plus” signal and the “minus” signal are then subtracted from each other. The result has a high sensitivity to fundamental pitch.

Example – “one, two” 2500Hz 1/3 octave band.

Pitch detector output with dry speech – the syllables “one, two” with no added reverberation. Note the high accuracy of the fundamental extraction and the >15dB S/N

Same – but convolved with 20ms of white noise

Convolving with white noise does not change the intelligibility, nor the C80, but dramatically changes the sound – and the pitch coherence. By chance the second syllable is not seriously degraded, but the first one is – at least in this 1/3 octave band

The sound quality is markedly degraded. We need a measure for this perception.

“one,two” 2500Hz band – equal mix of direct and one diffuse reflection at 30ms.

The high pitch coherence and high direct/reverberant ratio in the first 30ms is easily seen at the start of each syllable.

Segment of opera – old Bolshoi

Segment of Verdi – pitch coherence of the 2500Hz 1/3 octave band. F, F, glide to A. Recording from the back of the first balcony. There is no obvious gap before reflections arrive, and the pitch coherence appears relatively high.

Segment from the old Bolshoi

Segment from the new Bolshoi. (I was unable to produce a similar plot.)

Sound examples – syllables “one,two,three” with no

reverberation

1kHz 1/3 octave band 1.25kHz 1.6kHz

2kHz 2.5kHz 3.2kHz

Note the height and frequency of the pitch coherence peaks are (almost) uniform through all bands.

Maximum pitch coherence vs 1/3 octave bandfor non-reverberant speech

The syllables “one two three four five six seven” are analyzed.

Note that the maximum pitch coherence is relatively constant across all 1/3 octave bands, although the value depends on the particular vowel

“one,two,three” convolved with 20ms noise

1kHz 1.25kHz 1.6kHz

2kHz 2.5kHz 3.2kHz

Note that most of the pitch coherence has been eliminated

Maximum pitch coherence vs /3 octave bandsfor speech convolved with 20ms noise.

The syllables “one two three four five six seven” are analyzed.

Note the pitch coherence is low and not constant across third octave bands.

Pitch coherence of speech with a diffuse reflection at a level of 0dB

1kHz 1.25kHz 1.6kHz

2kHz 2.5kHz

Note the low pitch coherence for some of the syllables in several bands

Maximum pitch coherence vs 1/3 octave bands for direct + reverb at 0dB

Analysis of the syllables “one two three four five six seven.”

Note the low and noise-like coherence for most of the syllables.

Pitch coherence of speech with a diffuse reflection at a level of -4dB (optimum)

1kHz 1.25kHz 1.6kHz

2kHz 2.5kHz 3.2kHz

Note the high pitch coherence on most syllables in most bands. This reflection level is usually chosen as optimum.

Max pitch coherence vs 1/3 octave band for direct and reflected at -4dB

Analysis of the syllables “one two three four five six seven.”

Note the pitch coherence is both high and uniform across 1/3 octave bands

Teatro Alla Scala, MilanEchograms from LaScala. (From Hidaka and Beranek) illustrate these profiles:

Top curve - 2kHz octave band, 0-200ms

At 2kHz note the high direct sound and low level of reflections in the 50-150ms time range.

Bottom curve - 500Hz octave band 0-200ms

Note the high reverberation level – and short critical distance.

Let’s listen to Alla Scala!

• Matlab can be used to read these printed impulse responses and convert them into real impulse responses.– 1. First we read the .bmp file from a scan, and convert the peaks in the file

to delta functions with identical time delay, and an amplitude equivalent to the peak height.

• All the direct sound energy is combined into a single delta function, and the level of the direct sound is normalized (relative to the rest of the decay), so the 2kHz and 500kHz impulses can be accurately combined.

– 2. We then apply a random variable ~+- 5ms to the delay time to correct for the quantization in the scan.

– 3. We then extend the echogram to higher times by tacking on an exponentially decaying segment of white noise, with a decay rate equal to the published data for the hall.

– 4. We then filter the result for the 2kHz echogram with a 1k high-pass filter, and combine it with the 500Hz echogram low-pass filtered at 1kHz.

– 5. If desired we can create a “right channel” and a “left channel” reverberation by using a different set of random variables in steps 2 and 3.

– 6. We convolve a segment of dry sound with the new impulse response.– The result is sonically quite convincing!

Alla Scala at 500Hz – reading the plot

Top curve – 500Hz measured impulse response as given by Beranek. JASA Vol. 107 #1, Jan 2000, pp 356-367

Bottom curve – impulse response as regenerated from delta functions, passed through a 500Hz 6th order 1 octave filter.

Note the correspondence is more than plausible.

Alla Scala 500Hz – randomizing and extending

Top graph: Alla Scala published data

Bottom graph: regenerated impulse response after randomization and extension.

Pitch coherence of speech in La Scalla

1kHz 1.25Hz 1.6kHz

2kHz 2.5kHz 3.2kHz

Note the excellent sharpness of the pitch peaks, and good consistency across bands.

Maximum coherence vs 1/3 octave bands La Scala, Milan

Pitch coherence is similar to our example where the direct/reverberant ratio ~=4dB

While not as clear as in some examples, fundamental pitch is easily extracted using this simple detector.

Listen to Alla Scala, NNT Tokyo, Semperoper

2kHz and 500Hz Impulse responses from Scala Milan

NNT Theater Tokyo

Semper Oper Dresden

(All data from Hidaka and Beranek)Original Sound

2kHz 500Hz

Pitch Coherence – NNT opera house, Tokyo

1kHz 1.25kHz 1.6kHz

2kHz 2.5kHz 3.2kHz

Note the peaks – where they exist – are very broad, indicating inexact pitch extraction. For most bands, there is no extracted pitch for all syllables.

Maximum coherence vs 1/3 octave band NNT Opera Theater, Tokyo

Fundamental pitch is not extractable using this simple detector.

Binaural Examples in Opera HousesIt is very difficult to study opera acoustics, as the

sound changes drastically depending on:

1. the set design,

2. the position of the singers (actors),

3. the presence of the audience, and

4. the presence of the orchestra.

Binaural recordings made during performances give us the only clues.

Here is a sound bite from a famous German opera house: Note the excessive distance of the singers, and the low intelligibility. This is MUD in action!

And here is an example from another famous German opera house: Note the increase in intelligibility, reduced distance, and the improvement in dramatic connection between the singer and the audience.

Synthetic Opera House Study• We can use MC12 Logic 7 to separate the orchestra from the singers on

commercial recordings, and test different theories of balance and reverberation.

• From Elektra – Barenboim. Balance in original is OK by Barenboim.

Original

Orchestra Left&Right

Vocals

Downmix - No reverb on the singers

Reverb from orchestra

Reveb from singers

Downmix with reverb on the singers.

Muddiness: Dry Speech + 20ms noise

Mono speech signal

Convolved with noise – (diffuse reflections)

Mono:

Stereo:

Note the reflections increase muddiness and distance.

The stereo version is more natural than the mono, but equally distant.

Recorded speech in CovenantVoice segment recorded at 1.5m with a supercardioid mike

The same segment with the reflections below.

Note the muddiness increases dramatically

A frequency-flat reflection pattern with peak energy about 30ms after the direct sound

Demo 1: Clarity

• Demonstrate dry sound• Demonstrate muddy sound by adding reflections in

monaural.– Note that adding only very early reflections does not

decrease the intelligibility.– But it increases the perceived distance of the source.

• Demonstrate adding reflections in surround– Note that adding the reflections in surround increases the

perceived distance more effectively.– Less reflected energy is needed, and the direct sound

remains clear.• The optimum early energy is between -4dB and -6dB

The Physics and Psycho-Acoustics of Surround Recording Part 2 David Griesinger Lexicon...

Documents

Transcript of The Physics and Psycho-Acoustics of Surround Recording Part 2 David Griesinger Lexicon...