Digital Audio Effects and Simulations

420 J. Audio Eng. Soc., Vol. 58, No. 5, 2010 May

IIt is increasingly possible either toemulate legacy audio devices andeffects or to create new ones using

digital signal processing. Often theseare implemented as plug-ins to digitalaudio workstation packages, using oneof the proprietary systems such asVST, DirectX, Audio Suite, or AudioUnits. Papers from a session chairedby David Berners at the AES 127thConvention last year in New York cov-ered a wide range of recent innovationsin this field, including Leslie speaker,plate reverb, and guitar amplifier emu-lation.

ROLL ME A LESLIE SPEAKERInvented by Don Leslie in 1949, theclassic Leslie speaker has created aneffect that defines the sound of certainbands, having been used to add a char-acteristic tremolo or chorus effect tothe Hammond organ for many years.The whole process is achievedmechanically, which makes for aninteresting challenge to emulate it inthe digital domain. The effect resultsfrom rotating loudspeaker elementsthat have amplitude-, spectrum-, andphase-modulating influences on theaudio radiated by the system. A smallamount of varying Doppler shift is alsosaid to be introduced by the system.While the mid-range horns actuallyrotate, the bass loudspeaker remainsphysically static, and a rotating baffleis used to modulate the acoustic out-put. A picture of a Leslie 44W modelis shown in Fig. 1, in which the loud-speaker and baffle elements can clearlybe seen. The two rotating elements goaround at different rates to create amore interesting result, and there is aslow chorale effect as well as a fasttremolo effect.

In “Discrete Time Emulation of theLeslie Speaker” (AES 127th paper7925), Herrera et al. describe anattempt to emulate the classic designusing a novel method. In a previousattempt at emulation, they included themodeling of direct sound and time-varying first-order reflections from the

inside cabinet walls, aspicked up by two micro-phones a short distanceaway (see Fig. 2). Virtualsource images of thereflections were added tothe direct sound usingfiltered versions passedthrough a delay line, fordifferent rotational statesof the horn. While thiswas computationally effi-cient, it did not necessar-ily capture all therequired features of thesound. In the currentattempt, impulse respon-ses picked up from the horn and bafflearrangement by a spaced pair of B&K4006 microphones were captured fornumerous rotation angles, using smallmanual rotations of the loudspeakersystem. A sine-sweep method was usedduring this process. Rotation incre-

ments of about 0.9 degrees were usedfor the horn and 2.7 degrees for thebass baffle. As can be seen from Fig. 3,a typical impulse response contains adense tail of reflections. The signal wassplit into two frequency regions using acrossover, so that the bass and mid-

By Francis RumseyStaff Technical Writer

Digital Audio Effects and Simulations

Fig. 1. The Leslie 44W loudspeaker system (Figs. 1–3 courtesy Herrera et al.)

Fig. 2. Picking up first-order reflections from the Lesliehorn–cabinet combination using a pair ofmicrophones.

➥


range components could be processedseparately. The acceleration and decel-eration dynamics of the rotating loud-speaker were carefully measured andimplemented using separate filters inorder to be able to emulate the effectsof speeding up and slowing down, suchas during the transition between chorusand tremolo modes.

Three different FIR filtering tech-niques used for processing the relevantimpulse responses were compared. Thefirst used a scaled and interpolatedversion of the input signal based onmeasurements made at numerous rota-tion angles, which was then added tothe output. The second worked bycycling through taps in an FIR filterthat accorded to the relevant rotationangle. The third worked in a similarway to an HRTF filtering techniqueused in binaural processing, wherebyone of a small number of fixed filterswas selected depending on the rotationangle, with a form of weighted mixing(or panning) between the adjacentfilter outputs according to the rotationangle. The authors found that approxi-mately 20 filters were sufficient to beable to implement convincing repro-ductions of the original impulseresponses. (This last method is said tobe very efficient in terms of memoryrequirements and can render multipleinputs at different angles.)

During testing it was found that thefirst two time-varying filter methods

produced a low-level “hashy” sound atrotation speeds above 6 Hz, which wasthought to be due to spatial aliasingand linear interpolation between irreg-ularly distributed impulse responses.By projecting the impulse responsesonto a sinusoidal rotation pattern, thenrepresenting them as a set ofuniformly spaced (in angle) fixedfilters with spatially band-limitedresponses, the authors were able toeliminate this unpleasant effect even athigh rotation rates.

A TRIODE GUITAR AMPIn “Simulation of a Guitar AmplifierStage for Several Triode Models:Examination of Some Relevant Phe-nomena and Choice of AdaptedNumerical Schemes” (AES 127thpaper 7929), Cohen and Hélie lookinto the matter of simulating the high-gain triode stage in a guitar preampli-fier, in particular considering the non-linear distortions that can give avacuum tube-based guitar amp itsunique sound. They choose to investi-gate the commonly used 12AX7 triodevalve, also known as the ECC83,which has certain saturation distortioncharacteristics that are said to be val-ued by musicians and audiophilesalike.

Mathematical models aiming tosimulate static triode behavior weretried and compared with those foundon available datasheets for the valve in

question. A simulation model byNorman Koren was found to be moreaccurate than one by Leach, as itmodeled the triode behavior better forhigh grid and plate voltages, as well aslimiting the behavior of the model tomore realistic values in the case ofextreme grid voltages. A couple ofdynamic aspects of triode modelingwere also tried, including the parasiticcapacitances exhibited by the valve inquestion that tend to affect thefrequency response of the amplifier(see Fig. 4). The combined resultappeared to be successful both from ameasured point of view and perceptu-ally, and it was possible to implementit within a VST plug-in architecture.

The authors briefly discuss the effectof sampling rate on their simulations,suggesting that oversampling (at 96kHz for example) is generally consid-ered desirable in nonlinear simulations.Oversampling in such situationsreduces the likelihood of aliasingeffects becoming audible. It also seemsto improve the accuracy of the variousfilters and the speed of convergence ofan algorithm designed to solve nonlin-ear and differential equations.

OVER-THRESHOLD FEEDBACKDISTORTION SYNTHESISStaying with the “triode sound,” TomRutt describes an algorithm he devel-oped to synthesize nonlinear distortionusing over-threshold power functionfeedback (AES paper 7930). This issaid to emulate closely the effect ofvacuum tube triode grid limit distor-tion. The process acts as an instanta-neous soft limiter for both the positiveand negative half cycles of the signaland never clips the output signal peaks,even at the maximum peak input level.

An electronic circuit implementationof this algorithm is described, as wellas a DirectX software plug-in versionknown as Tuboid (www.tuboid.com).In the latter a mapping array is usedthat relates certain input signal valuesto modified (distorted) output signalvalues, in an attempt to emulate thecharacteristics of the electronic circuit.While a 1:1 mapping function wouldrequire an array of 65,536 sixteen-bitintegers, in fact the input signal isscaled by 256, and an array of only256 values is used. The intervening


Fig. 3. Example of a stereo impulse response picked up from the Leslie horn

J. Audio Eng. Soc., Vol. 58, No. 5, 2010 May 423

output values are linearly interpolated.The positive- and negative-going partsof the signal are separately processedto enable the plug-in to be adjusted in aflexible fashion to emulate a variety ofdifferent kinds of both symmetric andasymmetric tube distortion. The authorhas made a number of audio examplesavailable for audition at www.coastenterprises.com/OTPF/demoClips.

CLASSIC EMT 140 PLATEREVERBERATIONThe EMT 140 was a classic platereverberation system introduced in1957 that was used widely before thedays of digital reverb units. Its sound isstill sought by connoisseurs of a cer-

tain type of reverb that can be usefulfor vocals in a mix. In “An Emulationof the EMT 140 Plate ReverberatorUsing a Hybrid Reverberator Struc-ture” (AES 127th paper 7928), Green-blatt et al. consider how its characteris-tics can be emulated digitally, so as to avoid the need to keep and maintaina large physical unit such as this in astudio.

Essentially plate reverbs consist of asuspended, thin steel plate to which adriving transducer and one or morepickup transducers are attached.Decaying vibrations are set up in theplate that behave to some extent likeroom reverberation, while a dampingplate located close to the reverberation

plate controls the rate at which energy isdissipated (and hence the reverb time).According to the authors’ analysis, theEMT 140 unit behaves in an almostlinear and time-invariant fashion. It hasa very fast onset transient response and ahigh echo density, little influenced bythe damping plate, while the low-frequency decay can be controlled torange from around half a second to overeight seconds. When considering how toimplement a digital emulation it issuggested that, although an obvioussuggestion, the idea of storing, inter-polating, and convolving impulseresponses of the plate up to eightseconds is computationally very expen-sive. Physical models have been tried inwhich the mechanical characteristics ofthe plate are modeled numerically usingpartial differential equations, but so farthese have been unable to emulate therapid onset response of a real plate.They are, however, good for emulatingthe high-level nonlinear behavior ofsuch a plate.

In the emulation described here, theauthors explain that they haveemployed a hybrid reverberator thatcombines two elements: a short convo-lution stage to deal with the onset tran-sient and a computationally efficientfeedback-delay network (FDN) toemulate the tail, with access to a rangeof control parameters. Because thetransition to a “late field” (the timeafter the initial impulse at which theecho density becomes perceptuallydense) occurs after only a few millisec-onds with the EMT plate, the initialconvolution period can be short. Itdepends to some extent on the settingof the damping plate. In fact, in thisimplementation a single boundary pointbetween the initial convolution andFDN parts is introduced at 300 ms,regardless of the damping setting. As the energy in the FDN builds upduring the first few hundred millisec-onds, its output can then be crossfadedinto the end of the initial convolutionsection using a crossfade with energymatching. Measured and modeleddecay traces and spectrograms for this reverb emulation are shown in Figs. 5 and 6.

A stereo emulation of the EMT platecan be implemented by introducing asecond FDN signal path for the late


Fig. 4. Frequency response of a triode valve simulation either with (top) or without(bottom) modeling of parasitic capacitances. The roll-off in HF response can beseen in the lower plot (courtesy Cohen and Hélie).

➥

www.coastenterprises.com/OTPF/demoClips



reverberation, which will produce anoutput that is uncorrelated with thefirst. An identical window is appliedduring the convolution of the early partof the response for left and right chan-nels in order to maintain the stereoimage. The stereo width of the late partis adjusted to match that of the originalplate by varying the panning of the twosignals between the outputs so that thecross-correlation of the channelsmatches that of the original.

A SWITCHED CONVOLUTIONREVERBERATORMoving into the modern age and dis-cussing the implementation of neweffects algorithms, Lee et al. continuethe reverberation theme in relation to alow-cost algorithm for use in mobiledevices in ‘The Switched ConvolutionReverberator” (AES paper 7927). Con-ventional convolution and FDN-basedreverberators need considerableamounts of processing and memory,which makes them unsuitable formobile devices where these resourcesare at a premium. Consequently theauthors propose the reverberator struc-ture shown in Fig. 7, which consists ofa recursive filter (comb filter) driving aprocess that convolves the comb filteroutput with a short, exponentiallydecaying noise sequence. The reverber-ator’s equalization is controlled usinglow-order IIR filters at the input and inthe feedback path of the comb filter.

The impulse responses at differentpoints in the chain are shown in Fig. 8,where it can be seen that the output ofthe comb filter is a simple decayingtrain of pulses, whereas that of the filteris an infinitely long decaying noisesequence. The time constant of thecomb filter is supposed to be the samelength as the noise sequence, which inthis case is only in the region of 100ms, leading to low computation andmemory requirements. This structurecreates a decay that can have similarproperties, both spectrally and in termsof echo density, to typical late reverber-ation. The authors found that this basicstructure worked well for stationarysignals but less well for transient sig-nals, having audible periodicity.Because the noise sequence is

repeated in a cyclical fashion, as theauthors found above, there can be audi-ble periodicity in the perceived reverber-ation decay. For this reason the noisesequence has somehow to be varied. In aswitched convolution reverberator adifferent noise sequence can be switchedin at every period of the comb filter,which helps to reduce periodicity.However, a basic switched convolutionreverberator was found to have theopposite problem to the simple filterstructure described above, in that itperformed quite well with transientsignals but less well for stationarysignals. It was clear that time-varyingartifacts in the reverberation had to be

Fig. 5. Measured (blue) and modeled (red) decay traces foremulation of EMT 140 reverberator (Figs. 5 and 6 courtesyGreenblatt et al.)

Fig. 7. Comb filter reverberatorstructure using a short noise sequenceoutput filter (Figs. 7–11 courtesy Leeet al.)

Fig. 8. Impulse response of thereverberator structure shown in Fig. 7:(a) input signal x(t), (b) comb filteroutput, (c) reverberator output y(t).

Fig. 6. Measured (left) and modeled (right) impulse responsespectrograms of the EMT reverb emulation

minimized, and this became a key goalof the authors’ research. One proposedsolution, as shown in Fig. 9, employed acrossfade between the noise sequencesto be convolved, having a length oftwice the comb filter impulse peri- �

AudioEffectsMAY2010_MAY 2010 5/13/10 11:38 AM



A transient detection algorithmcould be used to assist theprocess of selecting the updaterate of the noise sequence. Forcomplex music signals it wasfound to be helpful to make theleaky integrator’s time constantfrequency-dependent. Sparsenoise sequences known asvelvet noise were tried to goodeffect, these having less denseimpulse responses but similarperceptual characteristics toGaussian noise.

HANDLING TRANSIENTSIN TIME-STRETCHINGALGORITHMSTime-stretching algorithms areused to alter the speed of audiosignals without altering their

pitch. Ideally this process should avoidside effects. While sustained signals canbe treated with relative success usingvarious types of overlap-and-add algo-rithms, transient components are notalways so easy to deal with. In “A NovelTransient Handling Scheme for TimeStretching Algorithms” (AES 127thpaper 7926), Nagel and Walther explainthat it is usually necessary to extract thetransient components and only processthe sustained elements of a signal,adding the transients back in after pro-cessing. In attempting to address thisproblem they tackle the challenge ofstretching a sustained note such as thatfrom a pitch pipe, combined with a

highly transient sound such as that froma castanet.

The authors cite a 1994 paper on timestretching by Quatieri et al. in which thealgorithm attempts to preserve the over-all envelope of the signal in its time-stretched version. However, this leads tothe side effect that a time-dilated percus-sive event will decay more slowly thanthe original. In musical signals, theyargue, it is preferable to preserve theenvelope of transient events. In theauthors’ new method, transients areextracted from the composite signal andreplaced with a stationary interpolatedsignal to fill the gap, as shown in Fig.12. (The reason for this is to avoid theartifacts that can arise when stretchingstationary signals that have gaps inthem.) The resulting quasistationarysignal is then well suited to time stretch-ing, and the transients are subsequentlyused to replace parts of the interpolatedsignal after it has been stretched. Aphase vocoding technique was used tostretch the stationary parts of the signal.As shown in Fig. 13, the signal was FFTanalyzed in grains of around 10 ms andthen resynthesized at the new time scaleusing a different hop size and overlap-add processing (in this example, astretching factor of two is shown).

The main challenges of this methodwere to detect transients accurately andto achieve a perceptually satisfactoryinterpolation of the stationary part ofthe signal. In particular there was theproblem of what would happen if the

ods. The system’s impulse response isshown in Fig. 10. The perceived resultwas found to represent an improvement;however, it cost more in terms ofcomputation and memory requirementsthan the basic structure described earlier.

Alternative structures were tried thatused only a single convolution stage,based on the idea of gradually changingthe noise signal over time so as toreduce artifacts. A leaky integratorconcept was used to control the rate atwhich the noise sequence was changed,and proposals were made to vary thisrate depending on the time-varyingnature of the input signal. A versionbased on this idea is shown in Fig. 11.

Fig. 9. Switched convolution reverberatorstructure with cross-faded noise sequences

Fig. 10. Impulse response of the crossfadedswitched convolution reverberator shown in Fig. 9

Fig. 12. Transient removal and interpolation in a novel time-stretchingalgorithm: (a) The transient is detected and windowed, then it is subtractedfrom the signal, finally the gap is interpolated; (b) upon resynthesis thestationary signal is stretched, then a gap is prepared for the transient bymultiplying with the inverse window used for its original extraction, then thetransient is added back in (Figs. 12 and 13 courtesy Nagel and Walther).

Fig. 11. Switched convolution reverberatorusing frequency-dependent noise update

a

b

stationary signal’s compositionchanged substantially over the durationof the transient, which the authorsproposed to address by using forwardand backward prediction with a cross-fade. Detection of the ends of transientsections was complicated by the factthat the reverberant decays of the tran-sient part typically merge with thefollowing stationary part, and thisrequired some careful handling. Resultswere promising, but did not appear tooutperform state-of-the-art softwarecurrently available. Interestingly, somelisteners seemed to prefer versions ofthe time stretch in which the reverb ofthe transient was stretched along withthe stationary part of the signal, whichrather contradicted the authors’ originalassumption that it would be preferableto treat the transients and their associ-ated decays as an unchanged andcombined entity. They concluded thatmore insight into listener preferencesare needed. The novel approach ispromising, they said, and its essentialprinciples had been demonstrated in thecase of a special signal. Anyone inter-ested in hearing the results of thisapproach are invited to ask the authors([email protected]) forsample audio files.

SUMMARYUsing today’s signal processing toolsone can both emulate the sounds ofyesteryear and implement new effectsthat were not possible in the days ofanalog processing. The sound of classicsystems such as the Leslie speaker andtube guitar amp can be made availablein the world of plug-ins, for use in mod-ern digital audio production applica-tions. Although plug-ins may not havesome of the raw physical appeal or nos-talgic value of the classic hardware they

emulate, they can bring the sounds ofyesterday to a contemporary audiencein a relatively convenient fashion.

Fig. 13 The phasevocoder algorithmused for timestretchingresynthesizes theoriginal signal byusing a hop sizethat is different tothat used in theoriginal FFTanalysis (hereshown as twotimes longer)

Editor’s note: The papers reviewed in this article, and all AES papers, can be purchasedonline at www.aes.org/publications/preprints/search.cfm and www.aes.org/journal/search.cfm.AES members also have free access to past tech-nical review articles such as this one and othertutorials from AES conventions and conferencesat www.aes.org/tutorials.


...integrated

electronic test, acoustic test...

Prism Sound is now a globalsales & support agent for:

dScope Series IIIanalog and digital

audio analyzers

Loudspeaker design and analysis solutions


Digital Audio Effects and Simulations

Documents

Transcript of Digital Audio Effects and Simulations