ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 9: Time & …dpwe/e4896/lectures/E4896-L09.pdfE4896 Music...

E4896 Music Signal Processing (Dan Ellis) 2013-03-25 - /16

Lecture 9:Time & Pitch Scaling

Dan EllisDept. Electrical Engineering, Columbia University

[email protected] http://www.ee.columbia.edu/~dpwe/e4896/1

1. Time Scale Modification (TSM)2. Time-Domain Approaches3. The Phase Vocoder4. Sinusoidal Approach

ELEN E4896 MUSIC SIGNAL PROCESSING

mailto:[email protected]

mailto:[email protected]

http://labrosa.ee.columbia.edu

http://labrosa.ee.columbia.edu


1. Time Scale Modification (TSM)

• The basic problem in time scaling:Modify a sound to make it “quicker/slower”?to examine “detail” in speech/performanceto synchronize tracks

2

xs(t) = x(t/r)

• Why not just change the sampling rate?“slowing the tape”e.g. for r times longer (slower),

2.35 2.4 2.45 2.5 2.55 2.6-0.1

-0.05

0

0.05

0.1

2.35 2.4 2.45 2.5 2.55 2.6-0.1

-0.05

0

0.05

0.1

time/s

Original

2x slower


Time & Pitch• Changing speed alters time AND pitch

how do we change just one?

• Preserve pitch, change timepreserve pitch = keep local time structurebut changing global time course?

• Preserve timing, change pitchanalogous problemchange time scale, then change sampling rate

3


2. Time-Domain TSM• Pre-digital time/pitch scaling:

Revolving tape heads

4

Fairb

anks

et a

l., 19

54


Simple OLA TSM• The digital equivalent of revolving tape heads

5

2.35 2.4 2.45 2.5 2.55 2.6-0.1

0

0.1

4.7 4.75 4.8 4.85 4.9 4.95-0.1

0

0.1

1

1

1 1 2 2 3 3 4 4 5 5 6

2

2

3

3

4

4

5 6

65

time / s

time / s

ym[mL + n] = ym�1[mL + n] + w[n] · x��m

r

�L + n

�


SOLA / SOLAFS• Simple OLA leads to

random phase interactions during overlapso.. search for optimal alignment Km via small offset

find best Km with cross-correlation:

6

12

Km maximizes alignment of 1 and 2

ym[mL + n] = ym�1[mL + n] + w[n] · x��m

r

�L + n + Km

�

Km = arg max0�K�Ku

�Nov

n=0 ym�1[mL + n] · x��

mr

�L + n + K

��

(ym�1[mL + n])2�

(x��

mr

�L + n + K

�)2

Roucos & Wilgus 1985Hejna & Musicus 1991


The Importance of Time Window• Preserved “local structure”

= structure within each windowshorter windows → less structure preserved

7

2.4 2.45 2.5 2.55 2.6 2.65 2.7 2.75 2.8-0.1

0

0.1

4.8 4.85 4.9 4.95 5 5.05 5.1 5.15-0.1

0

0.1

2.4 2.45 2.5 2.55 2.6 2.65 2.7 2.75 2.8-0.1

0

0.1

4.8 4.85 4.9 4.95 5 5.05 5.1 5.15-0.1

0

0.1

tw = 25ms tw = 5ms


Source-Filter TSM• Decompose into excitation + filter

before TSMfilter (e.g. LPC) easy to scale - change frame rateexcitation scaled e.g. by SOLAFS.. then reapplying filter can reduce artifacts

8

f

Filteranalysis

Resamplein time

Time scalemodification

Applyfilter

Filter coefficients

Residual t

Kawahara 1997


3. The Phase Vocoder• TSM based on the spectrogram

just stretching it out it time gives “what we want”:but: STFT magnitude isn’t quite enough

9

Time

Frequency

0 0.2 0.4 0.6 0.8 1 1.2 1.40

1000

2000

3000

4000

Time

Frequency

0 0.2 0.4 0.6 0.8 1 1.2 1.40

1000

2000

3000

4000

Flanagan & Golden 1966Dolson 1986


Magnitude-only reconstruction• How to recover ?

need ...iterate

converges to a solution, but SLOWLY

10

STFT�1{|Y (ej�, t)|}�{Y (ej�, t)}

yn(t) = STFT�1{|Y (ej�, t)|�n(�, t)}�n+1(�, t) = �{STFT{yn(t)}}

0 1 2 3 4 5 6 7 83

4

5

6

7

8

Iteration number

SNR

of r

econ

stru

ctio

n (d

B)

Iterative phase estimation

Starting from quantized phaseMagnitude quantization only

Starting from random phase

7.6 dB

4.7 dB

3.4 dB

Griffin & Lim 1984


Phase Correction• Essential idea of the phase vocoder

phase advance will stretch by time scaling factor... preserve rate of phase change between slices

= instantaneous frequency

(from differencing adjacent frames, or directly...)

• Makes sense for a single sinusoid...

11

�̇(�, t) =d

dt�{Y (ej�, t)}

time

θ = ΔθΔT

.

Δθ' = θ·2ΔT .

ΔT


Phase Vocoder Results• Reconstructed phase

gives smooth evolution

“metallic” extended noise...

12

freq

/ kHz

0

2

4

6

8

freq

/ kHz

0

2

4

6

8

time / s0.5 1 1.5 2 2.5 3


Time Window (again)• STFT time-frequency tradeoff

determines what “scaling” means

is pitch structure within DFT window?

13

time / s0.5 1 1.5 2 2.5 3

freq

/ kHz

0

2

4

6

8

freq

/ kHz

0

2

4

6

8


4. Sinusoidal TSM• Phase Vocoder essentially

treats each STFT bin as a sinusoidwith magnitude and frequency time scaling simply projects that sinusoid forward

• But: bins around peak don’t behave that wayneed phase alignment to achieve interference

• Model as sinusoids explicitlypeak-picking, magnitude & frequency estimationseparate noise signal?

14

�̇(�, t)|Y (ej�, t)|

Time

Frequency

0.5 1 1.5 2 2.5 30

2000

4000

6000

8000


Summary• Time / Pitch scale modification

Intuitive but difficult to define

• Time domain methodsPreserve local structure

• Spectral methodsElongate features in spectrogram

15


References• M. Covell, M.Withgott, M., Slaney, “Mach1: Nonuniform Time-Scale Modification

of Speech,” Proc. ICASSP, vol 1, pp. 349-352,1998.

• Mark Dolson, “The phase vocoder: A tutorial,” Computer Music Journal, vol. 10, no. 4, pp. 14-27, 1986.

• G. Fairbanks, W. Everitt, and R. Jaeger, “Method for time of frequency compression-expansion of speech,” Transactions of the IRE Professional Group on Audio, vol.2, no.1, pp. 7-12, Jan 1954.

• J. L. Flanagan, R. M. Golden, “Phase Vocoder,” Bell System Technical Journal, pp. 1493-1509, November 1966.

• D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Tr. Acous,, Speech, and Sig. Proc., vol. 32, pp. 236–242, 1984.

• Don Hejna and Bruce R. Musicus, “The SOLAFS Time-Scale Modification Algorithm,” BBN Technical Report, July 1991.

• Hideki Kawahara, “Speech Representation and Transformation using Adaptive Interpolation of Weighted Spectrum: VOCODER Revisited,'' Proc. ICASSP, vol.2, pp.1303-1306, 1997.

• Jean Laroche, “Time and Pitch Scale Modification of Audio Signals.” chapter 7 in Applications of Digitial Signal Processing to Audio and Acoustics, ed. M. Kahrs & K. Brandenburg, Kluwer, 1998.

16

http://www.mangolassi.org/covell/1997-060

http://www.mangolassi.org/covell/1997-060

ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 9: Time & …dpwe/e4896/lectures/E4896-L09.pdfE4896 Music...

Documents

Transcript of ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 9: Time & …dpwe/e4896/lectures/E4896-L09.pdfE4896 Music...