E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 {...
Transcript of E85.2607: Lecture 6 -- Time-segment Processingronw/adst-spring2010/...E85.2607: Lecture 6 {...
E85.2607: Lecture 6 – Time-segment Processing
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 1 / 16
Variable speed replay
How can we modify a sound to make ‘faster’ or ‘slower’?i.e. song played at half tempoWhy not just slow it down?
xs(t) = xo(vt), v = speedup factor (> 1→ faster)
equivalent to playback at a different sampling rate(resample in Matlab)
2.35 2.4 2.45 2.5 2.55 2.6-0.1
-0.05
0
0.05
0.1
2.35 2.4 2.45 2.5 2.55 2.6-0.1
-0.05
0
0.05
0.1
time/s
Original
2x slower
v = 12
Source: Dan Ellis, EE6820 Lecture 1
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 2 / 16
Variable speed replay in the frequency domain
Compress in time → shiftfrequencies higher
“chipmunk” effect
Beware aliasing
LPF before resampling
0 2000 4000 6000 8000
−0.5
0
0.5
x(n)
Variable speed replay (v=1), time domain signals
0 2000 4000 6000 8000
−0.5
0
0.5
x(n)
(v=0.5)
0 2000 4000 6000 8000
−0.5
0
0.5
x(n)
(v=2)
n !
0 1000 2000 3000 4000 5000−80
−60
−40
−20
0
X(f)
Variable speed replay (v=1), spectra
0 1000 2000 3000 4000 5000−80
−60
−40
−20
0
X(f)
(v=0.5)
0 1000 2000 3000 4000 5000−80
−60
−40
−20
0
X(f)
(v=2)
f in Hz !
Source: DAFX: Fig. 7.1
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 3 / 16
Timescale modification
Want to change timeaxis
Leave pitch alone
0 2000 4000 6000 8000
−0.5
0
0.5
x(n)
Time stretching (!=1), time domain signals
0 2000 4000 6000 8000
−0.5
0
0.5
x(n)
(!=0.5)
0 2000 4000 6000 8000
−0.5
0
0.5
x(n)
(!=2)
n "
0 1000 2000 3000 4000 5000−80
−60
−40
−20
0
X(f)
Time stretching (!=1), spectra
0 1000 2000 3000 4000 5000−80
−60
−40
−20
0
X(f)
(!=0.5)
0 1000 2000 3000 4000 5000−80
−60
−40
−20
0
X(f)
(!=2)
f in Hz "
Source: DAFX: Fig. 7.4
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 4 / 16
Block processing
time / s
4
4
1
1
3
3
2
2
5
5
Chop signal into short segments
Resample individual segments,but keep time axis
Will this work?
c = 1;for i = 0:N:length(x)y(:,c) = x(i + [1:N]);c = c + 1;
end
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 5 / 16
TSM in the time-domain
Problem: want to preserve local time structurebut alter global time structureRepeat or discard segments
but: artifacts from abrupt edges
Cross-fade & overlap-add (OLA)
ym[mL + n] = ym−1[mL + n] + w [n] · x [bvmc L + n]
2.35 2.4 2.45 2.5 2.55 2.6-0.1
0
0.1
4.7 4.75 4.8 4.85 4.9 4.95-0.1
0
0.1
1
1
1 1 2 2 3 3 4 4 5 5 6
2
2
3
3
4
4
5 6
6
5
time / s
time / sSource: Dan Ellis, EE6820 Lecture 1
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 6 / 16
Aside: Fast convolution (fftfilt)
Source: Dan Ellis, EE4810: Lecture 3
Use FFT to implement block-wise convolution: O(N log N) vs O(N2)
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 7 / 16
Synchronous overlap-add (SOLA)
Idea: allow some leeway in placing window to optimize alignment ofwaveforms to minimize artifacts
1
2
Km maximizes alignment of 1 and 2
Hence,
ym[mL + n] = ym−1[mL + n] + w [n] · x [bvmc L + n + Km]
Where Km chosen by cross-correlation (xcorr):
Km = argmax0≤K≤Ku
∑Novn=0 ym−1[mL + n] · x [bvmc L + n + K ]√∑(ym−1[mL + n])2
∑(x [bvmc L + n + K ])2
Source: Dan Ellis, EE6820 Lecture 1
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 8 / 16
Pitch-synchronous OLA (PSOLA): Analysis
Assuming input signal has pitch, can get even cleaner TSM
1 Identify pitch period T usingauto-correlation
2 Find peaks corresponding topitch pulse
3 Segment signal using length 2Twindow centered on each pitchpulse
Pitch Synchronous Analysis
P
pitch marks
PSOLA analysis
segments
Source: DAFX: Fig. 7.9
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 9 / 16
PSOLA: Synthesis
1 Set up grid of pitch marks foroutput segments
2 Match up closest input segmentto each output pitch mark
3 Overlap-add...
pitch marks
segments
PSOLA time stretching
Synthesispitch marks
Overlap and add
Source: DAFX: Fig. 7.10
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 10 / 16
PSOLA gotchas
Pitch Synchronous Analysis
P
pitch marks
PSOLA analysis
segments
Source: DAFX: Fig. 7.8
PSOLA analysis assumes constant pitchUse more sophisticated pitch tracker
What if there is no pitch?Fall back to OLA/SOLAComplicates analysis – how to decide whether given section has pitch ornot?
Tends to introduce artifacts on noisy sectionsRepeating noisy segments introduces artificial periodicity (e.g. repeatedtransients)Fix by reversing repeated segments
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 11 / 16
Pitch-shifting
Leave the time-axis alone, but shift frequency-axis
Approach: Resample to shift both, then correct time axis
or vice-versa
0 2000 4000 6000 8000
−0.5
0
0.5
x(n)
Pitch scaling (!=1), time domain signals
0 2000 4000 6000 8000
−0.5
0
0.5
x(n)
(!=0.5)
0 2000 4000 6000 8000
−0.5
0
0.5
x(n)
(!=2)
n "
0 1000 2000 3000 4000 5000−80
−60
−40
−20
0
X(f)
Pitch scaling (!=1), spectra
0 1000 2000 3000 4000 5000−80
−60
−40
−20
0
X(f)
(!=0.5)
0 1000 2000 3000 4000 5000−80
−60
−40
−20
0
X(f)
(!=2)
f in Hz "
Resampling(ratio N /N )1 2
x(n) y(n)Time Scaling(ratio N /N )2 1
Source: DAFX: Fig. 7.13
v =N2
N1
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 12 / 16
Pitch-shifting example
0 2000 4000 6000 8000
−0.5
0
0.5x(
n)
Pitch scaling (!=1), time domain signals
0 2000 4000 6000 8000
−0.5
0
0.5
x(n)
(!=0.5)
0 2000 4000 6000 8000
−0.5
0
0.5
x(n)
(!=2)
n "
0 1000 2000 3000 4000 5000−80
−60
−40
−20
0
X(f)
Pitch scaling (!=1), spectra
0 1000 2000 3000 4000 5000−80
−60
−40
−20
0
X(f)
(!=0.5)
0 1000 2000 3000 4000 5000−80
−60
−40
−20
0
X(f)
(!=2)
f in Hz "
Resampling(ratio N /N )1 2
x(n) y(n)Time Scaling(ratio N /N )2 1
Source: DAFX: Fig. 7.12
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 13 / 16
TSM with the Spectrogram
Just stretch out the spectrogram?
Time
Fre
quen
cy
0 0.2 0.4 0.6 0.8 1 1.2 1.40
1000
2000
3000
4000
Time
Fre
quen
cy
0 0.2 0.4 0.6 0.8 1 1.2 1.40
1000
2000
3000
4000
how to resynthesize?spectrogram is only |Y [k,m]|Source: Dan Ellis, EE6820 Lecture 1
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 14 / 16
A look ahead: Phase vocoder
Time/pitch modification in thetime-frequency domain
Like OLA, but also take DFT of eachsegment
Like STFT, but special attention paid tophase...
Magnitude from ‘stretched’ spectrogram:
|Y [k,m]| = |X [k , vr ]|
But preserve phase increment between slices:
θ̇Y [k ,m] = θ̇X [k , vm]
Magnitude Phase
Sh
ort
-tim
eF
ou
rie
rtr
an
sfo
rm
FFT
Windowing
Time-domainsignal
An
aly
sis
Sta
ge
Syn
the
sis
Sta
ge
IFFT
Magnitude and Phasecalculation
Overlap-add
Time-domainsignal
Time-frequencyProcessing
Inve
rse
sho
rt-t
ime
Fo
urie
rtr
an
sfo
rm
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 15 / 16
Reading
DAFX 7.1–7.4 TSM, SOLA, PSOLA
DAFX 8.1–8.2 Phase vocoder basics
E85.2607: Lecture 6 – Time-segment Processing 2010-03-04 16 / 16