Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University,...
-
Upload
chad-carson -
Category
Documents
-
view
227 -
download
1
Transcript of Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University,...
Multiresolution STFTMultiresolution STFTfor Analysis and Processing of for Analysis and Processing of
AudioAudio
Alexey LukinAlexey LukinMoscow State University, Russia;Moscow State University, Russia;
iZotope Inc., Cambridge, MAiZotope Inc., Cambridge, MA
Talk at B.U.Talk at B.U.Sept. 2010Sept. 2010
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
22/25/25
Short-Time Fourier Short-Time Fourier TransformTransform
Most commonly used transform for audio:Most commonly used transform for audio:►Spectral analysisSpectral analysis►Noise reduction (spectral subtraction algorithm)Noise reduction (spectral subtraction algorithm)► Time-variable filters and other effectsTime-variable filters and other effects
Very fast implementation for a large number of bands via FFTVery fast implementation for a large number of bands via FFT Good energy compaction for many musical signalsGood energy compaction for many musical signals
Many oscillations in basis functions → ringing (Gibbs Many oscillations in basis functions → ringing (Gibbs phenomenon)phenomenon)
Uniform frequency resolution → inadequate resolution at low Uniform frequency resolution → inadequate resolution at low freqs.freqs.
m
mjenmxmwnSTFT ][][],[
+
–
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
33/25/25
Short-Time Fourier Short-Time Fourier TransformTransform
Spectrogram: displays evolution of Spectrogram: displays evolution of spectrum in timespectrum in time
m
mjenmxmwnSTFT ][][],[
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
44/25/25
SpectrogramsSpectrograms
Problems:Problems:►Most perceptually meaningful energy is concentrated Most perceptually meaningful energy is concentrated
in a narrow band below 4 kHz → can’t see enough in a narrow band below 4 kHz → can’t see enough detailsdetails
► Time/frequency resolution trade-offTime/frequency resolution trade-off
ConventionalSTFT spectrogram(linear frequency scale)
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
55/25/25
SpectrogramsSpectrograms
Problems:Problems:► Poor frequency resolution at low frequencies → Poor frequency resolution at low frequencies →
can’t separate bass harmonics from the bass drumcan’t separate bass harmonics from the bass drum►Time/frequency resolution trade-offTime/frequency resolution trade-off
Mel-scaleSTFT spectrogram(window size = 12 ms)
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
66/25/25
SpectrogramsSpectrograms
Problems:Problems:► Poor time resolution at transients → time-smearing Poor time resolution at transients → time-smearing
of drums and other percussive soundsof drums and other percussive sounds
Mel-scaleSTFT spectrogram(window size = 93 ms)
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
77/25/25
Filter banksFilter banks
IdeaIdea::
Decompositions of a time-frequency planeDecompositions of a time-frequency plane
DecompositionProcessingof subband
signalsSynthesis
x[n] y[n]… …
f
tSTFT
f
tDWT
Uncertaintyprinciple
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
88/25/25
Filter banksFilter banks
Diagram of an mp3 encoder
mp3 filex[n]
FFT
Filter bank Q Huffman
Psychoacoustic model
Perceptual coding of audioPerceptual coding of audio
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
99/25/25
Filter banksFilter banks
Window size switching Window size switching (guided by transients (guided by transients detection)detection)
Pre-echo
Transient
Reducedpre-echo
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
1010/25/25
Proposed approachProposed approach
Transforms should varytheir time-frequency resolutionin a perceptually motivated way
► Imitation of time-frequency resolution of Imitation of time-frequency resolution of human hearinghuman hearing
► Adaptation of resolution to local signal Adaptation of resolution to local signal featuresfeatures
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
1111/25/25
SpectrogramsSpectrograms
Simple solution:Simple solution:►Combine spectrograms with different resolutions: Combine spectrograms with different resolutions:
take bass from a spectrogram with good frequency take bass from a spectrogram with good frequency resolution, take treble from a spectrogram with good resolution, take treble from a spectrogram with good time resolutiontime resolution
Combined resolutionspectrogram(window sizes
from 12 to 93 ms)
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
1212/25/25
SpectrogramsSpectrograms
Simple solution: combine spectrograms with Simple solution: combine spectrograms with different resolutionsdifferent resolutions
Each spectrogram is computed on the same Each spectrogram is computed on the same grid of time-frequency points grid of time-frequency points (using zero (using zero padding)padding)
Analysis
Filter bank 1 Filter bank 2
Mixer of coefficients
x[t]
af,t,1
control
af,t,2
af,t
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
1313/25/25
SpectrogramsSpectrograms
Better approach: select best resolution for Better approach: select best resolution for each time-frequency neighborhoodeach time-frequency neighborhood
Criteria?Criteria?►Better frequency resolution at bass Better frequency resolution at bass (reflects a-priori (reflects a-priori
psychoacoustical knowledge)psychoacoustical knowledge)
►Maximal energy compaction Maximal energy compaction (to minimize spectral (to minimize spectral smearing in both time and frequency, i.e. maximize sparsity)smearing in both time and frequency, i.e. maximize sparsity)
6 ms 12 ms 24 ms 48 ms 96 ms
best
STFT window size
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
1414/25/25
SpectrogramsSpectrograms
Calculation of sparsityCalculation of sparsity(in a given block,(in a given block,
for all T/F resolutions r)for all T/F resolutions r)
6 ms 12 ms 24 ms 48 ms 96 ms
best
STFT window sizes
Here aai,ri,r are STFT magnitudes in the block, Sr is the spectrum sparsity for the given resolution rr, rr00 is the resolution with best sparsity.
rrSr maxarg0
n
iri
n
i
L
Lr
a
an
anorm
nanormS
ri
1,
1
2
1
2,
)(
)(
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
1515/25/25
SpectrogramsSpectrograms
Benefits:Benefits:►Sharper bass drum hits and other transients, even Sharper bass drum hits and other transients, even
in mid-frequency rangein mid-frequency range►Sharper guitar harmonics at high frequenciesSharper guitar harmonics at high frequencies
Adaptive resolutionAdaptive resolutionspectrogramspectrogram(window sizes
from 12 to 93 ms)
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
1616/25/25
SpectrogramsSpectrograms
Simple solution:Simple solution:►Combine spectrograms with different resolutions: Combine spectrograms with different resolutions:
take bass from a spectrogram with good frequency take bass from a spectrogram with good frequency resolution, take treble from a spectrogram with good resolution, take treble from a spectrogram with good time resolutiontime resolution
Combined resolutionspectrogram(window sizes
from 12 to 93 ms)
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
1717/25/25
SpectrogramsSpectrograms
Tone onset waveform
More examplesMore examples
ConventionalSTFT spectrogram
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
1818/25/25
SpectrogramsSpectrograms
Combined resolutionspectrogram
More examplesMore examples
Adaptive resolutionAdaptive resolutionspectrogramspectrogram
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
1919/25/25
Processing frameworkProcessing framework
General framework forGeneral framework for
multi-resolution processingmulti-resolution processing► Perform processing withPerform processing with
several different resolutionsseveral different resolutions►Adaptively combine (mix)Adaptively combine (mix)
results in a time-frequency spaceresults in a time-frequency space►Mixing is controlled by a-prioriMixing is controlled by a-priori
knowledge of psychoacousticsknowledge of psychoacoustics
and analysis of local signal featuresand analysis of local signal features(e.g. transience or sparsity)(e.g. transience or sparsity)
Processing 1 Processing 2
Analysis
Filter bank Filter bank
Mixer of coefficients
Inversefilter bank
x[t]
x1[t] x2[t]
y[t]
control
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
2020/25/25
Noise reductionNoise reduction
Spectral subtraction algorithmSpectral subtraction algorithm1.1. STFT of a noisy signalSTFT of a noisy signal2.2. Estimate power spectrum of noise Estimate power spectrum of noise (manually or (manually or
automatically)automatically)
3.3. Subtract noise power spectrum from a signal power Subtract noise power spectrum from a signal power spectrumspectrum
4.4. InverseInverse STFTSTFT
STFT
Noise spectrumestimation
InverseSTFT
x[t] X[f,t]–
W[f]
S[f,t] s[t]
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
2121/25/25
Noise reductionNoise reduction
Spectral subtraction(short windows)
Mix
er
of
coeffi
cien
ts
y[t] x3[t]
Spectral subtraction(long windows)
STFT
STFT
Synthesis
x1[t]
x2[t]
Transienceanalysis
control
Example of adaptive resolutionExample of adaptive resolution►Better frequency resolution at low frequencies Better frequency resolution at low frequencies
(according to the resolution of human hearing)(according to the resolution of human hearing)
►Better temporal resolution near signal transients Better temporal resolution near signal transients (for reduction of Gibbs phenomenon)(for reduction of Gibbs phenomenon)
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
2222/25/25
Noise reductionNoise reduction
Results of single-resolution and multi-Results of single-resolution and multi-resolution algorithmsresolution algorithms
Noisy recording(guitar + castanets)
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
2323/25/25
Noise reductionNoise reduction
Results of single-resolution and multi-Results of single-resolution and multi-resolution algorithmsresolution algorithms
Single resolution
Multi-resolution(notice less pre-ringing on transients)
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
2424/25/25
ConclusionConclusion
When using STFT –do care about the window size!
Choose the size wisely:
► Maximize sparsity (spactrogram sharpness)Maximize sparsity (spactrogram sharpness)
► Account for human perceptionAccount for human perception
A. Lukin, J. Todd “Adaptive Time-Frequency A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”Resolution”
2525/25/25
Your questionsYour questions
Demo web page: http://www.izotope.com/tech/aes_adapt/
??