ICME 2004 Tutorial: Audio Feature...
Transcript of ICME 2004 Tutorial: Audio Feature...
![Page 1: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/1.jpg)
1
ICME 2004 Tutorial: Audio Feature Extraction
George TzanetakisAssistant ProfessorComputer Science DepartmentUniversity of Victoria, Canada
[email protected]://www.cs.uvic.ca/~gtzan
![Page 2: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/2.jpg)
2
Bits of the history of bits
01011110101010 Hello world
Understanding multimedia content ->
Web Multimedia
![Page 3: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/3.jpg)
3
Tutorial Goals
� Overview of state of the art
� Fundamentals
� Technical Background
� Some math, computer science, music
� Shift emphasis from audio coding/compression to audio analysis
� There is more to audio analysis than MFCCs
![Page 4: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/4.jpg)
4
Some simple but important observations
� Analysis/Understanding require multiple representations – no “best” one
� Coding/Compression/Processing typically search for the “best” “optimal” way to do things
� Paradigm shift is necessary to make multimedia more than just lots of numbers
� !!! MACHINE LEARNING
![Page 5: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/5.jpg)
5
My background – MIR
� Database of all recorded music
� Tasks: organize, search, retrieve, classify recommend, browse, listen, annotate
� Examples:
![Page 6: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/6.jpg)
6
Feature extraction
FeatureSpace
Feature vector
![Page 7: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/7.jpg)
7
Outline
� Introduction 10 min
� Signal Processing 30 min
� Source-based 20 min
� Perception-based 30 min
� Music-specific 20 min
� Fingerprinting and watermarking 25 min
� Sound Separation and CASA 25 min
� Future work and challenges 20 min
![Page 8: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/8.jpg)
8
Signal Processing
� Sound and Sine Waves
� Short Time Fourier Transform
� Discrete Wavelet Transform
� Fundamental Frequency Detection
![Page 9: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/9.jpg)
9
Understanding Sound
� Longitudinal wave – pulsating expanding sphere
� 344 m (1128 feet) / second (at 20 Celcius)
� Reflections
� Sound production, propagation and perception are to a certain degree linear phenomena
![Page 10: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/10.jpg)
10
Time-domain waveform
Input
Time
?
Decompose to building blocksthat are created in a regular fashion
Frequency
Time
![Page 11: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/11.jpg)
11
SpectrumM
FM
F
t
t+1
Recipe for how to combine the building blocks to form the signal
Different view of the same information
![Page 12: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/12.jpg)
12
Linear Systemsand Sine Waves
in1
in2
in1 + in2
out1
out2
out1 + out2
Amplitude
Period = 1 / Frequency
0 180 360
Phase True sine waves last forever
sine wave -> LTI -> new sine wave
![Page 13: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/13.jpg)
13
Time-Frequency Analysis Fourier Transform
f t � 12 �
�
�
f � e
i t dt
f � � �� �
�
f t e i � t dt
e i
� � cos
� � i � sin
�
f x �
n � 0
�
a n cos n � x �
n � 0
�
b n sin n � x
Any periodic waveform can be approximated by a number of sinusoidal components that are harmonics (integer multiples) of a fundamental frequency
![Page 14: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/14.jpg)
14
Non-periodic signals
P=1/f=2
� � !
We force them to be periodic by repeating them to infinity
![Page 15: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/15.jpg)
15
Short TimeFourier Transform
Time-varying spectraFast Fourier Transform FFT
Input
Time
t
t+1
t+2
Filters Oscillators
Output
Amplitude
Frequency
Vocoder
![Page 16: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/16.jpg)
16
Short Time Fourier Transform II
FT = global representation of frequency content
"
t
#%$ & 1 '
4
Heisenberg uncertainty
Time – Frequency
L2
Sf u , () *+ ,
,f t g t - u e
. i / t dt
![Page 17: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/17.jpg)
17
STFT- Wavelets
0
t
1%2 3 1 4
4Heisenberg uncertaintyTime – Frequency
![Page 18: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/18.jpg)
18
A filterbank viewof STFT and DWT
BW = 1KHzCF = 500Hz
BW = 1KHzCF = 1500Hz
BW = 1KHzCF = 2500Hz
Impulseinput
Impulseinput
BW=200
BW=400
BW=800
STFT DWT
![Page 19: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/19.jpg)
19
The Discrete Wavelet Transform
Analysis
Synthesis
Octave filterbank
![Page 20: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/20.jpg)
20
Sinusoids + noise modeling
Oscillators
Amplitude
Frequency
FilterNoise
outputDeterministic Part
StochasticPart
![Page 21: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/21.jpg)
21
Spectral Shape FeaturesM
FM
F
t
t+1
Centroid = center of gravity of spectrumRolloff = energy distribution low/highFlux = short time change
Grey's timbrespace (1975)
![Page 22: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/22.jpg)
22
Spectral Flatness
¼ octave
¼ octave
¼ octave
Logarithmically-spaced overlapping frequency bands
ratio of geometric mean to arithmetic mean of spectral power (presence of tonal component in band)
Tonality coefficients = how much tone vs noise is that particular band
![Page 23: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/23.jpg)
23
Analysis and Texture Windows
Texture windows
Speech Orchestra
20 milliseconds40 analysis windows
Piano
Analysis windows
Running multidimensional Gaussian distribution(means, variances over texture window)
![Page 24: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/24.jpg)
24
Fundamental Frequency Detection
P
r x5
n 6 0
N 7 1x n x n 8 l , l 9 0,1,.. L : 1
Autocorrelation Peaks at multiple of the fundamental frequency
ZeroCrossings
Time-domainFrequency-domainPerceptual
Rhythm -> ~20 Hz Pitch(created by Roger Dannenberg)
![Page 25: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/25.jpg)
25
Demos I
; Phase vocoder
< Spectrograms – time frequency tradeoffs
= Wavelet decomposition
![Page 26: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/26.jpg)
26
Source-based approaches
> Linear Prediction
? CELP, GSM
@ Isolated tone musical instrument recognition
![Page 27: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/27.jpg)
27
Harmonic Partials
A Instruments and the voice are harmonic oscillators (solution to pdf)
B Partials (peaks in the spectrum)
C Harmonic Partials are integer multiples of the first partial or fundamental frequency
frequencyf0
3f0
Helmholz – timbre is basedon the relative weights of the harmonics
![Page 28: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/28.jpg)
28
Voice production
D Vocal Folds
E breath pressure from lungs causes the folds to oscillate
F oscillator driven by breath pressure acts as excitation to the vocal folds
G Vocal Tract
H tube(s) of varying cross-section exhibiting modes of vibration (resonances)
I resonances “shape” the excitation
![Page 29: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/29.jpg)
29
Formant Peaks - Resonances
Modes/resonances are the result of standing waves constructive interference – boosted regions of frequency
Vocal tract is essentially a tubethat is closed at the vocal fold and open at the lips
modes = odd-multiples of ¼ cycleof a sine wave (F1 = c/l/4) l=9in375 Hz, 1125 Hz, 2075 Hz
![Page 30: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/30.jpg)
30
Formants
From “Real Sound Synthesis for Interactive Applications”P. Cook, A.K Peters Press, used by permission
![Page 31: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/31.jpg)
31
Linear Prediction Coefficients
Lossless Tube SpeechImpulses @ f0
White Noise
Source Filter
s ' n
J K
i L 1p
a i s n M 1 H z N 1
1 O Pi Q 1
p
a i z
R iOriginal
Resynthesizedwith impulses/noise
![Page 32: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/32.jpg)
32
Variations
S Perceptual Linear Prediction
T RASTA – Relative Spectral Transform -Perceptual Linear Prediction
U Take advantage of HAS characteristics
V CELP (Code Excited Linear Prediction)
W better modeling of excitation
X GSM
![Page 33: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/33.jpg)
33
CELP
Y Problems with LPC
Z tube is not one tube but two (nose)
[ buzz is not buzz
\ everything goes into residue
] Codebook Excitation
^ table of typical residue signals
_ one fixed
` one adaptive
![Page 34: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/34.jpg)
34
Isolated ToneInstrument Classification
a Important step for music transcription
b Hierarchical classification
c Family: bowed, wind etc
d Instrument: violin, flute, piano etc
e Spectral
f Temporal
g temporal centroid, onset time
![Page 35: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/35.jpg)
35
MPEG-7Audio Descriptors
h Low-Level Audio Descriptors
i Waveform, Spectral
j Spectral Timbral (centroid, spread)
k Temporal Timbral (temporal cntrd, log-attack)
l High-Level Description Tools
m Sound recognition and indexing
n Spoken content
o Musical instrument timbre
p Melody description
![Page 36: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/36.jpg)
36
Principal Component Analysis
Projection matrix
PCAEigenanalysisof collection correlation matrix
![Page 37: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/37.jpg)
37
MPEG-7 Spectral BasisFunctions
SVD
collection of spectra
Basis Functions (eigenvectors)
Projection coefficients(eigenvalues)
typical: 70% of original 32-dimensional data is captured by 4 sets of basisfunctions and projection coefficients
Each spectrum can be expressed as a linear combination of the basis
![Page 38: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/38.jpg)
38
Perception-based approaches
q Pitch perception
r Loudness percetion
s Critical Bands
t Mel-Frequency Cepstral Coefficients
u Masking
v Perceptual Audio Compression (MPEG)
![Page 39: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/39.jpg)
39
The Human Ear
PinnaAuditory canalEar Drum Stapes-Malleus-Incus (gain control)Cochlea (freq. analysis)Auditory Nerve ?
Wave travels to cutoff slowing down increasing in amplitudepower is absorbed
Each frequency has a position of maximum displacement
![Page 40: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/40.jpg)
40
Masking
Two frequencies -> beats -> harsh -> seperate
Inner Hair Cell excitation
Frequency Masking Temporal Masking
Pairs of sine waves (one softer) – how much weaker in order to be masked ?(masking curves) wave of high frequency can not mask a wave of lower frequency
![Page 41: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/41.jpg)
41
Masking Demo
From “Music, Cognition and Computerized Sound”P. Cook (Editor) MIT Press
High sine waves mask low: 500 Hz tone at 0dB with lowertones at -40dB, 300, 320, 340, 360, 380, 400, 420, 440, 460, 480 Hz
Low sine waves mask hih: 500Hz tone at 0dB with higher tones at -40dB, 1700, 1580, 1460, 1340, 1220, 1100, 980, 860, 740, 620 Hz
![Page 42: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/42.jpg)
42
Critical Bands
w Critical bandwidth = two sinusoidal signals interact or mask one another
x Bark scale (24 critical bands)
y [0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700]
z samplings of a continuous variation in the frequency response of the ear to a sinusoid or narrow band process
{ there is no discrete filterbank in the ear
![Page 43: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/43.jpg)
43
Fletcher-Munson Curves
Loudness is a perceptual (not physical)quantity i.e two sound with same SPL differentfrequencies are perceivedto have different loudness
(used in PLP)
for a soft sound at 50Hzto sound as loud as one at 2000 Hz 50dB more intense(100,000 times more power)
![Page 44: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/44.jpg)
44
Pitch Perception I
| Pitch is not just fundamental frequency
} Periodicity or harmonicity or both ?
~ Human judgements (adjust sine method)
� 1924 Fletcher – harmonic partials missing fundamental (pitch is still heard)
� Examples: phone, small radio
� Terhardt (1972), Licklider (1959)
� duplex theory of pitch (virtual & spectral pitch)
![Page 45: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/45.jpg)
45
Pitch Perception II
� One perception – two overlapping mechanisms
� Counting cycles of period < 800Hz
� Place of excitation along basilar membrane > 1600 Hz
FFT Pick sinusoids weighting / maskingcandidate generation
common divisorsmost likely pitch
![Page 46: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/46.jpg)
46
Mel Frequency Cepstral Coefficients
Mel-scale13 linearly-spaced filters 27 log-spaced filters
CFCF-130CF / 1.0718
CF+130CF * 1.0718
Mel-filtering
Log
DCT
MFCCs
![Page 47: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/47.jpg)
47
Cepstrum
Measure of periodicity of frequency response plot
S(ej
�
) = H(ej
�
) E (e j
�
) H is linear filter, E is excitation
log(|S(ej
�
)|) = log (|H(ej
�
)|) + log(|E (e j
�
)|)
(homomorphic transformation – the convolution of two signals becomesequivalent to the sum of their cepstra)
Aims to deconvolve the signal (low order coefficients filter shape – highorder coefficients excitation with possible F0) Cepstral coefficients can also be derived from LPC analysis
![Page 48: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/48.jpg)
48
Discrete Cosine Transform
� Strong energy compaction
� For certain types of signals approximates KL transform (optimal)
� Low coefficients represent most of the signal
� Can throw high coefficients
� MFCCs keep first 13-20
� MDCT (overlap-based) used in MP3, AAC, Vorbis audio compression
![Page 49: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/49.jpg)
49
Short MPEG Audio Coding Overview (mp3)
Signal AnalysisFilterbank
Psychoacoustics Model
32-linearlyspaced bands
available bits
MPEG Perceptual Audio Coding
Encoder: slow, complicatedDecoder: fast, simple
![Page 50: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/50.jpg)
50
Psychoacoustic Model
� Each band is quantized
� Quantization introduces noise
� Adapt the quantization so that it is inaudible
� Take advantage of masking
� Hide quantization noise where it is masked
� MPEG standarizes how the quantized bits are transmitted not the psychoacoustic model – (only recommended)
![Page 51: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/51.jpg)
51
MP3 Feature Extraction
� Feature extraction while decoding MPEG audio compressed data (mp3 files)
� Free analysis for encoding
� Space and time savings
EncoderAnalysis for compression
Analysis for featureextraction
Decoder
Feature extractionFeature extraction
Pye ICASSP 00 Tzanetakis & Cook ICASSP 00
![Page 52: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/52.jpg)
52
Music-specificAudio Features
� Beat extraction and rhythm representation
� Multi-pitch analysis and transcription
� Chroma
� MPEG-4 Structured Audio
� Similarity Retrieval
Genre Classification
¡ Score following
![Page 53: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/53.jpg)
53
Importance of Music
¢ 4 m CD tracks
£ 4000 CDs / month
¤ 60-80% ISP bandwidth
¥ Napster- 1.57m sim.users (00)
¦ 61.3m downloaded music (01)
§ Kazaa – 230 m downloads (03)
¨ Global, Pervasive, Complex
![Page 54: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/54.jpg)
54
Traditional Music Representations
![Page 55: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/55.jpg)
55
Rhythm
© Rhythm = movement in time
ª Origins in poetry (iamb, trochaic...)
« Foot tapping definition
¬ Hierarchical semi-periodic structure at multiple levels of detail
Links to motion, other sounds
® Running vs global
![Page 56: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/56.jpg)
56
Event-based Alghoniemy, Tewfik WMSP99Dixon ICMC02Goto, Muraoka IJCA97 Gouyon et al DAFX 00Laroche WASPAA 01Seppanen WASPAA 01
Event onset detection IOI computation Beat model
Single bandMultiple band ThresholdingChord changes
Successive onset pairsAll onset pairsQuantization
Constant-tempoMeasure changesComputational costMultiple hyphotheses
![Page 57: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/57.jpg)
57
Self-similarity Goto, Muraoka CASA98Foote, Uchihashi ICME01Scheirer JASA98Tzanetakis et al AMTA01
InputSignal
Full Wave Rectification - Low Pass Filtering - Normalization
+
Peak Picking
Beat Histogram
Envelope Extraction
DWT
Autocorrelation
![Page 58: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/58.jpg)
58
Beat HistogramsTzanetakis et al AMTA01
max(h(i)), argmax(h(i))
![Page 59: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/59.jpg)
59
Beat SpectrumFoote, Uchihashi 01
![Page 60: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/60.jpg)
60
Rhythmic content features
¯ Main tempo
° Secondary tempo
± Time signature
² Beat strength
³ Regularity
![Page 61: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/61.jpg)
61
Multiple Pitch Detection
> 1kHz
< 1kHz
Half-wave RectifierLowPass
PeriodicityDetection
PeriodicityDetection
SACF Enhancer
Pitch Candidates
C GC G
(7 * c ) mod 12
Circle of 5s
Tzanetakis et al, ISMIR 01Tolonen and Karjalainen, TSAP 00
![Page 62: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/62.jpg)
62
Chroma – Pitch perception
![Page 63: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/63.jpg)
63
MIDI
´ Musical Instrument Digital Interfaces
µ Hardware interface
¶ File Format
· Note events
¸ Duration, discrete pitch, "instrument"
¹ Extensions
º General MIDI
» Notation, OMR, continuous pitch
![Page 64: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/64.jpg)
64
Structured Audio
MPEG-4 SA Eric Scheirer
Instead of samples store sound as a computerprogram that generates audio samples
0.25 tone 4.04.50 end
SASL
instr tone (){ asig x, y, init; if (init = 0) { init=1; x=0;}x=x - 0.196307* y;y=y + 0.196307* x;output(y);}
SAOL
![Page 65: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/65.jpg)
65
Query-by-Example Content-based Retrieval
Rank List Collection of 3000 clips
![Page 66: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/66.jpg)
66
Automatic Musical Genre Classification
¼ Categorical music descriptions created by humans
½ Fuzzy boundaries
¾ Statistical properties
¿ Timbral texture, rhythmic structure, harmonic content
À Automatic Musical Genre Classification
Á Evaluate musical content features
 Structure audio collections
![Page 67: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/67.jpg)
67
GenreGram DEMO
Dynamic real time 3D display for classification of radio signals
![Page 68: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/68.jpg)
68
Structural Analysis
à Similarity matrix
Ä Representations
Å Notes
Æ Chords
Ç Chroma
È Greedy hill-climbing algorithm
É Recognize repeated patterns
Ê Result = AABA (explanation)
Dannenberg & Hu, ISMIR 2002Tzanetakis, Dannenberg & Hu, WIAMIS 03
![Page 69: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/69.jpg)
69
An example – Naima(demo ?)
![Page 70: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/70.jpg)
70
Music Representations
Ë Symbolic Representation– easy to manipulate– “flat” performance
Ì Audio Representation– expressive performance– opaque & unstructured
Align
POLYPHONIC AUDIO AND MIDIALIGNMENT
![Page 71: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/71.jpg)
71
Similarity Matrix
Similarity Matrix for Beethoven’s 5th Symphony, first movement
Optimal Alignment
Path
Oboe solo:•Acoustic Recording•Audio from MIDI
(Duration: 6:17)
(Dur
atio
n: 7
:49)
POLYPHONIC AUDIO AND MIDIALIGNMENT
![Page 72: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/72.jpg)
72
Audio Fingerprinting and Watermarking
Í Watermarking
Î Copyright protection
Ï Proof of ownership
Ð Usage policies
Ñ Metadata hiding
Ò Fingerprinting
Ó Tracking
Ô Copyright protection
Õ Metadata linking
![Page 73: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/73.jpg)
73
Watermarking
Ö Steganography (hiding information in messages – invisible ink )
WatermarkEmbedder
watermark data
key
signalrepres.
transmission attacks
WatermarkExtractor
key
watermark data
(original music)
![Page 74: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/74.jpg)
74
Desired Properties
× Perceptually hidden (inaudible)
Ø Statistically invisible
Ù Robust against signal processing
Ú Tamper resistant
Û Spread in the music, not in header
Ü key dependent
![Page 75: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/75.jpg)
75
Representationsfor Watermarking
Ý Basic Principles
Þ Psychoacoustics
ß Spread Spectrum
à redundant spread of information in TF plane
á Representations
â Linear PCM
ã Compressed bitstreams
ä Phase, stereo
å Parametric representations
![Page 76: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/76.jpg)
76
Watermarkingon parametric representations
Yi-Wen LiuJ. Smith 2004
message
parameters p
QIMp' Audio
SynthesisAttack
ParameterEstimation
Min distancedecoding
W
W' p''Choose attack tolerance quantize so perceptual distortion < t and lattice finding possible
![Page 77: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/77.jpg)
77
Problems with watermarking
æ The security of the entire system depends on devices available to attackers
ç Breaks Kerckhoff's Criterion: A security system must work even if reverse-engineered
è Mismatch attacks
é Time stretch audio – stretch it back (invertible)
ê Oracle attacks
ë Poll watermark detector
![Page 78: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/78.jpg)
78
Audio Fingerprinting
ì Each song is represent as a fingerprint (small robust representation)
í Search database based on fingerprint
î Main challenges
ï highly robust fingerprint extraction
ð efficient fingerprint search strategy
ñ Information is summarized from the whole song – attacks degrade unlike watermarking
![Page 79: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/79.jpg)
79
Hash functions
ò H(X) -> maps large X to small hash value
ó compare by comparing hash value
ô Perceptual hash function ?
õ impossible to get exact matching
ö Perceptually similar objects result in similar fingerprints
÷ Detection/false alarm tradeoff
![Page 80: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/80.jpg)
80
Properties
ø Robustness
ù Reliability
ú Fingerprint size
û Granularity
ü Search speed and scalability
![Page 81: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/81.jpg)
81
Fraunhofer
ý LLD Mpeg-7 framework (SFM)
þ Vector quantization (k-means)
ÿ Codebook of representative vectors
� Database target signature is the codebook
� Query -> sequence of feature vectors
� Matching by finding “best” codebook
� Robust not very scalable (O(n) search))
Allamanche Ismir2001
![Page 82: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/82.jpg)
82
Philips Research
� 32-bit subfingerprints for every 11.6 msec
� overlapping frames of 0.37 seconds (31/32 overlap)
� PSD -> logarithmic band spacing (bark)
� bits 0-1 sign of energy
� looks like a fingerprint
assume one fingerprint perfect – hierarchical database layout (works ok)
Haitsa & Kalker Ismir 2002
![Page 83: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/83.jpg)
83
Shazam Entertainment
Pick landmarks on audio – calculate fingerprint
� histogram of relative time differences for filtering
� Spectrogram peaks (time, frequency)
![Page 84: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/84.jpg)
84
Spectrogram Peaks
Very robust – even over noisy cell phones
![Page 85: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/85.jpg)
85
Audio Fingerprinting
Music piece
SignatureDatabase
Signature 1.5 millionMatching
Robustness
300 bytes
1sec
20msec
Copyright, metadata
moodlogic.net
![Page 86: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/86.jpg)
86
Auditory Scene Analysis
Music and Sound Cognition
� Onset detection
� Toward Transcription
![Page 87: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/87.jpg)
87
Auditory Scene Analysis
� Auditory stream
� perceptual grouping of parts of the neural spectrogram that go together
� Sound is a mixture and is transparent
� Primitive process of streaming
� Schemas for particular classes of sounds
� Grouping
� across time (sequential)
� across freq (simultaneous)
Bregman
![Page 88: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/88.jpg)
88
Onset detection
Naive: peaks in powerMultiband (wavelet, filterbanks)
Synchronicity Temporal Continuity Common Fate Proximity
![Page 89: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/89.jpg)
89
Polyphonic Transcription
Original Transcribed
Mixture signal Noise Suppression
Klapuri et al, DAFX 00
Predominant pitch estimation
Remove detected sound
Estimate # voicesiterate
![Page 90: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/90.jpg)
90
Summary
� Applications and especially analysis have different requirements -> different features
� wide variety of proposed audio features
� still many to be found .... hopefully by you :-)
![Page 91: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/91.jpg)
91
Future Challenges
� Main challenges
� escape HMM and MFCC
� tackle the general problem of auditory scene analysis
� “real learning”
� active audition - search for evidence rather than try to find
![Page 92: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/92.jpg)
92
Implementation
MARSYAS : free software framework for computer audition research
! marsyas.sourceforge.net
" Server in C++ (numerical signal processing and machine learning)
# Client in JAVA (GUI)
$ Linux, Solaris, Irix and Wintel (VS , Cygwin)
Tzanetakis & Cook Organized Sound 4(3) 00
![Page 93: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/93.jpg)
93
Marsyas users
Desert Island
Jared HoberockDan KellyBen Tietgen
Marc Cardle
Music-drivenmotion editing
Real timemusic-speechdiscrimination
![Page 94: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/94.jpg)
94
Auditory Scene Analysis
Albert Bregman
![Page 95: ICME 2004 Tutorial: Audio Feature Extractionwebhome.cs.uvic.ca/.../work/talks/icme04/icme04tutorial.pdf · 2004. 10. 22. · ICME 2004 Tutorial: Audio Feature Extraction George Tzanetakis](https://reader036.fdocuments.us/reader036/viewer/2022071418/6115d9a63f9bcf50245e4de3/html5/thumbnails/95.jpg)
95
THE END
% Perry Cook, Robert Gjerdingen, Ken Steiglitz
& Malcolm Slaney, Julius Smith, Richard Duda
' Georg Essl, John Forsyth
( Andreye Ermolinskiy, Doug Turnbull, George Tourtellot, Corrie Elder
) ISMIR, WASPAA, ICMC, DAFX, ICASSP , ICME