1 Robust Temporal and Spectral Modeling for Query By Melody Shai Shalev, Hebrew University Yoram...
-
date post
19-Dec-2015 -
Category
Documents
-
view
223 -
download
1
Transcript of 1 Robust Temporal and Spectral Modeling for Query By Melody Shai Shalev, Hebrew University Yoram...
1
Robust Temporal and Spectral Modeling for
Query By Melody
Shai Shalev, Hebrew University
Yoram Singer, Hebrew University
Nir Friedman, Hebrew University
Shlomo Dubnov, Ben-Gurion University
3 Problem Setting
Database of real recordings
Query: a melody
Find: performances of the queried melody
4
Challenge• Find performances of the queried
melody independent of:– Tempo – Performing instrument – Dynamics – Expression – Accompaniment
5
Related Work• A. Ghias, et al. “Query by humming”
• A. S. Durey and M. A. Clements. “Melody spotting using hidden markov models”
• C. Raphael. “Automatic segmentation of acoustic musical signals using HMMs”
• B. Doval and X. Rodet. “Fundamental frequency estimation using a new harmonic matching method”
6
Overview of Solution
• Employ a statistical framework
• Align a melody to a performance using an explicit tempo modeling
• Employ a maximum likelihood model for the spectrum of a note given the note’s pitch value
• Find the best alignment of a melody to a performance
using dynamic programming
7
Statistical Framework
Query Engine
M)|SP( i
For each recording
find:
A database of real recordings
L1 S,...,S
A melody query
)p,(d),...,p,(dM kk11
Ranked list of
L1 S,...,S
According to
M)|SP( i
8 Melody Modeling TT
M))A(T,|P(S P(T)M)|T,P(S
HiddenVariable
ObservedVariable
Legend:
M)|P(S
M))A(T,|P(S P(T)Tmax
Melody
)p,(d),...,p,(d kk11
Tempo
)t(t k1,...,
Aligned Melody
)p,d(),...,p,d( kkk111 tt
Sound
n1 s,...,s
9
Tempo Modeling
• Sequence of scaling factors (one per note)
• Model tempo as a first order Markov model
k
2i1ii1k1 )T|P(T)P(T)T,...,P(T
• Use log-normal distribution to model conditional probability of tempo
ρ)),(log(T~)T | log(T 1-i1-ii Ν
11 Spectral Modeling
)()( 00)F( NS
0 500 1000 1500 2000 2500 3000 3500 4000 45000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(Hz)
F(
)
NoiseSignal
0ω
12 Spectral Modeling (cont.)
0 500 1000 1500 2000 2500 3000 3500 4000 45000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(Hz)
F(
)
NoiseSignal
0ω
)(-A)F( 0h
hH
h 1
13
Spectral Modeling (cont.)
• Estimate the amplitude at each harmonyand global variance of the noise using the maximum likelihood principle
• Resulting signal-to-noise likelihood function:
2
0
2
00
)N(
)S(log))|log(P(F
14Finding the best
melody-performance alignment• Recurse over tempo and end-time of the previous note
Dynamic Programming procedure
• Complexity:
)MTO(k 2
#notes Length of Signal
#Possible Tempo values
15
• Queries: 50 melodies from opera arias (from Midi files)
• Database: over 800 performances of opera arias performed by over 50 tenors with full orchestral accompaniment
• Compared our variable-tempo (VT) model vs. fixed-tempo (FT) and locally-fixed-tempo (LFT) models
• Compared our Harmonic with Scaled Noise (HSN) spectral model vs. Harmonic with Independent Noise (HIN) model
Experimental Results
16
Evaluation Measures
Oerr = 0
Cov = 3 - 2
+-
+
-- -
--
Lik
elih
ood
Val
ue
Index of Performancein the ranked list
1 2 3 4 5
3
2
1
1
2
1AvgP
17
Summary of Results
• One Error of VT+HSN: 8%
• Average Precision of VT+HSN: 95%
• Coverage of VT+HSN: 0.21
18 Results
0.7521.670.350.6922.960.38FT
0.7517.940.370.6917.330.43LFT
0.6911.830.460.6510.670.51VT5
Sec.
0.7319.080.360.7119.830.38FT
0.428.150.660.448.100.66LFT
0.193.020.830.191.750.86VT15Sec.
0.7922.460.330.7720.690.34FT
0.485.980.630.465.900.66LFT
0.100.400.920.080.210.95VT25Sec.
OerrCovAvgPOerrCovAvgP
HINHSN
Spectral Distribution Model