1 Robust Temporal and Spectral Modeling for Query By Melody Shai Shalev, Hebrew University Yoram...

21
1 Robust Temporal and Spectral Modeling for Query By Melody Shai Shalev, Hebrew University Yoram Singer, Hebrew University Nir Friedman, Hebrew University Shlomo Dubnov, Ben-Gurion University
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    1

Transcript of 1 Robust Temporal and Spectral Modeling for Query By Melody Shai Shalev, Hebrew University Yoram...

1

Robust Temporal and Spectral Modeling for

Query By Melody

Shai Shalev, Hebrew University

Yoram Singer, Hebrew University

Nir Friedman, Hebrew University

Shlomo Dubnov, Ben-Gurion University

2Prelude

3 Problem Setting

Database of real recordings

Query: a melody

Find: performances of the queried melody

4

Challenge• Find performances of the queried

melody independent of:– Tempo – Performing instrument – Dynamics – Expression – Accompaniment

5

Related Work• A. Ghias, et al. “Query by humming”

• A. S. Durey and M. A. Clements. “Melody spotting using hidden markov models”

• C. Raphael. “Automatic segmentation of acoustic musical signals using HMMs”

• B. Doval and X. Rodet. “Fundamental frequency estimation using a new harmonic matching method”

6

Overview of Solution

• Employ a statistical framework

• Align a melody to a performance using an explicit tempo modeling

• Employ a maximum likelihood model for the spectrum of a note given the note’s pitch value

• Find the best alignment of a melody to a performance

using dynamic programming

7

Statistical Framework

Query Engine

M)|SP( i

For each recording

find:

A database of real recordings

L1 S,...,S

A melody query

)p,(d),...,p,(dM kk11

Ranked list of

L1 S,...,S

According to

M)|SP( i

8 Melody Modeling TT

M))A(T,|P(S P(T)M)|T,P(S

HiddenVariable

ObservedVariable

Legend:

M)|P(S

M))A(T,|P(S P(T)Tmax

Melody

)p,(d),...,p,(d kk11

Tempo

)t(t k1,...,

Aligned Melody

)p,d(),...,p,d( kkk111 tt

Sound

n1 s,...,s

9

Tempo Modeling

• Sequence of scaling factors (one per note)

• Model tempo as a first order Markov model

k

2i1ii1k1 )T|P(T)P(T)T,...,P(T

• Use log-normal distribution to model conditional probability of tempo

ρ)),(log(T~)T | log(T 1-i1-ii Ν

10 Spectral Modeling

1st harmonic 2nd harmonic

3rd harmonic 4th harmonic

hH

h0h -A)S(

1

11 Spectral Modeling

)()( 00)F( NS

0 500 1000 1500 2000 2500 3000 3500 4000 45000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(Hz)

F(

)

NoiseSignal

12 Spectral Modeling (cont.)

0 500 1000 1500 2000 2500 3000 3500 4000 45000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(Hz)

F(

)

NoiseSignal

)(-A)F( 0h

hH

h 1

13

Spectral Modeling (cont.)

• Estimate the amplitude at each harmonyand global variance of the noise using the maximum likelihood principle

• Resulting signal-to-noise likelihood function:

2

0

2

00

)N(

)S(log))|log(P(F

14Finding the best

melody-performance alignment• Recurse over tempo and end-time of the previous note

Dynamic Programming procedure

• Complexity:

)MTO(k 2

#notes Length of Signal

#Possible Tempo values

15

• Queries: 50 melodies from opera arias (from Midi files)

• Database: over 800 performances of opera arias performed by over 50 tenors with full orchestral accompaniment

• Compared our variable-tempo (VT) model vs. fixed-tempo (FT) and locally-fixed-tempo (LFT) models

• Compared our Harmonic with Scaled Noise (HSN) spectral model vs. Harmonic with Independent Noise (HIN) model

Experimental Results

16

Evaluation Measures

Oerr = 0

Cov = 3 - 2

+-

+

-- -

--

Lik

elih

ood

Val

ue

Index of Performancein the ranked list

1 2 3 4 5

3

2

1

1

2

1AvgP

17

Summary of Results

• One Error of VT+HSN: 8%

• Average Precision of VT+HSN: 95%

• Coverage of VT+HSN: 0.21

18 Results

0.7521.670.350.6922.960.38FT

0.7517.940.370.6917.330.43LFT

0.6911.830.460.6510.670.51VT5

Sec.

0.7319.080.360.7119.830.38FT

0.428.150.660.448.100.66LFT

0.193.020.830.191.750.86VT15Sec.

0.7922.460.330.7720.690.34FT

0.485.980.630.465.900.66LFT

0.100.400.920.080.210.95VT25Sec.

OerrCovAvgPOerrCovAvgP

HINHSN

Spectral Distribution Model

19

Precision-Recall

0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Recall

Pre

cisi

on

FT/25LFT/25VT/25

20

Illustration of Segmentation

21

Future Work

• More data • Other genre of music • Alternative spectral distribution models using

supervised learning methods. • Use alignment results for separating a soloist from the accompaniment