Harmonically Informed Multi-pitch Tracking Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept.,...

22
Harmonically Informed Multi-pitch Tracking Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio Lab, http://music.cs.northwestern.edu For presentation in ISMIR 2009, Kobe, Japan.

Transcript of Harmonically Informed Multi-pitch Tracking Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept.,...

Harmonically Informed Multi-pitch Tracking

Zhiyao Duan, Jinyu Han and Bryan PardoEECS Dept., Northwestern Univ.

Interactive Audio Lab, http://music.cs.northwestern.edu

For presentation in ISMIR 2009, Kobe, Japan.

• Given polyphonic music played by several monophonic harmonic instruments

• Estimate a pitch trajectory for each instrument

The Multi-pitch Tracking Task

2Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Potential Applications

• Automatic music transcription• Harmonic source separation• Other applications

– Melody-based music search– Chord recognition– Music education– ……

3Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

The 2-stage Standard Approach

• Stage 1: Multi-pitch Estimation (MPE) in each single frame

• Stage 2: Connect pitch estimates across frames into pitch trajectories

4Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Time

Fre

quen

cy

State of the Art

5Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

• How far has existing work gone?– MPE is not very robust– Form short pitch trajectories (within a

note) according to local time-frequency proximity of pitch estimates

• Our contribution– A new MPE algorithm– A constrained clustering approach to

estimate pitch trajectories across multiple notes

System Overview

6Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

0 10 20 30 40

40

50

60

70

80

90

Time (second)

Fre

quency (

MID

I num

ber)

0 10 20 30 40

40

50

60

70

80

90

Time (second)

Fre

que

ncy (

MID

I n

um

be

r)

0 10 20 30 40-1

0

1

Time (second)

Am

plit

ud

e

Frequency

Am

plitu

de

Multi-pitch Estimation in Single Frame

• A maximum likelihood estimation method

• Spectrum: peaks & the non-peak region

Best F0 estimate (a set of F0s)

Observed power spectrum

F0 hypothesis, (a set of F0s)

7Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

True F0True F0

Likelihood Definition

Likelihood of observing these peaks

Likelihood of not having any harmonics in the NP region

8Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

F0 Hyp F0 Hyp

is large is small

is large is small

Likelihood Definition

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 9

True F0

F0 Hyp

Likelihood of observing these peaks

Likelihood of not having any harmonics in the NP region

is large is large

Pitch Trajectory Formation

10

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

• How to form pitch trajectories ?– View it as a constrained clustering problem!

• We use two clustering cues– Global timbre consistency– Local time-frequency locality

Global Timbre Consistency

• Objective function– Minimize intra-cluster distance

• Harmonic structure feature– Normalized relative amplitudes of harmonics

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 111st Component

2nd

Com

pone

nt

PCA of Harmonic Structures

ViolinClarinetSaxophoneBassoon

0 10 20 30 40 500

20

40

60

80

100

Harmonic number

Am

plitu

de (

dB)

Harmonic Structure

Local Time-frequency Locality

• Constraints– Must-link: similar pitches in adjacent frames– Cannot-link: simultaneous pitches

• Finding a feasible clustering is NP-hard!

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 12

Time

Fre

quen

cy

Our Constrained Clustering Process

• 1) Find an initial clustering– Labeling pitches according to pitch order in

each frame: First, second, third, fourth

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 13

Time

Fre

quen

cy

Time

Fre

quen

cy

Our Constrained Clustering Process

• 2) Define constraints– Must-link: similar pitches in adjacent

frames and the same initial cluster: Notelet

– Cannot-link: simultaneous notelets

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 14

Time

Fre

quen

cy

Our Constrained Clustering Process• 3) Update clusters to minimize objective

function– Swap set: A set of notelets in two clusters

connected by cannot-links– Swap notelets in a swap set between clusters if

it reduces objective function– Iteratively traverse all the swap sets

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 15

Time

Fre

quen

cy

Data Set

• Data set– 10 J.S. Bach chorales (quartets, played by

violin, clarinet, saxophone and bassoon)– Each instrument is recorded individually, then

mixed

• Ground-truth pitch trajectories– Use YIN on monophonic tracks before mixing

16Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Experimental Results

17Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Mean +- Std Precision (%) Recall (%)

How many pitches are correctly estimated?

Klapuri, ISMIR2006

87.2 +- 2.0 66.2 +- 3.4

Ours 88.6 +- 1.7 77.0 +- 3.5

How many pitches are correctly estimated and put into the correct trajectory?

Chance Approx 0.0 Approx 0.0

Ours 76.9 +- 11.0 67.1 +- 11.9

How many notes are correctly estimated?

Chance Approx 0.0 Approx 0.0

Ours 46.0 +- 5.5 54.3 +- 5.5

Ground Truth Pitch Trajectories

18Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

J.S. Bach, “Ach lieben Christen, seid getrost”

Our System’s Output

19Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

J.S. Bach, “Ach lieben Christen, seid getrost”

Conclusion

• Our multi-pitch tracking system– Multi-pitch estimation in single frame

• Estimate F0s by modeling peaks and the non-peak region

• Estimate polyphony, refine F0s estimates

– Pitch trajectory formation• Constrained clustering

– Objective: timbre (harmonic structure) consistency– Constraints: local time-frequency locality of pitches

• A clustering algorithm by swapping labels

• Results on music recordings are promising

20Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Thanks you!Q & A

21Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Possible Questions

• How much does our constrained clustering algorithm improve from the initial pitch trajectory (label pitches by pitch order)?

22Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

1 2 3 40

20

40

60

80

100

Track No.

Acc

urac

y (%

)

Initial trajectoryFinal trajectory