Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan...

22
Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio Lab, http://music.cs.northwestern.edu For presentation in ICASSP 2010, Dallas, Texas, USA.

Transcript of Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan...

Page 1: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Song-level Multi-pitch Tracking by Heavily Constrained Clustering

Zhiyao Duan, Jinyu Han and Bryan Pardo

EECS Dept., Northwestern Univ.

Interactive Audio Lab, http://music.cs.northwestern.edu

For presentation in ICASSP 2010, Dallas, Texas, USA.

Page 2: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Multi-pitch Estimation & Tracking Task

• Given polyphonic music played by several monophonic harmonic instruments (Num known)

• Estimate a pitch trajectory for each instrument

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 2

Page 3: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Potential Applications

• Automatic music transcription• Harmonic source separation• Other applications

– Melody-based music search– Chord recognition– Source localization– Music education– ……

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 3

Page 4: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

The 2-stage Standard Approach

• Stage 1: Multi-pitch Estimation (MPE): estimate pitches in each single time frame– Z. Duan, B. Pardo and C. Zhang. , “Multiple Fundamental Frequency

Estimation by Modeling Spectral Peaks and Non-peak Regions”, IEEE Trans. Audio Speech Language Process., in press.

• Stage 2: Multi-pitch Tracking (MPT): connect pitch estimates across frames into pitch trajectories

4

Time

Frequen

cy

Page 5: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

State of the Art of MPT

• What existing MPT methods do– Form short pitch trajectories within a note,

(note-level) according to local time-frequency proximity of pitch estimates

• Our contribution– Form long pitch trajectories through multiple

notes (song-level) using a new constrained clustering algorithm

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 5

Page 6: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Try Clustering by Timbre

• Each trajectory is a cluster of pitch estimates• One cluster per instrument• Clustering principle: maintain timbre

consistency in each cluster

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

?

Page 7: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Timbre Feature of Pitch Estimates• Harmonic structure: relative amplitudes

of first 50 harmonics

Time

Freq

uen

cy

0 10 20 30 40 500

20

40

60

80

100

Harmonic number

Ampl

itude

(dB)

Harmonic Structure

Page 8: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Minimize This Objective Function

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

2

1

( )ki

K

i kk T

f

cx

A partitioninto K clusters

The 50-d harmonicstructure of i-thpitch estimate

Number ofClusters

Center of k-th cluster

For all pitch estimates in k-th cluster

Page 9: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Objective Function Is Not Enough

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Page 10: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Add Pitch-locality Constraints

• Must-link: pitch estimates close in both time and frequency should be in the same cluster

• Cannot-link: simultaneous pitches should not be in the same cluster (only for monophonic instruments)

10Time

Frequency

Page 11: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Properties of Our Problem

• Objective: timbre consistency• Constraints: pitch locality• Previous constrained clustering algorithms do

not apply due to the following properties:– Inconsistent constraints:

pitch estimates sometimes erroneous

may make constraints unsatisfiable– Heavily constrained:

nearly every pitch estimate is involved in at least one constraint

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Page 12: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

The Proposed Clustering Algorithm

: clustering in n-th iteration;

: {all constraints satisfied by } ;

1. Start from an initial clustering , which satisfies , a subset of all constraints; n=1;

2. Find a new clustering that decreases the objective and also satisfies ;

3. = {all constraints satisfied by } ;

4. Repeat 2-4 until the objective (nearly) cannot be decreased;

0 0C

n1nC

nnC

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

0 1C CC

f

0 1) ( ) ( )(f f f

n

nCn

Page 13: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Initial Clustering

• Trivial one– : a random partition– : constraints satisfied by , may be empty

• A more informative one for MPT– : label pitches according to pitch order in each

frame: highest, second-highest, third.., fourth…– : will contain all cannot-links

0

0C 0

0

0C

…Time

Freq

uen

cy …Time

Freq

uen

cy

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Page 14: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

• 1. Satisfy current constraints• 2. Decrease the objective function

: satisfied cannot-link : unsatisfied cannot-link

: satisfied must-link : unsatisfied cannot-link

• Swap set: A connected subgraph between two clusters. • Traverse all swap sets until finding a new clustering that

decreases the objective function

4

2 3

7

8

3

1

5 6

Find A New Clustering

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

4

2 3

7

8

3

1

5 6

4

2 3

7

8

3

1

5 6

0 1C CC 0 1) ( ) ( )(f f f

Page 15: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Algorithm Review

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

: partition of points into clusters

: feasible solution space under constraints

kkS kC

Page 16: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Experiments

• Data set– 10 J.S. Bach chorales (quartets, played by violin,

clarinet, saxophone and bassoon)– Each instrument is recorded individually, then mixed

• Ground-truth pitch trajectories– Use YIN on monophonic tracks before mixing

• Input pitch estimates– Our previous work in [1]– Input accuracy: 70.0+-3.1%

[1] Zhiyao Duan, Bryan Pardo and Changshui Zhang, “Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-peak Regions”, IEEE Trans. Audio Speech Language Process., in press.

16Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Page 17: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Overall Multi-pitch Tracking Results

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Mean % of correct pitch estimates

Page 18: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Among Correctly Estimated Pitches

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Page 19: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

An Example

0 5 10 15 20 25

40

50

60

70

80

90

Time (second)

Pitc

h (M

IDI

num

ber)

Ground-truth Pitch Trajectories

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Page 20: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

An Example

0 5 10 15 20 25

40

50

60

70

80

90

Time (second)

Pitc

h (M

IDI

num

ber)

Our Resutls

Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Page 21: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Conclusion

• Formulate the song-level Multi-pitch Tracking problem as a constrained clustering problem– Objective: timbre consistency– Constraints: pitch locality

• Existing constrained clustering algorithms do not apply due to problem properties

• Propose a new constrained clustering algorithm

• Experimental results are promisingNorthwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu

Page 22: Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.

Thanks you!

22Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu