Music Information Retrieval in Polyphonic Mixtures

62
Music Information Retrieval in Polyphonic Mixtures Steve Tjoa MIR Workshop CCRMA, Stanford University iZotope, Inc. San Francisco, CA, USA June 27, 2012

Transcript of Music Information Retrieval in Polyphonic Mixtures

Page 1: Music Information Retrieval in Polyphonic Mixtures

Music Information Retrieval in PolyphonicMixtures

Steve Tjoa

MIR WorkshopCCRMA, Stanford University

iZotope, Inc.San Francisco, CA, USA

June 27, 2012

Page 2: Music Information Retrieval in Polyphonic Mixtures

A bit about myself...

Page 3: Music Information Retrieval in Polyphonic Mixtures

A bit about myself...

Page 4: Music Information Retrieval in Polyphonic Mixtures

A bit about myself...

Page 5: Music Information Retrieval in Polyphonic Mixtures

Quick Review

What are the three main components of any classificationsystem?

What are some useful features for MIR?

What are some problems and applications addressed by MIR?

Page 6: Music Information Retrieval in Polyphonic Mixtures

Quick Review

What are the three main components of any classificationsystem?

What are some useful features for MIR?

What are some problems and applications addressed by MIR?

Page 7: Music Information Retrieval in Polyphonic Mixtures

Quick Review

What are the three main components of any classificationsystem?

What are some useful features for MIR?

What are some problems and applications addressed by MIR?

Page 8: Music Information Retrieval in Polyphonic Mixtures

Music Transcription

From this song...

get the “piano roll”:

e3

f3

f#3

g3

g#3

a3

hb3

h3

c4

c#4

d4

eb4

e4

f4

f#4

g4

g#4

a4

hb4

h4

c5

c#5

d5

eb5

e5

f5

f#5

g5

g#5

a5

hb5

h5

c6

c#6

d6

Time (ticks)

Pitch

Beatles - No Reply

100 200 300 400 500 600 700 800 900 1000

140

150

160

170

180

190

200

210

Page 9: Music Information Retrieval in Polyphonic Mixtures

Music Transcription

From this song... get the “piano roll”:

e3

f3

f#3

g3

g#3

a3

hb3

h3

c4

c#4

d4

eb4

e4

f4

f#4

g4

g#4

a4

hb4

h4

c5

c#5

d5

eb5

e5

f5

f#5

g5

g#5

a5

hb5

h5

c6

c#6

d6

Time (ticks)

Pitch

Beatles - No Reply

100 200 300 400 500 600 700 800 900 1000

140

150

160

170

180

190

200

210

Page 10: Music Information Retrieval in Polyphonic Mixtures

Music Source Separation

Isolate, amplify, or suppress a musical voice/instrument.

Example: From these beats...

isolate the kick drum andsnare drum.

Page 11: Music Information Retrieval in Polyphonic Mixtures

Music Source Separation

Isolate, amplify, or suppress a musical voice/instrument.

Example: From these beats... isolate the kick drum andsnare drum.

Page 12: Music Information Retrieval in Polyphonic Mixtures

A Really Special Tool

Nonnegative Matrix Factorization (NMF):

Given X nonnegative, find W and H, both nonnegative, thatminimize some distance d(X,WH).

Easy! And it works.

Meaningful to humans.

Widely used.

Page 13: Music Information Retrieval in Polyphonic Mixtures

A Really Special Tool

Nonnegative Matrix Factorization (NMF):

Given X nonnegative, find W and H, both nonnegative, thatminimize some distance d(X,WH).

Easy! And it works.

Meaningful to humans.

Widely used.

Page 14: Music Information Retrieval in Polyphonic Mixtures

Why NMF?

Energy of musical events are nonnegative.

Page 15: Music Information Retrieval in Polyphonic Mixtures

Brief Refresher: Matrix Multiplication

[1 2

] [ ab

]= a + 2b

[34

] [a b c

]=

[3a 3b 3c4a 4b 4c

]w[a b c

]=[aw bw cw

][

34

]h =

[3h4h

]

Page 16: Music Information Retrieval in Polyphonic Mixtures

Brief Refresher: Matrix Multiplication

[1 2

] [ ab

]= a + 2b

[34

] [a b c

]=

[3a 3b 3c4a 4b 4c

]

w[a b c

]=[aw bw cw

][

34

]h =

[3h4h

]

Page 17: Music Information Retrieval in Polyphonic Mixtures

Brief Refresher: Matrix Multiplication

[1 2

] [ ab

]= a + 2b

[34

] [a b c

]=

[3a 3b 3c4a 4b 4c

]w[a b c

]=[aw bw cw

]

[34

]h =

[3h4h

]

Page 18: Music Information Retrieval in Polyphonic Mixtures

Brief Refresher: Matrix Multiplication

[1 2

] [ ab

]= a + 2b

[34

] [a b c

]=

[3a 3b 3c4a 4b 4c

]w[a b c

]=[aw bw cw

][

34

]h =

[3h4h

]

Page 19: Music Information Retrieval in Polyphonic Mixtures

Nonnegative Matrix Factorizaton

Top right: X. Left: W. Bottom: H. Three piano notes:

1 2 3

Fre

qu

en

cy

3

2

1

Time

Page 20: Music Information Retrieval in Polyphonic Mixtures

Nonnegative Matrix Factorizaton

Top right: X. Left: W. Bottom: H. Kick and snare:

1

Fre

quen

cy

12

Time

2

Spectrogram

Page 21: Music Information Retrieval in Polyphonic Mixtures

NMF Algorithms

Multiplicative update rules:

W←W · XHT

WHHTH← H · WTX

WTWH

See [Lee and Seung, NIPS 2001].

Page 22: Music Information Retrieval in Polyphonic Mixtures

NMF Algorithms

Easy to implement!Python:

1 for iter in range(maxiter):

2 W = multiply(W, (X*H.T)/(W*H*H.T))

3 H = multiply(H, (W.T*X)/(W.T*W*H))

Matlab:

1 for iter=1:maxiter

2 W = W.*(X*H’)./(W*H*H’);

3 H = H.*(W’*X)./(W’*W*H);

4 end

Page 23: Music Information Retrieval in Polyphonic Mixtures

NMF Algorithms

Easy to implement!Python:

1 for iter in range(maxiter):

2 W = multiply(W, (X*H.T)/(W*H*H.T))

3 H = multiply(H, (W.T*X)/(W.T*W*H))

Matlab:

1 for iter=1:maxiter

2 W = W.*(X*H’)./(W*H*H’);

3 H = H.*(W’*X)./(W’*W*H);

4 end

Page 24: Music Information Retrieval in Polyphonic Mixtures

Example: Source Separation

kick and snare:

[kick drum] and [snare drum]

oboe and horn:

Duan et. al: [oboe] and [horn]

Wang et. al: [oboe] and [horn]

Tjoa and Liu: [oboe] and [horn]

Vivaldi, Winter, Four Seasons:

[solo] and [accompaniment]

Page 25: Music Information Retrieval in Polyphonic Mixtures

Example: Source Separation

kick and snare:

[kick drum] and [snare drum]

oboe and horn:

Duan et. al: [oboe] and [horn]

Wang et. al: [oboe] and [horn]

Tjoa and Liu: [oboe] and [horn]

Vivaldi, Winter, Four Seasons:

[solo] and [accompaniment]

Page 26: Music Information Retrieval in Polyphonic Mixtures

Example: Source Separation

kick and snare:

[kick drum] and [snare drum]

oboe and horn:

Duan et. al: [oboe] and [horn]

Wang et. al: [oboe] and [horn]

Tjoa and Liu: [oboe] and [horn]

Vivaldi, Winter, Four Seasons:

[solo] and [accompaniment]

Page 27: Music Information Retrieval in Polyphonic Mixtures

Example: Instrument Recognition

Use NMF to identify the instruments in a musical signal.Observe these atoms:

1

Fre

quen

cy

12

Time

2

Spectrogram

Page 28: Music Information Retrieval in Polyphonic Mixtures

Filter the temporal atoms from NMF [Tjoa and Liu, 2010]:

ta = 0.010

ta = 0.020

ta = 0.040

ta = 0.080

ta = 0.160

ta = 0.320

ta = 0.640

ta = 1.280

0 1 2 3 4 5Time (seconds)

Use support vector machine (SVM) to classify the processedspectral and temporal atoms.

Page 29: Music Information Retrieval in Polyphonic Mixtures

Feature Vector of Kick Drum

1.0000.8000.6000.4000.2000.1000.0500.020

n = 1.2

n = 1.5

n = 2.0

n = 3.0

max

Attack Time (seconds)

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40Time (seconds)

InputAtom

Page 30: Music Information Retrieval in Polyphonic Mixtures

Feature Vector of Snare Drum

1.0000.8000.6000.4000.2000.1000.0500.020

n = 1.2

n = 1.5

n = 2.0

n = 3.0

max

Attack Time (seconds)

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40Time (seconds)

InputAtom

Page 31: Music Information Retrieval in Polyphonic Mixtures

Feature Vector of Trumpet

1.0000.8000.6000.4000.2000.1000.0500.020

n = 1.2

n = 1.5

n = 2.0

n = 3.0 max

Attack Time (seconds)

0.0 0.5 1.0 1.5 2.0Time (seconds)

InputAtom

Page 32: Music Information Retrieval in Polyphonic Mixtures

Feature Vector of Violin

1.0000.8000.6000.4000.2000.1000.0500.020

n = 1.2

n = 1.5

n = 2.0

n = 3.0

max

Attack Time (seconds)

0 1 2 3 4 5 6Time (seconds)

InputAtom

Page 33: Music Information Retrieval in Polyphonic Mixtures

Results: Isolated Instrument Recognition

Experiments on isolated instrument sounds:

Accuracy: 92.3%

Reflect state-of-the-art performance for isolated instrumentrecognition among as many as 24 classes.

Page 34: Music Information Retrieval in Polyphonic Mixtures

Bas

soon

Cla

rine

tF

lute

Obo

eSa

xoph

one

Hor

nT

rom

bone

Tru

mpe

tT

uba

Cel

loV

iola

Vio

linC

ello

Piz

zV

iola

Piz

zV

iolin

Piz

zG

lock

ensp

iel

Gui

tar

Mar

imba

Pia

noX

ylop

hone

Kic

kSn

are

Tim

pani

Tom

s

BassoonClarinet

FluteOboe

SaxophoneHorn

TromboneTrumpet

TubaCelloViola

ViolinCello PizzViola Pizz

Violin PizzGlockenspiel

GuitarMarimba

PianoXylophone

KickSnare

TimpaniToms

0.01

0.10

0.20

0.40

0.60

0.80

1.00

Page 35: Music Information Retrieval in Polyphonic Mixtures

Results: Solo Melodic Phrases

Instrument classifications. One decision per signal.Accuracy: 96.2%.

bass

oon

clar

inet

flute

oboe

horn

trom

bone

trum

pet

tuba

cello

viol

a

viol

in

bassoon

clarinet

flute

oboe

horn

trombone

trumpet

tuba

cello

viola

violin0.00

0.20

0.40

0.60

0.80

1.00

Page 36: Music Information Retrieval in Polyphonic Mixtures

Results: Solo Melodic Phrases

Family classifications. One decision per signal.Accuracy: 97.4%.

win

d

bras

s

stri

ngs

wind

brass

strings

0.00

0.20

0.40

0.60

0.80

1.00

Page 37: Music Information Retrieval in Polyphonic Mixtures

Current and Future Work

Existing algorithms cannot handle “complicated” music.

Page 38: Music Information Retrieval in Polyphonic Mixtures

Related work:

smoothness

harmonicity

statistical priors

Page 39: Music Information Retrieval in Polyphonic Mixtures

Sparse Coding

What if you already have a large dictionary?

mins

d(x,As)

Solution: Impose sparsity on s.

Benefits: guaranteed spectral structure; labels already known.

Page 40: Music Information Retrieval in Polyphonic Mixtures

Sparse Coding

Related work:

matching pursuit (MP)

orthogonal matching pursuit (OMP)

basis pursuit (BP)

Disadvantages:

Complexity that is linear in the dictionary size.

Neither fast nor scalable.

Page 41: Music Information Retrieval in Polyphonic Mixtures

Example: Orthogonal Matching Pursuit

OMP [Pati et al., 1993]:

Input: x ∈ RM ; A = [a1, a2, ..., aK ] ∈ RM×K s.t. ||ak ||2 = 1for all k .

Output: s ∈ RK

Initialize: S ← ∅; s← 0; r← x; ε > 0.

While ||r|| > ε:

1. k ← argmaxj aTj r

2. S ← S ∪ k3. Solve for {sj |j ∈ S}: minsj |j∈S ||x−

∑j∈S ajsj ||

4. r← x− As

s← s

Page 42: Music Information Retrieval in Polyphonic Mixtures

Proposed Algorithm: Approximate Matching Pursuit

AMP [Tjoa and Liu]:

Input: x ∈ RM ; A = [a1, a2, ..., aK ] ∈ RM×K s.t. ||ak ||2 = 1for all k .

Output: s ∈ RK

Initialize: S ← ∅; s← 0; r← x; ε > 0.

While ||r|| > ε:

1. Find any k such that ak and r are near neighbors.2. S ← S ∪ k3. Solve for {sj |j ∈ S}: minsj |j∈S ||x−

∑j∈S ajsj ||

4. r← x− As

s← s

Page 43: Music Information Retrieval in Polyphonic Mixtures

Locality Sensitive Hashing

Idea: Hash nearby points into the same bin.

0.20.4

0.60.8

0.20.4

0.60.8

0.2

0.4

0.6

0.8

Page 44: Music Information Retrieval in Polyphonic Mixtures

Experiments: Music Transcription

0 1 2 3 4 5 6 7 8 9Time (seconds)

20

30

40

50

60

70

80

90

100

110

Pit

ch(M

IDI

num

ber)

C Major Scale (OMP)

0 1 2 3 4 5 6 7 8 9Time (seconds)

20

30

40

50

60

70

80

90

100

110

Pit

ch(M

IDI

num

ber)

C Major Scale (AMP, L=8, k=8)

Page 45: Music Information Retrieval in Polyphonic Mixtures

Experiments: Music Transcription

0 2 4 6 8 10 12Time (seconds)

20

30

40

50

60

70

80

90

100

110

Pit

ch(M

IDI

num

ber)

Debussy Clair de Lune, mm. 1-4 (OMP)

0 2 4 6 8 10 12Time (seconds)

20

30

40

50

60

70

80

90

100

110

Pit

ch(M

IDI

num

ber)

Debussy Clair de Lune, mm. 1-4 (AMP, L=8, k=8)

Page 46: Music Information Retrieval in Polyphonic Mixtures

Experiments: Music Transcription

0 2 4 6 8 10 12Time (seconds)

20

30

40

50

60

70

80

90

100

110

Pit

ch(M

IDI

num

ber)

Debussy Clair de Lune, mm. 5-8 (OMP)

0 2 4 6 8 10 12Time (seconds)

20

30

40

50

60

70

80

90

100

110

Pit

ch(M

IDI

num

ber)

Debussy Clair de Lune, mm. 5-8 (AMP, L=8, k=8)

Page 47: Music Information Retrieval in Polyphonic Mixtures

Experiments: Music Transcription

Execution times in seconds.

Song OMP AMP8,8 AMP10,10

C-major scale 81.05 43.63 21.03Debussy mm. 1-4 118.57 88.45 29.01Debussy mm. 5-8 123.05 121.73 121.84

Page 48: Music Information Retrieval in Polyphonic Mixtures

Where to Learn More

Conferences:

Int. Society of Music Information Retrieval (ISMIR)

MIR Evaluation Exchange (MIREX)

Int. Computer Music Conference (ICMC)

IEEE Int. Conf. Audio, Speech, Signal Processing (ICASSP)

ACM Multimedia

Journals:

IEEE Trans. Audio, Speech, Language, Processing

Journal of New Music Research

Computer Music Journal

Page 49: Music Information Retrieval in Polyphonic Mixtures

Lab 3

Page 50: Music Information Retrieval in Polyphonic Mixtures

Lab 3: Summary

Summary:

3.1 Separate sources.

3.2 Separate noisy sources.

3.3 Classify separated sources.

Page 51: Music Information Retrieval in Polyphonic Mixtures

Lab 3: Matlab Programming Tips

Pressing the up and down arrows let you scroll throughcommand history.

A semicolon at the end of a line simply means “suppressoutput”.

Type help <command> for instant documentation. Forexample, help wavread, help plot, help sound. Usehelp liberally!

Page 52: Music Information Retrieval in Polyphonic Mixtures

Lab 3.1: Source Separation

1. In Matlab: Select File → Set Path.Select “Add with Subfolders”.Select /usr/ccrma/courses/mir2011/lab3skt.

2. As in Lab 1, load the file, listen to it, and plot it.

1 [x, fs] = wavread(’simpleLoop.wav’);

2 sound(x, fs)

3 t = (0:length(x)-1)/fs;

4 plot(t, x)

5 xlabel(’Time (seconds)’)

Page 53: Music Information Retrieval in Polyphonic Mixtures

Lab 3.1: Source Separation

3. Compute and plot a short-time Fourier transform, i.e., theFourier transform over consecutive frames of the signal.

1 frame_size = 0.100;

2 hop = 0.050;

3 X = parsesig(x, fs, frame_size, hop);

4 imagesc(abs(X(200:-1:1,:)))

Type help parsesig, help imagesc, and help abs formore information.This step gives you some visual intuition about how sounds(might) overlap.

Page 54: Music Information Retrieval in Polyphonic Mixtures

Lab 3.1: Source Separation

4. Let’s separate sources!

1 K = 2;

2 [y, W, H] = sourcesep(x, fs, K);

Type help sourcesep for more information.

5. Plot and listen to the separated signals.

1 plot(t, y)

2 xlabel(’Time (seconds)’)

3 legend(’Signal 1’, ’Signal 2’)

4 sound(y(:,1), fs)

5 sound(y(:,2), fs)

Feel free to replace Signal 1 and Signal 2 with Kick andSnare (depending upon which is which).

Page 55: Music Information Retrieval in Polyphonic Mixtures

Lab 3.1: Source Separation

6. Plot the outputs from NMF.

1 figure

2 plot(W(1:200,:))

3 legend(’Signal 1’, ’Signal 2’)

4 figure

5 plot(H’)

6 legend(’Signal 1’, ’Signal 2’)

What do you observe from W and H?Does it agree with the sounds you heard?

Page 56: Music Information Retrieval in Polyphonic Mixtures

Lab 3.1: Source Separation

7. Repeat the earlier steps for different audio files.

125BOUNC-mono.WAV

58BPM.WAV

CongaGroove-mono.wav

Cstrum chord mono.wav

... and more.Experiment with different values for the number of sources, K.Where does this separation method succeed?Where does it fail?

Page 57: Music Information Retrieval in Polyphonic Mixtures

Lab 3.2: Noise Robustness

Begin with simpleLoop.wav. Then try others.

1. Add noise to the input signal, plot, and listen.

1 xn = x + 0.01*randn(length(x),1);

2 plot(t, xn)

3 sound(xn, fs)

Page 58: Music Information Retrieval in Polyphonic Mixtures

Lab 3.2: Noise Robustness

2. Separate, plot, and listen.

1 [yn, Wn, Hn] = sourcesep(xn, fs, K);

2 plot(t, yn)

3 sound(yn(:,1), fs)

4 sound(yn(:,2), fs)

How robust to noise is this separation method?Compared to the noisy input signal, how much noise is left inthe output signals?Which output contains more noise? Why?

Page 59: Music Information Retrieval in Polyphonic Mixtures

Lab 3.3: Classification

Follow the K-NN example in Lab 1, but classify the separatedsignals.

1. As in Lab 1, extract features from each training sample in thekick and snare drum directories.

2. Train a K-NN model using the kick and snare drum samples.

1 labels=[[ones(10,1) zeros(10,1)];

2 [zeros(10,1) ones(10,1)]];

3 model_snare =

4 knn(5, 2, 1, trainingFeatures, labels);

5 [voting, model_output] =

6 knnfwd(model_snare, featuresScaled)

Page 60: Music Information Retrieval in Polyphonic Mixtures

Lab 3.3: Classification

3. Extract features from the drum signals that you separated inLab 3.1.Classify them using the K-NN model that you built.Does K-NN accurately classify the separated signals?Repeat for different numbers of separated signals (i.e., theparameter K in NMF).

4. Overseparate the signal using K = 20 or more. For thoseseparated components that are classified as snare, add themtogether using sum. The listen to the sum signal. Is itcoherent, i.e., does it sound like a single separated drum?

Page 61: Music Information Retrieval in Polyphonic Mixtures

...and more!

If you have another idea that you would like to try out, pleaseask me!

Please collaborate with a partner.Together, brainstorm your own problems, if you want!

Page 62: Music Information Retrieval in Polyphonic Mixtures

Good luck!

Music Information Retrieval in PolyphonicMixtures

Steve Tjoa

MIR WorkshopCCRMA, Stanford University

iZotope, Inc.San Francisco, CA, USA

June 27, 2012