Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6...

21
Fast algorithms for informed source separation Augustin Lef` evre [email protected] September, 10th 2013

Transcript of Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6...

Page 1: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Fast algorithms for informed source separation

Augustin Lefevre [email protected]

September, 10th 2013

Page 2: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Source separation in 5 minutes

◮ Recover source estimates from a mixed signal

◮ We consider the single-channel setting :

xt = s(1)t + s

(2)t .

Ill-posed problem, need prior information.

Page 3: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Read mix waveform

0 5 10 15 20 25

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

time (seconds)

x

Page 4: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Short time Fourier transform

Short time Fourier transform

Cfn =

F∑

t=1

xt+(n−1)Hwt exp

(

−i2(f − 1)π(t − 1)

F

)

Page 5: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Remove phase information

sec

Hz

5 10 15 20 250

1000

2000

3000

4000

5000

6000

7000

Page 6: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Output of source separation program

sec

Hz

5 10 15 20 250

1000

2000

3000

4000

5000

6000

7000

sec

Hz

5 10 15 20 250

1000

2000

3000

4000

5000

6000

7000

Page 7: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Time-frequency masking

Estimates of each source’s complex STFT are obtained by :

Sg ,fn =Xg ,fn

l Xl,fn

Cfn

Page 8: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Estimate waveforms from STFT

0 5 10 15 20 25−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

time (seconds)

x

0 5 10 15 20 25

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

time (seconds)

x

Page 9: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Annotation informed source separation

[Lefevre et al., 2012, Bryan and Mysore, 2013]: interaction betweenuser and source separation software.

[Lefevre et al., 2012]: detector trained on development database(random forest, SVM, nearest-neighbour, etc.).

Figure: Detections in the spectrogram

Page 10: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

AISS nmf: non-convex

Annotation informed source separation.

Information is used as additional constraints : Mg ⊙ Xg = Mg ⊙ Tg .

[Lefevre et al., 2012] : nonnegative matrix factorization (nmf) withconstraints :

minD,A ‖Y −∑

g DgAg‖2F

s.t. D ∈ RF×K+ ,A ∈ RK×N

+

Mg ⊙ (DgAg ) = Mg ⊙ Tg

Y ∈ RF×N+ is the input spectrogram.

Need only D1A1 ≥ 0, but impose stronger constraint : D1 ≥ 0,A1 ≥ 0 (NMF).

nmf is a hard problem.

Page 11: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

AISS lownuc : convex

Informed souce separation : X1, . . . ,XG ∈ RF×N .

minX12‖Y −

∑G

g=1 Xg‖2F + λ

∑G

g=1 ‖Xg‖∗s.t. Mg ⊙ Xg = Mg ⊙ Tg

Xg ≥ 0

The rank of a matrix is revealed in its SVD : X = PΣQ⊤.

σ1 ≥ σ2 ≥ · · · ≥ σF ≥ 0 singular values.

‖Xg‖∗ =∑F

f=1 σf .

Projecting on Xg ≥ 0 is straightforward.

Instead of one nmf, we will make repeated calls to svd to compute‖Xg‖∗ and additional information.

Page 12: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Algorithms for informed source separation

Convex but nonsmooth problem.

Related approaches if no noise and no inequality constraints (Rechtet al., 2010) :

min ‖X‖∗s.t. A(X ) = b

where A : RF×NG 7→ Rp, b ∈ Rp (p ≪ m× n) is linear.

Link with SDP optimization :

min ts.t. A(X ) = b

(

tI XX tI

)

� 0

Use interior-point solver, which has superlinear convergence rate.

BUT Hessian has size O(F 2N2), i.e. 1010 for a ten seconds audiotrack. This is too large !

Page 13: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Subgradient descent

Objective function f is convex so it admits derivatives in alldirections :

f ′(X ;D) = limt↓0

f (X + tD)− f (X )

t

Subgradients generalize the gradient :

Z ∈ ∂f (X ) ↔ f ′(X ;D) ≥ 〈Z ,D〉

〈Z ,D〉 =∑

g Tr Z⊤g Dg

Projected subgradient descent : X (t+1) = Π(X (t) − µtZ(t)).

Warning : f (X (t+1)) � f (X (t)).

Guarantee :µt = µ0(1 + t)−12 ⇒ ‖X (t) − X ∗‖ ց 0.

Page 14: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Controlled experiments

0 50 100 150−2

0

2

4

6

8

lownucnmf

(a) Overall

0 2 4 6 80

2

4

6

lownucnmf

(b) First few seconds

Figure: (Left) Evolution of SDR as a function of CPU time (in seconds), for(green) our method and (red) NMF started from several initial points.

SDR is a measure of how well we have separated sources (the higherthe better).

Page 15: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Shrinkage of singular values

0 50 100 150 200 250 300

1e−4

1e−3

1e−2

1e−1

1

1e+1

1e+2

1e+3

1e+4

trueλ 1.0e−04λ 1.0e−02λ 1.0e+00λ 1.0e+02λ 1.0e+04

Figure: Magnitude of singular values in decreasing order, for various values ofλ. Dotted line is the true singular value profile.

Page 16: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Smoothing technique [Nesterov, 2003]

minX12‖Y −

∑Gg=1 Xg‖

2F + λ

∑Gg=1 ‖Xg‖∗,µ

s.t. Mg ⊙ Xg = Mg ⊙ Tg

Xg ≥ 0

‖ · ‖∗,µ is C(1,1 with Lipschitz constant 1µand :

‖X‖∗,µ ≤ ‖X‖∗ ≤ ‖X‖∗,µ + µC ∀X ∈ RF×N

‖X‖∗ = max{Tr U⊤X , σ1(U) ≤ 1}

‖X‖∗,µ = max{Tr U⊤X − ‖U‖2F , σ1(U) ≤ 1}

Apply accelerated gradient descent to the smooth minimizationproblem.

µ = 0 : slow convergence but accurate solutions.

Large µ : fast but inaccurate solutions.

Page 17: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Comparison with subgradient

0 50 100 150 200180

200

220

240

260

280

300Obj function

CPU time

ob

j. va

lue

subGsmG 1e−1smAG 1e−1

0 50 100 150 2002

2.5

3

3.5

4

4.5

5

5.5

6

CPU time

SD

R

subG

smG 1e−1

smAG 1e−1

Figure: Decrease of the objective function as a function of the allowed CPUtime, for various algorithms

Page 18: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Effect of µ

0 50 100 150180

190

200

210

220

230

240Obj function

CPU time

obj.

valu

e

mu 1e+00mu 1e−01mu 1e−02mu 1e−03

0 50 100 1502

3

4

5

6

CPU time

SD

R

mu 1e+00mu 1e−01mu 1e−02mu 1e−03

Figure: Decrease of the objective function as a function of the allowed CPUtime, for various values of µ.

We display the original objective function :

1

2‖Y −

G∑

g=1

Xg‖2F + λ

G∑

g=1

‖Xg‖∗ .

Page 19: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Conclusion

Our formulation contributes to the field of informed sourceseparation methods, where knowledge is directly relevant to thequery audio track, and involves interaction with the user.

These methods are the state of the art in single-channel sourceseparation benchmarks.

Our convex formulation compares well with its NMF counterpart,even with a subgradient algorithm.

The smoothing technique allows to retrieve more accurate solutionsfor a given CPU budget.

More complex constraints ? E.g., source estimates should classifycorrectly : 〈W ,Xg 〉+ b ≤ 0.

Page 20: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

Proximal operator :

prox(X ) =arg minX

12‖X − X‖2F + λ‖X‖∗ ,

s.t. Mg ⊙ Xg = Mg ⊙ Tg ,

Necessary and sufficient conditions :

0 ∈ X − X + λ(PQ⊤ +W ) +M ⊙ E

W⊤X = 0

WX⊤ = 0

M ⊙ X = M ⊙ T

‖W ‖op ≤ 1

where E ∈ RF×N are Lagrangian multiplicators associated with theconstraint M ⊙ X = 0. Note that here, X = PΣQ⊤ is an economy-sizeSVD of X and not X , so P and Q depend on X .

Page 21: Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6 lownuc nmf (b) First few seconds Figure: (Left) Evolution of SDR as a function of

N.J. Bryan and G.J. Mysore. Interactive Refinement of Supervised andSemi-supervised Sound Source Separation Estimates. In ICASSP, 2013.

A. Lefevre, F. Bach, and C. Fevotte. Semi-supervised NMF withtime-frequency annotations for single-channel source separation. InInternational Conference on Music Information Retrieval (ISMIR),2012.

Y. Nesterov. Introductory lectures on convex optimization: A basiccourse, volume 87. Springer, 2003.