Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6...

Fast algorithms for informed source separation

Augustin Lefevre [email protected]

September, 10th 2013

Source separation in 5 minutes

◮ Recover source estimates from a mixed signal

◮ We consider the single-channel setting :

xt = s(1)t + s

(2)t .

Ill-posed problem, need prior information.

Read mix waveform

0 5 10 15 20 25

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

time (seconds)

x

Short time Fourier transform

Short time Fourier transform

Cfn =

F∑

t=1

xt+(n−1)Hwt exp

(

−i2(f − 1)π(t − 1)

F

)

Remove phase information

sec

Hz

5 10 15 20 250

1000

2000

3000

4000

5000

6000

7000

Output of source separation program

sec

Hz

5 10 15 20 250

1000

2000

3000

4000

5000

6000

7000

sec

Hz

5 10 15 20 250

1000

2000

3000

4000

5000

6000

7000

Time-frequency masking

Estimates of each source’s complex STFT are obtained by :

Sg ,fn =Xg ,fn

∑

l Xl,fn

Cfn

Estimate waveforms from STFT

0 5 10 15 20 25−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

time (seconds)

x

0 5 10 15 20 25

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

time (seconds)

x

Annotation informed source separation

[Lefevre et al., 2012, Bryan and Mysore, 2013]: interaction betweenuser and source separation software.

[Lefevre et al., 2012]: detector trained on development database(random forest, SVM, nearest-neighbour, etc.).

Figure: Detections in the spectrogram

AISS nmf: non-convex

Annotation informed source separation.

Information is used as additional constraints : Mg ⊙ Xg = Mg ⊙ Tg .

[Lefevre et al., 2012] : nonnegative matrix factorization (nmf) withconstraints :

minD,A ‖Y −∑

g DgAg‖2F

s.t. D ∈ RF×K+ ,A ∈ RK×N

+

Mg ⊙ (DgAg ) = Mg ⊙ Tg

Y ∈ RF×N+ is the input spectrogram.

Need only D1A1 ≥ 0, but impose stronger constraint : D1 ≥ 0,A1 ≥ 0 (NMF).

nmf is a hard problem.

AISS lownuc : convex

Informed souce separation : X1, . . . ,XG ∈ RF×N .

minX12‖Y −

∑G

g=1 Xg‖2F + λ

∑G

g=1 ‖Xg‖∗s.t. Mg ⊙ Xg = Mg ⊙ Tg

Xg ≥ 0

The rank of a matrix is revealed in its SVD : X = PΣQ⊤.

σ1 ≥ σ2 ≥ · · · ≥ σF ≥ 0 singular values.

‖Xg‖∗ =∑F

f=1 σf .

Projecting on Xg ≥ 0 is straightforward.

Instead of one nmf, we will make repeated calls to svd to compute‖Xg‖∗ and additional information.

Algorithms for informed source separation

Convex but nonsmooth problem.

Related approaches if no noise and no inequality constraints (Rechtet al., 2010) :

min ‖X‖∗s.t. A(X ) = b

where A : RF×NG 7→ Rp, b ∈ Rp (p ≪ m× n) is linear.

Link with SDP optimization :

min ts.t. A(X ) = b

(

tI XX tI

)

� 0

Use interior-point solver, which has superlinear convergence rate.

BUT Hessian has size O(F 2N2), i.e. 1010 for a ten seconds audiotrack. This is too large !

Subgradient descent

Objective function f is convex so it admits derivatives in alldirections :

f ′(X ;D) = limt↓0

f (X + tD)− f (X )

t

Subgradients generalize the gradient :

Z ∈ ∂f (X ) ↔ f ′(X ;D) ≥ 〈Z ,D〉

〈Z ,D〉 =∑

g Tr Z⊤g Dg

Projected subgradient descent : X (t+1) = Π(X (t) − µtZ(t)).

Warning : f (X (t+1)) � f (X (t)).

Guarantee :µt = µ0(1 + t)−12 ⇒ ‖X (t) − X ∗‖ ց 0.

Controlled experiments

0 50 100 150−2

0

2

4

6

8

lownucnmf

(a) Overall

0 2 4 6 80

2

4

6

lownucnmf

(b) First few seconds

Figure: (Left) Evolution of SDR as a function of CPU time (in seconds), for(green) our method and (red) NMF started from several initial points.

SDR is a measure of how well we have separated sources (the higherthe better).

Shrinkage of singular values

0 50 100 150 200 250 300

1e−4

1e−3

1e−2

1e−1

1

1e+1

1e+2

1e+3

1e+4

trueλ 1.0e−04λ 1.0e−02λ 1.0e+00λ 1.0e+02λ 1.0e+04

Figure: Magnitude of singular values in decreasing order, for various values ofλ. Dotted line is the true singular value profile.

Smoothing technique [Nesterov, 2003]

minX12‖Y −

∑Gg=1 Xg‖

2F + λ

∑Gg=1 ‖Xg‖∗,µ

s.t. Mg ⊙ Xg = Mg ⊙ Tg

Xg ≥ 0

‖ · ‖∗,µ is C(1,1 with Lipschitz constant 1µand :

‖X‖∗,µ ≤ ‖X‖∗ ≤ ‖X‖∗,µ + µC ∀X ∈ RF×N

‖X‖∗ = max{Tr U⊤X , σ1(U) ≤ 1}

‖X‖∗,µ = max{Tr U⊤X − ‖U‖2F , σ1(U) ≤ 1}

Apply accelerated gradient descent to the smooth minimizationproblem.

µ = 0 : slow convergence but accurate solutions.

Large µ : fast but inaccurate solutions.

Comparison with subgradient

0 50 100 150 200180

200

220

240

260

280

300Obj function

CPU time

ob

j. va

lue

subGsmG 1e−1smAG 1e−1

0 50 100 150 2002

2.5

3

3.5

4

4.5

5

5.5

6

CPU time

SD

R

subG

smG 1e−1

smAG 1e−1

Figure: Decrease of the objective function as a function of the allowed CPUtime, for various algorithms

Effect of µ

0 50 100 150180

190

200

210

220

230

240Obj function

CPU time

obj.

valu

e

mu 1e+00mu 1e−01mu 1e−02mu 1e−03

0 50 100 1502

3

4

5

6

CPU time

SD

R

mu 1e+00mu 1e−01mu 1e−02mu 1e−03

Figure: Decrease of the objective function as a function of the allowed CPUtime, for various values of µ.

We display the original objective function :

1

2‖Y −

G∑

g=1

Xg‖2F + λ

G∑

g=1

‖Xg‖∗ .

Conclusion

Our formulation contributes to the field of informed sourceseparation methods, where knowledge is directly relevant to thequery audio track, and involves interaction with the user.

These methods are the state of the art in single-channel sourceseparation benchmarks.

Our convex formulation compares well with its NMF counterpart,even with a subgradient algorithm.

The smoothing technique allows to retrieve more accurate solutionsfor a given CPU budget.

More complex constraints ? E.g., source estimates should classifycorrectly : 〈W ,Xg 〉+ b ≤ 0.

Proximal operator :

prox(X ) =arg minX

12‖X − X‖2F + λ‖X‖∗ ,

s.t. Mg ⊙ Xg = Mg ⊙ Tg ,

Necessary and sufficient conditions :

0 ∈ X − X + λ(PQ⊤ +W ) +M ⊙ E

W⊤X = 0

WX⊤ = 0

M ⊙ X = M ⊙ T

‖W ‖op ≤ 1

where E ∈ RF×N are Lagrangian multiplicators associated with theconstraint M ⊙ X = 0. Note that here, X = PΣQ⊤ is an economy-sizeSVD of X and not X , so P and Q depend on X .

N.J. Bryan and G.J. Mysore. Interactive Refinement of Supervised andSemi-supervised Sound Source Separation Estimates. In ICASSP, 2013.

A. Lefevre, F. Bach, and C. Fevotte. Semi-supervised NMF withtime-frequency annotations for single-channel source separation. InInternational Conference on Music Information Retrieval (ISMIR),2012.

Y. Nesterov. Introductory lectures on convex optimization: A basiccourse, volume 87. Springer, 2003.

Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6...

Documents

Transcript of Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6...