Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6...
Transcript of Fast algorithms for informed source separation · 4 6 8 lownuc nmf (a) Overall 0 2 4 6 8 0 2 4 6...
Fast algorithms for informed source separation
Augustin Lefevre [email protected]
September, 10th 2013
Source separation in 5 minutes
◮ Recover source estimates from a mixed signal
◮ We consider the single-channel setting :
xt = s(1)t + s
(2)t .
Ill-posed problem, need prior information.
Read mix waveform
0 5 10 15 20 25
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
time (seconds)
x
Short time Fourier transform
Short time Fourier transform
Cfn =
F∑
t=1
xt+(n−1)Hwt exp
(
−i2(f − 1)π(t − 1)
F
)
Remove phase information
sec
Hz
5 10 15 20 250
1000
2000
3000
4000
5000
6000
7000
Output of source separation program
sec
Hz
5 10 15 20 250
1000
2000
3000
4000
5000
6000
7000
sec
Hz
5 10 15 20 250
1000
2000
3000
4000
5000
6000
7000
Time-frequency masking
Estimates of each source’s complex STFT are obtained by :
Sg ,fn =Xg ,fn
∑
l Xl,fn
Cfn
Estimate waveforms from STFT
0 5 10 15 20 25−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
time (seconds)
x
0 5 10 15 20 25
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
time (seconds)
x
Annotation informed source separation
[Lefevre et al., 2012, Bryan and Mysore, 2013]: interaction betweenuser and source separation software.
[Lefevre et al., 2012]: detector trained on development database(random forest, SVM, nearest-neighbour, etc.).
Figure: Detections in the spectrogram
AISS nmf: non-convex
Annotation informed source separation.
Information is used as additional constraints : Mg ⊙ Xg = Mg ⊙ Tg .
[Lefevre et al., 2012] : nonnegative matrix factorization (nmf) withconstraints :
minD,A ‖Y −∑
g DgAg‖2F
s.t. D ∈ RF×K+ ,A ∈ RK×N
+
Mg ⊙ (DgAg ) = Mg ⊙ Tg
Y ∈ RF×N+ is the input spectrogram.
Need only D1A1 ≥ 0, but impose stronger constraint : D1 ≥ 0,A1 ≥ 0 (NMF).
nmf is a hard problem.
AISS lownuc : convex
Informed souce separation : X1, . . . ,XG ∈ RF×N .
minX12‖Y −
∑G
g=1 Xg‖2F + λ
∑G
g=1 ‖Xg‖∗s.t. Mg ⊙ Xg = Mg ⊙ Tg
Xg ≥ 0
The rank of a matrix is revealed in its SVD : X = PΣQ⊤.
σ1 ≥ σ2 ≥ · · · ≥ σF ≥ 0 singular values.
‖Xg‖∗ =∑F
f=1 σf .
Projecting on Xg ≥ 0 is straightforward.
Instead of one nmf, we will make repeated calls to svd to compute‖Xg‖∗ and additional information.
Algorithms for informed source separation
Convex but nonsmooth problem.
Related approaches if no noise and no inequality constraints (Rechtet al., 2010) :
min ‖X‖∗s.t. A(X ) = b
where A : RF×NG 7→ Rp, b ∈ Rp (p ≪ m× n) is linear.
Link with SDP optimization :
min ts.t. A(X ) = b
(
tI XX tI
)
� 0
Use interior-point solver, which has superlinear convergence rate.
BUT Hessian has size O(F 2N2), i.e. 1010 for a ten seconds audiotrack. This is too large !
Subgradient descent
Objective function f is convex so it admits derivatives in alldirections :
f ′(X ;D) = limt↓0
f (X + tD)− f (X )
t
Subgradients generalize the gradient :
Z ∈ ∂f (X ) ↔ f ′(X ;D) ≥ 〈Z ,D〉
〈Z ,D〉 =∑
g Tr Z⊤g Dg
Projected subgradient descent : X (t+1) = Π(X (t) − µtZ(t)).
Warning : f (X (t+1)) � f (X (t)).
Guarantee :µt = µ0(1 + t)−12 ⇒ ‖X (t) − X ∗‖ ց 0.
Controlled experiments
0 50 100 150−2
0
2
4
6
8
lownucnmf
(a) Overall
0 2 4 6 80
2
4
6
lownucnmf
(b) First few seconds
Figure: (Left) Evolution of SDR as a function of CPU time (in seconds), for(green) our method and (red) NMF started from several initial points.
SDR is a measure of how well we have separated sources (the higherthe better).
Shrinkage of singular values
0 50 100 150 200 250 300
1e−4
1e−3
1e−2
1e−1
1
1e+1
1e+2
1e+3
1e+4
trueλ 1.0e−04λ 1.0e−02λ 1.0e+00λ 1.0e+02λ 1.0e+04
Figure: Magnitude of singular values in decreasing order, for various values ofλ. Dotted line is the true singular value profile.
Smoothing technique [Nesterov, 2003]
minX12‖Y −
∑Gg=1 Xg‖
2F + λ
∑Gg=1 ‖Xg‖∗,µ
s.t. Mg ⊙ Xg = Mg ⊙ Tg
Xg ≥ 0
‖ · ‖∗,µ is C(1,1 with Lipschitz constant 1µand :
‖X‖∗,µ ≤ ‖X‖∗ ≤ ‖X‖∗,µ + µC ∀X ∈ RF×N
‖X‖∗ = max{Tr U⊤X , σ1(U) ≤ 1}
‖X‖∗,µ = max{Tr U⊤X − ‖U‖2F , σ1(U) ≤ 1}
Apply accelerated gradient descent to the smooth minimizationproblem.
µ = 0 : slow convergence but accurate solutions.
Large µ : fast but inaccurate solutions.
Comparison with subgradient
0 50 100 150 200180
200
220
240
260
280
300Obj function
CPU time
ob
j. va
lue
subGsmG 1e−1smAG 1e−1
0 50 100 150 2002
2.5
3
3.5
4
4.5
5
5.5
6
CPU time
SD
R
subG
smG 1e−1
smAG 1e−1
Figure: Decrease of the objective function as a function of the allowed CPUtime, for various algorithms
Effect of µ
0 50 100 150180
190
200
210
220
230
240Obj function
CPU time
obj.
valu
e
mu 1e+00mu 1e−01mu 1e−02mu 1e−03
0 50 100 1502
3
4
5
6
CPU time
SD
R
mu 1e+00mu 1e−01mu 1e−02mu 1e−03
Figure: Decrease of the objective function as a function of the allowed CPUtime, for various values of µ.
We display the original objective function :
1
2‖Y −
G∑
g=1
Xg‖2F + λ
G∑
g=1
‖Xg‖∗ .
Conclusion
Our formulation contributes to the field of informed sourceseparation methods, where knowledge is directly relevant to thequery audio track, and involves interaction with the user.
These methods are the state of the art in single-channel sourceseparation benchmarks.
Our convex formulation compares well with its NMF counterpart,even with a subgradient algorithm.
The smoothing technique allows to retrieve more accurate solutionsfor a given CPU budget.
More complex constraints ? E.g., source estimates should classifycorrectly : 〈W ,Xg 〉+ b ≤ 0.
Proximal operator :
prox(X ) =arg minX
12‖X − X‖2F + λ‖X‖∗ ,
s.t. Mg ⊙ Xg = Mg ⊙ Tg ,
Necessary and sufficient conditions :
0 ∈ X − X + λ(PQ⊤ +W ) +M ⊙ E
W⊤X = 0
WX⊤ = 0
M ⊙ X = M ⊙ T
‖W ‖op ≤ 1
where E ∈ RF×N are Lagrangian multiplicators associated with theconstraint M ⊙ X = 0. Note that here, X = PΣQ⊤ is an economy-sizeSVD of X and not X , so P and Q depend on X .
N.J. Bryan and G.J. Mysore. Interactive Refinement of Supervised andSemi-supervised Sound Source Separation Estimates. In ICASSP, 2013.
A. Lefevre, F. Bach, and C. Fevotte. Semi-supervised NMF withtime-frequency annotations for single-channel source separation. InInternational Conference on Music Information Retrieval (ISMIR),2012.
Y. Nesterov. Introductory lectures on convex optimization: A basiccourse, volume 87. Springer, 2003.