Scalar shift strategy for accelerating forward iterations on Sm –1

2
Scalar shift strategy for accelerating forward iterations on S m1 S m1 S m1 Hao Shen and Knut Hüper Canberra Research Laboratory, NICTA, and Department of Information Engineering, Research School of Information Sciences and Engineering, The Australian National University, Canberra, Australia. A result on how to accelerate simple forward iteration algorithms on the unit sphere, in the framework of a scalar shift strategy, to achieve local quadratic convergence is developed in this note. 1 Introduction In this note, we study a certain class of fixed point algorithms on the unit sphere, i.e. simple forward iteration algorithms. In the framework of a scalar shift strategy, we develop a result on how to accelerate this class of algorithms to achieve local quadratic convergence. Two examples related to the linear Independent Component Analysis (ICA) and the real symmetric eigenvalue problem are presented. 2 Main results Define a smooth map on the unit sphere S m1 := {x R m |x =1} as follows ψ : S m1 S m1 , x B(x) B(x) , (1) where B : S m1 R m \{0} is smooth. Assume that (i) x S m1 is a fixed point of ψ, (ii) in a suitable neighborhood of x , the sequence produced by iterating ψ converges to x , and (iii) the first derivative of ψ does not vanish at x . Note that the latter implies that iterating ψ does not converge locally quadratically fast to x . One possible strategy to increase the convergence rate of the algorithmic map ψ is to introduce a scalar shift, i.e., ψ s : S m1 S m1 , x B(x)ρ(x)x B(x)ρ(x)x , (2) where ρ : S m1 R should be a smooth function satisfying ρ(x ) = B(x ). Obviously, x cannot be a fixed point of ψ s if ρ(x ) > B(x ), as in this case ψ s (x )= x . This difficulty can be resolved by introducing a proper sign correction term. Let us define a real valued function as follows τ : S m1 R, τ (x) := x B(x) ρ(x), (3) which indicates the angle between two consecutive iterates generated by ψ s . By the assumption that ρ(x ) = B(x ), one easily gets τ (x ) =0. We then construct the following map ψ g : S m1 S m1 , x τ (x)(B(x)ρ(x)x) τ (x)(B(x)ρ(x)x) . (4) A small calculation shows that x is a fixed point of ψ g as well. Let us denote the numerator of ψ g (x) by B g : S m1 R m , B g (x) := τ (x)(B(x) ρ(x)x) . (5) The algorithmic map ψ g is locally quadratically convergent to x if and only if the first derivative of ψ g at x is equal to zero. By the chain rule, we compute the directional derivative of ψ g in tangent direction ξ T x S m1 D ψ g (x)ξ| x=x = 1 Bg (x ) Π(x )D B g (x)ξ| x=x , where Π(x ) := I Bg (x )B g (x ) Bg (x ) 2 . (6) Here T x S m1 represents the tangent space of S m1 at x S m1 . As x is a fixed point of ψ g , the orthogonal projection operator Π(x ) projects onto the complement of span(x ). Hence the first derivative of ψ g vanishes at x if and only if the first derivative of B g at x is a scalar multiple of x . A direct computation shows D B g (x)ξ| x=x = τ (x )(D B(x)ξ| x=x ρ(x )ξ D ρ(x)ξ| x=x x )+D τ (x)ξ| x=x (B(x ) ρ(x )x ) . (7) Since the second summand of the equation (7) is indeed a scalar multiple of x and the expression D ρ(x)ξ| x=x is a real number, we conclude that the algorithmic map ψ g is locally quadratically convergent to x if and only if the following equality holds true D B(x)ξ| x=x = ρ(x )ξ. (8) In other words, if the first derivative of B(x) at x , i.e. the linear operator D B(x ): T x S m1 R m acts on a tangent vector ξ T x S m1 simply by scalar multiplication, then there exists a smooth scalar shift strategy as in (4) to accelerate the algorithmic map ψ to converge locally quadratically fast to x . In what follows, we will present two examples, one from linear Independent Component Analysis (ICA), an important tool in Signal Processing, and the other from numeral linear algebra. © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim PAMM · Proc. Appl. Math. Mech. 7, 10622031062204 (2007) / DOI 10.1002/pamm.200700752

Transcript of Scalar shift strategy for accelerating forward iterations on Sm –1

Page 1: Scalar shift strategy for accelerating forward iterations on Sm –1

Scalar shift strategy for accelerating forward iterations on Sm−1Sm−1Sm−1

Hao Shen and Knut Hüper

Canberra Research Laboratory, NICTA, and Department of Information Engineering, Research School of InformationSciences and Engineering, The Australian National University, Canberra, Australia.

A result on how to accelerate simple forward iteration algorithms on the unit sphere, in the framework of a scalar shiftstrategy, to achieve local quadratic convergence is developed in this note.

1 IntroductionIn this note, we study a certain class of fixed point algorithms on the unit sphere, i.e. simple forward iteration algorithms.In the framework of a scalar shift strategy, we develop a result on how to accelerate this class of algorithms to achieve localquadratic convergence. Two examples related to the linear Independent Component Analysis (ICA) and the real symmetriceigenvalue problem are presented.

2 Main resultsDefine a smooth map on the unit sphere Sm−1 := {x ∈ R

m| ‖x‖ = 1} as follows

ψ : Sm−1 → Sm−1, x �→ B(x)‖B(x)‖ , (1)

where B : Sm−1 → Rm\{0} is smooth. Assume that (i) x∗ ∈ Sm−1 is a fixed point of ψ, (ii) in a suitable neighborhood

of x∗, the sequence produced by iterating ψ converges to x∗, and (iii) the first derivative of ψ does not vanish at x∗. Notethat the latter implies that iterating ψ does not converge locally quadratically fast to x∗. One possible strategy to increase theconvergence rate of the algorithmic map ψ is to introduce a scalar shift, i.e.,

ψs : Sm−1 → Sm−1, x �→ B(x)−ρ(x)x‖B(x)−ρ(x)x‖ , (2)

where ρ : Sm−1 → R should be a smooth function satisfying ρ(x∗) �= ‖B(x∗)‖. Obviously, x∗ cannot be a fixed point of ψs

if ρ(x∗) > ‖B(x∗)‖, as in this case ψs(x∗) = −x∗. This difficulty can be resolved by introducing a proper sign correction

term. Let us define a real valued function as follows

τ : Sm−1 → R, τ(x) := x�B(x) − ρ(x), (3)

which indicates the angle between two consecutive iterates generated by ψs. By the assumption that ρ(x∗) �= ‖B(x∗)‖, oneeasily gets τ(x∗) �= 0. We then construct the following map

ψg : Sm−1 → Sm−1, x �→ τ(x)(B(x)−ρ(x)x)‖τ(x)(B(x)−ρ(x)x)‖ . (4)

A small calculation shows that x∗ is a fixed point of ψg as well. Let us denote the numerator of ψg(x) by

Bg : Sm−1 → Rm, Bg(x) := τ(x) (B(x) − ρ(x)x) . (5)

The algorithmic map ψg is locally quadratically convergent to x∗ if and only if the first derivative of ψg at x∗ is equal to zero.By the chain rule, we compute the directional derivative of ψg in tangent direction ξ ∈ Tx∗Sm−1

Dψg(x)ξ|x=x∗ = 1

‖Bg(x∗)‖Π(x∗) DBg(x)ξ|x=x∗ , where Π(x∗) := I −

Bg(x∗)B�

g (x∗)

‖Bg(x∗)‖2 . (6)

Here TxSm−1 represents the tangent space of Sm−1 at x ∈ Sm−1. As x∗ is a fixed point of ψg, the orthogonal projectionoperator Π(x∗) projects onto the complement of span(x∗). Hence the first derivative of ψg vanishes at x∗ if and only if thefirst derivative of Bg at x∗ is a scalar multiple of x∗. A direct computation shows

DBg(x)ξ|x=x∗ = τ(x∗) (D B(x)ξ|x=x∗ − ρ(x∗)ξ − D ρ(x)ξ|x=x∗ x∗)+ D τ(x)ξ|x=x∗ (B(x∗) − ρ(x∗)x∗) . (7)

Since the second summand of the equation (7) is indeed a scalar multiple of x∗ and the expression D ρ(x)ξ|x=x∗ is a realnumber, we conclude that the algorithmic map ψg is locally quadratically convergent to x∗ if and only if the following equalityholds true

DB(x)ξ|x=x∗ = ρ(x∗)ξ. (8)

In other words, if the first derivative of B(x) at x∗, i.e. the linear operator DB(x∗) : Tx∗Sm−1 → Rm acts on a tangent

vector ξ ∈ Tx∗Sm−1 simply by scalar multiplication, then there exists a smooth scalar shift strategy as in (4) to accelerate thealgorithmic map ψ to converge locally quadratically fast to x∗.

In what follows, we will present two examples, one from linear Independent Component Analysis (ICA), an important toolin Signal Processing, and the other from numeral linear algebra.

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

PAMM · Proc. Appl. Math. Mech. 7, 1062203–1062204 (2007) / DOI 10.1002/pamm.200700752

Page 2: Scalar shift strategy for accelerating forward iterations on Sm –1

3 Linear Independent Component Analysis

The problem of one-unit whitened noiseless linear Independent Component Analysis (ICA) [1] can be modeled by the relationy = x�w, where w ∈ R

m is an m-dimensional random vector representing m observed linear mixtures of m unknownmutually statistically independent source signals. The task is to extract an estimation y ∈ R of one single source by computingx ∈ Sm−1. An algorithm for the one-unit linear ICA problem [2] can be derived by iterating simply

ψica : Sm−1 → Sm−1, x �→ E[G′(x�w)w]‖E[G′(x�w)w]‖

, (9)

where E[·] denotes the expectation over w and G′ denotes the first derivative of a smooth nonlinear, even function G : R → R,which is usually chosen according to the specific application. Let x∗ ∈ Sm−1 be a correct separation point of one single source,it can be shown that x∗ is a fixed point of ψica and the first derivative of ψica does not vanish at x∗. A direct computation shows

D(E[G′(x�w)w]

)ξ∣∣x=x∗

= E[G′′(x∗�w)]ξ, (10)

i.e., the first derivative of the numerator of ψica acts on a tangent vector ξ ∈ Tx∗Sm−1 by scalar multiplication with the scalarbeing equal to E[G′′(x∗�w)]. Thus, there exists a smooth scalar shift strategy ρica : Sm−1 → R satisfying the conditionρica(x∗) = E[G′′(x∗�w)] in the general form of ψg, see (4), to accelerate the map ψica to converge locally quadratically fastto x∗. A simple choice of such a smooth scalar shift as given by

ρica : Sm−1 → R, ρica(x) := E[G′′(x�w)], (11)

leads to the following map for the one-unit linear ICA problem

ψicag : Sm−1 → Sm−1, x �→

τ ica(x)(E[G′(x�w)w]−E[G′′(x�w)]x)‖τ ica(x)(E[G′(x�w)w]−E[G′′(x�w)]x)‖

, (12)

where τ ica(x) := E[G′(x�w)(x�w)]−E[G′′(x�w)]. It can be easily shown that x∗ is a fixed point of ψicag and the algorithmic

map ψicag is locally quadratically convergent to x∗. Moreover, by substituting ρica into ψs, the resulting map

ψicas : Sm−1 → Sm−1, x �→ E[G′(x�w)w]−E[G′′(x�w)]x

‖E[G′(x�w)w]−E[G′′(x�w)]x‖, (13)

is surprisingly identical to the so-called FastICA algorithm, which is a prominent algorithm for the one-unit linear ICAproblem. It was originally developed as an approximate Newton method with a suitable approximation of Hessians [1].

It is well known that under certain situations, FastICA might oscillate between neighborhoods of two antipodes ±x∗ ∈Sm−1, both of which extract the same single source up to a sign. Since both ψica

g and ψicas induce the same smooth map on the

real projective space RPm−1, the local convergence properties of FastICA, considered as a map on RP

m−1 by projecting ψicas

on RPm−1, can be deduced directly from the local convergence results of ψica

g [3].

4 Power Iterations

The power method is an important algorithm to compute the dominant eigenvector of a positive definite matrix M = M� ∈R

m×m. For simplicity, we assume that the largest eigenvalue λ∗ of M is simple with a corresponding eigenvector x∗ ∈ Sm−1.The power method iterates the following map

ψeig : Sm−1 → Sm−1, x �→ Mx‖Mx‖ . (14)

Note that under the condition x�0 x∗ �= 0, the sequence {xn} generated by ψeig will converge to x∗. We now show that, in

general, there exists no smooth scalar shift strategy to accelerate the power method ψeig to converge locally quadratically fastto x∗. A direct computation shows

D(Mx)ξ|x=x∗ = Mξ, for all ξ ∈ Tx∗Sm−1. (15)

It is easily seen that, in general, Mξ /∈ Tx∗Sm−1 except for the following scenario. If M = M� ∈ Rm×m with m ≥ 2 has

only two different eigenvalues {λ∗, σ} with λ∗ being simple and dominant, and σ occurring with multiplicity m − 1, thenD(Mx)ξ|x=x∗ = σξ ∈ Tx∗Sm−1. Thus we just showed that, in general, with more than three distinct eigenvalues, there isno way to accelerate the power iteration (14) in the framework of a smooth scalar shift strategy as in (4) to converge locallyquadratically fast.

Acknowledgements NICTA is funded by the Australian Government as represented by the Department of Broadband, Communicationsand the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program.

References

[1] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis (Wiley, New York, 2001).[2] P. Regalia and E. Kofidis, IEEE Transactions on Neural Networks 14(4), 943–949 (2003).[3] H. Shen, M. Kleinsteuber, and K. Hüper, IEEE Transactions on Neural Networks 19(6), 1022–1032 (2008).

© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

ICIAM07 Minisymposia – 06 Optimization 1062204