Computational Methods for Inverse Problems in...
Transcript of Computational Methods for Inverse Problems in...
INDAM intensive period“Computational Methods for Inverse Problems in Imaging ”
Conclusive Workshop, Como, Italy, August 16-18, 2018
(Numerical linear algebra, regularization in Banach spacesand) variable exponent Lebesgue spaces
for adaptive regularization
Claudio EstaticoDipartimento di Matematica, Universita di Genova, Italy
Related joint works:
• Image restoration, acceleration and preconditioning:Paola Brianzi, Fabio Di Benedetto; Pietro Dell’Acqua; Marco DonatelliDIMA, Univ. Genova; DISAT, Univ. dell’Aquila; Univ. Insubria, Como.
•Microwave inverse scattering:Alessandro Fedeli, Matteo Pastorino and Andrea RandazzoDip. di Ingegneria DITEN, Univ. Genova.
•Remote Sensing:Flavia Lenti, Serge Gratton; David Titley-Peloquin; Maurizio Migliaccio,Ferdinando Nunziata; Federico Benvenuto, Alberto SorrentinoCERFACS and ENSEEIHT, Universite de Toulouse; Dep. of BioresourceEngineering, McGill University, Canada; Dip. per le Tecnologie, Univ.Napoli Parthenope; Dip. di Matematica (DIMA), Univ. Genova.
• Subsurface prospecting:Gian Piero Deidda; Patricia Diaz De Alba, Caterina Fenu,Giuseppe RodriguezDip. di Ing. Civile, Dip. di Matematica e Informatica, Univ. Cagliari.
Introduction:
linear equations and inverse problems
in Hilbert vs Banach space settings
Inverse Problem
By the knowledge of some “observed” data y (i.e., the effect),
find an approximation of some model parameters x (i.e., the cause).
Given the (noisy) data y ∈ Y ,find (an approximation of) the unknown x ∈ X such that
Ax = y
where A : X −→ Y is a known linear operator,and X ,Y are two functional (here Hilbert or Banach) spaces.
True image Blurred (noisy) image Restored image
Inverse problems are usually ill-posed, they need regularization techniques.
Numerical linear algebra lives in Hilbert spaces
A Hilbert space is a complete vector space endowed with a scalar productthat allows (lengths and) angles to be defined and computed.
In any Hilbert spaces, what happens in our Euclidean world (that is, in ourpart of universe ...) always holds:
• Pythagorean theorem: if u⊥v, then ‖u + v‖2 = ‖u‖2 + ‖v‖2
• Parallelogram identity: ‖u + v‖2 + ‖u− v‖2 = 2(‖u‖2 + ‖v‖2)
•Gradient of (the square of) the norm: ∇12‖u‖
2 = u
The scalar product allows also for a natural definition of:
• orthogonal projection (best approximation) of u on v P (u) = 〈u, v〉v/‖v‖2
• SVD decomposition (or eigenvalues/eigenvectors dec.) of linear operators(for separable Hilbert spaces, as any finite dimensional Hilbert space...)
Regularization: the “classical” framework in Hilbert spaces
In general, all the regularization methods for ill-posed functional equationshave been deeply investigated in the context of Hilbert spaces.
Benefits:Any linear (or linearized) operator in Hilbert spaces can be decomposedinto a set of eigenfunctions by using the conventional spectral theory. Thisway, convergence and regularization properties of any solving method can beanalyzed by considering the behavior of any single eigenfunction (i.e., we canuse the SVD decomposition ...).
Drawback:Regularization methods in Hilbert spaces usually give rise to smooth (andover-smooth) solutions. In image deblurring, regularization methods in Hilbertspaces do not allow to obtain a good localization of the edges.
Regularization: the “recent” framework in Banach spaces
More recently, some regularization methods have been introduced and in-vestigated in Banach spaces. In any Banach space, only distances betweenits elements can be defined and measured, but no scalar product (thus no“angle”) is defined.
Benefits:Due to the geometrical properties of Banach spaces, these regularizationmethods allow us to obtain solutions endowed with lower over-smoothness,which result, as instance, in a better localization and restoration of the dis-continuities in imaging applications. Another useful property of the regular-ization in Banach space is that solutions are more sparse, that is, in generalthey can be represented by very few components.
Drawback:The “Mathematics” is much more involving (the spectral theory cannot beused...). Convex analysis is required.
Regularization: Hilbert VS Banach spaces
Hilbert spaces Banach spaces
L2(Rn), Hs = W s,2 Lp(Rn),W s,p, p ≥ 1;BVBenefits Easier computation Better restoration
(Spectral theory, of the discontinuities;eigencomponents) Sparse solutions
Drawbacks Over-smoothness Theoretically tricky(bad localization of edges) (Convex analysis required)
Recap:
In Banach spaces we have no scalar product (so, no orthogonal projection),no Pythagorean theorem, no SVD...
The (sub-)gradient of (the square of) the norm is not linear, so that theleast square solution (of a linear problem) is no more a linear problem...
Examples: L1 for sparse recovery or Lp, 1 < p < 2, for edge restoration.
L2(R2)1/2 ||x||
2
L1.2(R2)1/1.2 ||x||
1.2
L5(R2)1/5 ||x||
5
∇ ( 1/2 ||x||22 )
−5 0 5−5
0
5
∇ ( 1/1.2 ||x||1.21.2 )
−5 0 5−5
0
5
∇ ( 1/5 ||x||55 )
−5 0 5−5
0
5
Regularization in Hilbert and Banach spaces
I Iterative minimization algorithms
Gradient-like methods: Landweber, CG
II Iterative projection algorithms
SIRT: Cimmino, DROP, ...
III Preconditioning
Trigonometric matrix algebra preconditioners
IV Multi-parameter and Adaptive regularization
The Variable exponent Lebesgue spaces
Regularization in Hilbert and Banach spaces
I
Iterative minimization algorithms
Iterative minimization algorithms
In the variational approach, instead of solving straightly the operator equa-tion Ax = y, we minimize the residual functional H : X −→ [0,+∞)
H(x) =1
r‖Ax− y‖rY .
Basic iterative scheme:
xk+1 = xk − λkΦA(xk, y)
where the operator ΦA : X × Y −→ X returns a value
ΦA(xk, y) ≈ ∇H(xk) = ∇(
1
r‖Axk − y‖rY
),
that is, (an approximation of) the “gradient” of the functional 1r‖Ax− y‖
rY
in the point xk and λk > 0 is the step length.This way, the iterative schemes are all different generalizations of the basic
gradient descent method.
The Landweber algorithm in Hilbert spaces
In the conventional case (i.e., both X and Y are Hilbert spaces) we considerthe least square functional
H2(x) =1
2‖Ax− y‖2Y .
Direct computation of ∇H2 by chaining rule for derivatives (in Rn)
∇H2(x) =((
(∇12‖u‖
2)|u=Ax−y)∗J (Ax− y)
)∗= ((Ax− y)∗A)∗ = A∗(Ax− y)
leads to the “simplest” iterative method: the Landweber algorithm
xk+1 = xk − λA∗(Axk − y)
Since H2 is convex, there exists a non-empty set of stationary points, i.e.∇H2(x) = 0, which are all minimum points of H2.
The constant step size λ ∈ (0, 2/‖A∗A‖) , yields H2(xk+1) < H2(xk) andalso guarantees the convergence of the iterations.
Towards the Landweber algorithm in Banach spaces (I)
xk+1 = xk − λA∗(Axk − y)
Formally, A∗ is the dual operator of A, that is, the operator
A∗ : Y ∗ −→ X∗ such that
y∗(Ax) = (A∗y∗)(x) , ∀x ∈ X and ∀y∗ ∈ Y ∗ ,where X∗ and Y ∗ are the dual spaces of X and Y .
If X and Y are Hilbert spaces, then X is isometrically isomorphic to X∗
and Y is isometrically isomorphic to Y ∗ (by virtue of Riesz Theorem), sothat the operator A∗ : Y ∗ −→ X∗ can be identified with A∗ : Y −→ G.
However, in general Banach spaces are not isometrically isomorph to theirduals. This way, the Landweber iteration above is well defined in Hilbertspaces (...only!)
The key point: To generalize from Hilbert to Banach spaces we have toconsider the so-called duality maps.
Towards the Landweber algorithm in Banach spaces (II)
The computation of the (sub-)gradient of the residual functional requiresthe (sub-)gradient of the norm of the Banach space involved.Here the key tool is the so-called “duality map”, which maps a Banach spaceB with its dual B∗.
Theorem (Asplund, Acta Math., 1968)Let B be a Banach space and let r > 1.The duality map JBr is the (sub-)gradient of the convex functionalf : B −→ R defined as f (u) = 1
r‖u‖rB (with abuse of notation...):
JBr = ∇f = ∇(
1
r‖ · ‖rB
).
Again from chaining rule, the (sub-)gradient of the residual 1r‖Ax− y‖
rY ,
by means of the duality map JYr , can be computed as follows
∇(
1
r‖Ax− y‖rY
)= A∗JYr (Ax− y) .
The Landweber method in Banach spaces
Let r > 1 a fixed weight value. Let x0 ∈ X an initial guess (the null vectorx0 = 0 ∈ X is often used in the applications), and set x∗0 = JXr (x0) ∈ X∗.
For k = 0, 1, 2, . . .
x∗k+1 = x∗k − λkA∗JYr (Axk − y) ,
xk+1 = JX∗
r∗ (x∗k+1) ,
where r∗ is the Holder conjugate of r, that is, 1r + 1
r∗ = 1,and the step sizes λk are suitably chosen.
Here the duality map JXr : X −→ 2X∗
acts on the iterates xk ∈ X , and
the duality map JX∗
r∗ : X∗ −→ 2X∗∗
acts on the iterates x∗k ∈ X∗: In order
to be well defined, it is only required that the space X is reflexive, that isX∗∗ is isometrically isomorph to X , so that JX
∗r∗ ⊆ X .
Landweber iterative method in Hilbert spaces
A : X −→ Y A∗ : Y −→ X H2(x) = 12‖Ax− y‖
2Y
xk+1 = xk − λA∗(Axk − y)
Landweber iterative method in Banach spaces
A : X −→ Y A∗ : Y ∗ −→ X∗ Hr(x) = 1r‖Ax− y‖
rY
xk+1 = JX∗
r∗ (JXr xk − λkA∗JYr (Axk − y))
Some remarks:
(i) Any duality map is in general nonlinear (and multi-valued...), so that,differing from the Hilbert space case, the Landweber method is not linear.
(ii) In the Banach space Lp, p ∈ (1,+∞), by direct computation we have
JLp
r (x) = ‖x‖r−pp |x|p−1sgn(x)
JLp
r is a non-linear, single-valued, diagonal operator , which cost O(n).
JLp
r does not increase the global numerical complexity O(n log n) of image
deblurring with (FFT-based) structured matrices. Note that JL2
2 = I .
A convergence resultfor the Landweber method in Banach spaces
Proposition [Schopfer, Louis, Schuster, Inv. Prob., 2006]Let X be a reflexive Banach space, and Y a (arbitrary) Banach space. Lety ∈ R(A) and let x† the minimum norm pseudo-solution of Ax = y.
If λk > 0 is suitably (...) chosen for all k, then the sequence of the iterations(xk) converges strongly to x†, that is,
limk−→+∞
‖xk − x†‖X = 0
If the data y is noisy, an early stop of the iterations (by discrepancy prin-ciple) gives rise to an iterative regularization method.
In Hilbert setting, the iterations are defined in the (primal) space X .
In Banach setting, the iterations are defined in the dual space X∗ and arelinked to the (“wide”...) Banach fixed point theory.
Duality maps are the basic tool for generalizing classical iterative methodsfor linear systems to Banach spaces: Landweber method, CG, Mann itera-tions, Gauss-Newton Gradient type iterations (for nonlinear problems).
Basic hypotheses for the convergence (and regularization behavior): uni-formly smooth and (uniformly) convex Banach spaces. See also:Scherzer, Kaltenbacher, Fornasier, Hofmann, Kazimierski, Poschl, Hein, Q.Jin, Tautenhahn, Neubauer, Tomba, . . . .
To reduce over-smoothness, these methods have been implemented in thecontext of Lp Lebesgue spaces with 1 < p ≤ 2.
p >≈ 1 Low “regularization” Good recovery of edges and discontinuitiesImprove the sparsity
p ≈ 2 High “regularization” Higher stabilityOver-smoothness
A numerical evidence in Lp Lebesgue space, 1 < p ≤ 2
The classical 2D Image restoration, with Landweber method (200 iters)
True image x PSF (A) Blurred and noisy image y
Hilbert Restoration (p = 2) Banach Restoration (p = 1.5) Banach Restoration (p = 1.2)
A classical image restoration example
Convergence history:
Relative restoration errors‖x−xk‖22‖x‖22
versus iteration index k
The Conjugate Gradient in Lp Banach spaces
The Landweber algorithm in Banach spaces setting gives better restora-tions but it is still slow. Instead of −∇(H(xk)) , we consider a differentanti-gradient descent direction, based on the same idea of the classical CG.
The CG(NR) method for min. of H(x) = 12‖Ax− y‖
22 , in Hilbert spaces:
xk+1 = xk + αkpkpk+1 = −∇H(xk+1) + βkpk = −A∗(Axk+1 − y) + βkpk
where p0 = −∇H(x0) = −A∗(Ax0 − y), and
αk =(Apk)T (y − Axk)
(Apk)T (Apk), βk =
∇H(xk+1)T∇H(xk+1)
∇H(xk)T∇H(xk).
The step size αk is called optimal, since α := αk solves the linear equationd
dαH(xk + αpk) = 0 .
Some facts about the conventional CG method.
Definition: xk is said to be optimal with respect to the direction v if
H(xk) ≤ H(xk + λv) ∀λ ∈ R.
By virtue of the optimal choice of the step length αk, the new iterationxk+1 = xk + αkpk is optimal with respect to the direction pk.
The key idea of the CG method is that the new iteration xk+1 must be op-timal w.r.t. the (previous) direction pk−1 too. This leads to the computationof the special coefficient βk (i.e, the so-called Fletcher-Reeves formula), andit can be proven that xk+1 is automatically optimal w.r.t. all the previousdirections p0, p1, . . . , pk−2.
The optimality of xk+1 w.r.t. the previous descent direction pk−1 is im-portant to achieve (finite step) convergence. Indeed, in the two dimensionalHilbert (i.e., L2 or Euclidean...) case, all the points which are optimal withrespect to a fixed direction lie on a line passing through the minimum point.
−4 −2 0 2 4 6 8 10−5
−4
−3
−2
−1
0
1
2
3
4
5
−4 −2 0 2 4 6 8 10−5
−4
−3
−2
−1
0
1
2
3
4
5
−4 −2 0 2 4 6 8 10−5
−4
−3
−2
−1
0
1
2
3
4
5
L2(Rn) L1.2(Rn) L5(Rn)
The Conjugate Gradient method in Lp Banach spaces
The CG(NR) method for min. of H(x) = 1p‖Ax− y‖
pY , in Y = Lp:
x∗k+1 = x∗k + αkp∗k
xk+1 = JX∗
p∗ (x∗k+1)
p∗k+1 = −∇H(xk+1) + βkp∗k = −A∗JYp (Axk+1 − y) + βkp
∗k
where p∗0 = −∇H(x0) = −A∗JYp (Ax0 − y),
the step size αk solves (with approximation) the nonlinear equation
d
dαH(JX∗
p∗ (x∗k + αp∗k))
= 0 ,
and the coefficient βk satisfies a “Fletcher-Reeves”-like formula
βk = γ||Axk+1 − y||
pp
||Axk − y||pp
with γ < 1/2
Convergence of CG in Lp Banach spaces
Theorem [E., Gratton, Lenti, Titley-Peloquin; Num. Math., 2018]
(...) the sequence xkk of the CG method in Lp converges strongly to the
minimum norm pseudo-solution x† of Ax = y, i.e.
‖xk − x†‖p −→ 0 (k −→ +∞) .
If the data y is noisy, an early stop of the iterations (by discrepancy prin-ciple) gives rise to an iterative regularization method.
Remark: Differing from the conventional CG in Hilbert spaces, NO finite(n steps) convergence in Rn.However, Banach space finite steps convergence still holds in a simplifiedsetting [Herzog R., Wollner W.; J. inverse ill-posed prob., 2017].
L2 L1.5 L1.2
(Relative Errors vs iterations - Red: CG; Magenta: Landweber)
Some comments about the results:
For p = 2 the CG method is too fast (i.e., not enough regularization), anddoes not provide the same good quality of the slowest Landweber.
For the smaller p = 1.5, 1.2 the CG decelerates, and it gives the samequality of restoration give by the Landweber method, in much less iterations.
Regularization in Hilbert and Banach spaces
II
Iterative projection algorithms
Iterated projections (Row Action Methods)
The points x satisfying the i-th row equation 〈ai, x〉 = bi of the m × nlinear system Ax = b define an hyperplane
Qi = x ∈ Rn : 〈ai, x〉 = biThe solution x of the linear systems belongs to all the hyperplanes Qi,
for i = 1, . . . , n.In Hilbert spaces, BECAUSE• the orthogonal projection Pi(z) of a point z onto one hyperplane Qi
is easy to perform
Pi(z) = z +bi − 〈ai, z〉‖ai‖22
aTi ,
• and the projection Pi(z) is closer to the solution x than z itself,
THEN
iterative projecting xk+1 = Pi(xk) onto different hyperplanes Qi gives riseto a “low-cost” sequence (xk)+∞
k=1 converging to the solution x.
Some families of projection methods (in Hilbert spaces)
• ART (Algebraic Reconstruction Techniques) or Kaczmarz‘s methods
xk+1 = xk + λkbi − 〈ai, xk〉‖ai‖2
aTi
where the row index i depends on the iteration index k (in many ways,for instance i = k mod m, where m is the number of rows of A, or evenrandomly (!) ) and λk ∈ R is a relaxation parameter.
• SIRT (Simultaneous Iterative Reconstruction Techniques)
The same “fixed” iteration xk is used (i.e., simultaneously) for a “complete”set of m different projections, that is, for a full set of indexes i = 1, 2, . . . ,m.
In this case, a matrix form iteration holds
xk+1 = xk + λkSA∗M(b− Axk)
SIRT: Cimmino method, DROP (Diagonally Relaxed Orthogonal Projec-tion), CAV (Component Averaging), BIP (Block-Iterative Projections), ...
Cimmino and DROP methods
In the SIRT family xk+1 = xk + λkSA∗M(b− Axk) , we have:
• the Cimmino method (1938), where the new iteration is the averageof a complete set of m projections
xk+1 =1
m
m∑i=1
Pi(xk) = xk +1
m
m∑i=1
bi − 〈ai, xk〉‖ai‖2
aTiso that
S = I,
M =1
mD =
1
mdiag(1/‖a1‖2, 1/‖a2‖2, . . . , 1/‖am‖2).
• the DROP method (Diagonally Relaxed Orthogonal Projection), where
S = diag(1/r1, 1/r2, . . . , 1/rn) ,
M = D = diag(1/‖a1‖2, 1/‖a2‖2, . . . , 1/‖am‖2) ,
being rj the number of non-zeroes elements of the j-th column of A.
−4 −2 0 2 4 6 8 10−5
−4
−3
−2
−1
0
1
2
3
4
5
P1(x
0)
P2(x
0)
(b1 − <a
1,x
0>) a
1 / ||a
1||2
x0x
1
x1 = [P
1(x
0)+P
2(x
0)] / 2
Pi(x) = x + (b
i − <a
i,x>) a
i / ||a
i||2
(b2 − <a
2,x
0>) a
2 / ||a
2||2−2x+(4/3)y=8/3
a2=(−2,4/3); b
2=8/3;
−x+3y=8a
1=(−1,3); b
1=8;
DROP method (in Hilbert spaces)
A : X −→ Y A∗ : Y −→ X HM (x) = 12‖Ax− y‖
2M
xk+1 = S(S−1xk − λkA∗M(Axk − y))
Landweber iterative method in Banach spaces
A : X −→ Y A∗ : Y ∗ −→ X∗ Hr(x) = 1r‖Ax− y‖
rY
xk+1 = JX∗
r∗ (JXr xk − λkA∗JYr (Axk − y))
Some remarks about
JYr , JX∗
r∗ (in Lp Banach spaces) VS M , S (in L2 Hilbert spaces):
• JYr is non-linear and M is linear; JX∗
r∗ is non-linear and S is linear;
• all are diagonal and positive operators;
• all cost O(n) operations;
• the action of the matrix M in DROP is “similar” to the one of the dualitymap JYr = JL
p
2 with 1 < p < 2 in Landweber-Banach.
DROP method (in Hilbert spaces)
A : X −→ Y A∗ : Y −→ X HM (x) = 12‖Ax− y‖
2M
xk+1 = S(S−1xk − λkA∗M(Axk − y))
Landweber iterative method in Banach spaces
A : X −→ Y A∗ : Y ∗ −→ X∗ Hr(x) = 1r‖Ax− y‖
rY
xk+1 = JX∗
r∗ (JXr xk − λkA∗JYr (Axk − y))
In summary, the Landweber iterative method in Banach spaces can alsobe viewed as a non-linear generalization of well-known projection algorithmsfor linear systems.
The oblique geometry of the M -induced norm ‖ · ‖M in Hilbert space isreplaced by the Lp-norm ‖ · ‖Lp in Lp Banach space setting.
Regularization in Hilbert and Banach spaces
III
Preconditioning
Preconditioning in Banach Space
Preconditioned system in Hilbert space, with (invertible) preconditioner D
A∗Ax = A∗y ⇐⇒ DA∗Ax = DA∗y
Preconditioned Landweber method in Hilbert spaces
xk+1 = xk − λDA∗(Axk − y)
where the preconditionerD is a regularized (polynomial, rational, circulant)“low-cost” approximation (a structured and/or sparse matrix) of the inverseof A∗A, i.e.,
D ≈ (A∗A)−1 .
Now in Banach spaceA∗A is not well defined (. . . it cannot even be written!)
We generalized preconditioning schemes to Banach space in two ways
•Primal preconditioning
•Dual preconditioning
[Brianzi, Di Benedetto, E., Comp. Opt. Appl., 2013]
Primal Preconditioning in Banach Space
Remember, in Hilbert space:
xk+1 = xk − λDA∗(Axk − y) with D ≈ (A∗A)−1
Primal preconditioner : D : X −→ X
The Primal preconditioner is a regularized approximation of (JX∗
r∗ A∗JYr A)−1
From Hilbert space:
xk+1 = D(D−1xk − λA∗(Axk − y))
To Banach space:
xk+1 = DJX∗
r∗ (JXr D−1xk − λA∗JYr (Axk − y))
Recalling that A : X −→ Y , A∗ : Y ∗ −→ X∗ , JXr : X −→ X∗ ,JX∗
r∗ : X∗ −→ X , JYr : Y −→ Y ∗ , the iterations are well-defined.
Dual Preconditioning in Banach Space
Remember, in Hilbert space:
xk+1 = xk − λDA∗(Axk − y) with D ≈ (A∗A)−1 .
Dual preconditioner : D = X∗ −→ X∗
The Dual preconditioner is a regularized approximation of (A∗JYr AJX∗r∗ )−1.
From Hilbert space:
xk+1 = xk − λDA∗(Axk − y)
To Banach space:
xk+1 = JX∗
r∗ (JXr xk − λDA∗JYr (Axk − y))
Recalling that A : X −→ Y , A∗ : Y ∗ −→ X∗ , JXr : X −→ X∗ ,JX∗
r∗ : X∗ −→ X , JYr : Y −→ Y ∗ , the iterations are well-defined.
Convergence analysis of preconditioning in Lp Banach Space
These preconditioned algorithms can be read as fixed-point iterations inthe dual space X∗, that is,
x∗k+1 = g(x∗k) ,
where g : X∗ −→ X∗ in the dual preconditioning is
g(t) = t− λDA∗JYr (AJX∗
r∗ (t)− y) .
By simple direct computation, its Jacobian Jg in Lp Banach spaces is
Jg(x∗) = I − λDA∗A+E ,
where E is a matrix of small norm for p close to (the Hilbert value) p = 2.
This way, by a Bauer-Fike argument, the eigenvalues of Jg(x∗) are “per-turbations” of those of the iteration matrix I − λDA∗A of the conventionalHilbert case. It follows that, for p ≈ 2, the spectral radius of Jg(x∗) isessentially the same as that of the Hilbert case.
In particular, choosing D as an optimal approximation of (A∗A)† will bringto an acceleration of the convergence, at least for p ≈ 2.
(Regularized) Preconditioning for image restorationPreconditioner: Thikonov-filtered T. Chan optimal circulant preconditioner.
True image x NO Prec. 200 Iter. (p = 1.5)200 Iter. Rel. Err 0.4182
Tikhonov Prec. (p = 1.8) Tikhonov Prec. 33 Iter.(p = 1.5)93 Iter. Rel. Err 0.3482 33 Iter. Rel. Err 0.3383
0 5 10 15 20 25 30 35 40 45 500.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p=1.5; Tikh. filter alpha=0.02p=1.5; NO Preconditionerp=2.0 Tikh. filter alpha=0.02
Regularization in (Hilbert and) Banach spaces
IV
Multi-parameter and Adaptive regularization
A “new” framework: variable exponent Lebesgue spaces Lp(·)
In image restoration, often different regions of the image require different“amount of regularization”. Setting different levels of regularization is usefulbecause background, low intensity, and high intensity values require differentfiltering levels [Nagy, Pauca, Plemmons, Torgersen, J Opt Soc Am A, 1997].
The idea: the ill-posed functional equationAf = g is solved in Lp(·) Banachspaces, namely, the variable exponent Lebesgue spaces, a special case of theso-called Musielak-Orlicz functional spaces (first proposed in two seminalpapers in 1931 and 1959, but intensively studied just in the last 10 years).
In a variable exponent Lebesgue space, to measure a function f , insteadof a constant exponent p all over the domain, we have a pointwise variable(i.e., a function) exponent 1 ≤ p(·) ≤ +∞.
This way, different values of the exponent, i.e. different kinds of regulariza-tion, on different regions of the image to restore, can be automatically andadaptively assigned.
A sketch on variable exponent Lebesgue spaces Lp(·)
Lp(Ω) Lp(x)(Ω)
1 ≤ p ≤ ∞ p(x) : Ω→ [1,∞]p is constant p(x) is a measurable function
‖f‖p =( ∫
Ω |f (x)|p dx)1/p
‖f‖p(·) =( ∫
Ω |f (x)|p(x) dx)1/???
‖f‖∞ = ess sup |f (x)| . . .
f ∈ Lp(Ω) ⇐⇒∫
Ω |f (x)|p dx <∞ f ∈ Lp(·)(Ω) ⇐⇒ ???
In the following, Ω∞ = x ∈ Ω : p(x) =∞ has zero measure.
Ω = [−5, 5] p(x) =
2 se − 5 ≤ x ≤ 0
3 se 0 < x ≤ 5
Ω = [−5, 5] p(x) =
2 se − 5 ≤ x ≤ 0
3 se 0 < x ≤ 5
f (x) = 1
|x−1|1/3 /∈Lp(·)([−5, 5])
Ω = [−5, 5] p(x) =
2 se − 5 ≤ x ≤ 0
3 se 0 < x ≤ 5
f (x) = 1
|x+1|1/3 ∈Lp(·)([−5, 5]) f (x) = 1
|x−1|1/3 /∈Lp(·)([−5, 5])
The norm of variable exponent Lebesgue spaces
In the conventional case Lp, the norm is ‖f‖Lp =( ∫
Ω |f (x)|pdx)1/p
.
In Lp(·) Lebesgue spaces, the definition and computation of the norm isnot straightforward, since we have not a constant value for computing the(“mandatory”) radical.
‖f‖Lp(·) =
(∫Ω|f (x)|p(x)dx
)1/???.
The solution: compute first the modular (for 1 ≤ p(·) < +∞)
%(f ) =
∫Ω|f (x)|p(x)dx ,
and then obtain the (so called Luxemburg [1955]) norm by solving a 1Dminimization problem
‖f‖Lp(·) = inf
λ > 0 : %
(fλ
)≤ 1.
The elements of a variable exponent Lebesgue space
%(f ) =
∫Ω|f (x)|p(x)dx ,
‖f‖Lp(·) = inf
λ > 0 : %
(fλ
)≤ 1.
The Lebesgue space
Lp(·)(Ω) =f : Ω→ R | ‖f‖
Lp(·) <∞
is a Banach space.
In the case of a constant function exponent p(x) = p, this norm is exactlythe classical one ‖f‖p, indeed
%(fλ
)=
∫Ω
(|f (x)|λ
)pdx =
1
λp
∫Ω|f (x)|pdx =
1
λp‖f‖pp
and
infλ > 0 :
1
λp‖f‖pp ≤ 1
= ‖f‖p
Modulus VS Norm
In (classical) Lp, norm and modulus are “the same” apart from a p-root:
‖f‖p <∞ ⇐⇒∫
Ω|f (x)|p dx <∞
In Lp(·), norm and modulus are really different:
‖f‖p(·) <∞ ⇐ 6⇒ %(f ) <∞
Indeed, the following holds
‖f‖p(·) <∞ ⇐⇒ there exist a λ > 0 s.t. %(fλ
)<∞
(and notice that λ can be chosen large enough . . . ).
A simple example of a strange behavior
Ω = [1,∞) f (x) ≡ 1 p(x) = x
———–
%(f ) =
∫ ∞1
1x dx =∞ BUT ‖f‖p(·) ' 1.763
Indeed %(fλ
)=
∫ ∞1
(1
λ
)xdx =
1
λ log λ<∞, for any λ > 1
The vector case: the Lebesgue spaces of sequences lp(·) = (lpnn )
The unit circle of x = (x1;x2) in R2 with variable exponents p = (p1; p2).
‖x‖p ‖x‖p(·)
Inclusion if (p1, p2) ≥ (q1, q2) (as classical) No inclusion in general
Properties of variable exponent Lebesgue spaces Lp(·)
Let p− = ess infΩ|p(x)| , and p+ = ess sup
Ω|p(x)| .
If p+ =∞ , then Lp(·)(Ω) is a “bad” (although very interesting) Banachspace, with poor geometric properties (i.e., it is not useful for our methods).
If 1 < p− ≤ p+ <∞ , then Lp(·)(Ω) is a “good” Banach space, since
many properties of classical Lebesgue spaces Lp still hold: Lp(·) is uniformlysmooth, uniformly convex, and reflexive; we know the duality map of Lp(·).
Its dual space is well defined,(Lp(·)
)∗' Lq(·), where 1
p(x)+ 1q(x)
= 1 ,
Lg(f ) =
∫Ωf (x)g(x) dx ∈
(Lp(·)
)∗⇐⇒ g ∈ Lq(·)(Ω) .
Unfortunately,(Lp(·)
)∗and Lq(·) are isomorphic but not isometric
‖Lg‖(Lp(·))∗ 6= ‖g‖Lq(·) .
We know the duality map ofLp(·), but don’t know the duality map of (Lp(·))∗.
The duality map of the variable exponent Lebesgue space
By extending the duality maps, we could define into Lp(·) all the iterativemethods developed in Lp (Landweber, Steepest descent, CG, Mann iter.).
For any constant 1 < r < +∞, we recall that the duality map, that is, the(sub-)differential of the functional 1
r‖f‖rLp , in the classical Banach space Lp,
with constant 1 < p < +∞, is defined as follows(JLp(f )
)(x) =
|f (x)|p−1 sgn(f (x))
‖f‖p−rp
.
By generalizing a result of P. Matei [2012], we have that the correspondingduality map in variable exponent Lebesgue space is defined as follows(
JLp(·)(f )
)(x) =
1∫Ωp(x) |f (x)|p(x)
‖f‖p(x)p(·)
dx
p(x) |f (x)|p(x)−1 sgn(f (x))
‖f‖p(x)−rp(·)
,
where any product and any ratio have to be considered as pointwise.
The adaptive algorithm in variable exponent Lebesgue spaces
It is a numerical evidence that, in Lp image deblurring,
• dealing with small 1 ≈ p << 2 improves sparsity and allows a betterrestoration of the edges of the images and of the zero-background,
• dealing with p ≈ 2 (even p > 2), allows a better restoration of the regionsof pixels with the highest intensities.
The idea: to use a scaled into [1, 2] version of the (re-)blurred data y asdistribution of the exponent p(·) for the variable exponent Lebesgue spaces
Lp(·) where computing the solution. Example (linear interpolation):
p(·) = 1 + [A∗y(·)−min(A∗y)]/[max(A∗y)−min(A∗y)]
The Landweber (i.e., fixed point) iterative scheme in this Lp(·) Banachspace can be modified as adaptive iteration algorithm, by recomputing, aftereach fixed number of iterations, the exponent distribution pk(·) by means ofthe k-th restored image xk (instead of the first re-blurred data A∗y), that is
pk(·)= 1 + [xk(·)−min(xk)]/[max(xk)−min(xk)]
The conjugate gradient method in Lp(·) for image restoration
Let p(·) = 1 + [A∗y(·)−min(A∗y)]/[max(A∗y)−min(A∗y)]
φ∗0 = −A∗JLp(·)r (Ax0 − y)
For k = 0, 1, 2, . . .
λk = arg minα1r‖A(xk + αφk)− y‖r
Lp(·)
x∗k+1 = x∗k + λkφ∗k xk+1 = J
(Lp(·))∗
s′ (x∗k+1)
βk+1 = γ‖Axk+1−y‖r
Lp(·)‖Axk−y‖r
Lp(·)with γ < 1/2
φ∗k+1 = −A∗JLp(·)r (Axk+1 − y) + βk+1φ
∗k
and recompute p(·) = 1 + [xk(·)−min(xk)]/[max(xk)−min(xk)]
each m iterations by using the last iteration xk .
[Proof of convergence of CG for Lp (constant) Lebesgue spaces in 2017].
Numerical results
True image Point Spread Function Blurred image (noise = 4.7%)
Special thanks to Brigida Bonino (Univ. Genova).
True image Blurred (noise = 4.7%) p = 2 (0.2692)
p = 1.3 (0.2681) p = 1.3− 1.6 (0.2473) 1.3− 1.6 and irreg. (0.2307)
True image Point Spread Function Blurred image (noise = 4.7%)
A remote sensing application: spatial resolutionenhancement of microwave radiometer data
It is an under-determined problem (with a special structured matrix), since we want toreconstruct the High frequency components reduced by the Low Pass filter of the radiometer.
Courtesy: monde-geospatial.com
Joint work with: Matteo Alparone, Flavia Lenti, Maurizio Migliaccio, Nando Nunziata(Dep. of Engineering, Univ. Napoli Parthenope)
0 200 400 600 800 1000 1200 1400-50
0
50
100
150
200
250
0 200 400 600 800 1000 1200 1400-50
0
50
100
150
200
250
p=2 p=1.2
0 200 400 600 800 1000 1200 1400-50
0
50
100
150
200
250
Variable ppmin = 1.2;pmax = 2;lambda = 0.3;maxiter = 2000;eThr = 0.12;fKind = 1;
0 200 400 600 800 1000 1200 1400-50
0
50
100
150
200
250
0 200 400 600 800 1000 1200 1400-50
0
50
100
150
200
250
RECT E DOUBLE RECT
0 200 400 600 800 1000 1200 1400-50
0
50
100
150
200
250
Variable ppmin = 1.2;pmax = 2;lambda = 0.3;maxiter = 2000;eThr = 0.12;fKind = 1;
0 10 20 30 40 50 60 70100
150
200
250
300Signal
0 500 1000 15000
50
100
150
200
250
300
350
400Res En Constant p = 1.2
xconstTB37
0 500 1000 150050
100
150
200
250
300
350
400
450Res En variable p
xvarTB37
An inverse scattering application: microwave tomography
Microwave imaging is a noninvasive and nondestructive technique to inspect materials byusing incident electromagnetic waves generated in the microwave range (300 MHz - 300GHz). It is a nonlinear, implicit and ill-posed problem.
Joint work with: Alessandro Fedeli, Matteo Pastorino, Andrea Randazzo (Dep. of Engi-neering, Univ. Genoa)
1
𝐿2 𝐿1.2 𝐿𝑝(1.4−2)
𝜖𝑟 = 2, SNR=15dB
𝜖𝑟 = 3, SNR=30dB
2
Thank you for your attention
References
[1] F. Schopfer, A.K. Louis, T. Schuster, Nonlinear iterative methods for linear ill-posed problems in Banach
spaces, Inverse Problems, vol. 22, pp. 311–329, 2006.
[2] M. O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, F. Lenzen, Variational Methods in Imaging,
Springer, Berlin, 2008.
[3] Schuster, T., Kaltenbacher, B., Hofmann, B., and Kazimierski, K. S., Regularization Methods in Banach
Spaces. Radon Series on Computational and Applied Mathematics, vol. 10, De Gruyter, 2012.
[4] L. Diening, P. Harjulehto, P. Hasto, M. Ruzicka, Lebesgue and Sobolev Spaces with Variable Exponents,
Lecture Notes in Mathematics, vol. 2017, Springer, 2011.
[5] C. Estatico, M. Pastorino, and A. Randazzo, A novel microwave imaging approach based on regularization
in Lp Banach spaces, IEEE Transactions on Antennas and Propagation, 60, 3373–3381, 2012.
[6] P. Brianzi, F. Di Benedetto, C. Estatico, Preconditioned iterative regularization in Banach spaces, Com-
putational Optimization and Applications, vol. 54, pp. 263–282, 2013.
[7] F. Lenti, F. Nunziata, C. Estatico, M. Migliaccio, On the resolution enhancement of radiometer data in
Banach spaces, IEEE Trans. Geoscience and Remote Sensing, vol. 52, pp. 1834–1842, 2014.
[8] P. Dell’Acqua, C. Estatico, Acceleration of multiplicative iterative algorithms for image deblurring by
duality maps in Banach spaces, Applied Numerical Mathematics, vol. 99, pp. 121–136, 2016.
[9] C. Estatico, S. Gratton, F. Lenti, D. Titley-Peloquin, A conjugate gradient like method for p-norm
minimization in functional spaces, Numerische Mathematik, vol. 137, pp. 895–922, 2017.
[10] P. Brianzi, F. Di Benedetto, C. Estatico, L. Surace Irregularization accelerates iterative regularization,
Calcolo, vol. 2/18, 2018.