Randomized dimension reduction for full-waveform inversion · 2019. 9. 6. · SLIM Impediments...

Post on 26-Aug-2020

0 views 0 download

Transcript of Randomized dimension reduction for full-waveform inversion · 2019. 9. 6. · SLIM Impediments...

SLIMUniversityofBritishColumbia

FelixJ.Hermann

Randomized dimension reduction for full-waveform inversion

Released to public domain under Creative Commons license type BY (https://creativecommons.org/licenses/by/4.0).Copyright (c) 2010 SLIM group @ The University of British Columbia.

SLIMUniversityofBritishColumbia

FelixJ.Hermann,PeymanMoghaddam,andXiangLi

Randomized dimension reduction for full-waveform inversion

SLIM

ImpedimentsFull-waveform inversion (FWI) is suffering from

• multimodality, i.e,. a multitude of velocity models explain data

• local minima, i.e., requirement of an accurate initial model

• over- and underdeterminacy

Curse of dimensionality for d>2

• requires implicit (Helmholtz) solvers to address bandwidth

• # RHS’s makes computation of gradients prohibitively expensive

[Symes, ‘08]

SLIM

Wish listAn inversion technology that

• is based on a time-harmonic PDE solver, which is easily parallelizable, and scalable to 3D

• does not require multiple iterations with all data

• removes the linearly increasing costs of implicit solvers for increasing numbers of frequencies & RHS’s

• allows for a dimensionality reduction commensurate the model’s complexity

SLIM

Key technologiesNumerical linear algebra [Erlangga & Nabben,’08, Erlanga & FJH ’08-’09]

• multi-level Krylov preconditioner for Helmholtz

Simultaneous sources

• supershots

Stochastic optimization & machine learning [Bersekas, ’96]

• stochastic gradient decent

Compressive sensing [Candès et.al, Donoho, ’06]

• sparse recovery & randomized subsampling

[Beasley, ’98, Neelamani et. al,, ’08]

[Krebs et.al., ’09, Operto et. al., ’09, FJH et.al., ’08-10’]

SLIM

Full-waveform inversionSingle-source / frequency PDE-constrained optimization problem:

minu!U ,m!M

12!p"Du

!!2

2subject to H[m]u = q

p = Single-source and single-frequency dataD = Detection operatoru = Solution of the Helmholtz equationH = Discretized monochromatic Helmholtz systemq = Unknown seismic sourcem = Unknown model, e.g. c!2(x)

SLIM

Unconstrained problem

For each separate source q solve the unconstrained problem:

with q a single source function and

• Gradient updates involve 3 PDE solves• Increased matrix bandwith of Helmholtz discretization makes scaling to

3D challenging for direct methods...

F [m,q] = DH!1[m]q

minm!M

12!p" F [m,q]!2

2

SLIM

Preconditioner

Clustering around one

!(H) !(HM!1) !(HM!1Q)

[Erlanga, Nabben, ’08][Erlanga and F.J.H, ‘08]

M!! Q

• shifts the eigenvalues to positive half plane - solves indefiniteness)

• clusters eigenvalues near one - solves ill-conditioning

SLIM

Scaling

0 5 10 15 20 25 300

50

100

150

200

250

300

Frequency, Hz

MKMGMG

Number of iterationNumber of matrix-vec multiplies

10 15 20 25 30106

107

108

109

1010

Frequency, Hz

Unit

mem

ory

Direct method, LU

Iterative method

751! 201 2001! 5341501! 401 �"#$%"&ÿ(

SLIM

Example: Marmousi

x−axis, meter

dept

h, m

eter

0 1000 2000 3000 4000 5000 6000 7000

0

500

1000

1500

2000

2500

x−axis, meter

dept

h, m

eter

0 1000 2000 3000 4000 5000 6000 7000

0

500

1000

1500

2000

2500

Hard Model1500-4000 m/s

Smooth Model

SLIM

Simulations

grid in x−direction

grid

in d

epth

dire

ctio

n

Real part of u, freq = 10 Hz, 9 grid/wavelength

50 100 150 200 250 300 350

20

40

60

80

100

grid in x−direction

grid

in d

epth

dire

ctio

n

Real part of u, freq = 10 Hz, 18 grid/wavelength

100 200 300 400 500 600 700

50

100

150

200

SLIM

Convergence

0 5 10 15 20 25 300

10

20

30

40

50

Frequency, Hz

No. I

tera

tion

Hard modelSmooth model

SLIM

ObservationsImplicit solver that

• converges for high frequencies

• scales & embarrassingly parallelizable

• same of order complexity as TDFD

• trivial implementation of imaging conditions

But it’s complexity grows linearly with the number of frequencies and sources...

SLIM

Full-waveform inversionMultiexperiment PDE-constrained optimization problem:

P = Total multi-source and multi-frequency data volumeU = Solution of the Helmholtz equationH = Discretized multi-frequency Helmholtz systemQ = Unknown seismic sources

minU!U ,m!M

12!P"DU

!!2

2,2subject to H[m]U = Q

SLIM

0 50 100 150 200 250!5

!4

!3

!2

!1

0

1

2

3

4 x 104

Gridpoints in x!directionGridpoints

Gridpoints

50 100 150 200 250

20

40

60

80

100

120

simultaneous source Randomized amplitudes along the shot line

Simultaneous sources

Randomized superposition of sequential source functions creates a supershot

[Beasley, ’98, Neelamani et. al,, ’08, Krebs et.al., ’09, Operto & Virieux, ’09-’10, FJH et.al., ’08-10’]

SLIM

Gridpoints

Gridpoints

50 100 150 200 250

20

40

60

80

100

120

GridpointsGridpoints

50 100 150 200 250

20

40

60

80

100

120

sequential sourcewavefield

Simultaneous shotat 5 Hz

simultaneous sourcewavefield

Notice the increased wavenumber contend

SLIM

Image

Increased wavenumber contend leads to improved image/gradient updates ...

50 100 150 200 250

20

40

60

80

100

120 −2

−1.5

−1

−0.5

0

0.5

1

1.5x 10−6

50 100 150 200 250

20

40

60

80

100

120 −5

0

5

10x 10−6

sequential sourceimage

simultaneous sourceimage

SLIM

Supershots

H-1

RM =

sub sampler! "# $%

&&&&&&'

R!1 ! I!R"

1

...

R!ns! ! I!R"

ns!

(

))))))*

random phase encoder! "# $+F!

2 diag+ei!

,! I

,F3, (3)

where F2,3 are the 2,3-D Fourier transforms, and where ! = Uniform([0, 2!]) is a random

phase rotation. Notice that the F2 and phase rotations act along the source/receiver coor-

dinates. Application of this CS-sampling matrix, RM, to the original source wavefields in

s turns these single shots into a subset (n"s " ns) of time-harmonic simultaneous sources

that are randomly phase encoded and that have for each simultaneous shot a di!erent set of

angular frequencies missing—i.e., there are n"f " nf frequencies non-zero (see Figure 2(a)).

Because seismic data is bandwidth limited, we sample with a probability that is weighted

by the power spectrum of the source wavelet. The advantage of this implementation is that

it is matrix-free, fast, and it turns interferences into harmless noise (see Figure 2(b)).

The sparsfying transform: Aside from proper CS sampling the recovery from simulta-

neous simulations depends on a sparsifying transform that compresses seismic data, is fast,

and reasonably incoherent with the CS sampling matrix. We accomplish this by defining

the sparsity transform as the Kronecker product between the 2-D discrete curvelet trans-

form (Candes et al., 2006) along the source-receiver coordinates, and the discrete wavelet

transform along the time coordinate—i.e., S := C !W with C, W the curvelet- and

wavelet-transform matrices, respectively.

8

n!s ! ns

DQ DQRM

SLIM

Dimensionality reduction

Reduced system

!"#

"$

Q = D! s%&'(single shots

HU = Q!"

!#

$

Q = D! RMs% &' (simul. shots

HU = Q

P:=RMDU=DU

minU!U ,m!M

12!P"DU!2

2 subject to H[m]U = Q

[Neelamani et.al., ’08, Krebs et. al., ’09, FJH et. al., ’08-’10]

SLIM

Reduced gradientReplace gradient updates for all sequential sources:

by

• N’<<N number of multifrequency simultaneous experiments

• creates incoherent “Gaussian” simultaneous-source crosstalk

mk+1 := mk ! !k

N!

i=1

"f(mk,qi) with qi := ithCol(Q)

mk+1 := mk ! !k

N !!

i=1

"f(mk,qi) with qi := (RM)i Q

SLIM

Stochastic optimization

Stochastic “batch” gradient decent :

• for , the updates become deterministic

• prohibitively expensive

n!"

[Bersekas, ’96, Nemirovski, ’09]

mk+1 := mk ! !k1n

n!

i=1

"f(mk,qi) with qi := (RM)i Q

[Robbins and Monro, 1951]

SLIM

Stochastic Average Approximation (SAA)

Approximate expectation with ensemble average

• for becomes equality

• well studied and known as Monte Carlo sampling

• slow but embarrassingly parallel

E{f(m, q)} ! 1N

N!

i=1

minm f(m,qi) with qi := (RM)i Q

n!"

[Bersekas, ’96, Nemirovski, ’09]

SLIM

Renewals

Use different supershots for each (gradient) update:

‣ uses different random RM for each iteration

‣ cheap but introduces more “noise” and does not converge...

[Krebs et.al, ’09]

mk+1 := mk ! !k"f(mk,Qk) with Qk := (RM)k Q

SLIM

Stochastic Approximation (SA)

Stochastic “online” gradient descent with mini batches :

[Bersekas, ’96]

mk+1 = mk ! !k"f(mk,Qk) with Qk := (RM)k Q

mk+1 =1

k + 1

!k"

i=1

mi + mk+1

#

‣ averages over the past iterations and converges

‣ reasonable well understood

‣ not understood for (quasi) Newton source: http://en.wikipedia.org/wiki/Stochastic_gradient_descent

SLIM

offset (m)

dept

h (m

)

0 500 1000 1500 2000 2500 3000 3500 4000 4500

0

500

1000

1500

2000

25002

2.5

3

3.5

4

4.5

offset (m)

dept

h (m

)

0 500 1000 1500 2000 2500 3000 3500 4000 4500

0

500

1000

1500

2000

2500

1.5

2

2.5

3

3.5

4

4.5

5

5.5

Marmoussi modeloriginal model initial model

SLIM

offset (m)

dept

h (m

)

0 500 1000 1500 2000 2500 3000 3500 4000 4500

0

500

1000

1500

2000

2500

1.5

2

2.5

3

3.5

4

4.5

5

5.5

offset (m)

dept

h (m

)

0 500 1000 1500 2000 2500 3000 3500 4000 4500

0

500

1000

1500

2000

2500

1.5

2

2.5

3

3.5

4

4.5

5

5.5

Full-waveform inversionrecovered model

l-BFGSrecovered

stochastic gradient

SLIM

Speed upFull scenario:

‣ 113 sequential shots with 50 frequencies

‣ 18 iterations of l-BFGS (90=5*18 Helmholtz solves)

Reduced scenario:

‣ 16 randomized simultaneous shots with 4 frequencies

‣ 40 iterations of SA (2.27=16*4/(113*50)*40*5 solves)

Speed up of 40 X or > week vs 8 h on 32 CPUs

SLIM

Observations

SAA can be applied to supercharge FWI

But it

• requires care with line searches

• does not extend to Newton methods

• is relatively poorly understood mathematically

What does Compressive Sensing have to offer?

SLIM

Compressive recovery

Consider “Newton” updates of the reduced system as sparse recovery problems

• feasible because of the reduced system size

• imposes transform-domain sparsity on the updates

• corresponds to sparse linearized inversions

SLIM

Sparse linearized inversion

Invert adjoint of the Jacobian, i.e., linearized Born approximation

with obtained via sparse inversion

where

!mk = SH x

!dk = K[mk,Q]!mk with !dk = vec(P!F [mk!Q])

A := RMKSH = KSH and b = !dk := RM!dk

x = arg minx

||x||1 subject to b = Ax

SLIM

Initial model

Lateral ( × 15 meters)

Dept

h ( ×

15

met

ers)

50 100 150 200 250

20

40

60

80

100

1202000

2500

3000

3500

4000

4500

SLIM

Lateral ( × 15 meters)

Dep

th ( ×

15

met

ers)

50 100 150 200 250

20

40

60

80

100

120 −40

−20

0

20

40

60

80

Dep

th ( ×

15

met

ers)

Lateral ( × 15 meters)

50 100 150 200 250

20

40

60

80

100

120−40

−20

0

20

40

60

80

true reflectivity sparse recovery with wavelets

Linearized sparse inversion

30 simultaneous shots 10 random frequencies

SLIMD

epth

( ×

15

met

ers)

Lateral ( × 15 meters)

50 100 150 200 250

20

40

60

80

100

120−40

−20

0

20

40

60

80

Lateral ( × 15 meters)

Dep

th ( ×

15

met

ers)

50 100 150 200 250

20

40

60

80

100

120 −40

−20

0

20

40

60

Linearized sparse inversion

20 simultaneous shots 10 random frequencies

true reflectivity sparse recovery with wavelets

SLIMD

epth

( ×

15

met

ers)

Lateral ( × 15 meters)

50 100 150 200 250

20

40

60

80

100

120−40

−20

0

20

40

60

80

Lateral ( × 15 meters)

Dep

th ( ×

15

met

ers)

50 100 150 200 250

20

40

60

80

100

120 −40

−20

0

20

40

60

80

Linearized sparse inversion

10 simultaneous shots 5 random frequencies

true reflectivity sparse recovery with wavelets

SLIM

Subsample ratio 0.015 0.006 0.002

n!f/n!

s recovery error (dB)

5 17.44 (1.32) 11.66 (0.78) 6.83 (-0.14)1 17.53 (1.59) 11.89 (1.05) 7.19 (0.15)0.2 18.22 (1.68) 12.11 (1.32) 7.46 (0.27)

Speed up (!) 66 166 500

Linearized sparse inversion

errors for “migration” in parentheses

SLIM

Observations

Reconstruct model updates

‣ from randomized subsamplings

‣ with correct amplitudes (like Newton updates)

Removed the “curse of dimensionality” by reducing the number of RHS’s

Recovery quality depends on degree of subsampling

What about stochastic optimization?

SLIM

Lateral ( × 15 meters)

Dep

th ( ×

15 m

eter

s)

50 100 150 200 250

20

40

60

80

100

120 −20

−10

0

10

20

30

Reduced batch

10 X

SLIM

Lateral ( × 15 meters)

Dep

th ( ×

15 m

eter

s)

50 100 150 200 250

20

40

60

80

100

120−20

−10

0

10

20

30

Stochasticmini batch

66 X

SLIM

Lateral ( × 15 meters)

Dep

th ( ×

15 m

eter

s)

50 100 150 200 250

20

40

60

80

100

120 −20

−10

0

10

20

30

Stochastic average approximation

66 X

SLIM

Stochastic mini batch

Algorithm 1: FWI without renewal of CS experiments.Result: Estimate for the model !mm !" m0 ; // initial model

j !" 0 ; // loop counter

Q!" (RM)Q ; // Draw random sim. shot

while #P"F [m,Q]#22,2 $ ! doj := j + 1; // increase counter

A!" K[m,Q]S! ; // Calculate Jacobian

"P = P"F [m,Q] ; // Calculate residual

"!x = arg min!x #"x#"1 s.t. #"P"A"x#2 % # ; // Sparse recovery

m!"m + S!"!x ; // Compute model update

end

SLIM

SA mini batch

Algorithm 1: FWI with renewal of CS experiments.Result: Estimate for the model !mm !" m0 ; // initial model

j !" 0 ; // loop counter

Q!" (RM)iQ ; // Draw random sim. shot

while #P"F [m,Q]#22,2 $ ! doj := j + 1; // increase counter

A!" K[m,Q]S! ; // Calculate Jacobian

"P = P"F [m,Q] ; // Calculate residual

"!x = arg min!x #"x#"1 s.t. #"P"A"x#2 % # ; // Sparse recovery

m!"m + S!"!x ; // Compute model update

Q!" (RM)iQ ; // Draw random sim. shot

end

SLIM

Lateral ( × 15 meters)

Dept

h ( ×

15

met

ers)

50 100 150 200 250

20

40

60

80

100

1201500

2000

2500

3000

3500

4000

4500

Lateral ( × 15 meters)

Dept

h ( ×

15

met

ers)

50 100 150 200 250

20

40

60

80

100

120 1500

2000

2500

3000

3500

4000

4500

5000

with renewal(SNR = 20.8)

without renewal(SNR = 19.9)

Update with sparse recoveries

10 iterations

SLIM

Lateral ( × 15 meters)

Dep

th ( ×

15 m

eter

s)

50 100 150 200 250

20

40

60

80

100

120−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

x 10−5

Lateral ( × 15 meters)

Dep

th ( ×

15 m

eter

s)

50 100 150 200 250

20

40

60

80

100

120−1.5

−1

−0.5

0

0.5

1

1.5x 10−5

1st update 9th update

Updates

SLIM

Conclusions

Reconstruct “Newton-like” updates from randomized subsamplings

Remove the “curse of dimensionality”

Algorithms have parallel pathways

Results are encouraging but rigorous theory still lacking

SLIM

AcknowledgmentsThis work was in part financially supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grant (22R81254) and the Collaborative Research and Development Grant DNOISE (334810-05).

This research was carried out as part of the SINBAD II project with support from the following organizations: BG Group, BP, Petrobras, and WesternGeco. 

SLIM

Thank you

slim.eos.ubc.ca

Further readingCompressive sensing

– Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information by Candes, 06.

– Compressed Sensing by D. Donoho, ’06Simultaneous acquisition

– A new look at simultaneous sources by Beasley et. al., ’98.– Changing the mindset in seismic data acquisition by Berkhout ’08.

Simultaneous simulations, imaging, and full-wave inversion:

– Faster shot-record depth migrations using phase encoding by Morton & Ober, ’98.– Phase encoding of shot records in prestack migration by Romero et. al., ’00.

– Efficient Seismic Forward Modeling using Simultaneous Random Sources and Sparsity by N. Neelamani et. al., ’08.– Compressive simultaneous full-waveform simulation by FJH et. al., ’09.

– Fast full-wavefield seismic inversion using encoded sources by Krebs et. al., ’09– Randomized dimensionality reduction for full-waveform inversion by FJH & X. Li, ’10

Stochastic optimization and machine learning:

– A Stochastic Approximation Method by Robbins and Monro, 1951– Neuro-Dynamic Programming by Bersekas, ’96

– Robust stochastic approximation approach to stochastic programming by Nemirovski et. al., ’09

– Stochastic Approximation and Recursive Algorithms and Applications by Kushner and Lin– Stochastic Approximation approach to Stochastic Programming by Nemirovski