Randomized dimension reduction for full-waveform inversion · 2019. 9. 6. · SLIM Impediments...

SLIMUniversityofBritishColumbia

FelixJ.Hermann

Randomized dimension reduction for full-waveform inversion

SLIMUniversityofBritishColumbia

FelixJ.Hermann,PeymanMoghaddam,andXiangLi

Randomized dimension reduction for full-waveform inversion

ImpedimentsFull-waveform inversion (FWI) is suffering from

• multimodality, i.e,. a multitude of velocity models explain data

• local minima, i.e., requirement of an accurate initial model

• over- and underdeterminacy

Curse of dimensionality for d>2

• requires implicit (Helmholtz) solvers to address bandwidth

• # RHS’s makes computation of gradients prohibitively expensive

[Symes, ‘08]

Wish listAn inversion technology that

• is based on a time-harmonic PDE solver, which is easily parallelizable, and scalable to 3D

• does not require multiple iterations with all data

• removes the linearly increasing costs of implicit solvers for increasing numbers of frequencies & RHS’s

• allows for a dimensionality reduction commensurate the model’s complexity

Key technologiesNumerical linear algebra [Erlangga & Nabben,’08, Erlanga & FJH ’08-’09]

• multi-level Krylov preconditioner for Helmholtz

Simultaneous sources

• supershots

Stochastic optimization & machine learning [Bersekas, ’96]

• stochastic gradient decent

Compressive sensing [Candès et.al, Donoho, ’06]

• sparse recovery & randomized subsampling

[Beasley, ’98, Neelamani et. al,, ’08]

[Krebs et.al., ’09, Operto et. al., ’09, FJH et.al., ’08-10’]

Full-waveform inversionSingle-source / frequency PDE-constrained optimization problem:

minu!U ,m!M

12!p"Du

2subject to H[m]u = q

p = Single-source and single-frequency dataD = Detection operatoru = Solution of the Helmholtz equationH = Discretized monochromatic Helmholtz systemq = Unknown seismic sourcem = Unknown model, e.g. c!2(x)

Unconstrained problem

For each separate source q solve the unconstrained problem:

with q a single source function and

• Gradient updates involve 3 PDE solves• Increased matrix bandwith of Helmholtz discretization makes scaling to

3D challenging for direct methods...

F [m,q] = DH!1[m]q

minm!M

12!p" F [m,q]!2

Preconditioner

Clustering around one

!(H) !(HM!1) !(HM!1Q)

[Erlanga, Nabben, ’08][Erlanga and F.J.H, ‘08]

• shifts the eigenvalues to positive half plane - solves indefiniteness)

• clusters eigenvalues near one - solves ill-conditioning

Scaling

0 5 10 15 20 25 300

Frequency, Hz

MKMGMG

Number of iterationNumber of matrix-vec multiplies

10 15 20 25 30106

Frequency, Hz

Direct method, LU

Iterative method

751! 201 2001! 5341501! 401 �"#$%"&ÿ(

Example: Marmousi

x−axis, meter

0 1000 2000 3000 4000 5000 6000 7000

x−axis, meter

0 1000 2000 3000 4000 5000 6000 7000

Hard Model1500-4000 m/s

Smooth Model

Simulations

grid in x−direction

Real part of u, freq = 10 Hz, 9 grid/wavelength

50 100 150 200 250 300 350

grid in x−direction

Real part of u, freq = 10 Hz, 18 grid/wavelength

100 200 300 400 500 600 700

Convergence

0 5 10 15 20 25 300

Frequency, Hz

Hard modelSmooth model

ObservationsImplicit solver that

• converges for high frequencies

• scales & embarrassingly parallelizable

• same of order complexity as TDFD

• trivial implementation of imaging conditions

But it’s complexity grows linearly with the number of frequencies and sources...

Full-waveform inversionMultiexperiment PDE-constrained optimization problem:

P = Total multi-source and multi-frequency data volumeU = Solution of the Helmholtz equationH = Discretized multi-frequency Helmholtz systemQ = Unknown seismic sources

minU!U ,m!M

12!P"DU

2,2subject to H[m]U = Q

0 50 100 150 200 250!5

4 x 104

Gridpoints in x!directionGridpoints

Gridpoints

50 100 150 200 250

simultaneous source Randomized amplitudes along the shot line

Simultaneous sources

Randomized superposition of sequential source functions creates a supershot

[Beasley, ’98, Neelamani et. al,, ’08, Krebs et.al., ’09, Operto & Virieux, ’09-’10, FJH et.al., ’08-10’]

Gridpoints

50 100 150 200 250

GridpointsGridpoints

50 100 150 200 250

sequential sourcewavefield

Simultaneous shotat 5 Hz

simultaneous sourcewavefield

Notice the increased wavenumber contend

Increased wavenumber contend leads to improved image/gradient updates ...

50 100 150 200 250

120 −2

−1.5

−0.5

1.5x 10−6

50 100 150 200 250

120 −5

10x 10−6

sequential sourceimage

simultaneous sourceimage

Supershots

sub sampler! "# $%

&&&&&&'

R!1 ! I!R"

R!ns! ! I!R"

))))))*

random phase encoder! "# $+F!

2 diag+ei!

,F3, (3)

where F2,3 are the 2,3-D Fourier transforms, and where ! = Uniform([0, 2!]) is a random

phase rotation. Notice that the F2 and phase rotations act along the source/receiver coor-

dinates. Application of this CS-sampling matrix, RM, to the original source wavefields in

s turns these single shots into a subset (n"s " ns) of time-harmonic simultaneous sources

that are randomly phase encoded and that have for each simultaneous shot a di!erent set of

angular frequencies missing—i.e., there are n"f " nf frequencies non-zero (see Figure 2(a)).

Because seismic data is bandwidth limited, we sample with a probability that is weighted

by the power spectrum of the source wavelet. The advantage of this implementation is that

it is matrix-free, fast, and it turns interferences into harmless noise (see Figure 2(b)).

The sparsfying transform: Aside from proper CS sampling the recovery from simulta-

neous simulations depends on a sparsifying transform that compresses seismic data, is fast,

and reasonably incoherent with the CS sampling matrix. We accomplish this by defining

the sparsity transform as the Kronecker product between the 2-D discrete curvelet trans-

form (Candes et al., 2006) along the source-receiver coordinates, and the discrete wavelet

transform along the time coordinate—i.e., S := C !W with C, W the curvelet- and

wavelet-transform matrices, respectively.

n!s ! ns

DQ DQRM

Dimensionality reduction

Reduced system

Q = D! s%&'(single shots

HU = Q!"

Q = D! RMs% &' (simul. shots

HU = Q

P:=RMDU=DU

minU!U ,m!M

12!P"DU!2

2 subject to H[m]U = Q

[Neelamani et.al., ’08, Krebs et. al., ’09, FJH et. al., ’08-’10]

Reduced gradientReplace gradient updates for all sequential sources:

• N’<<N number of multifrequency simultaneous experiments

• creates incoherent “Gaussian” simultaneous-source crosstalk

mk+1 := mk ! !k

"f(mk,qi) with qi := ithCol(Q)

mk+1 := mk ! !k

"f(mk,qi) with qi := (RM)i Q

Stochastic optimization

Stochastic “batch” gradient decent :

• for , the updates become deterministic

• prohibitively expensive

[Bersekas, ’96, Nemirovski, ’09]

mk+1 := mk ! !k1n

"f(mk,qi) with qi := (RM)i Q

[Robbins and Monro, 1951]

Stochastic Average Approximation (SAA)

Approximate expectation with ensemble average

• for becomes equality

• well studied and known as Monte Carlo sampling

• slow but embarrassingly parallel

E{f(m, q)} ! 1N

minm f(m,qi) with qi := (RM)i Q

[Bersekas, ’96, Nemirovski, ’09]

Renewals

Use different supershots for each (gradient) update:

‣ uses different random RM for each iteration

‣ cheap but introduces more “noise” and does not converge...

[Krebs et.al, ’09]

mk+1 := mk ! !k"f(mk,Qk) with Qk := (RM)k Q

Stochastic Approximation (SA)

Stochastic “online” gradient descent with mini batches :

[Bersekas, ’96]

mk+1 = mk ! !k"f(mk,Qk) with Qk := (RM)k Q

mk+1 =1

mi + mk+1

‣ averages over the past iterations and converges

‣ reasonable well understood

‣ not understood for (quasi) Newton source: http://en.wikipedia.org/wiki/Stochastic_gradient_descent

offset (m)

0 500 1000 1500 2000 2500 3000 3500 4000 4500

offset (m)

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Marmoussi modeloriginal model initial model

offset (m)

0 500 1000 1500 2000 2500 3000 3500 4000 4500

offset (m)

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Full-waveform inversionrecovered model

l-BFGSrecovered

stochastic gradient

Speed upFull scenario:

‣ 113 sequential shots with 50 frequencies

‣ 18 iterations of l-BFGS (90=5*18 Helmholtz solves)

Reduced scenario:

‣ 16 randomized simultaneous shots with 4 frequencies

‣ 40 iterations of SA (2.27=16*4/(113*50)*40*5 solves)

Speed up of 40 X or > week vs 8 h on 32 CPUs

Observations

SAA can be applied to supercharge FWI

But it

• requires care with line searches

• does not extend to Newton methods

• is relatively poorly understood mathematically

What does Compressive Sensing have to offer?

Compressive recovery

Consider “Newton” updates of the reduced system as sparse recovery problems

• feasible because of the reduced system size

• imposes transform-domain sparsity on the updates

• corresponds to sparse linearized inversions

Sparse linearized inversion

Invert adjoint of the Jacobian, i.e., linearized Born approximation

with obtained via sparse inversion

!mk = SH x

!dk = K[mk,Q]!mk with !dk = vec(P!F [mk!Q])

A := RMKSH = KSH and b = !dk := RM!dk

x = arg minx

||x||1 subject to b = Ax

Initial model

Lateral ( × 15 meters)

h ( ×

50 100 150 200 250

1202000

th ( ×

50 100 150 200 250

120 −40

th ( ×

50 100 150 200 250

120−40

true reflectivity sparse recovery with wavelets

Linearized sparse inversion

30 simultaneous shots 10 random frequencies

50 100 150 200 250

120−40

th ( ×

50 100 150 200 250

120 −40

50 100 150 200 250

120−40

th ( ×

50 100 150 200 250

120 −40

Subsample ratio 0.015 0.006 0.002

n!f/n!

s recovery error (dB)

5 17.44 (1.32) 11.66 (0.78) 6.83 (-0.14)1 17.53 (1.59) 11.89 (1.05) 7.19 (0.15)0.2 18.22 (1.68) 12.11 (1.32) 7.46 (0.27)

Speed up (!) 66 166 500

errors for “migration” in parentheses

Observations

Reconstruct model updates

‣ from randomized subsamplings

‣ with correct amplitudes (like Newton updates)

Removed the “curse of dimensionality” by reducing the number of RHS’s

Recovery quality depends on degree of subsampling

What about stochastic optimization?

th ( ×

50 100 150 200 250

120 −20

Reduced batch

th ( ×

50 100 150 200 250

120−20

Stochasticmini batch

th ( ×

50 100 150 200 250

120 −20

Stochastic average approximation

Stochastic mini batch

Algorithm 1: FWI without renewal of CS experiments.Result: Estimate for the model !mm !" m0 ; // initial model

j !" 0 ; // loop counter

Q!" (RM)Q ; // Draw random sim. shot

while #P"F [m,Q]#22,2 $ ! doj := j + 1; // increase counter

A!" K[m,Q]S! ; // Calculate Jacobian

"P = P"F [m,Q] ; // Calculate residual

"!x = arg min!x #"x#"1 s.t. #"P"A"x#2 % # ; // Sparse recovery

m!"m + S!"!x ; // Compute model update

SA mini batch

Algorithm 1: FWI with renewal of CS experiments.Result: Estimate for the model !mm !" m0 ; // initial model

j !" 0 ; // loop counter

Q!" (RM)iQ ; // Draw random sim. shot

while #P"F [m,Q]#22,2 $ ! doj := j + 1; // increase counter

A!" K[m,Q]S! ; // Calculate Jacobian

"P = P"F [m,Q] ; // Calculate residual

"!x = arg min!x #"x#"1 s.t. #"P"A"x#2 % # ; // Sparse recovery

m!"m + S!"!x ; // Compute model update

Q!" (RM)iQ ; // Draw random sim. shot

h ( ×

50 100 150 200 250

1201500

h ( ×

50 100 150 200 250

120 1500

with renewal(SNR = 20.8)

without renewal(SNR = 19.9)

Update with sparse recoveries

10 iterations

th ( ×

50 100 150 200 250

120−2.5

−1.5

−0.5

x 10−5

th ( ×

50 100 150 200 250

120−1.5

−0.5

1.5x 10−5

1st update 9th update

Updates

Conclusions

Reconstruct “Newton-like” updates from randomized subsamplings

Remove the “curse of dimensionality”

Algorithms have parallel pathways

Results are encouraging but rigorous theory still lacking

AcknowledgmentsThis work was in part financially supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grant (22R81254) and the Collaborative Research and Development Grant DNOISE (334810-05).

This research was carried out as part of the SINBAD II project with support from the following organizations: BG Group, BP, Petrobras, and WesternGeco.

Thank you

slim.eos.ubc.ca

Randomized dimension reduction for full-waveform inversion · 2019. 9. 6. · SLIM Impediments...

Documents

Transcript of Randomized dimension reduction for full-waveform inversion · 2019. 9. 6. · SLIM Impediments...

Full Waveform Inversion (FWI) with wave-equation · PDF fileFull Waveform Inversion (()FWI) with wave-equation migration (WEM) and well control(WEM) and well control Garyyg Margrave

Two-dimensional frequency-domain visco-elastic full waveform inversion … · 2013-10-22 · Full waveform inversion (FWI) is an appealing seismic data-ﬁtting procedure for the

Orthorhombic full-waveform inversion for imaging … full-waveform inversion for imaging the ... we have developed a practical orthorhombic full-waveform inversion ... The combined

Full-waveform inversion based on gradient sampling algorithmsgpnus.org/presentations/NUPRI_Slides_2018/Full-waveform inversio… · Full-waveform inversion based on gradient sampling

Journal of Applied Geophysicseost.u-strasbg.fr/opomiv/images/Publications/PDFs/SuperS... · 2014-10-06 · Full elastic Waveform Inversion(FWI)onadatasetacquiredonaclayey landslide.

Pitfalls and Challenges of Seismic Imaging · A. WEGLEIN, TLE, 2013 The present status of FWI The so-called “full-waveform inversion” or FWI is “«…technical bubble, and self-proclaimed

INITIAL MODEL DESIgN FOR FuLL-wAvEFORM INvERSION ... · Full-Waveform Inversion (FWI) is a high resolution seismic tomograph technique ased on the numerical solution of the waves

Pre-stack full waveform inversion of ultra-high-frequency marine seismic … · 2018-03-20 · The full waveform inversion (FWI) of seismic reﬂection data aims to reconstruct a

Multi-parameter inversion through offset dependent elastic FWI

Waveform inversion by Stochastic optimizationWaveform inversion by Stochastic optimization Thursday, December 9, 2010. SLIM Costs per iteration of FWI grows ... ‣Stochastic optimization

y Advanced Geophysical Technology, Inc. 14100 Southwest ... · INTRODUCTION Full waveform inversion (FWI), mathematically built on nonlinear optimization algorithms, is an advanced

Z. Zhang (KAUST), E. Zabihi Naeini* (Ikon Science), T ... 2017_Facies Constrained... · WK($*(&RQIHUHQFH ([KLELWLRQ 3DULV )UDQFH -XQH Introduction Full waveform inversion (FWI), in

Journal of Computational Physics - Purdue University · 2014. 10. 19. · In seismic inversion, full-waveform inversion (FWI) [28,41] is a promising technique to reconstruct subsurface

Elastic full-waveform inversion for VTI media: Methodology ...itsvanki/Papers_pdf/pap16_2.pdfdata. Elastic FWI of synthetic multicomponent surface data (con-sisting of both diving

FWI – Full Waveform Inversion - GeoTomo LLCgeotomo.deckermedia.com/Paper-FWI.pdfFWI – Full Waveform Inversion GeoTomo’s FWI solution is designed for land, marine and OBC data.

Full waveform inversion – the state of the art · 2019-06-04 · 2 Full waveform inversion is also sometimes known as full waveform tomography or waveform/field inversion. Although

INTEGRAL MODELLING OF PROPAGATION OF INCIDENT WAVES … · 2019-02-14 · of the academic community and industry is FWI (Full Waveform Inversion) [1]. In FWI, full wave propagation

Suprasalt model building using full-waveform inversion › hubfs › DM FWI Campaign › tle38030214.1... · 2020-05-28 · deals with suprasalt model building by using desalt inputs

Q Estimation Through Waveform Inversion...waveform inversion. Conclusions In this paper we expand the application of waveform inversion on Q estimation. The approach uses raw seismic

Image-guided full waveform inversion - CWP Home · 2019. 8. 31. · Full waveform inversion (FWI) (Tarantola, 1984; Pratt, 1999) uses recorded seismic data d to estimate parame-ters