The application of GPU system in the field of seismic data...
Transcript of The application of GPU system in the field of seismic data...
The application of GPU system in the
field of seismic data processing
Xiangyang Zhang,System Engineer,NWGI
2012/03
Reverse Time Migration,and etc,can provide more
accurate seismic imaging result
Reverse Time Migration is not widely used
— limited calculating and storaging amount
— Can only be run in super computing center
Put forward a new parallel calculating method that is
to use the GPU Cluster for rapid calculating 3D RTM
— GPU Cluster for rapidly calculating 3D RTM.
— A distributed file system based on Infiniband network
Computation capability has improved scores of times
more, compared to the current PC Cluster
GPU accelerated Seismic imaging
Current situation of seismic data processing
GPU is the best choice
Application of GPU in seismic data processing
Our Achievements
Overview
Resource requirements of Seismic data processing methods
Decoding
preprocessing
deconvolution
Static correction method DM
O
data stack
poststack migration
Prestack time migration
Kirchhoff PSDM
Reverse Time Migration
further processing
Few large-scale industrial
applications because of
the calculating amount
Ⅰ、 Situation of seismic data processing
Current Situation of Seismic data processing methods
Decoding
preprocessing
deconvolution
Static correction method DM
O
data stack
poststack migration
Prestack time migration
Kirchhoff PSDM
Reverse Time Migration
further processing
Using 80% of computing resource
Very few large-scale industrial applications
Ⅰ、 Situation of seismic data processing
Computation-intense methods in seismic data imaging :
Asymmetric Travel-time Pre-Stack Time Migration
One-way wave equation Pre-Stack Depth Migration
Reverse Time Migration
Full waveform inversion
Ⅰ、 Situation of seismic data processing
Features of the above methods:
Intensive computation
High concurrency
Difficult to achieve industrial application
Ⅰ、 Situation of seismic data processing
The ability of existing PC cluster can not meet the
need of the above seismic data processing in most
computing center,we need a new technique to
improve the computing power,GPU provides us a good
solution。
Ⅰ、 Situation of seismic data processing
Current situation of seismic data processing
GPU is the best choice
Application of GPU in seismic data processing
Our Achievements
Overview
Cache
ALU
Control
ALU
ALU
ALU
DRAM
CPU micro architecture
DRAM
GPU micro architecture
Ⅱ、 GPU is the best choice
Large number of ALU and powerful computing capabilities of
floating point
Suitable for the methods which have the character of
intensive computation、high concurrency and large volume
of data
A. Powerful computing capability
Easier to Program with Single Address Space
Reduce unnecessary data copying
Multiple Memory Spaces
( S1070 + CUDA3 )
Single Address Space
( S2090 + CUDA4 )
GPU GPU CPU
MemMemMem
GPU GPU CPU
MemMemMem
Ⅱ、 GPU is the best choice
B. Unified Virtual Addressing
GPU GPU CPU
MemMemMem
C. GPU Direct 2.0
Only one copy required
( S2090 + CUDA4 )
Two copy required
( S1070 + CUDA3 )
GPU GPU CPU
MemMemMem
Ⅱ、 GPU is the best choice
Reduce memcpy() processing
Lower cost of copying
D. L1、L2 cache
Ⅱ、 GPU is the best choice
L2
Global Memory
Registers
L1
SM-N
SMEM
Registers
L1
SM-0
SMEM
Registers
L1
SM-1
SMEM
Improve bandwidth
Reduce latency
Coherent data sharing
Current situation of seismic data processing
GPU is the best choice
Application of GPU in seismic data processing
Our Achievements
Overview
Flowchart of Kirchhoff Pre-stack Depth migration
(Kirchhoff PSDM) based on GPU
Copy the data of
detection point from CPU
Memory to GPU Memory
Filtering
Copy imaging result from
GPU Memory to CPU Memory
finite difference
in X direction
finite difference
in Y direction
compensation of
phase error
convolution
Number of parallel
threads:nx*ny*nw
Number of parallel
threads:ny*nw
Number of parallel
threads:nx *nw
Number of parallel
threads:nw
Number of parallel
threads:nx*ny*nw
Number of parallel
threads:ny*nw
Number of parallel
threads:nx *nw
Number of parallel
threads:nw
Number of parallel
threads:nw
Copy the data of shot
from CPU Memory to
GPU Memory
Filtering
finite difference
in X direction
finite difference
in Y direction
compensation of
phase error
CPU
CPU
GPU
Ⅲ、 Application of GPU in seismic data processing
Load parameters
Load data of one shot
Define the range of
migration aperture
Define interval of wavefields time
extrapolation
Load velocity model
Generate sub wave of
earthquake
Add random boundary
Finished?
Load sub wave of
earthquake and random
boundary in to GPU Memory
Forward modeling the wave filed of
earthquake to the maximum time
using the Finite Difference mothed
(GPU)
Back propagation of the
last two wave fileds
athe same time(GPU)
Imaging condition
(GPU)
Copy imaging result
from GPU Memory
to CPU Memory
Denoising
Export imaging result
Merging the
result data
Regularization of
one shot data
Load regular shot data
and velocity model in to
GPU Memory
Back propagation of single-
shot data and return
underground(GPU)
NO
Flowchart of Reverse Time Migration based on GPU
Ⅲ、 Application of GPU in seismic data processing
Load parameters
Load data of one shot
Define the range of
migration aperture
Define interval of wavefields time
extrapolation
Load velocity model
Generate sub wave of
earthquake
Add random boundary
Finished?
Load sub wave of
earthquake and random
boundary in to GPU Memory
Forward modeling the wave filed of
earthquake to the maximum time
using the Finite Difference mothed
(GPU)
Back propagation of the
last two wave fileds
athe same time(GPU)
Imaging condition
(GPU)
Copy imaging result
from GPU Memory
to CPU Memory
Denoising
Export imaging result
Merging the
result data
Regularization of
one shot data
Load regular shot data
and velocity model in to
GPU Memory
Back propagation of single-
shot data and return
underground(GPU)
NO
The main part of the
data exchange in RTM
Ⅲ、 Application of GPU in seismic data processing
Flowchart of Reverse Time Migration based on GPU
Characters of above Methods:
High imaging accuracy
Enormous computation
High concurrency
large amount of data exchange
Ⅲ、 Application of GPU in seismic data processing
Large amount of computing capability
High-speed Network
Great throughput of storage device
Needs of above Methods:
Ⅲ、 Application of GPU in seismic data processing
PC cluster
Traditional development method, Reverse-time migration ,
and etc, can only be run in the Supercomputing Center
GPU cluster
New development method, with less costs to achieve more
computing power than PC cluster
A、Large amount of computing capability
Ⅲ、 Application of GPU in seismic data processing
Ⅲ、 Application of GPU in seismic data processing
Computation capability of GPU improves very fast, and can meet the
need of computation-intense methods well. We have used Tesla S1070
and Tesla S2090 GPU system in our institute.
16
2
4
6
8
10
12
14
DP G
FLO
PS p
er
Watt
T10 Fermi
Kepler
Maxwell
2007 2009 2011 ……
Running environment of S1070 and CUDA3:
Migration run independently on a single GPU card
Available GPU memory is up to 4GB
Accuracy of Imaging is restricted by the capacity of memory
Reducing memory usage by optimizer
Ⅲ、 Application of GPU in seismic data processing
Share memory
Registers Registers
Thread(0,0) Thread(0,1)
Local memory
Local memory
Very few ,so we must
control the usage of
registers, and improve
utilization
Slow but big enough to
load the imaging result
small but has cache,and
read-only,can be used
to transfer parameters
Has a cache and can
be used to load velocity
file
Block(0,0)
Grid
Texture memory
Global memory
Constant memory
Usage of the GPU memory
Ⅲ、 Application of GPU in seismic data processing
Running environment of S2090 and CUDA4:
Unified Virtual Addressing
GPU Direct 2.0
Available GPU memory is up to 6GB
Accuracy of Imaging will not be restricted by
the capacity of memory
Finished?
Forward modeling the wave
filed of earthquake to the
maximum time using the
Finite Difference mothed
(GPU)
Back propagation of the
last two wave fileds
athe same time(GPU)
Imaging condition
(GPU)
Back propagation of single-
shot data and return
underground(GPU)
Ⅲ、 Application of GPU in seismic data processing
Number of CPU cores must be more than the number
of GPU card
Number of CPUs can support enough memory
Configuration about the number of CPU and GPU
Ⅲ、 Application of GPU in seismic data processing
Infiniband network provides higher bandwidth
Infiniband network is more efficient
RDMA technology:
Remote Direct Memory Access (RDMA) can
improve throughput and performance because it
frees up resources
Ⅲ、 Application of GPU in seismic data processing
B. Very efficiency network
Denoising
Export imaging result
Merging the result
data
Merging of result data is directly
related to read-write speed of
storage equipment, usually the
amount of data is about 10 TB。
Ⅲ、 Application of GPU in seismic data processing
C. Great throughput of storage equipment
Memory capacity
Memory capacity determines the times of scanning
result data ,also determines the final amount of data
to read and write. Thereby We hope that the Memory
is big enough to load all the result data, and more
efficient by reducing the number of cycles of data
consolidation.
Ⅲ、 Application of GPU in seismic data processing
Ⅲ、 Application of GPU in seismic data processing
GPU Cluster architecture of our institute
CPU:Dual-CPU(24*2*6 = 228 cores)
GPU:S2090 GPU(24*4*512 = 49152 cores)
Network:Infiniband(RDMA technology)
Storage:Lustre file system
Memory:2304GB (96GB*24=2304GB)
Hardware configuration:
Ⅲ、 Application of GPU in seismic data processing
CUDA4.0
Unified Virtual Addressing
GPU Direct 2.0
Software Configuration:
Ⅲ、 Application of GPU in seismic data processing
Current situation of seismic data processing
GPU is the best choice
Application of GPU in seismic data processing
Our Achievements
Overview
A. Comparison of One-way wave equation Pre-stack depth migration
between PC cluster and GPU cluster System S1070 GPU Cluster(12 nodes) PC Cluster(60 nodes)
Software
Configuration
CUDA3
4 threads per node
C++
8 threads per node
Hardware
configuration
GPU: S1070
Memory: 24GB
CPU: one CPU
(Intel(R) Xeon(R) X5570
2.93GHz)
Network: Gigabit
Storage: Scale-out NAS
Memory: 16GB
CPU: Dual-CPU
(Intel(R) Xeon(R) E5440
2.83GHz)
Network: Gigabit
Storage: Scale-out NAS
Test re
sults
Area(Km2) 300
amount of data(Gb) 487
Interval (ms) 2
FLOD 120
length of trace gather(ms) 6000
cell(m) 25x25
Number of shot 29328 30678
Total time 190.54 hours 402 hours
Average time
of each shot 0.39 minutes 0.786 minutes
Ⅳ、Our Achievements
The computing power of the GPU node is 10 times larger than the PC node
B. Comparison of Reverse time migration between
PC cluster and GPU cluster
System S1070 GPU Cluster(12 nodes) PC Cluster(60 nodes)
Software
Configuration
CUDA3
4 threads per node
C++
8 threads per node
Hardware
configuration
GPU: S1070
Memory: 24GB
CPU: one CPU
(Intel(R) Xeon(R) X5570 2.93GHz)
Network: Gigabit
Storage: Scale-out NAS
Memory: 16GB
CPU: Dual-CPU
(Intel(R) Xeon(R) E5440
2.83GHz)
Network: Gigabit
Storage: Scale-out NAS
Test re
sults
Area(Km2) 300
amount of data(Gb) 487
Interval (ms) 2
FLOD 120
length of trace gather(ms) 6000
cell(m) 25x25
Number of shot 29328 30678
Total time 205 hours 1008 hours
Average time
of each shot 0.42 minutes 1.97 minutes
Ⅳ、Our Achievements
The computing power of the GPU node is 23 times larger than the PC node
C. Comparison of Reverse time migration between
different GPU clusters
System S1070 GPU Cluster(12 nodes) S2090 GPU Cluster(12 nodes)
Software Configuration CUDA3
4 threads per node
CUDA 4
4 threads per node
Hardware configuration
GPU: S1070
Memory: 24GB
CPU: one CPU
(Intel(R) Xeon(R) X5570)
Network: Gigabit
Storage: Lustre file system
GPU: S2090
Memory: 96GB
CPU: Dual-CPU
(Intel(R) Xeon(R) CPU X5570)
Network: Infiniband
Storage: Lustre file system
Test re
sults
Area(Km2) 400
amount of data(Gb) 159
Interval (ms) 2
FLOD 70
length of trace gather(ms) 6000
cell(m) 25x25
Number of shot 11520
Depth of extrapolation 5000
Time of extrapolation 0.4ms
Average time of
each shot 0.159 minutes 0.087 minutes
Ⅳ、Our Achievements
The computing power of S2090 is 1.83 times larger than S1070
Ⅳ、Our Achievements
Comparison of Reverse time migration in some area
One-way wave equation Pre-stack depth migration
Reverse Time Migration
RTM method can provide more accurate
seismic imaging result
GPU system can make the RTM method
easier to achieve industrial application
Conclusion