Professional CAE Product Development | GTC...
Transcript of Professional CAE Product Development | GTC...
By : Veenus A V, Associate GM & Lead NeST-NVIDIA Center for GPU computing, Trivandrum, India
Office: NeST/SFO Technologies, San Jose, CA, www.nestsoftware.com
veenusav @ gmail. com
“ Do not simply believe in anything because
you have heard it.
› No matter that if I
have told it !
Believe only after you observe and
analyze. ”
Reference: Anguttara Nikaya, Vol 1, 188-193
Sri Buddha
Application Architecture policies
› Scientific Visualization
Software blends with the platform
Demands of modern users
Proof of Concept >> Product
We are showing a few technical
experiments for your understanding.
Not a PRODUCT demonstration!
Pre Solver Post
The data structures are plain to process
› May be a few arrays. An under graduate
can understand all these in plain form.
Graphics is not that vast
› Compared to a typical game, it is a simple
deal. Na?!!
A bit serious results – Users will adjust!
.. Let me explain about our background
before continuing ..
It will reveal the way how we
are proceeding so…!
We make
specific
software
solutions for
your scientific
needs.
We are specialized in engineering software development.
NeST-NVIDIA center for GPU computing
› Lab specifically for GPU based technologies
› Inaugurated by Dr. Bill Dally –chief scientist NVIDIA
How to architect the software for your
futuristic hardware and software..
Proof-of-concept to Product
Not giving emphasis on:
› Features of the applications
› Algorithms
Pre Solver Post
In focus: Scientific data visualization
`
Multi physics
Solver
Outer
surfaces Volume
(v or thd)
Results
Volume
(tensor)
Shapes
and
geometry Display
Frame
(image) Analysis model
(boundary & other
params)
`
Multi physics
Solver
Outer
surfaces Volume
(v or thd)
Results
Volume
(tensor)
Known model
(Expert
system db
some cases)
Historical
Experiment
Results
For eg: inverse modeling process
Shapes
and
geometry Display
Frame
(image) Analysis model
(boundary & other
params)
Workstation PC
Multi physics
Solver
Outer
surfaces Volume
tetra
hedron
Known model
(Expert
system db
some cases)
Historical
Experiment
Results
For eg: inverse modeling process
Shapes
and
geometry
PC Display
Frame 1920 x 1080
48.8 KB
591 MB
5.6 GB
154.6GB x 10
5.93 MB
Tablet Display
Frame 1080 X 720
2.22 MB
Results
Volume
2.5 TB 3.2 GB
Workstation PC
CPU
Cores
CPU
RAM
HDD
SSD
SATA
DDR3 PCI Express
Interface Mother board
Bus
GIGABIT
Ethernet
Interface
GPU
Memory
(Global)
GDDR5
GPU
Cache
(2D)
GPU
Cor
es
GPU
Cor
es
GPU
Cor
es
(Shared Memory)
Fast local
Network Intranet User
(Tablet)
Internet Remote User
(Tablet or
Browser)
Global
Memory
Texture Memory
10GB/s
340 MB/s
12 GB/s
5.3 GB/s
42GB/s
350~550 MB/s
70 ~130 MB/s
Algorithms are good.
Mathematics doing fine for centuries…
› Newton’s laws, Maxwell's equations still hold
good.
Proof of concepts might be the best the
world!
The data structures are plain to process
› May be a few arrays. An under graduate
can understand all these at plain form.
Graphics is not that vast
› Compared to a typical game, it is a simple
deal. Na?!!
A bit serious results – Users will adjust!
A popular myth – pci express cannot
give data to monitor.. !
PCIExpress can give good frame rate if
your data is ready in CPU memory
› A lot of points like when you closely watch
the platform facts..
› GPU for FLOPS only…
Multi physics
Solver
Outer
surfaces Volume
tetra
hedron
Known model
(Expert
system db
some cases)
Historical
Experiment
Results
For eg: inverse modeling process
Shapes
and
geometry
PC Display
Frame 1920 x 1080
48.8 KB
591 MB
5.6 GB
154.6GB
5.93 MB
Tablet Display
Frame 1080 X 720
2.22 MB
Results
Volume
2.5 TB 3.2 GB
GPU means - More FLOPS/$, FLOPS/real-estate.
Use GLSL for graphics (SH 5.0 gives you freedom of mesh quality too!)
CUDA syntax is simple, do data flow analysis for maximum throughput
But don’t forget to juice your CPU too!
Offline processing before graphics viewer › Even letting your user to have a coffee before he
starts to analysis.!
› Extra data - Mind HDD space and transfer rate
Spatially order data › viewer will seek like that.
› Processor wait means DELAY! 2D locality of reference
Make an LoD arrangement › User want response not ‘details’ always!
Maximum parallelism, WARP full, threads > cores
Only compute for the device and screen. › Higher resolution is not always needed.
› User wants responsive software
› Pixel shader is your time eater.. Resolution of RT
› GPU utilized for other compute, do these based on real response metrics. Do 2D bicubic instead.
› Texelize .. Texelize….
Read-only data, a knowledge that gives
freedom for GPU cache…
› Use asynchronous system at the maximum
Processor is not the only ‘active’ component in
the board!
Use streams of CUDA or switching of textures…
Its time of BOYD
Do watch software systems on specific
platforms
› For googling: Kepler grid, cloudgaming
Volume viewer – voxelized data
Geometric Editor – Mesh can be perfect!
Preparation for solver - inverse modeling
with GPU (only platform work)
Remote visualization for post processor
Video
Volume resolution and dimensions
› Avoid empty spaces
› Bricking,
› Compression
Quality Graphics demanded
› Phong SM
Video
Algorithm based on Laplacian
The operations involved is as follows. Select a ROI in the mesh on the screen
Draw a sketch on the screen suggesting a edited region of mesh
The model will be reshaped to fit the curve but still retaining the shape.
2D edge tracking to 3D was a challenge
Used modified form of classic algorithms
of CPU.
› In GPU was difficult
› Created regular triangles on the fly to give
neat result
Same area. So isosceles or equilateral
To make the model, real world data used
Huge data inputs › Point cloud, volumetric, high data rate
Inverse modeling techniques used by preparatory algorithm
SVD to avoid non-significant information
Challenge – partial volume correlation
Volume division optimized for maximum threads in gpu and MPI
Model the control flow (limit) as per the locality heuristics (expert system with direction vectors)
Always handle border separate(good for processor)
Each module may not be that fast..! › Win war.. Not every battle…!
Users demand BOYD
Not all features – but subset
KEPLER GRID most awaiting hardware
Features › Html 5 client
› Stream based server
› LoD based RayCaster viewer TO Nvidia iRay
› Serviced on a GPU cluster
Challenge › Time-to-market: Conversion of existing engine
› Multi user support and faster data speed
Proof-of-concept level complexities - Algorithm level research
Development process – How to manage projects which involves scientific stuff and new platform challenges.
Test automation architecture
Deployment scenarios and hardware tune-up at the final level (it is a fact always!)
Remember Kalama Sutta…
› Your questions may transform my
thinking…
Please ask even after the session [email protected]
www.nestsoftware.com
Do write to us on technical and business
queries.
› Speaker: veenusav @ gmail.com
› Website: www.nestsoftware.com
› Business queries: [email protected]