Nci.org.au © National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014...
-
Upload
alysha-sara -
Category
Documents
-
view
221 -
download
5
Transcript of Nci.org.au © National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014...
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
nci.org.au@NCInews
High Performance Computing and High Performance Data: exploring the growing use of Supercomputers in Oil and Gas Exploration & Production
Lesley Wyborn1, Ben Evans1, David Lescinsky2 and Clinton Foster2
16 September 2014
1 National Computational Infrastructure (NCI), 2Geoscience Australia (GA)
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
1. Current drivers for supercomputers in the Oil and Gas Exploration and Production (E & P)
2. Overview of the concepts of:
– High Performance Computing (HPC)
– High Performance Data (HPD)
– Data-intensive Science
3. Present some new research directions in HP environments: are they applicable to Oil and Gas E & P?
4. Discuss advantages of the Oil and Gas Industries, Academia and Government collaboratively working together in Data-intensive Science, but still enabling competitive E & P analytics
5. Key take home messages
Outline
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Potential drivers: relevant facts on Oil and Gas E & P
• ‘Easy’ oil is running out: easily accessible fields are becoming scarcer• People are no longer drilling wildcat wells and hoping for the best• As exploration goes deeper and into harsher environments (e.g., Arctic,
deeper water) the risk of miscalculating drill sites increases• The cost of finding and then bringing discoveries into production are now
substantially higher (e.g., offshore rigs can cost $1,000,000 per day) • Exponentially growing volumes of E & P data are being collected• In all parts of E & P the risks of getting it wrong are far greater than ever
Source: http://www.economist.com/node/16326356 Source: http://media.economist.com/images/images-magazine/2010/10/TQ/201010TQC941.gif
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Background of Paper: Government working with Academia in partnership
• Work done in GA and its predecessors in the management of scientific digital data since 1977
• Collaborative work since 2010 by GA and NCI, in particular research into large-scale, High Performance Data (HPD), High Performance Computing (HPC) and multi-disciplinary Data-intensive Science
• NCI is a partnership between Academia and Government: ANU, Bureau of Meteorology, GA and CSIRO
• Funding of ~ $360M in eResearch Infrastructure by the Australian Government (former Department of Innovation, Industry, Science and Research) since 2007 (2 Petaflop computers, 24,000 node research cloud, ~30 PB of data storage at 8 nodes, data services, networks and 12 virtual laboratories)
Raijin: The NCI 57,000 core Petascale machine (currently No 38 on the Top 500 Supercomputer list)
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
We are entering the 4th Paradigm of Scientific Discovery
~250 BC Archimedes of Syracuse
First paradigm: Thousands of years ago Empirical Science describing natural phenomena Second paradigm:
Last few hundred years: Theoretical Science using models, generalizations
~1650 AD Sir Isaac NewtonSource: http://www.aps.org/publications/apsnews/200908/zerogravity.cfm
Third paradigm: Last few decades: Computational Science cpu intensive or simulating complex phenomena
~1940 AD Alan TuringSource: http://couldhavebeenacoconuttree.wordpress.com/2011/05/07/volume-archimedes-and-the-golden-crown-2/
Source: http://www.rutherfordjournal.org/article040101.html http://www.turing.org.uk/turing/scrapbook/electronic.html
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
The 4th Paradigm of Data- intensive Science
http://research.microsoft.com/en-us/collaboration/fourthparadigm/4th_paradigm_book_complete_lr.pdf
• Concept developed in 2007 by Jim Gray
• Data-intensive Supercomputing is where large volume data stores and large capacity computation are co-located
• Such HP hybrid systems are designed, programmed and operated to enable users to interactively invoke different forms of computation in situ over large volume data collections
• High Performance Data (HPD) is data that is carefully prepared, standardised and structured to be used in Data-intensive Science on HPC
• Very different to compute intensive paradigm – cf the iPhone5 of 2012
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
Australian HPC in Top 500: June 2014
Tier 1 (Top 500)
Tier 2
Tier 3Local Machines and Clusters
Local CondorPools
Based on European Climate Computing Environments, Bryan Lawrence (http://home.badc.rl.ac.uk/lawrence/blog/2010/08/02 ) & Top 500 list June 2014 (http://www.top500.org)
Petascale: >100,000 cores
Intern
al
Terascale: >10,000 cores E
xtern
al
GA usage!!
No 38: NCI (979 TFlops)No 57: LS Vic (715 TFlops) No 181: CSIRO (168 TFlops)
No 266: Pawsey (192 Tflops)
No 363: Defence (162 TFlops)
No 364: Defence (162 TFlops)
Institutional Facilities
Grid,Cloud
Local Machines and Clusters
Local CondorPools
Gigascale: >1,000 cores
No 500 (134.2 TFLOPS)
Tier 0 (Top 10)
Megascale: >100 cores
Desktop:2 – 8 cores
No 10: 3.14 PFLOPS
No 1: 33.86 PFLOPS Tianhe-2 (China)
No 38 (No 11: ENI, No 16: TOTAL)
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Oil and Gas E&P in the global Top 500 Supercomputers list
http://www.top500.org/statistics/perfdevel/
• From the earliest days, Oil and Gas E & P has had a high demand for HPC1
• Developments in geophysical data processing software closely tracked (drove?) developments in HPC architecture1
• Oil and Gas E & P use cases appear on the Top 500 list, but not all users are recorded or identifiable
• June 2013, Pangea (TOTAL) was No 11 (2.09 Pflops)
• June 2014, ENI (Italy) was No 11 (3 Pflops)
• 2005-2012 marks significant shifts in HPC everywhere
• iPhone5 in 2012 was ~80 Gflops
1 http://igcc.ucsd.edu/assets/001/505220.pdf Supercomputing and Energy in china – How Investment in HPC affects Oil Security
No 11 Pangea
No 38 Raijin
iPhone 5S
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
The growth in HPC capacity is no longer driven by increasing the No. of CPU’s
• Moore’s law– Transistor density doubles every 2 years
• Limitations– Power, heat dissipation
– Atomic limits
• Impacts– CPU clock speeds plateaued
– Power wall forced shift to multi-core
– Number of cores increased
– Parallelisation became king
– New algorithms required for parallelism
– Many commercial software does not scale and/or the business model is inappropriate
Sutter 2009 The Free Lunch is over:. http://www.gotw.ca/publications/concurrency-ddj.htm
Slide Courtesy of Brett Bryan CSIRO
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
Source: http://www.ewdn.com/2013/05/21/shell-and-russian-software-developers-team-up-to-focus-on-supercomputing-for-oil-exploration/
New algorithms are being developed for Supercomputing by Oil and Gas E&P
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Supercomputers assisting Oil and Gas E&P
• Supercomputers assist in Oil and Gas E&P in three primary ways: – seismic data processing– reservoir simulation– computational visualisation at all stages of the process from Exploration to Production
(e.g., can produce four-dimensional visualisations that identify how oil, gas and water flow through the reservoir during production that are hard to ‘see’ algorithmically)
Source: http://www.analytics-magazine.org/november-december-2011/695-how-big-data-is-changing-the-oil-a-gas-industry
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
• They help "de-risk" the whole process from exploration to production by– enabling processing and combination of vast amounts of data from well logs, seismic,
gravity and magnetic surveys to produce 3D models of the subsurface
– assisting in identifying drilling locations that maximise the chance of finding exploitable resources and minimise drilling of dry holes
– producing four-dimensional visualisations that identify how oil, gas and water flow through the reservoir during production
– enabling field engineers to plan the optimum layout for producing and injection wells, and to extract residual oil and gas from primary production
– allowing ensemble runs to test multiple scenarios and to quantify uncertainties on all parts of the process from exploration through to production
– above all, enabling integration with non-Oil and Gas data sets to maximise extraction from the subsurface safely and with minimal risks and environmental impacts
Supercomputers “de-risking” Oil and Gas E&P
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Quotes on Supercomputers assisting Oil and Gas E &P…
• They save the industry time:• “Projects that used to take two years now take six months” • “Pangea helped analyze seismic data from TOTAL’s Kaombo project in Angola in just
9 days, or 4months quicker than it would have taken previously”
• They produce better products – “It is like having a bigger lens, so that you get a sharper picture”
• They allow for more interaction within teams– “faster processors allow those collecting the data and the geologists, who interpret
the data, to exchange information and made needed adjustments”
• They open new possibilities:– “BP’s industry-leading development of ‘digital rocks’ .. enable calculating
petrophysical rock properties and modeling fluid flow directly from high-resolution 3D images – at a scale equivalent to 1/50th of the thickness of a human hair”
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Impact on cost-benefit analysis
• Supercomputers can change an oil/gas company’s cost-benefit calculations by: – allowing it to process data more quickly
– creating a more accurate model with fewer assumptions that help pinpoint the best drilling location, thus reducing the number of dry holes
– monitoring changes in a site/field over time
– “de-risking” the process to make drilling in complex environments more affordable and safer
http://subseaworldnews.com/wp-content/uploads/2013/05/Jasper-Explorer-Starts-Drilling-1st-Well-for-CNOOC-Congo-SA.jpg
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
http://www.top500.org/statistics/perfdevel/
In HPC parallelising code is only one part of it
No 11 Pangea
• The elephant in the room is data access
• The needs to be a balance between processing power and ability to access data (data scaling)
• The focus in no longer on feeds and speeds
• The focus is for on-demand direct access to large data sources
• It is now on content and on enabling HPC analytics directly on that content
No 38 Raijin
iPhone 5S
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
Local
Increase Model Complexity
Timescale
Speed up Data accessIncrease Data Resolution
Increase Model Sizeand Data types
Self describing data cubes and data arraysUse higher resolution data
Monte Carlo Simulations, multiple ensemble runs
Petascale
Terascale
Giga
Single passes at larger scales: integrate more data types Use longer duration runs:
use more and shorter time intervals
Based on European Climate Computing Environments, Bryan Lawrence (http://home.badc.rl.ac.uk/lawrence/blog/2010/08/02 )
Ways to better utilise HPC capacity and transition to petascale computing
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
DataAccessibility Tools
Bandwidth
High PerformanceComputing Infrastructures
The High Performance systems tetrahedron in balance
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
DataAccessibility
Tools, Codes
Bandwidth
High PerformanceComputing Infrastructures
The High Performance systems tetrahedron in 2014
Totally out of balance!
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
HPD is now an essential prelude to Data-intensive Science– We have new opportunities to process large volumes of data at resolutions and at scales never before possible
– But data volumes are growing exponentially: scalable data access is increasingly difficult
– Traditional data find/download technologies are well past their effective limit for Data-intensive Science
– ‘Big data IS the new oil but unrefined it is of little value: it must be refined, processed and analysed’
– We need to convert ‘Big data’ collections into High Performance Data (HPD) by• Aggregating data into seamless ‘pre-processed’ data products• Creating hyper-cubes and self describing data arrays
Source: http://www.tsunami.org/images/student/art/hokusai.jpg
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
• A research project with 15 Years of Landsat Data (1998-2012) funded by the Department of Innovation, Industry, Science and Research
• The Landsat cube arranges 636,000 Landsat Source scene spatially and temporally to allow flexible but efficient large-scale analysis
• The data is partitioned into spatially-regular, time-stamped, band-aggregated tiles which can be presented as temporal stacks.
Spatially partitioned tiles Temporal Stack
Creating HPD collections: eg the Landsat Cube
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Current Landsat Holdings as HPD
636,000 Landsat Source Scenes(~52 x 1012 Pixels)
4M Spatially-Regular Time-Stamped Tiles (0.5 PB)
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
• Sampled 1,312,087 25 tiles => 21x1012 pixels
• Water detection over15 Years from 1998-2012
• High 25m Nominal Pixel Resolution
• Actual data can be sampled at national or local farm scale
High-Resolution, Multi-Decade, Continental-Scale Analysis of HPD
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Can we created an equivalent HPD array of seismic reflection data?
• What would a calibrated HPD array of all Australian Seismic data look like?
• That is, direct access to actual data content rather than to metadata on files of data which then need to be downloaded, integrated and processed locally
• Such an array could be sampled and processed directly at a national, basin or prospect scale
• And then integrated with HPD full resolution point clouds of magnetic, gravity and magneto-telluric survey data
Source: http://www.ga.gov.au/__data/assets/image/0003/15645/13-7749-8-allstates.jpg
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Realities of HPD Data Collections
• HPD collections are just too large to move – Bandwidth limits the capacity to move them: data transfers are too slow– Even if they can be moved, few can afford to store them locally: the energy costs are also substantial
• HPD is about moving processing to the data, moving users to the data and about having online applications to process the data
• HPD enables cross-domain integration
• Domain-neutral international standards for data collections and interoperability are critical for allowing complex interactions in HP environments both within and between HPD collections
• HPD enables scalable data access but also means rethinking the algorithms of the data (not again?)
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
The Oil and Gas Industry can take a bow
• They have been amongst the leaders in development of global standardised formats (e.g., SEG-Y)
• Energistics have driven the next generation of the ISO 19115 metadata standard – THE global standard for discovery of geospatial data
• Energistics are a global consortium that facilitates the development, management and adoption of data exchange standards for the upstream oil and gas industry (e.g., WITSML, PRODML and RESQML)
• In 2012, the Oil and Gas Industries formed the Standards Leadership Council which links many oil and gas standards bodies as well as the OGC and the SEG
• But these standards may need to evolve to increase uptake of data in HP environments, particularly at exascale
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Rethinking Hardware Architectures for Data-intensive Science
• Work at NCI has highlighted the need for balanced systems to enable Data-intensive Science including:– Interconnecting processes and high throughput to reduce
inefficiencies– The need to really care about placement of data resources – Better communications between the nodes– Large persistent storage (on spinning disk) in addition to the
traditional ‘scratch spaces’– I/O capability to match the computational power
• NCI’s I/O speed to persistent storage is ~50 GBytes/sec and to scratch is ~120 Gbytes/sec
– Close coupling of cluster, cloud and storage
• Since starting on the NCI/GA Data-intensive journey in 2010 software is being progressively rewritten: data and hardware architectures also need to change to create balanced systems
NCI’s Integrated High Performance Environment
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Warning: Exascale is just around the corner
http://www.top500.org/statistics/perfdevel/
No38 Raijin
Next NCI ??
No11 Pangea
Next Pangea or ENI??
• Addressing the data access problem is the highest priority as Supercomputing heads towards exascale
• Climate/Weather research are already there: can we learn from these academic communities?
• Looking backwards: the capacity of an iPhone 5 today is equivalent to Supercomputers of 1995
• Looking forwards: we are starting to slip….
Exascale
Petascale
Terascale
Gigascale
iPhone 5S
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Energy: the main limitation on growth in HP environments
• Future HP systems will be energy-limited (both storage and HPC): are we reaching the tops in flops?
• Architecture will matter more: energy efficiency will be achieved through careful architecture
• Increased performance will be determined by new algorithms and far more efficient data access
http://www.top500.org/statistics/perfdevel/
Top 200: June 2014
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Future HPC Challenges for the Oil and Gas E&P
• Future HPC challenges for everyone are: – Power, programmability and scalability: programming needs to be at an extreme scale, using massive
parallelism
– Data movement is THE current bottleneck: the precise and efficient flow of data will take center stage and hardware that will enable that level of control will become critical
– Balanced systems will be crucial (architecture, software, data access)
• Specific HPC challenges for Oil and Gas E & P are:– There will probably need to be a transition to high volume collaborative high performance data
stores against which competitive algorithms can be deployed by individual companies
– The competitive advantage will now no longer be in what data a company holds: the advantage will be in smarter proprietary algorithms that are applied to collaborative HPD collections that are closely sited next to HPC
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Can we do this as a 3-way collaboration?
Industry: Driving developments
in HPD/HPC
Government Agencies: (Data Rich)
Academia:Cutting edge HPC/HPD research, particularly scaling to exascale Data-intensive Science
?
?
Governm
ent and Academia in partnership in
HPC
and HPD
: developing new approaches
to in-situ Data-intensive Science
Academia and Industry in partnerships on HPC developing new systems
Indu
stry
and
Gov
ernm
ent i
n pa
rtner
ship
for t
radi
tiona
l da
ta s
uppl
y
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
© National Computational Infrastructure 2014
Take home messages for Oil and Gas E&P
• HPC is now an integral part of the Oil and Gas E&P: there IS capacity for way more growth
• Moving to HPD collections will be key to enable data to be integrated and processed at high resolutions to give more accurate models and predictions
• The Oil and Gas Industry will need to continue to drive standards globally and ensure they can are compatible with the rapidly on-coming exascale HPC/HPD environments
• Three way partnerships need to be investigated (Government, Industry, Academia)
• Then together we can continue to drive ‘New Oil’ from ‘Raw Materials’ via standard specifications to feed HPC environments to ‘fuel’ quality assessments and provide burning new insights to support environmentally sustainable and safe development of our Oil and Gas Resources
nci.org.au© National Computational Infrastructure 2014 HPC and HPD in E&P Perth, September 2014
nci.org.au@NCInews
Any Questions?
Source: http://www.analytics-magazine.org/november-december-2011/695-how-big-data-is-changing-the-oil-a-gas-industry
Dr Lesley [email protected]
Dr Ben Evans [email protected]
Dr David [email protected]
Dr Clinton [email protected]