Post on 25-Feb-2016
description
Computational Science at STFC and the Exploitation of Novel
HPC Architectures
Mike Ashworth Scientific Computing Department
andSTFC Hartree Centre
STFC Daresbury Laboratory
mike.ashworth@stfc.ac.uk
•STFC’s Scientific Computing Department
•STFC’s Hartree Centre
•Exploitation of Novel HPC Architectures
•STFC’s Scientific Computing Department
•STFC’s Hartree Centre
•Exploitation of Novel HPC Architectures
HM Government (& HM Treasury)
RCUK Executive Group
Organisation
Joint Astronomy Centre Hawaii
Isaac Newton Group of Telescopes La Palma
UK Astronomy Technology Centre, Edinburgh, Scotland
Polaris House Swindon, Wiltshire
Chilbolton ObservatoryStockbridge, Hampshire
Daresbury LaboratoryDaresbury Science and Innovation CampusWarrington, Cheshire
Rutherford Appleton LaboratoryHarwell Science and Innovation CampusDidcot, Oxfordshire
STFC’s Sites
Understanding our Universe
STFC’s Science Programme
Particle PhysicsLarge Hadron Collider (LHC), CERN - the structure and forces of natureGround based AstronomyEuropean Southern Observatory (ESO), ChileVery Large Telescope (VLT), Atacama Large Millimeter Array (ALMA), European Extremely Large Telescope (E-ELT), Square Kilometre Array (SKA)Space based AstronomyEuropean Space Agency (ESA)Herschel/Planck/GAIA/James Webb Space Telescope (JWST)Bi-laterals – NASA, JAXA, etc.STFC Space Science Technology DepartmentNuclear PhysicsFacility for anti-proton and Ion research (FAIR), GermanyNuclear Skills for - medicine (Isotopes and Radiation applications), energy (Nuclear Power Plants) and environment (Nuclear Waste Disposal)
STFC’s Facilities
Neutron SourcesISIS - pulsed neutron and muon source/ and Institute Laue-Langevin (ILL), GrenobleProviding powerful insights into key areas of energy, biomedical research, climate, environment and security.
High Power LasersCentral Laser Facility - providing applications on bioscience and nanotechnologyHiPERDemonstrating laser driven fusion as a future source of sustainable, clean energy
Light SourcesDiamond Light Source Limited (86%) - providing new breakthroughs in medicine, environmental and materials science, engineering, electronics and cultural heritageEuropean Synchrotron Radiation Facility (ESRF), Grenoble
Major funded activities• 190 staff supporting over 7500 users• Applications development and support • Compute and data facilities and services• Research: over 100 publications per annum• Deliver over 3500 training days per annum• Systems administration, data services, high-performance
computing, numerical analysis & software engineering.
Major science themes and capabilities• Expertise across the length and time scales from processes
occurring inside atoms to environmental modelling
Scientific Computing Department
Director: Adrian WanderAppointed 24th July 2012
The UK National Supercomputing facilities
The UK National Supercomputing Services are managed by EPSRC on behalf of the UK academic communitiesHPCx ran from 2002-2009 using IBM POWER4 and POWER5
HECToR current service 2007-2014Located at Edinburgh, operated jointly by STFC and EPCC HECToR Phase3 90,112 cores Cray XE6 (660 Tflop/s Linpack)
ARCHER is the new service, early access now, service starts 16th Dec‘13, operated by STFC and EPCC (1.37 Pflop/s Linpack #19 TOP500)
10
PRACE PRACE launched 9th June 2010 25 member countries; seat in
Belgium France, Germany, Italy and Spain
have each committed €100M over 5 years
EC funding of €70M for infrastructure Tier-0 infrastructure providing free-of-charge service for European scientific communities based on peer review
Four projects to date: PP, 1IP, 2IP, 3IP overlapping 1IP finished; 2IP extended; 3IP about half way through EPSRC represent UK on PRACE Council; STFC and EPCC carry
out technical work in the PRACE projects STFC focus is on application optimization & benchmarking,
technology evaluation, procurement procedures, training STFC contributes 2.5% BlueGene/Q into the PRACE DECI calls
Scientific Highlights
Journal of Materials Chemistry 16 no. 20 (May 2006) - issue devoted to HPC in materials chemistry (esp. use of HPCx);
Phys. Stat. Sol.(b) 243 no. 11 (Sept 2006) - issue featuring scientific highlights of the Psi-k Network (the European network on the electronic structure of condensed matter coordinated by our Band Theory Group);
Molecular Simulation 32 no. 12-13 (Oct, Nov 2006) - special issue on applications of the DL_POLY MD program written & developed by Bill Smith (the 2nd special edition of Mol Sim on DL_POLY - the 1st was about 5 years ago);
Acta Crystallographica Section D 63 part 1 (Jan 2007) - proceedings of the CCP4 Study Weekend on protein crystallography.
The Aeronautical Journal, Volume 111, Number 1117 (March 2007), UK Applied Aerodynamics Consortium, Special Edition.
Proc Roy Soc A Volume 467, Number 2131 (July 2011), HPC in the Chemistry and Physics of Materials.
Last 5 years metrics:– 67 grants of order £13M– 422 refereed papers and 275 presentations – Three senior staff have joint appointments
with Universities– Seven staff have visiting professorships– Six members of staff awarded Senior
Fellowships or Fellowships by Research Councils’ individual merit scheme
– Five staff are Fellows of senior learned societies
•STFC’s Scientific Computing Department
•STFC’s Hartree Centre
•Exploitation of Novel HPC Architectures
Opportunities
Political Opportunity
Business Opportunity
Scientific Opportunity
Technical Opportunity
• Demonstrate growth through economic and societal impact from investments in HPC
• Engage industry in HPC simulation for competitive advantage
• Exploit multi-core
• Exploit new Petascale and Exascale architectures
• Adapt to multi-core and hybrid architectures
• Build multi-scale, multi-physics coupled apps
• Tackle complex Grand Challenge problems
Tildesley Report
BIS commissioned a report on the strategic vision for a UK e-Infrastructure for Science and Business.
Prof Dominic Tildesley led the team including representatives from Universities, Research Councils, industry and JANET. The scope included compute, software, data, networks, training and security.
Mike Ashworth, Richard Blake and John Bancroft from STFC provided input.
Published in December 2011. Google the title to download from the BIS website
Government Investmentin e-infrastructure - 2011
17th Aug 2011: Prime Minister David Cameron confirmed £10M investment into STFC's Daresbury Laboratory. £7.5M for computing infrastructure
3rd Oct 2011: Chancellor George Osborne announced £145M for e-infrastructure at the Conservative Party Conference
4th Oct 2011: Science Minister David Willetts indicated £30M investment in Hartree Centre
30th Mar 2012: John Womersley CEO STFC and Simon Pendlebury IBM signed major collaboration at the Hartree Centre
Clockwise from top left
Intel collaborationSTFC and Intel have signed an MOU to develop and
test technology that will be required to power the supercomputers of tomorrow.
STFC and Intel have signed an MOU to develop and test technology that will be required to power the supercomputers of tomorrow.
Karl Solchenbach, Director of European Exascale Computing at Intel said "We will use STFC's leading expertise in scalable applications to address the challenges of exascale computing in a co-design approach."
Collaboration with Unilever
1st Feb 2013: Also announced was a key partnership with Unilever in the development of Computer Aided Formulation (CAF)Months of laboratory bench work can be completed within minutes by a tool designed to run as an ‘App’ on a tablet or laptop which is connected remotely to the Blue Joule supercomputer at Daresbury.
John Womersley, CEO STFC, and Jim Crilly, Senior Vice President,
Strategic Science Group at Unilever
This tool predicts the behaviour and structure of different concentrations of liquid compounds, both in the bottle and in-use, and helps researchers plan fewer and more focussed experiments.
The aggregation of surfactant molecules into micelles is an important process in product
formulation
TOP500(#13 in Jun 2012 list)#23 in the Nov 2013 list#8 in Europe#2 system in UK
6 racks • 98,304 cores• 6144 nodes• 16 cores & 16 GB
per node• 1.25 Pflop/s peak
1 rack to be configured as BGAS (Blue Gene Advanced Storage)
• 16,384 cores• Up to 1PB Flash memory
Hartree Centre IBM BG/Q Blue Joule
Hartree CentreIBM iDataPlex Blue Wonder
TOP500(#114 in Jun 2012 list)#283 in the Nov 2013 list
8192 cores, 170 Tflop/s peaknode has 16 cores, 2 socketsIntel Sandy Bridge (AVX etc.)
252 nodes with 32 GB4 nodes with 256 GB12 nodes with X3090 GPUs
256 nodes with 128 GBScaleMP virtualization software up to 4TB virtual shared memory
Hartree CentreDatastore
Storage:
5.76 PB usable disk storage15 PB tape store
Hartree CentreVisualization
Four major facilities:
Hartree Vis-1: a large visualization “wall” supporting stereo Hartree Vis-2: a large surround and immersive visualization systemHartree ISIC: a large visualization “wall” supporting stereo at ISIC Hartree Atlas:a large visualization “wall” supporting stereo in the Atlas Building at RAL, part of the Harwell Imaging Partnership (HIP)
Virtalis is the hardware supplier
Home of the 2nd most powerful
supercomputer in the UK
Douglas Rayner HartreeFather of Computational Science• Hartree–Fock method• Appleton–Hartree equation• Differential Analyser• Numerical Analysis
Douglas Hartree with Phyllis Nicolson at the Hartree Differential Analyser at Manchester
University
“It may well be that the high-speed digital computer will have as great an influence on civilization as the advent of nuclear power” 1946
Douglas Rayner Hartree PhD, FRS (1897 –1958)
Hartree Centre Official Opening
1st Feb 2013: Chancellor George Osborne and Science Minister David Willetts opened the Hartree Centre and announced a further £185M of funding for e-Infrastructure£19M for the Hartree Centre for power-efficient computing technologies£11M for the UK’s participation in the Square Kilometre Array
George Osborne opens the Hartree Centre, 1st February 2013
This investment forms part of the £600 million investment for science announced by the Chancellor at the Autumn Statement 2012.
“By putting out money into science we are supporting the economy of
tomorrow.”
Work with us to:
• Sharpen your innovation
• Improve your global competitiveness
• Reduce research and development costs
• Reduce costs for certification
• Speed up your time to market
£19M investment in power-efficient technologies• System with latest NVIDIA Kepler GPUs• System based on Intel Xeon Phi• System based on ARM processors• Active storage project using IBM BGAS• Dataflow architecture based on FPGAs• Instrumented machine room
Power-efficient Technologies ‘Shopping List’
Systems will be made available for development and evaluation projects with Hartree Centre partners from industry, government and academia
•STFC’s Scientific Computing Department
•STFC’s Hartree Centre
•Exploitation of Novel HPC Architectures
Accelerators in the TOP500
Where are the accelerators?
TOP500 1-10 accelerator based systems:#1 Tianhe-2 33.8 Pflop/s Intel Xeon Phi
3120k (2736k) cores#2 Titan 17.6 Nvidia K20x
560k (262k) #6 Piz Daint 6.27 Nvidia K20x
116k (74k)#7 Stampede 5.17 Intel Xeon Phi
462k (366k)
UK National resources:STFC Hartree Centre IBM iDataPlex has 48 Nvidia Fermi GPUs
UK Regional resources:e-Infrastructure South: EMERALD consists of 372 Fermi GPUs
Local resources:University and departmental clustersUnder your desk?
The Power Wall
• Transistor density is still increasing
• Clock frequency is not due to power density constraints
• Cores per chip is increasing, multi-core CPUs (currently 8-16) and GPUs (~500)
• Little further scope for instruction level parallelism
Source: Intel, Microsoft (Sutter) and Stanford (Olukotun, Hammond)
Processor Comparison
Processor Parallelism GB/s Tflop/s Gflop/s/W
Nvidia K40 15x64 288 1.43 6.1
AMD FirePro S10000 2x56x32* 480 1.48 3.9
Intel Xeon Phi 60 x 8 320 1.00 4.5
Intel E5-2667W 16x4 50 0.20 1.3
* single precision cores. double precision is 1/4.
Software Challenges
OpenMP for multi-coreCUDA for Nvidia GPUsOpenCL
OpenACCDirective-based from Cray (& PGI)
OpenMP 4 for accelerators
Fortran/C + MPI
Fundamental challenges is extracting additional parallelism in your application
•Gung-Ho – a new atmospheric dynamical core
•NEMO on GPUs
•DL_POLY on GPUs
•LBM on GPUs
Current Unified Model
100
150
200
250
300
350
400
450
2003
01
2004
01
2005
01
2006
01
2007
01
2008
01
2009
01
2010
01
2011
01
Met OfficeECMWFUSAFranceGermanyJapanCanadaAustralia
2003 2011
Performance of the UM (dark blue) versus a basket of models measured
by 3-day surface pressure errors
From Nigel Wood, Met Office
Met Office Unified Model
‘Unified’ in the sense of using the same code for weather forecasting and for climate researchCombines dynamics on a lat/long grid with physics (radiation, clouds, precipitation, convection etc.)Also couples to other models (ocean , sea-ice, land surface, chemistry/aerosols etc.) for improved forecasting and earth system modelling“New Dynamics” Davies et al (2005)“ENDGame” to be operational in 2013
(17km)
POWER7 Nodes
Perfect scaling
Limits to Scalability of the UM
The problem lies with the spacing of the lat/long grid at the poles
At 25km resolution, grid spacing near poles is 75m
At 10km this reduces to 12m!
The current version (New Dynamics) has limited scalability
The latest ENDGame code improves this, but a more radical solution is required for Petascale and beyond
From Nigel Wood, Met Office
Challenging Solutions
Cube-sphere
Triangles
Yin-Yang
GUNG-HO targets a brand new dynamical core
Scalability – choose a globally uniform grid which has no poles (see below)
Speed – maintain performance at high & low resolution and for high & low core counts
Accuracy – need to maintain standing of the model
Space weather implies a 600km deep modelFive year project 2011-2015Operational weather forecasts around 2020!
From Nigel Wood, Met Office
Globally Uniform Next GenerationHighly Optimized
“Working together harmoniously”
Design considerations
Ford et al, “Gung Ho: A code design for weather and climate prediction on exascale machines”, EASC 2013, to appear in a special edition of the Advances in Engineering Software • Fortran 2003, MPI, OpenMP and OpenAcc• Other models e.g. PGAS, CAF, are not excluded• Indirect addressing in the horizontal to support a wide
range of possible grids• Direct addressing in the vertical• Vertical index innermost is optimal for cache re-use in
CPUs, and can also achieve coalesced memory access in GPUs
Software architecture
The Gung-Ho software architecture is structured into layers communicating via a defined API
• the driver layer (control for one or more models)• the algorithm layer (high-level specification)• the parallelisation system (PSy) (inter-node and
intra-node parallelism, parsing, transformations, hardware-specific code generation)
• the kernel layer (toolkit of algorithm building blocks with directives)
• the infrastructure layer (generic library to support parallelisation, communications, coupling, I/O etc.)
Gung-Ho Single Model architecture
The arrows represent the APIs connect-ing the layers
The direction shows the flow control
A code generator will parse the algorithm layer source code.
Structure
PSyGenerator
AlgorithmGenerator
ParserAlgCode
AlgCode
PSyCode
KernelCodes
Generator
...
ast,invokeInfo=parse(filename,invoke_name=invokeName)
alg=algGen(ast,invokeInfo,psyName=psyName, invokeName=invokeName)
psy=psyGen(invokeInfo,psyName=psyName)
...
Invoking the generator
integrate_one_generate: python ../generator/generator.py -oalg integrate_one_alg.F90 -opsy integrate_one_psy.F90 integrate_one.F90 make integrate_one_generated
rupert@ubuntu:~/proj/GungHoSVN/LFRIC/src/generator$ python generator.py usage: generator.py [-h] [-oalg OALG] [-opsy OPSY] filenamegenerator.py: error: too few arguments
Example (integrate_one)program main ... use integrate_one_module, only : integrate_one_kernel ... call invoke(integrate_one_kernel(x, integral)) ...end program main
PROGRAM main ... USE psy, ONLY: invoke_integrate_one_kernel ... CALL invoke_integrate_one_kernel(x, integral) … END PROGRAM main
Example (integrate_one)
module integrate_one_module use kernel_mod implicit none private public integrate_one_kernel public integrate_one_code type, extends(kernel_type) :: integrate_one_kernel type(arg) :: meta_args(2) = (/& arg(READ, (CG(1)*CG(1))**3, FE), & arg(SUM, R, FE)/) integer :: ITERATES_OVER = CELLS contains procedure, nopass :: code => integrate_one_code end type integrate_one_kernelcontains subroutine integrate_one_code(layers, p1dofm, X, R) ...
Example (integrate_one) MODULE psy USE integrate_one_module, ONLY: integrate_one_code USE lfric IMPLICIT NONE CONTAINS SUBROUTINE invoke_integrate_one_kernel(x, integral) ... SELECT TYPE ( x_space=>x%function_space ) TYPE IS ( FunctionSpace_type ) topology => x_space%topology nlayers = topology%layer_count() p1dofmap => x_space%dof_map(cells, fe) END SELECT DO column=1,topology%entity_counts(cells) CALL integrate_one_code(nLayers, p1dofmap(:,column), x%data, integral%data(1)) END DO END SUBROUTINE invoke_integrate_one_kernel END MODULE psy
Timetable
Further development and testing of horizontal [2013]
Testing of proposals for code architecture [2013]
Vertical discretization [2013] 3D prototype development [2014-2015] Operational around 2020…?
NEMO Acceleration on GPUs using OpenACC
Maxim Milakov, Peter Messmer, Thomas Bradley, NVIDIA
Flat profile, code converted using OpenACC directives
GYRE only – we are looking at more realistic test cases, with ice and land
Tesla M2090 GPUs, Westmere CPUs
Milakov, Messmer, Bradley, GPU Technology Conference, 18th-22nd March 2013
DL_POLY Acceleration on GPUs
Chritos Kartsaklis and Ruairi Nestor, ICHEC, Ilian Todorov and Bill Smith, STFC
CUDA implem-entation of key DL_POLY features:Constraints ShakeLink cell pairsTwo-body forcesEwald SPME
forces DMPC (dimethyl pyrocarbonate) in water, 413896 atoms (test case 4)
DL_POLY Acceleration on GPUs
“Benchmarking and Analysis of DL_POLY 4 on GPU Clusters”Lysaght et alPRACE report
Significant 8x speed-up using cuFFT
MPI code scales to 108 atoms on >105 cores
Pure MPI vs. 2 GPUs on the ICHEC Stokes GPU clusterSodium Chloride, 216000 Ions (test case 2)
3D LBM on Kepler GPUs (1)
Mark Mawson & Alistair Revell, Manchester
Lattice Boltzmann Method (LBM) for solving fluid flow
Focus on memory transfer issues
SKA
STFC is a partner in the Science Data Processor (SDP) work package
Stephen Pickles leads the software engineering task in SKA.TEL.SDP.ARCH.SWE developing the software system-level prototype
We are also contributing to the work to build, run and support the SDP software prototypes on existing production HPC facilities (SKA.TEL.SDP.PROT.ISP).
Work started on 1st November 2013
• New UK Government investment supports a wide range of e-Infrastructure projects (incl. data centres, networks, ARCHER, Hartree)
• The Hartree Centre is ‘open for business’
• We are driving forward application development on emerging architectures
Summary
For more information see http://www.stfc.ac.uk/scd
If you have been … … thank you for listening
Mike Ashworth mike.ashworth@stfc.ac.uk
http://www.stfc.ac.uk/scd