THE SQUARE KILOMETER ARRAY (SKA) Use Case for... · Discovery and Data Mining ETP4HPC workshop, 23...
Transcript of THE SQUARE KILOMETER ARRAY (SKA) Use Case for... · Discovery and Data Mining ETP4HPC workshop, 23...
THE SQUARE KILOMETER ARRAY (SKA)
ESD USE CASE
Ronald Nijboer
Head ASTRON R&D Computing Group
1ETP4HPC workshop, 23 June 2016
With material fromChris Broekema (ASTRON)John Romein (ASTRON)Nick Rees (SKA Office)Miles Deegan (SKA Office)John Taylor (U. of Cambridge)Michael Wise (ASTRON)
ASTRON, Offices & Locations
2ETP4HPC workshop, 23 June 2016
GroningenLOFAR CEP
WesterborkWSRT
Borger-Odoorn, ExlooLOFAR core
DwingelooLOFAR / WSRT OperationsR&D, ScienceJIVE, NOVA
Radio Astronomy
3ETP4HPC workshop, 23 June 2016
Doppler shift 21 cm lineGalaxy M81
Square Kilometer Array (SKA)
4ETP4HPC workshop, 23 June 2016
SKA: one Observatory (HQ in UK), two sites (South-Africa & Australia)
SKA: big scientific questions
5ETP4HPC workshop, 23 June 2016
Testing gravitation
Epoch of Reionisation
Cosmic Magnetism
Cradle of lifeLarge scale structures
Turbulent Universe
SKA Context Diagram
6ETP4HPC workshop, 23 June 2016
SDP is off-site! (Perth & Cape
Town)
The Science Data Processor transforms theSignals into Science Data Products
Regional Science Centers
ETP4HPC workshop, 23 June 2016 7
Regional Centers are proposed for‘doing Science’ with the SKA Data Products
RSC Functionality
ETP4HPC workshop, 23 June 2016 8
Data MiningData ProcessingData Discovery Observation database Associated metadata Quick-look data products Flexible catalog queries Integration with VO
tools Publish data to VO
Reprocessing and calibration High resolution imaging Mosaicing Source extraction Catalog re-creation DM searches
Multi-wavelength studies Catalog cross-matching Light-curve analysis Transient classification Feature detection Visualization
RSC Requirements
Regional Science Centers are being discussed and planned for
Requirements do not exist yet
H2020 project Aeneas submitted
Likely RSCs will be different in different locations
SKA SDP type processing will be needed, as well as Data Discovery and Data Mining
9ETP4HPC workshop, 23 June 2016
Design and specification of a distributed, European Science Data Centre (ESDC) to support the pan-European astronomical community in achieving the scientific goals of the SKA
SDP Key Performance Requirements
ETP4HPC workshop, 23 June 2016 10
SDP Local Monitoring & Control
High Performance• ~100 PetaFLOPS
Data Intensive• ~100 PetaBytes/observation
(job)
Partially real-time• ~10s response time
Partially iterative• ~10 iterations/job (~6hour)
CSP
Observatory
High Volume & High Growth Rate• ~100 PetaByte/year
Infrequent Access
• ~few times/year max
Data Processor Data Preservation
Delivery System
Data Distribution•~100 PetaByte/year from Cape Town & Perth to rest of World
Data Discovery•Visualisation of 100k by 100k by 100k voxel cubes
~1 Tbytes-1 ~10 Gbytes-1
~200Gbytes-1
SDP Functional Breakdown
ETP4HPC workshop, 23 June 2016 11
Data Parallelism
ETP4HPC workshop, 23 June 2016 12
Frequency
Time & baseline
o Data parallelism: Dominated by frequency
o Provides dominant scalingo Nothing more needed if each processing
node can manage a frequency channel complete processing
Processing nodes
Visibility data
Exploit frequency independence
Grid and de-
grid
FFT
Buffered UV data
A lot of the processing is embarrasingly (data) parallel, but …… there will be synchronisation points where data needs to be combined
SDP Compute Requirements
~50 PFLOPS total sustained, max
FFT and Gridding dominant
Mixed precision
Achieve 10-15% of peak now
Large fast working memory (~2 FLOP/byte)
Can exchange memory for FLOPs using facetting
Fast Storage
~3 Tb/s write, ~30 Tb/s read
~ 13000 FLOPS/byte read
~5MW per site
13ETP4HPC workshop, 23 June 2016
SDP Compute Characteristics
Few, well known applications -> co-design
Trivially parallel workloads, baseline architecture leveragesthis
Low arithmatic intensity, thus I/O bound
Pseudo real-time + fast storage + batch processing
Tight budgets (energy, capital and ops)
14ETP4HPC workshop, 23 June 2016
Current Timeline
2013 – 2017 SKA Pre-Construction
2018 – 2022 SKA Construction
2020 Start Early Science
2023 Start Full Operations
15ETP4HPC workshop, 23 June 2016
Conclusions
SKA is a huge computational challenge
RSCs in the process of being defined
SDP ~ 50 Pflop (sustained), 5 MW
Power is also a major driver.
Software complexity is also beyond what has been achieved in astronomy previously.
Traditional HPC is not a good match because the problem is bandwidth dominated.
SKA would be a perfect Use Case as Big Data application for the EsD projects
16ETP4HPC workshop, 23 June 2016
Questions?
ETP4HPC workshop, 23 June 2016 17