SKA Science Data Processor Consortium SKA...

43
SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance Computing and Research Computing Service University of Cambridge Email: [email protected] and SKA Science Data Processor Consortium ISC17 Frankfurt

Transcript of SKA Science Data Processor Consortium SKA...

Page 1: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

SKA Science Data Processor Update

Dr. John Taylor(on behalf of)High Performance Computing and Research Computing ServiceUniversity of CambridgeEmail: [email protected] Science Data Processor Consortium

ISC17 Frankfurt

Page 2: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Overview

• Science Data Processor Context• Overview of the Science Data Processor

(SDP)• Architectural Considerations• ALASKA Prototyping Environment• Next Steps

Page 3: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

One SDP Two Telescopes

Page 4: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

SDP Scope SKA Phase 1

Ref. SKA-TEL-SDP-0000001 SDP Preliminary Architecture Design P Alexander et al

Page 5: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

SDP Key Performance Requirements -- SKA Phase 1

SDP Local Monitoring & Control

High Performance• ~100 PetaFLOPS

Data Intensive• ~100 PetaBytes/observation

(job)

Partially real-time• ~10s response time

Partially iterative• ~10 iterations/job (~6hour)

Telescope Manager

CSP

Observatory

High Volume & High Growth Rate• ~100 PetaByte/year

Infrequent Access

• ~few times/year max

Data Processor Data Preservation

Delivery System

Data Distribution•~100 PetaByte/year from Cape Town & Perth to rest of World

Data Discovery•Visualisation of 100k by 100k by 100k voxel cubes

Science Data Processor

~1 Tbytes-1~10

Gbytes-1~20Gbytes-1

~1 Gbytes-1

(TBC)

Page 6: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Illustrative Requirements

• HPC – ~250 PFLOP system (Peak)– ~200 PByte/s aggregate BW to fast working memory – ~80 PByte Storage– ~0.5-1 TeraByte/s sustained write to storage– ~5-10 TeraByte/s sustained read from storage– ~10000 FLOPS/byte read from storage– ~2 Bytes/Flop memory bandwidth

6

• Preservation and LMC– ~1-10 Gbytes/s QA information– ~ 20 Gbytes/s to HSM for preservation– ~10s latency to respond to alerts

• Ingest– ~Tbyte/sec

Page 7: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

SDP Overview

• So the SDP is NOT ONLY just another HPC system– Achieve high-performance on key scientific algorithms in Exascale

regime• State-of-the-art HPC technologies are critical

• BUT, It needs to also:-– Collect, manage, store and deliver vast amounts of data into viable

products • Big Data => Variety, velocity, volume, veracity => Value

– Combine real-time and iterative execution environment and provide feedback at various cadence to other elements of the telescope

• High Performance Data Analytics

– Operate 365 days a year • Highly available and accommodate failure via software. Modern hyperscale

environments

– Extensible and Scalable• Provide a modern eco-system to accommodate new algorithm development and

upgrades

Page 8: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

SDP Challenges

• Power efficiency – Current Exascale roadmap (US) indicates 20-25MW(!) for ExaFlop by 2023. – Typically US Gov. pays around 200-250MUSD for such beasts– ECP now in flow with separate budget to develop the capability.

• Cost – Are our assumptions correct? How will growth-rates pan-out (processor,

memory, networking and storage).

Page 9: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

SDP Challenges

• Complexity of Hardware and Software• Combining Real-Time (Streaming), Off-line (Batch) with feedback• Multiple Sub-Systems (Ingest, Buffer, Processing, Control,

Preservation and Delivery)• Scalability

• Hardware roadmaps• Demonstrated software scaling is uncertain

• Extensibility, scalability, maintainability• SKA1 is the first “milestone” – expecting significant expansion in the

2020s with a 50yr observatory lifetime

Page 10: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

KEY CHARACTERISTICS OF RADIO INTERFEROMETRY IMAGE PROCESSING

Page 11: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Key Characteristics of SKA Data Processing

Very large data volumes, all data are processed in each observation

Noisy Data

Corrected for by deconvolution using iterative algorithms (~10 iterations)

Sparse and Incomplete Sampling

Corrected by jointly solving for the sky brightness distribution and for the slowly changing corruption effects using iterative algorithms

Corrupted Measurements

Loosely coupled tasks, large degree of parallelism is inherently available

Multiple dimensions of

data parallelism

Page 12: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Data-parallelism schemes

Frequency

Time & baseline

o Data parallelism: Dominated by frequency

o Provides dominant scalingo Nothing more needed if each processing

node can manage a frequency channel complete processing

Processing nodes

Visibility data

Data-parallelism schemes

Page 13: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Frequency

Time & baseline

Visibility data

o Further data parallelism in locality in UVW-spaceo Use to balance memory bandwidth per nodeo Some overlap regions on target grids neededo UV data buffered either on a locally shared store or locally on each node

Data-parallelism schemes

Page 14: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Data-parallelism schemes

Frequency

Time & baseline

Visibility data

o To manage total I/O from buffer/bus distribute Visibility data across nodes for same target grid which is duplicated

o Duplication of target provides fall-over protection

Page 15: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

KEY ARCHITECTURAL CONSIDERATIONS AND MAPPING TO CURRENT HARDWARE

Page 16: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

SDP Functional Breakdown

Ref. SKA-TEL-SDP-00000013 SDP Preliminary Architecture Design P Alexander et al

Page 17: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Imaging Component

Page 18: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Imaging and Fast Imaging in more detail

Page 19: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Image Processing Model

UV data store

Major cycle

Astronomical quality data

Image gridded data

Deconvolve imaged data Update(minor cycle) current sky model

Solve for telescope and Update image-plane calibration calibration model model

Imaging processors

Subtract current sky UV processors model from visibilitiesusing current calibrationmodel

Grid UV data to form e.g. W-projection

Page 20: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Compute Island/Node Concept

Ref. SKA-TEL-SDP-0000018 SDP Data Processor Platform Design C. Broekema

Page 21: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Compute Island Concept

Current Hardware Costed Concept

Page 22: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

SDP Networking

Page 23: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

SDP Hardware Concept

Page 24: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

PROTOYPING ACTIVITES

Page 25: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Prototyping for SDP – What we need to explore

• PRIORITY BASED ON RISK• Provisioning, Management and Control

– Multiple networks (single tenant), large HPC platform, integration within multiple sub-systems (LMC, Preservation, Delivery), logging and event handling

• Software Defined Networking– Multiplicity of networks required (some RDMA), can these all be subsumed by SDN

(and over Ethernet)• Virtualization and Containerization

– What is the overhead for parallel applications? Small % can have a dramatic impact on cost. Bare metal provisioning at scale

• Orchestration of Pipelines with Execution Framework– Data-driven Execution Framework, scheduling of pipelines, perhaps based on COTS

Big Data Solution• Orchestration of feedback mechanisms

– “High Performance Middleware”, Distributed Database to maintain telescope state and sky model

• Management of Storage Hierarchies– RDMA to Object Storage, Parallel FS, API to support hierarchy.

Page 26: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Prototyping Platform

• Create a flexible, but performance-driven prototyping environment (P3) to support and inform Architecture

– Support migration of SIP– Create a software environment to define infrastructure – Provision a variety of storage technologies to provide experimental

bench– Provision multiple networks to accommodate data flows– Provision some CPU acceleration– Enable PaaS to investigate Execution Frameworks

• Solution OpenStack A la SKA – Sits on top of P3

Page 27: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

The Buffer

• Ephemeral Storage required to support – real-time (Hot - hours) and – batch (Cold - weeks) processing

• Buffer is localised to a Data Island which is a subset of a Compute Island

• Provide a single namespace across 1-n compute nodes

• Hot buffer – currently conceived as local to nodes is driven by

performance to meet real-time needs• Cold buffer

– network attached is driven by capacity and resilience

Page 28: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Execution Engine and Data Life Cycle

Approach: Build on BigData Concepts "data driven” → graph-based processing approach

receiving a lot of attentionInspired by Hadoop but for our complex data flow

Graph-based approach

Hadoop

Page 29: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Execution Engine: hierarchical

Processing Level 1

Cluster and data distribution controller

Relatively low-data rate and cadence of messaging

Staging: aggregation of data products

Processing level 2

Static data distribution exploiting inherent parallelism

Page 30: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Execution Engine: hierarchical

Processing Level 2Data Island

Shared file store across data island

Worker nodes

Task-level parallelism to

achieve scaling

Process controller

(duplicated for resilience)

Cluster manager e.g. Mesos-like

Page 31: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

What do we need

• Processing Level 1:– Custom framework to provide scaleout

• Processing Level 2:– Many similarities to Big-Data frameworks– Need to modify / develop for High Throughput

• High Throughput data analytics framework (HTDAF)– Possibly development of something like SPARK

• New data models• Links to external processes• Memory management between framework and processes

– Shared file system needs to be very intelligent about data placement / management or by tightly coupled with the HTDAF

Page 32: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

StackHPC

Performance Prototype Platform

Performance Prototype Platform

▸ Specification complete early 2017

▸ Deployment started late March

▸ Early users started April

▸ Hosted and Managed by Cambridge University, UIS

32

Page 33: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

StackHPC

Performance Prototype Platform

ALASKA - HARDWARE▸ 3x Control nodes

▸ 29x Compute nodes

▸ 2x High-memory nodes

▸ 2x GPU nodes

▸ 1x NVMe Storage node

▸ 2x SSD Storage nodes

▸ 5x ARM64 Ceph Storage Cluster

33

Page 34: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Prototyping Platform

Prototyping Platform - Hardware

BASED ON CURRENT HARDWARE CONCEPT OF A COMPUTE ISLAND

34

Page 35: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Performance Prototype Platform

Technology Evaluation Zoo– GPU– NVMe– ARM64– HPC network fabrics– High memory

Software Evaluation Zoo– Ironic– Magnum– Sahara– Monasca– SDN

Application Evaluation Zoo– Spark et al– MPI workloads– Containerised workloads– RDMA data flow models– Programmable networks– Stimulus generation

Page 36: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Performance Prototype Platform

Developing Ironic– Zero-touch registration– Scalable provisioning– Multi-network support– Reconfigurable BIOS &

RAID

Developing Monasca– Logging via Monasca– Postgres data store– Grafana visualisation– Multi-tenant logging service– HPC monitoring services

Developing Kolla– Monasca containers– Kolla-on-Bifrost

Page 37: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Prototyping Work So Far

• P3-System is isolated in WCDC• Infrastructure is Software Defined• SDN Enablement• Support for SIP

• Docker Swarm, Mesos, SPARK, HPCaaS• CephFS Subsystem• Next Steps (short-term)

• Hot buffer - POSIX Cluster FileSystem• Cold buffer - Object or FileSystem• High Performance Monitoring and Logging• DBaaS

Page 38: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

SKA and OpenStack Science-WG

We are not alone……..

Page 39: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

BACKUP SLIDES

Page 40: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

• Next Generation radio telescope – compared to best current instruments it is ...

• ~100 times sensitivity• ~ 106 times faster imaging the sky• More than 5 square km of collecting area on sizes

3000km• Two Phases (2023 and 2030)

• Will address some of the key problems of astrophysics and cosmology (and physics)

• Builds on techniques developed in Europe • It is an interferometer

• Uses innovative technologies...• Major ICT project• Need performance at low unit cost

What is the SKA

Page 41: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance
Page 42: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

Pulsar as Natural Clocks: Testing gravity

• Pulsars are rotating neutron stars

• Pulse once per revolution → yery accurate clocks

• The SKA will detect around 30,000 pulsars in the Galaxy

• Relativistic binaries to test gravity

• Timing net of to detect gravitational waves

Page 43: SKA Science Data Processor Consortium SKA …ska-sdp.org/sites/default/files/attachments/isc17-sdp...SKA Science Data Processor Update Dr. John Taylor (on behalf of) High Performance

SKA Context Diagram

SDP - These are off-site! (In Perth &

Cape Town)

Science Regional Centres