SKA Science Data Processor Consortium SKA...

Post on 08-May-2020

15 views 0 download

Transcript of SKA Science Data Processor Consortium SKA...

SKA Science Data Processor Update

Dr. John Taylor(on behalf of)High Performance Computing and Research Computing ServiceUniversity of CambridgeEmail: jt585@cam.ac.ukandSKA Science Data Processor Consortium

ISC17 Frankfurt

Overview

• Science Data Processor Context• Overview of the Science Data Processor

(SDP)• Architectural Considerations• ALASKA Prototyping Environment• Next Steps

One SDP Two Telescopes

SDP Scope SKA Phase 1

Ref. SKA-TEL-SDP-0000001 SDP Preliminary Architecture Design P Alexander et al

SDP Key Performance Requirements -- SKA Phase 1

SDP Local Monitoring & Control

High Performance• ~100 PetaFLOPS

Data Intensive• ~100 PetaBytes/observation

(job)

Partially real-time• ~10s response time

Partially iterative• ~10 iterations/job (~6hour)

Telescope Manager

CSP

Observatory

High Volume & High Growth Rate• ~100 PetaByte/year

Infrequent Access

• ~few times/year max

Data Processor Data Preservation

Delivery System

Data Distribution•~100 PetaByte/year from Cape Town & Perth to rest of World

Data Discovery•Visualisation of 100k by 100k by 100k voxel cubes

Science Data Processor

~1 Tbytes-1~10

Gbytes-1~20Gbytes-1

~1 Gbytes-1

(TBC)

Illustrative Requirements

• HPC – ~250 PFLOP system (Peak)– ~200 PByte/s aggregate BW to fast working memory – ~80 PByte Storage– ~0.5-1 TeraByte/s sustained write to storage– ~5-10 TeraByte/s sustained read from storage– ~10000 FLOPS/byte read from storage– ~2 Bytes/Flop memory bandwidth

6

• Preservation and LMC– ~1-10 Gbytes/s QA information– ~ 20 Gbytes/s to HSM for preservation– ~10s latency to respond to alerts

• Ingest– ~Tbyte/sec

SDP Overview

• So the SDP is NOT ONLY just another HPC system– Achieve high-performance on key scientific algorithms in Exascale

regime• State-of-the-art HPC technologies are critical

• BUT, It needs to also:-– Collect, manage, store and deliver vast amounts of data into viable

products • Big Data => Variety, velocity, volume, veracity => Value

– Combine real-time and iterative execution environment and provide feedback at various cadence to other elements of the telescope

• High Performance Data Analytics

– Operate 365 days a year • Highly available and accommodate failure via software. Modern hyperscale

environments

– Extensible and Scalable• Provide a modern eco-system to accommodate new algorithm development and

upgrades

SDP Challenges

• Power efficiency – Current Exascale roadmap (US) indicates 20-25MW(!) for ExaFlop by 2023. – Typically US Gov. pays around 200-250MUSD for such beasts– ECP now in flow with separate budget to develop the capability.

• Cost – Are our assumptions correct? How will growth-rates pan-out (processor,

memory, networking and storage).

SDP Challenges

• Complexity of Hardware and Software• Combining Real-Time (Streaming), Off-line (Batch) with feedback• Multiple Sub-Systems (Ingest, Buffer, Processing, Control,

Preservation and Delivery)• Scalability

• Hardware roadmaps• Demonstrated software scaling is uncertain

• Extensibility, scalability, maintainability• SKA1 is the first “milestone” – expecting significant expansion in the

2020s with a 50yr observatory lifetime

KEY CHARACTERISTICS OF RADIO INTERFEROMETRY IMAGE PROCESSING

Key Characteristics of SKA Data Processing

Very large data volumes, all data are processed in each observation

Noisy Data

Corrected for by deconvolution using iterative algorithms (~10 iterations)

Sparse and Incomplete Sampling

Corrected by jointly solving for the sky brightness distribution and for the slowly changing corruption effects using iterative algorithms

Corrupted Measurements

Loosely coupled tasks, large degree of parallelism is inherently available

Multiple dimensions of

data parallelism

Data-parallelism schemes

Frequency

Time & baseline

o Data parallelism: Dominated by frequency

o Provides dominant scalingo Nothing more needed if each processing

node can manage a frequency channel complete processing

Processing nodes

Visibility data

Data-parallelism schemes

Frequency

Time & baseline

Visibility data

o Further data parallelism in locality in UVW-spaceo Use to balance memory bandwidth per nodeo Some overlap regions on target grids neededo UV data buffered either on a locally shared store or locally on each node

Data-parallelism schemes

Data-parallelism schemes

Frequency

Time & baseline

Visibility data

o To manage total I/O from buffer/bus distribute Visibility data across nodes for same target grid which is duplicated

o Duplication of target provides fall-over protection

KEY ARCHITECTURAL CONSIDERATIONS AND MAPPING TO CURRENT HARDWARE

SDP Functional Breakdown

Ref. SKA-TEL-SDP-00000013 SDP Preliminary Architecture Design P Alexander et al

Imaging Component

Imaging and Fast Imaging in more detail

Image Processing Model

UV data store

Major cycle

Astronomical quality data

Image gridded data

Deconvolve imaged data Update(minor cycle) current sky model

Solve for telescope and Update image-plane calibration calibration model model

Imaging processors

Subtract current sky UV processors model from visibilitiesusing current calibrationmodel

Grid UV data to form e.g. W-projection

Compute Island/Node Concept

Ref. SKA-TEL-SDP-0000018 SDP Data Processor Platform Design C. Broekema

Compute Island Concept

Current Hardware Costed Concept

SDP Networking

SDP Hardware Concept

PROTOYPING ACTIVITES

Prototyping for SDP – What we need to explore

• PRIORITY BASED ON RISK• Provisioning, Management and Control

– Multiple networks (single tenant), large HPC platform, integration within multiple sub-systems (LMC, Preservation, Delivery), logging and event handling

• Software Defined Networking– Multiplicity of networks required (some RDMA), can these all be subsumed by SDN

(and over Ethernet)• Virtualization and Containerization

– What is the overhead for parallel applications? Small % can have a dramatic impact on cost. Bare metal provisioning at scale

• Orchestration of Pipelines with Execution Framework– Data-driven Execution Framework, scheduling of pipelines, perhaps based on COTS

Big Data Solution• Orchestration of feedback mechanisms

– “High Performance Middleware”, Distributed Database to maintain telescope state and sky model

• Management of Storage Hierarchies– RDMA to Object Storage, Parallel FS, API to support hierarchy.

Prototyping Platform

• Create a flexible, but performance-driven prototyping environment (P3) to support and inform Architecture

– Support migration of SIP– Create a software environment to define infrastructure – Provision a variety of storage technologies to provide experimental

bench– Provision multiple networks to accommodate data flows– Provision some CPU acceleration– Enable PaaS to investigate Execution Frameworks

• Solution OpenStack A la SKA – Sits on top of P3

The Buffer

• Ephemeral Storage required to support – real-time (Hot - hours) and – batch (Cold - weeks) processing

• Buffer is localised to a Data Island which is a subset of a Compute Island

• Provide a single namespace across 1-n compute nodes

• Hot buffer – currently conceived as local to nodes is driven by

performance to meet real-time needs• Cold buffer

– network attached is driven by capacity and resilience

Execution Engine and Data Life Cycle

Approach: Build on BigData Concepts "data driven” → graph-based processing approach

receiving a lot of attentionInspired by Hadoop but for our complex data flow

Graph-based approach

Hadoop

Execution Engine: hierarchical

Processing Level 1

Cluster and data distribution controller

Relatively low-data rate and cadence of messaging

Staging: aggregation of data products

Processing level 2

Static data distribution exploiting inherent parallelism

Execution Engine: hierarchical

Processing Level 2Data Island

Shared file store across data island

Worker nodes

Task-level parallelism to

achieve scaling

Process controller

(duplicated for resilience)

Cluster manager e.g. Mesos-like

What do we need

• Processing Level 1:– Custom framework to provide scaleout

• Processing Level 2:– Many similarities to Big-Data frameworks– Need to modify / develop for High Throughput

• High Throughput data analytics framework (HTDAF)– Possibly development of something like SPARK

• New data models• Links to external processes• Memory management between framework and processes

– Shared file system needs to be very intelligent about data placement / management or by tightly coupled with the HTDAF

StackHPC

Performance Prototype Platform

Performance Prototype Platform

▸ Specification complete early 2017

▸ Deployment started late March

▸ Early users started April

▸ Hosted and Managed by Cambridge University, UIS

32

StackHPC

Performance Prototype Platform

ALASKA - HARDWARE▸ 3x Control nodes

▸ 29x Compute nodes

▸ 2x High-memory nodes

▸ 2x GPU nodes

▸ 1x NVMe Storage node

▸ 2x SSD Storage nodes

▸ 5x ARM64 Ceph Storage Cluster

33

Prototyping Platform

Prototyping Platform - Hardware

BASED ON CURRENT HARDWARE CONCEPT OF A COMPUTE ISLAND

34

Performance Prototype Platform

Technology Evaluation Zoo– GPU– NVMe– ARM64– HPC network fabrics– High memory

Software Evaluation Zoo– Ironic– Magnum– Sahara– Monasca– SDN

Application Evaluation Zoo– Spark et al– MPI workloads– Containerised workloads– RDMA data flow models– Programmable networks– Stimulus generation

Performance Prototype Platform

Developing Ironic– Zero-touch registration– Scalable provisioning– Multi-network support– Reconfigurable BIOS &

RAID

Developing Monasca– Logging via Monasca– Postgres data store– Grafana visualisation– Multi-tenant logging service– HPC monitoring services

Developing Kolla– Monasca containers– Kolla-on-Bifrost

Prototyping Work So Far

• P3-System is isolated in WCDC• Infrastructure is Software Defined• SDN Enablement• Support for SIP

• Docker Swarm, Mesos, SPARK, HPCaaS• CephFS Subsystem• Next Steps (short-term)

• Hot buffer - POSIX Cluster FileSystem• Cold buffer - Object or FileSystem• High Performance Monitoring and Logging• DBaaS

SKA and OpenStack Science-WG

We are not alone……..

BACKUP SLIDES

• Next Generation radio telescope – compared to best current instruments it is ...

• ~100 times sensitivity• ~ 106 times faster imaging the sky• More than 5 square km of collecting area on sizes

3000km• Two Phases (2023 and 2030)

• Will address some of the key problems of astrophysics and cosmology (and physics)

• Builds on techniques developed in Europe • It is an interferometer

• Uses innovative technologies...• Major ICT project• Need performance at low unit cost

What is the SKA

Pulsar as Natural Clocks: Testing gravity

• Pulsars are rotating neutron stars

• Pulse once per revolution → yery accurate clocks

• The SKA will detect around 30,000 pulsars in the Galaxy

• Relativistic binaries to test gravity

• Timing net of to detect gravitational waves

SKA Context Diagram

SDP - These are off-site! (In Perth &

Cape Town)

Science Regional Centres