Data Intensive Astronomy Group Talk II ICRAR Con 4 September 2015 Chen Wu.

Post on 21-Jan-2016

215 views 0 download

Transcript of Data Intensive Astronomy Group Talk II ICRAR Con 4 September 2015 Chen Wu.

Data Intensive Astronomy Group Talk II

ICRAR Con

4 September 2015

Chen Wu

2

– MWA data system

– MWA data usage

– MWA storage modelling

– GLEAM archive

– GLEAM VO usage

– In-archive processing

Agenda

3

MWA Data System

4

MWA Archive

5

MWA Archive

6

MWA ingest and retrieval

24 Jan 2015

5 Aug 2014

7

Data access by region

14 million successful retrievals

8

MWA usage

9

Distribution of file freshness and staleness

Data storage @ MWA LTA Pawsey

10

10 Gbps

Stage

Release

Fast Disk Storage

Tape Library

DM-GET

DM-PUT

Archive

Stage

FE Nodes

CXFS

API

POSIX

MWA LTA @ Pawsey

Hierarchical Storage Management (DMF)

Tape Library x2(CSIRO - library + cables + optics)

x32

x32

Bulk Disk Storage

ScienceDB

M&CDB

11

Staging time

AGE_WEIGHT = constant + multiplier*<file_age_in_day>

12

Storage performance modelling

“simulation” using the MWA data access stream (25 million successful requests)

(a, b, c, d, c, a) 3 is the reuse distance

(a, b, c, d, c, a) a is staged from the tape

13

GLEAM

IVOA Interface

GLEAM VO Server

WebInterface

GLEAM Archive Store 04

GLEAM Archive Store 06

NGAS Client

Over 800,000 images20,000 MeasurementSet220 TB

14

GLEAM usage on all-sky view

All sky view in Aladin Lite!

15

In archive processingSome “real” requirements from both MWA and GLEAM:– Interactive processing

• Cutout and regridding, NGAS Tasks

– Batch (re-)processing - Process all files satisfying some conditions currently in the archive: e.g.

• Compress all visibility files that are (1) EoR project and (2) Observed on last Friday (MWA)

• Rescale flux of all snapshot images of GLEAM Phase 1 that are ingested in the past two weeks

• Make movies from images formed in DEC -26 strip scans• Re-index all WCS headers of images ingested from last November

– Incremental processing - Asynchronously, continuously, and selectively processing "newly" ingested files

• After a snapshot image tar is ingested, decompress it, and for each FITS image, compute its sky coverage, and update VO database indexes accordingly

• As soon as a 32MHz image is ingested, if its Robustness is 0, send a copy to RRI at India before transferring it to RDSI

– NGAS Job Framework • With the same spirit of MapReduce• File Object Container or DROP

16

Data re-processing Web UI

17

– MWA data system

– MWA data usage

– MWA storage modelling

– GLEAM archive

– GLEAM VO usage

– In-archive processing

Conclusion

18

Thank you!Q & A