From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories
description
Transcript of From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories
eScience May 2007
From Photons to Petabytes:Astronomy in the Era of Large Scale Surveys and Virtual Observatories
R. Chris SmithNOAO/CTIO, LSST
eScience May 2007
Challenges for the Operational VO Providing Content
capturing and archiving data from diverse instruments, AND capturing metadata (system & science) to make that data useful
Providing Access implementing the VO standards and services, plus network
infrastructure, needed for wide access to the content Ensure not only access, but long-term support and
documentation of datasets & metadata (curation) Providing User Interfaces and Tools
developing and operating user interfaces which enable effective scientific use of ALL of the distributed resources of the VO
eScience May 2007
A Case Study:NOAO Data Management Management of data from all NOAO and some
affiliated facilities = CONTENT 3 mountaintops (Cerro Tololo, Cerro Pachon, Kitt Peak) 11 telescopes More than 30 instruments
Virtual Observatory “back end” = ACCESS Provide effective access to large volume (TBs to PBs) of
archived ground-based optical & infrared data and data products through VO standard interfaces and networks
Virtual Observatory “front end” = UI and TOOLS Enable science by developing VO user interfaces, tools,
and services to work with distributed data sources and large volumes of data
eScience May 2007
eScience May 2007
BIG Question: How does this model SCALE?
Capturing, moving, & processing the data Making the data AVAILABLE through VO
interfaces Making the data USEFUL for scientific analysis
Why do we worry about scaling?
eScience May 2007
Turning Photonsinto Petabytes Today
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
MOSAIC, WFI, IMACS: 64 Mpix cameras ~10 to 20 GB/night
Builds up quickly! in only 3 years of two MOSAIC cameras ~20TB raw data ~40-60TB processed
IMACS image, Las Campanas Observatory (Danny Steeghs, Jan'04)
eScience May 2007
Coming Soon: Dark Energy Camera
Focal Plane:• 64 2K x 4K detectors
• Plus guiding and WFS• 530 Mpix camera
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
eScience May 2007
The Data:Dark Energy Survey Each image = 1GB 350 GB of raw data / night Data must be moved to supercomputer center
(NCSA) before next night begins (<24 hours) Need >36Mbps internationally
Data must be processed within ~24 hours Need to inform next night’s observing
Total raw data after 5 yrs ~0.2 PB TOTAL Dataset 1 to 5 PB
Reprocessing planned using TeraGrid resources
eScience May 2007
LSST: The Large Synoptic Survey Telescope
Survey the entire sky every 3 to 5 nights, to simultaneously detect and study: Dark Matter via Weak gravitational lensing Dark Energy via thousands of SNe per year Potentially hazardous near earth asteroids Tracers of the formation of the solar system Fireworks in the heavens – GRBs, quasars… Periodic and transient phenomena ...…the unknown
Massively PARALLEL Astronomy
eScience May 2007
LSST: The Instrument
8.2m telescope Optimized for WIDE
field of view
3.5 degree FOV 3.5 GIGApixel camera
Deep images in 15s Able to scan whole
sky every 3 to 5 nights
eScience May 2007
LSST: Deep, Wide, FastField of view (FOV)
KeckTelescope
0.2 degrees
10 m
3.5 degrees
LSST
eScience May 2007
LSST Site: Cerro Pachon, Chile
Soar
Gemini
LSST ~1.5m caltelescope
Support
LSST site plan
ElPenon
Gemini (South)SOAR
eScience May 2007
LSST: Distributed Data Mgmt
Long-Haul CommunicationsData transport & distribution
Base FacilityReal time processing
Mountain Sitedata acquisition, temp. storage
Archive/Data Access CentersData processing, long term storage, & public access
eScience May 2007
LSST: The Data Flow Each image roughly 6.5GB Cadence: ~1 image every 15s 15 to 18 TB per night
ALL must be transferred to U.S. “data center” Mtn-base within image timescale (15s), ~10-20Gbps Internationally within <24 hours, >2-10Gbps
REAL TIME reduction, analysis, & alerts Send out alerts of transient sources within minutes Provide automatic data quality evaluation, alert to
problems Processed data grows to >100TB per night!
Just catalogs = Petaybytes per year!
eScience May 2007
LSST Needs
Computing Requirements by Year
0.0
50.0
100.0
150.0
200.0
250.0
300.0
2014 2016 2018 2020 2022
Year
Tera
_Flo
ati
ng P
oin
t O
pe
rati
on
s (
TF
)
Science/OperationsSparesTransientsRed. ImagesDQ AnalysisQueriesDeep Det.RoutineNightlyInitial
ArchiveCenter
Base
Data AccessCenter
eScience May 2007
Turning Photonsinto Petabytes: Summary Today, ~10 to 20 GB/night
MOSAIC, WFI, IMACS: 64 Mpix cameras Soon, ~300 to 500 GB/night
VISTA: 67 Mpix camera VST: 256 Mpix camera DECam/DES: 520 Mpix camera
On the horizon, ~15 TB/night LSST Project: 3 Gpix camera
And these are just survey instruments in Chile!
eScience May 2007
DES, LSST, … the REST of the Science?
Ongoing (MOSAIC, WFI, IMACS) and future (DES, LSST, etc.) projects will provide PETABYTES of archived data
Only a small fraction of the science potential will be realized by the planned investigations
How do we maximize the investment in these datasets and provide for their future scientific use?
eScience May 2007
VO ChallengesProvider Perspective
How do we effectively capture, transport, and manage Petabytes of data? Need advanced IT infrastructure
How do we provide effective access to Petabytes of data? Need advanced data mining interfaces
Fundamentally IT challenges, in support of the astronomical community
eScience May 2007
VO ChallengesScientific Perspective Data Discovery
From those Petabytes, what data exists that might be useful to help address my scientific query?
Data Understanding Which data are best suited for my analysis?
Data Movement How do I get the data from where it is to where it is most
useful?
Data Analysis How do I extract the information I need from the data?
eScience May 2007
NVO portal @ NOAO Focus on Scientific USER
4 Keys: Data Discovery, Data Understanding, Data Access, Data Analysis
First focus on supporting data DISCOVERY Discovery in spatial coordinates: NOAO Sky Discovery in temporal coordinates: Timeline
NOAO NVO portals: http://nvo.noao.edu
And for South America… http://nvo.ctio.noao.edu Foundation for exploring partnerships with S.A.
communities
eScience May 2007
Summary:VO Challenges In Infrastructure
Collect and maintain petabytes of content Provide for effective access, including networks,
hardware, and software In User Interaction
Provide effective user interfaces Support distributed analysis
Support large queries across distributed DBs Support statistical analysis and processing across
distributed resources (Grid processing & storage) TOOLS & SERVICES to enable SCIENCE
eScience May 2007
How?Strategic Partnerships
In Local Systems Vendors: Local Storage, Processing, Servers
In Remote Systems Distributed computer centers to provide bulk storage, large
scale processing Linked together for Grid processing, Grid storage
In Connectivity High-speed national and international bandwidth
Scientific VO Partners to develop standards, provide tools (IVOA) Developing tools and services optimized for scientific
analysis over large datasets (e.g., statistical methods)