Datamanagement at the European Southern Observatory ...
Transcript of Datamanagement at the European Southern Observatory ...
Datamanagement at the European Southern Observatory:
Strategies and Challenges
Michael Sterzik, ESO Datamanagement and Operations Division
European Southern Observatory
- builds and operates state-of-the-art ground-based astronomical facilities - most productive Observatory world-wide- inter-governmental organisation supported by 16 member states- involvement of ~1/2 of the astronomical community world-wide
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Data Challenges in Astronomy
! diversity of data collections ! multi-mission, multi-messenger ! multi-wavelength ! time-domain
! Astronomy becomes more and more data intensive ! increasing volume and/or increasing complexity
! … depends on Infrastructure ! to store, access, preserve and share ! to process, analyse, synthesise
! … requires a (widely accepted) ecosystem ! data standards, representations (“FITS”) ! metadata definitions ! interoperability protocols (“VO”) ! associated SW tools ! information services
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Data Policy of ESO http://www.eso.org/sci/observing/policies/Cou996-rev.pdf
! Council doc. on VLT/VLTI Science Operations Policy (2004) ! access to the facility (general observing time, Guaranteed Time) ! data rights: proprietary period (in general one year) ! data access: public through an electronic archive ! data products: level 0 (quality control) and 1 (science grade) by ESO ! data analysis: higher level by community
! … necessary for defining a data management plan
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
mapping research into a work/data flow
Phase II
Long Term SchedulePhase I
Phase III
ProgramPreparation
Data EnhancementArchive I/O
OBExecution
Observing BlockPreparation
Short Term Schedule
Quality Control
Metadata
raw datadata products
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
“Business model” of Astronomy
• periodic (6mo) review of observing proposal: oversubscription rate 3-5 • best ideas/proposals win observing time after peer-review • Principal Investigators enjoy proprietary data rights for one year • after: data go public, pressure to publish increases
Volume versus Value
! raw-data are use-less for direct scientific analysis ! data sets must be complete, calibratable, and pass
quality control ! data processing (“reduction”) to create data
products ! quality control, master calibrations (Level 0) ! transformation from instrumental to physical units (L 1) ! combination of observations: deep and wide maps or
spectra (L 2) ! catalogs, high level data products after scientific
analysis (L 3)
Volume
Effort, Value
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Strategies for the Generation of Content: Science Data Products @ ESO
! In-house generation of Data Products (IDPs): ! quality ensured through a standardised process of data acquisition for
science and calibration data ! near-real time quality control process ensures certified master
calibrations (L0) ! un-attended processing through certified pipelines ! goal: science grade data for all popular instruments (L1)
! External Data Products (EDPs): ! provided by public surveys and large programs (deliverables) ! programs specifically selected for high legacy value ! most use dedicated (non-ESO) user-pipes (eg CASU) ! goal: advanced products (L2, L3) ! perspective: user community at large contributes EDPs • quality assurance: published datasets only? • acknowledgement: specific DOI?
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Phase 3 process for EDPs
1.!Data!
prepara(on!
2.!User�s!data!valida(on!
3.!Data!
release!defini(on!
4.!Data!
transfer!to!ESO!
8.!Data!
publica(on!
7.!Archival!storage!
6.!Content!valida(on!
5.!Automa(c!!release!
valida(on!
P.I. Data provider “C
losing” the data release
ESO defines the required data format, provides dedicated tools, user documentation and direct support for Phase 3 data providers.
The data provider i.e. survey P.I. is responsible for the quality of the reduced data products and the associated data release documentation.
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Data Collections: Phase 3 Releases http://www.eso.org/sci/observing/phase3/data_releases.html
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
ESO Data Portal: acknowledgements
… DOIs?
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Linking Publications and Data
! Scientific Return is a KPI for on Research Infrastructure ! data is the primary scientific return ! bibliometrics widely used as a proxy for • quantity (number) • quality (citations) • merit (quality/cost)
! managerial tool • assess impact of science policies, implementation, operations, …
! Community Access to Publications and Data ! enables active archive research ! allows to recognise the authorship (in principle)
! the impact/value of data archive research is not always appreciated (no much data-metrics)
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Comparative(Inter-facility) Bibliometrics
U. Grothkopf et al., http://www.eso.org/sci/libraries/edocs/ESO/ESOstats.pdf
ESO and other Observatories In order to put ESO’s research output into context, we give an overview of the total numbers of publications of major obser-vatories for the publication years 1996 to 2014 (if already available). Note that some facilities date back further than that; their early years are not included in this graph. The most simplistic way of comparing facilities is to look at the numbers of publications. Obviously, this favors large institu-tions with many facilities over smaller ones. A more meaningful investigation should normalize the numbers in some way, for instance by number of observing hours, by actual share of data used in the papers (as many scientific articles use data from more than one observatory), or by budget (telescope construction costs and maintenance). When comparing publication statistics among different observatories, it is essential to assess the selection criteria applied by each observatory. To the best of our knowledge, the observatories shown in this graph include only papers that actually use observational data from their facilities (as opposed to merely referencing them). All papers were published in refereed journals. !!!!!!!!!!!!!!!!!!!!!!!!The statistics shown in Fig. 3 and Table 3 were obtained as follows: ESO total, VLT/VLTI, La Silla, ESO survey telescope, APEX (ESO time), ALMA (Europe time): ESO Telescope Bibliography (http://telbib.eso.org) Chandra: Chandra Bibliographic Statistics (http://cxc.harvard.edu/cda/bibstats/bibstats.html ‘Refereed Chandra Science Papers’ and www.eso.orghttp://cxc.harvard.edu/cda/bibstats/plots/Current/Papers_by-year.txt) Gemini: ADS (Filters / Select References In, http://esoads.eso.org/abstract_service.html#jousel) HST: HST Publication Statistics (http://archive.stsci.edu/hst/bibliography/pubstat.html) ING: Isaac Newton Group of Telescopes paper counts (http://catserver.ing.iac.es/service/biblio/tablecount.php) for WHT, INT, and JKT Keck: Keck Science Bibliography (http://www2.keck.hawaii.edu/library/keck_papers.html) NRAO: NRAO Publication Statistics (http://www.nrao.edu/library/pubstats.shtml) Spitzer: Spitzer Bibliographical Database (http://sohelp2.ipac.caltech.edu/bibsearch/) Subaru: ADS (Filters / Select References In, http://esoads.eso.org/abstract_service.html#jousel) Swift: statistics provided by Sandra Savaglio, Max-Planck-Institute for Extraterrestrial Physics, Garching, Germany XMM: XMM-Newton in the Journals (http://heasarc.gsfc.nasa.gov/docs/xmm/xmmbib.html). Number of publications per year provided by Norbert Schartel, ESA, Madrid, Spain
ESO Library, Karl-Schwarzschild-Strasse 2, 85748 Garching near Munich, Germany, [email protected] / http://telbib.eso.org ! 5
Publications of major observatories by year
0
100
200
300
400
500
600
700
800
900
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
ESO total HST!!Spitzer!VLT/VLTI !NRAOChandra XMM Swift La Silla Keck Gemini ING (may contain duplicates)!Subaru!ESO survey tel. APEX (ESO) ALMA (Eur.)
No. o
f pub
licat
ions
Fig. 3: Refereed publications by ESO and other observatories (as of Feb. 2015)Thick lines: ESO facilities. Thin lines: other ground-based facilities. Dashed lines: space-based facilities.
Please note that selection criteria for inclusion or exclusion of papers vary among observatories• useful to estimate the global performance of a facility (Observatory) • but biased, depends on how librarians count the “contributions”
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Intra-facility bibliometrics
• useful to asses the relative performance of different components
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Intra-facility bibliometrics
• useful to asses the science policy and its implementation
UT Productivity
0 2 4 6 8 10Years between Program scheduled and Publication
0.0
0.5
1.0
1.5
2.0Nu
mbe
r of P
ublic
atio
ns p
er P
rogr
amUT Productivity
0 2 4 6 8 10Years between Program scheduled and Publication
0.0
0.5
1.0
1.5
2.0Nu
mbe
r of P
ublic
atio
ns p
er P
rogr
amAllLargeGTODDTTOO
Average
Guaranteed Time
Discretionary Time
Target of Opportunity
Large Programs
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Links to data and publications telbib.eso.org
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
All original data archive.eso.org
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Content aggregation and exploration cdsportal.u-strasbg.fr
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Archival data enabled publication fraction
U. Grothkopf et al., http://www.eso.org/sci/libraries/edocs/ESO/ESOstats.pdf
start of facility operations start archive population with DP archive services interoperability
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Archival data enabled publication fraction
U. Grothkopf et al., http://www.eso.org/sci/libraries/edocs/ESO/ESOstats.pdf
HST
start of facility operations start archive population with DP archive services interoperability
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
… and costs? as a fraction of total RI operations costs
! data archive operations ! archive infrastructure TCO (1PB, 3 safe copies) 0.3-1% ! content management (production, curation, assurance) ~10%
! data generation ! facility time for calibrations ~4%
! lacking resources for data archive developments … ! apply existing frameworks, standards as much as possible! ! do not re-invent archive infrastructure!
Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015
Summary
! astronomy as data-intensive science ! diversity of datasets with respect to quality, quantity, and origin ! international data standards largely applied (IAU)
! mapping the research process into data-flow process ! allows a systemic approach to reach consistent data quality ! allows to add-value through the generation of data products ! allows to add-value through the sharing of data products
! astronomical data archives ! are an essential science resource ! benefit (= science return + …) > costs
! challenges ahead ! recognise content, services, interoperability as priority ! incentives for data providers