The data challenge in astronomy archives technology problems solution DCC conference Bath Andy...
-
Upload
teresa-augusta-stewart -
Category
Documents
-
view
215 -
download
0
Transcript of The data challenge in astronomy archives technology problems solution DCC conference Bath Andy...
![Page 1: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/1.jpg)
The data challenge in astronomy
The data challenge in astronomy
• archives
• technology
• problems
• solution
DCC conference Bath Andy Lawrence Sep 2005
DCC conference Bath Andy Lawrence Sep 2005
the virtual observatory
![Page 2: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/2.jpg)
astronomical archives
(1)
![Page 3: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/3.jpg)
IT in astronomy : key areasIT in astronomy : key areas• (1) facility operations• (2) facility output processing• (3) shared supercomputers for theory• (4) science archives• (5) end-user tools
(1-3) : big bucks(4-5) : smaller bucks but
- produces the final science output
- sets requirements for (1-2)
![Page 4: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/4.jpg)
astronomical archives astronomical archives
• major archives growing at TB/yr
ESO Archive Volume (GB)
1
10
100
1000
10000
ESO Archive Volume (GB)
1
10
100
1000
10000
![Page 5: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/5.jpg)
astronomical archives astronomical archives
• major archives growing at TB/yr
• issue not storage but management (curation)
• improving quality of data access and presentation
• needs specialist data centres
ESO Archive Volume (GB)
1
10
100
1000
10000
ESO Archive Volume (GB)
1
10
100
1000
10000
![Page 6: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/6.jpg)
end users end users
• increasing fraction of archive re-use
![Page 7: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/7.jpg)
end users end users
• increasing fraction of archive re-use• increasing multi-archive use • most download small files and analyse at home• some users process whole databases• reduction standardised; analysis home grown
![Page 8: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/8.jpg)
needles in a haystackneedles in a haystackHambly et al 2001
- faint moving object is a cool white dwarf- may be solution to the dark matter problem- but hard to find : one in a million- even harder across multiple archives
![Page 9: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/9.jpg)
failed starsfailed stars
compare optical and infra-red
extra object is very cold
a "brown dwarf" orfailed star
![Page 10: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/10.jpg)
multi- views of a Supernova Remnant
Shocks seen in the X-ray
Heavy elementsseen in the optical
Dust seen in the IR
Relativistic electrons seen in the radio
![Page 11: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/11.jpg)
solar-terrestrial linkssolar-terrestrial links
Coronal mass ejection imaged by space-based
solar observatory
Effect detected hours later bysatellites and ground radar
![Page 12: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/12.jpg)
background technology
(2)
![Page 13: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/13.jpg)
dogs and fleas dogs and fleas
• there is a very large dog
![Page 14: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/14.jpg)
hardware trends hardware trends
• ops, storage, bw : all 1000x/decade– can get 1TB IDE = $5K
– backbones and LANS are Gbps
1.E-06
1.E-03
1.E+00
1.E+03
1.E+06
1.E+09
1880 1900 1920 1940 1960 1980 2000
doubles every 7.5 years
doubles every 2.3 years
doubles every 1.0 years
ops per second/$
![Page 15: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/15.jpg)
hardware trends hardware trends
• ops, storage, bw : all 1000x/decade– can get 1TB IDE = $5K– backbones and LANS are Gbps
• but device bw 10x/decade– real PC disks 10MB/s; fibre channel SCSI poss 100MB/s
• and last mile problem remains– end-end b/w typically 10Mbps
1.E-06
1.E-03
1.E+00
1.E+03
1.E+06
1.E+09
1880 1900 1920 1940 1960 1980 2000
doubles every 7.5 years
doubles every 2.3 years
doubles every 1.0 years
ops per second/$
![Page 16: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/16.jpg)
operations on a TB database operations on a TB database
• searching at 10 MB/s takes a day– solved by parallelism– but development non-trivial ==> people
• transfer at 10 Mbps takes a week– leave it where it is
• ==> data centres provide search and analysis services
![Page 17: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/17.jpg)
network development network development • higher level protocols ==> transparency
• TCP/IP message exchange
• HTTP doc sharing (web)
• grid suite CPU sharing
• XML/SOAP data exchange
==> service paradigm
![Page 18: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/18.jpg)
next up on the internet next up on the internet
• workflow definition
• dynamic semantics (ontology)
• software agents
![Page 19: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/19.jpg)
the problems
(3)
![Page 20: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/20.jpg)
data growth data growth
• astronomical data is growing fast
• but so is computing power
• so whats the problem ?
(1) Heterogeneity(2) End user delivery(3) End user demand
![Page 21: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/21.jpg)
data rich future data rich future • heritage
– Schmidt, IRAS, Hipparcos
• current hits– VLT, SDSS, 2MASS, HST, Chandra, XMM, WMAP
• coming up : – UKIDSS, VISTA, ALMA, JWST, Planck, Herschel
• cross fingers : – LSST, ELT, Lisa, Darwin,SKA, XEUS, etc.
• plus lots more
• issue is archive interoperability– need standards and transparent infrastructure
![Page 22: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/22.jpg)
archive data rates archive data rates • map the sky : 0.1" x 16 bits = 100 TB• process to find objects : billion row tables• VISTA 100 TB/yr by 2007• SKA datacubes 100PB/yr by 2020• not a technical or financial problem
– LHC doing 100PB/yr by 2007
• issue is logistic : data management • need professional data centres
![Page 23: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/23.jpg)
data rates : user delivery data rates : user delivery
• disk I/O and bandwidth – end-user bottlenecks will get WORSE– but links between data centres can be good
• move from download to service paradigm– leave the data where it is– operations on data (search, cluster analysis, etc) as services– shift the results not the data– networks of collaborating data centres (datagrid or VO)
![Page 24: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/24.jpg)
user demands user demands
• bar constantly raising– online ease– multi-archive transparency– easy data intensive science
• new requirements – automated resource discovery (intelligent Google)– cheap I/O and CPU cycles – new standards and software infrastructure
![Page 25: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/25.jpg)
the virtual observatory
(4)
![Page 26: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/26.jpg)
the VO concept the VO concept
• web all docs in the world inside your PC
• VO all databases in the world inside your PC
![Page 27: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/27.jpg)
Generic science driversGeneric science drivers
• data growth• multi-archive science• large database science
can do all this now, but needsto be fast and easy
• empowerment
Beijing as good as Berkeley
![Page 28: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/28.jpg)
whats its notwhats its not
• not a monolith
• not a warehouse
![Page 29: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/29.jpg)
VO frameworkVO framework
• framework + standards
• inter-operable data
• inter-operable software modules
• no central VO-command
- its not a thing- its a way of life
![Page 30: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/30.jpg)
VO geometryVO geometry
• not a warehouse
• not a hierarchy
• not a peer-to-peer system
• small set of service centresand large population of end users
– note : latest hot database lives with creators / curators
![Page 31: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/31.jpg)
yesterdayyesterday
browserfrontend
CGIrequest
html
web page
DBengine
SQL
data
![Page 32: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/32.jpg)
todaytoday
appl
icat
ion
webservice
SOAP/XML request
SOAP/XML data
DBengine
SQL
nativedata
anyt
hin
g
standard formats
![Page 33: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/33.jpg)
tomorrowtomorrow
appl
icat
ion
webservice
job
results
anyt
hin
g
webservice
webservice
webservice
webservice
webservice
Registry Workflow
GLUE Certification VO Space
standard semantics
publ
ish W
SDL
grid
con
nec
ted
![Page 34: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/34.jpg)
publishing metaphorpublishing metaphor
• facilities are authors
• data centres are publishers
• VO portals are shops
• end-users are readers
• VO infrastructure is distribution system.
![Page 35: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/35.jpg)
International VO alliance (IVOA) International VO alliance (IVOA)
![Page 36: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/36.jpg)
IVOA standardsIVOA standards
• formal process modelled on W3C
• technical working groups and interop workshops
• agreed functionality roadmap
![Page 37: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/37.jpg)
IVOA standardsIVOA standards
• key standards so far– table formats– resource and service metadata definitions– semantic dictionary– protocols for image and spectrum access
• coming year– grid and web service interfaces– authentication– storage sharing protocols– application metadata and interfaces
![Page 38: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/38.jpg)
state of implementationsstate of implementations
• key projects : AstroGrid, US-NVO, Euro-VO
• many compliant data services
• VO aware tools
• mutually harvesting registries
• workflow system
• simple shared storage
• AstroGrid has ~100 registered users
• first science results coming out
![Page 39: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/39.jpg)
coming yearcoming year
• single sign on
• internationally shared storage
• NGS link up
• many more tools
![Page 40: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/40.jpg)
next stepsnext steps
• intelligent glue– ontology, agents
• analysis services– cluster analysis, multi-D visualisation, etc
• theory services – simulated data, models on demand
• embedding facilities– VO ready facilities
– links to data creation
![Page 41: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/41.jpg)
lessons
![Page 42: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/42.jpg)
lessonslessons
• drivers: end user bottleneckend user demandempowerment
• need network of healthy data centres
• need last mile investment
• need facilities to be VO ready
• need continuing technology development
• need continuing standards programme
![Page 43: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.](https://reader036.fdocuments.us/reader036/viewer/2022062518/56649e8f5503460f94b933a7/html5/thumbnails/43.jpg)
FIN