E research overview gahegan bioinformatics workshop 2010
-
Upload
bestgrid-nz-research-computing-eresearch-grid-escience-cyberinfrastructure -
Category
Education
-
view
445 -
download
1
description
Transcript of E research overview gahegan bioinformatics workshop 2010
eResearch: the evolution of science
Mark Gahegan
Center for eResearch The University of Auckland
Vannevar Bush, As We May Think (1945)
There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers - conclusions which he cannot find time to grasp, much less to remember, as they appear.
Professionally our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose…
…A record, if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted. (Bush, 1945)
The data explosion (from Wired ‘Big Data, July 2008)
Terabytes What it stores
1 2,600 songs Large hard disk ($200)
20 Photos uploaded to FaceBook every month
120 All the data collected by the Hubble telescope
330 Weekly data produced by the Large Hadron Collider (est.)
440 All the international climate / weather data compiled by the National Climatic Data Center in the USA
530 All the videos in YouTube
1000 (1 petabyte)
Data processed by Google servers every 72 minutes
Sarah E. Fratesi, 2008Journal of Research PracticeVolume 4, Issue 1, Article M1,Scientific Journals as Fossil Traces of Sweeping Change in the Structure and Practice of Modern Geology
Problems with Science
• The three pillars of Science– Communicable– Repeatable– Refutable
• Science efficiency– Share expensive facilities / equipment– Find, use, and understand, relevant resources– Question assumptions and reasoning
effectively
Connectivity resources
eResearchTheories, concepts
Knowledge representation
Data: Observations, measurements, experiments
Instrumentation
Information: real-time, archives, analyses
Informatics resources
Models, simulationsSupercomputing
PeopleCollaboration,
visualization, education resources
Awareness / OutreachEducation
Support / Enabling
Societal contextScience driversGlobal issues
Reproducible Science means context, quality, trust, easy access to the sources
Methods / workflows are scientific commodities
• Scripts, workflows, simulations, experimental plans statistical models, ...
• Repeatable, reproducible, comparable and reusable research.
• Sharing propagates expertise and builds reputation.
,
http://myexperiment.org
Methods
Lab Books
Preprints
DataVideo
Blogs
Podcasts
Codes
Algorithms
Models
Presentations
OntologiesIntermediateResults
Related Articles
Comments& Reviews
Plans
Models
Reproducible, or rather “fully supported”,Transparent science, Composite research components
Carole Goble, UK eScience
Methods
Lab Books
Preprints
DataVideo
Blogs
Podcasts
Codes
Algorithms
Models
Presentations
OntologiesIntermediateResults
Related Articles
Comments& Reviews
Connections run both ways…
Carole Goble, UK eScience
Virtual Research Environments
Support for knowledge communitiesSocial networks of collaboration, use cases,Emergent trends and patterns
3D Earthquake Modeling
Earthquake scenarios
Some challenges and consequences
• Bigger, infrastructures: some institutionally focussed, some nationally focussed, some community focussed
• Who ‘OWNS’ our research: where is it physically housed? How is access managed?
• eResearch may also change the nature of the ‘Library’ the ‘Institution’ and even the ‘Academy’. Consider: Publish, Peer Review, Contribution, Tenure
What next for NZ? Aligning the research institutions around eReseach
Planning with MoRST for a long-term integrated landscape of HPC and eResearch, a National eResearch Infrastructure
What are the research needs, tools, applications, environments, computing capabilities that we will need, over the next 10 years?
Please get in touch if you would like to include your ideas and needs:
We_are@the_end
Questions, comments
Graphic Correlation Database
PGAPPGAP
Example 1Fossils and climate: Paleo-Integration
(Community and data integration)
PaleoIntegration ProjectAllister Rees, University of Arizona
3-tier architecture:
Front - user interface (computer terminal, user-friendly search terms and tools)
Back - databases (schema, ontology coding - age, geography, content)
Middleware - translates user-selected parameters for database searches - keeps track of user selections (workflow), so a modified search doesn’t mean “starting over” - routes user requests to different software components (e.g. data query, spatial data conversion), bringing results from multiple databases and tools together on one screen
How?Architecture—simplified
Integration of various data, datasets and databases
Download search results, analyze and interpret data
Fossil collection and publication
Publish new results and interpretations?
Early Jurassic Climates, Vegetation, and Dinosaur Distributions
Paleobiology Database (PBDB)Paleomap Project
LATE JURASSIC PLANT DIVERSITY
Paleogeographic Atlas Project (PGAP)Oil Source Rocks Dataset (OSR)Paleomap Project
LATE JURASSIC COALS AND EVAPORITES
Dinosauria Dataset (DINO)Paleomap Project
LATE JURASSIC DINOSAURS
TENDAGURU
MORRISON
Climate / biome reconstruction
GEON SYNSEIS Integration Platform
Dogan Seber, SDSC
Subsurface Model
SeismicG
EO
N p
ort
al a
nd
HP
C
En
viro
nm
en
t
Gravity Magnetic
Simulation, Analyses and Integration
Sci
enti
fic
Dis
cove
ries
Inte
rna
l an
d E
xte
rna
l Da
tase
tsExample 2: Earthquake simulation
(data integration & HPC)