Developing a Strategy for e-Science Indiana University Malcolm Atkinson Director e-Science Institute...
-
Upload
brandy-phillip -
Category
Documents
-
view
214 -
download
0
Transcript of Developing a Strategy for e-Science Indiana University Malcolm Atkinson Director e-Science Institute...
Developing a Strategy for
e-Science
Indiana University
Malcolm AtkinsonDirector e-Science Institute
UK e-Science Envoy
www.nesc.ac.uk5th February 2008
Outline
• What is e-Science• What we gained from an e-Science
initiative• Why we need a strategy• What should the strategy achieve• What computing research do we need
• Theory & pioneering steer each other• Realistic models• Sustainable farming for the e-Science Ecosphere
• The global challenge
Definition of e-Science
Computing has become a fundamental tool in all research disciplines, which often proceed by assembling and managing large data collections and exploiting computer models and simulations (a topic called e-Science)
e-Science is the invention and app lication of computer-enabled methods to achieve new, better, faster or more efficient research in any discipline. It draws on advances in computing science, computation and digital communications. As such it has been an important tool for researchers for many decades. The data deluge and the scale and complexity of todayÕs research challenges have greatly increased its importance for researchers. As a consequence, in 2001 the UK led the world by initiating a coordinated e-Science research programme to stimulate the development of e-Science across all fields of research. That investment, £250 million, has developed assets on which the Strategy for Century-of Information Research will build.
Strengths of e-Science
Researchusinge-Science
Researchenabling
e-Science
Communities and e-Infrastructure supporting research and innovation
e-Science Centres in the UKe-Science Centres in the UKe-Science Centres in the UKe-Science Centres in the UK
OxfordOxford
EdinburghEdinburgh
BelfastBelfast
CambridgeCambridgeSTFC DaresburySTFC Daresbury
ManchesterManchester
LeSCLeSC
NewcastleNewcastle
SouthamptonSouthampton
CardiffCardiff
STFC RALSTFC RAL
GlasgowGlasgow
LeicesterLeicester
UCLUCL
BirminghamBirmingham
White RoseGrid
White RoseGrid
LancasterLancaster
ReadingReading
Access GridSupport Centre
Access GridSupport Centre
Digital Curation CentreDigital Curation Centre
National GridService
National GridService
National Centrefor e-Social
Science
National Centrefor e-Social
Science
National Centre forText Mining
National Centre forText Mining
National Institutefor Environmental
e-Science
National Institutefor Environmental
e-Science
Open MiddlewareInfrastructure Institute
Open MiddlewareInfrastructure Institute
SheffieldSheffieldSheffieldSheffield
YorkYorkYorkYork
LeedsLeedsLeedsLeeds
Coordinated by:Directors’ Forum
& NeSC
Coordinated by:Directors’ Forum
& NeSC
Web: www.omii.ac.uk Email: [email protected]
OMII-UK: For all kinds of users
Taverna: effortless workflows for scientists
OGSA-DAI: data integrationfor service providers
PAG: AG video-conferencingfor anyone
Campus Grid Toolkit: easy toinstall grid for job submission
SAGA: abstraction & code mobility
NGS & Partners, 2007
ESI Themes
Slide from Dr Anna Kenway
Theme 8: Trust and Security in Virtual Communities
Theme 4: Spatial Semantics for Automating Geographic Information Processes
Theme 5: Distributed Programming Abstractions
Theme 6: e-Science in the Arts and Humanities
Theme 7: Neuroinformatics and Grid Techniques to Build a Virtual Fly Brain
Theme 9: Provenance
Outline
• What is e-Science• What we gained from an e-Science initiative
•Why we need a strategy• What should the strategy achieve• What computing research do we need
• Theory & pioneering steer each other• Realistic models• Sustainable farming for the e-Science
Ecosphere
• The global challenge
Official UK Research Goals
Tremendous global challenges
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
Scale, Urgency, Complexity, …
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
The 21st Century
This is the century of information
PM G. Brown, University of Westminster, 25 October 2007
• We can collect it• We can generate it• Can we move it?• We can store it• Can we use it? • Dramatic increase in data from sensors
• Dramatic drop in cost of computation• Web-scale effects• Ubiquitous digital communications• Community intelligence• Global challenges• Transforming research, design, diagnosis, social behaviour, …
セキュリティ
GRID/ ペタコン
ユビキタス
ITS
ではない 情報系アンブレラ
…And then there is now the Information Explosion
988EB(2010)
161EB(2006 by IDC)
= 1ZB
Slide: Satoshi Matsuoka
Outline
• What is e-Science• What we gained from an e-Science initiative• Why we need a strategy
•What should the strategy achieve
• What computing research do we need • Theory & pioneering steer each other• Realistic models• Sustainable farming for the e-Science
Ecosphere
• The global challenge
High-Level Goals for CIR
• New world-leading research• New methods & new technology
• High impact (transformative)• Sustained rapid transfer from invention to wide use• Much wider engagement => More Research &
Innovation• Cultural changes• Effective transfer between business & academia
• Cost effective• Shared e-Infrastructure (Cyberinfrastructure)• Shared support for developing advances in
Tools Services Trust
Elements of CIR
• Establish an Office of Strategic Coordination of Century-of-Information Research
• Support the continuous innovation of research methods
• Provide easily used, pervasive and sustained e-Infrastructure for all research
• Enlarge the productive research community who exploit the new methods fluently
• Generate capacity, propagate knowledge and develop a culture via new curricula
Enable extreme e-Science
• Sustain support for interdisciplinary teams• Breakthroughs depend on talented research leaders• Plus strong supporting teams
• Provide an environment of composable components• Significant advances from familiar components• Composed in new ways
• Provide powerful tools and services• With licence to experiment
• Inject energy through challenges & long-term funds
CIR Sustain method invention
Applied Scientist
e-Scientist
Researcher communitiesusing e-Science Methods
e-Science e-Infrastructure
Com
puter Science
EvidenceMethodsModels
&challenges
AlgorithmsModels
NotationsMethods
Technology
Supports
Cha
lleng
esId
eas
Mod
els
Test
s U
ses
Dep
loys
Eva
luat
esA
dapt
s
Infrastructure Provisionand Support
InfrastructureDevelopment
Adoption
Challenges
Challenges & supportsOperational data
Slide from John Darlington with modificationsReal invention has more complex interactions
CIR Enable fluent mass use
Applied Scientist
e-Scientist
Researcher communitiesusing e-Science Methods
e-Science e-Infrastructure
Com
puter Science
EvidenceMethodsModels
&challenges
AlgorithmsModels
NotationsMethods
Technology
Supports
Cha
lleng
esId
eas
Mod
els
Test
s U
ses
Dep
loys
Eva
luat
esA
dapt
s
Infrastructure Provisionand Support
InfrastructureDevelopment
Adoption
Challenges
Challenges & supportsOperational data
Balancing Three Strands of CIR• Pioneering
• Invention of new data & computational methods• Advances in the ways they are used• Advances in the technology that supports them
• Provision• e-Infrastructure or Cyberinfrastructure
support, consultancy, training, tools, services Curated digital data resources, Computation, Communication networks & CSCW
• Education & cultural change• Preparing graduates to flourish in the digital economy• Developing a culture & trust that enables data sharing
scientists
LocalWeb
Repositories
Digital Libraries
Graduate Students
Undergraduate Students
Virtual Learning Environment
Technical Reports
Reprints
Peer-Reviewed Journal &
Conference Papers
Preprints &
Metadata
Certified Experimental
Results & Analyses
experimentation
Data, Metadata Provenance WorkflowsOntologies
The social process of science
Slide: DaveDe Roure
Web Services RESTful APIs cmd lines ssh http
Web Browser Mobile phone iPod Car Equipment PDA
P2P
mashups
workflows
services
applicationsSubjectICT experts Computer
Scientists
Software Companies
Workflowtools
Ruby on Rails
ecosystem
Scientists
open sourceSoftwareEngineers
nesc
Slide: DaveDe Roure
scientists
LocalWeb
Repositories
Graduate Students
Undergraduate Students
Virtual Learning Environment
Technical Reports
Reprints
Peer-Reviewed Journal &
Conference Papers
Preprints &
Metadata
Certified Experimental
Results & Analyses
experimentation
Data, Metadata Provenance WorkflowsOntologies
Digital Libraries
The social process of science 2.0
Slide: DaveDe Roure
Organised Sharing?
• Application researchers choose / lead• Leads to diversity & little fluent use
• Sustaining & improving community effects• Group & subject community cultures
• Sharing advantageous• Costs of development, deployment, operations• Costs of improvement, scaling & green computing
• What and when to share• Low-level services & libraries shared across
disciplines• Curated digital resources across discipline groups• Tools may be discipline specific or widely used
Developing Trust
• Researchers share networks & computing• And trust them
• Will they trust a shared storage service?• How would you build such trust
• System, model and data complexity increase• How can we build trust in the results they give?
• Much data is personal, medical or financial• Blunders happen• How do we get the public to trust research use
Education and Training• Training
• Targeted• Immediate goals• Specific skills• Building a workforce
• Education• Pervasive• Long term and sustained• Generic conceptual models• Developing a culture
• Both are needed
Organisation
Skilled Workers
TrainingServices & Applications
Invests
PreparesDevelop
Strengthens
Society
Graduates
EducationInnovation
Invests
PreparesCreate
Enriches
Outline
• What is e-Science• What we gained from an e-Science initiative• Why we need a strategy• What should the strategy achieve
• What computing research do we need • Theory & pioneering steer each other• Realistic models• Sustainable farming for the e-Science
Ecosphere
• The global challenge
Matrix to analyse e-Science
Observation
Modelling
Analysis
Action
Collaboration
Ant
hrop
olog
yA
rcha
eolo
gyA
stro
nom
yB
iolo
gy
Bio
chem
istry
Che
mis
tryD
emog
raph
yE
cono
mic
sE
ngin
eerin
gG
eogr
aphy
ScholarshipDesignDiagnosisExploitation
Climate, Observation
• Satellite & ground based imaging, ocean buoys, atmospheric, ocean & coastal surveys, robotic mobile devices, distributed urban, rural & river sensors
• Past from trees, corals, ice, sediments, geology, …
• Long-term phenomena• Observations decades to
centuries• Data used for centuries
• Large & sustained data flows
• Economic long-term data storage / management
• Complexity, variety of data>40 ISO standards (OGC+)
• Stability & change, calibration & normalisation
• Sufficient coverage & resolution
• Speed for exceptional environmental events (E3)
• Dependable accuracy• Data discovery,
understanding metadata & ontologies
Source: Next Generation Science for Planet Earth: NERC strategy 2007-20012
Climate, Modelling
• Many interacting subsystems: solar, atmospheric physics & chemistry, oceans, air+water interface, cryosphere, air+ice interface, biosphere+ air+land interface, land surface, fires, volcanoes, human activity, …
• Interacting models• Multiple versions• Large (global) team efforts• Dependent on many
parameters (estimates)• No one understands fully
even one model
• Constructing trusted models - mathematics to hindcasting
• Composing models• Combining data &
observation• Computational power• Managing & using data
produced• Curation, cataloguing &
metadata• Managing & tracking
model revision• Rapid execution for E3• Making models usable
Climate, Analysis
• Identify & bring together multiple data sets
• Transform them to align & expose information
• Statistical comparisons
• Visualisations
• Finding, accessing & transforming data
• Moving data reduction steps to data
• Necessary data movement• Tools that cope with the
scale: statistics, data handling & visualisation
• Curating, cataloguing results
• Agreeing trusted analysis methods
• Automating analysis• Stability & change
Climate, Action
• Scholarship: • papers, contribution
to national & international reports.
• Advice & policy:• planning for &
response to E3• Planning agriculture,
epidemiology & coastal retreat
• Public outreach• Prediction services
• Traditional quality of results / arguments• With 10-year time to
truth• Cross-discipline for
• Socio-economic impact data
• Privacy & ethics• Recognition &
responsibility• Many model & data
sources & contributors• Rational debate about
validity and significance of results• Multi-disciplinary effects
Climate, Collaboration
• Already• International (UN,
INSPIRE, scholarly) collaboration
• Economic, social & political drivers
• Usual CSCW• Skype, Blogs, tele/video
conferencing, wiki, facebook, telepresence, OptIPort, …
• Shared data resources• Quality metadata
• Shared code development & testing
• Ontologies & standards• Multi-site computational
steering & spatio-temporal visualisation
• Business case to support the research
Climate, pervasive
• How do you build & sustain the business case• Stern report helps
• How do you provide security without inhibiting collaboration, open inspection, alternative interpretations
• Cost reductions• Pooling data collection• Pooling storage• Sharing responsibilities• Pooling model
development
• But diversity for safety
• Security• Prevent damage to
data• Prevent misuse of
resources
Please join me in the Matrix
• Populate columns you care about• Music, fine art, chemistry, linguistics, …
• Integrate & digest the list of requirements• Identify the current barriers• Think up strategies for overcoming them
• Start communities following those strategies
Outline
• What is e-Science• What we gained from an e-Science initiative• Why we need a strategy• What should the strategy achieve
• What computing research do we need • Theory & pioneering steer each other• Realistic models• Sustainable farming for the e-Science Ecosphere
•The global challenge
Data is the Key
• e-Science is different• We are responsible for our data• We curate it / select it / throw it away• Our program executions build & reshape it
• We need a safe model for fluent mass use• Transactional & idempotent• Safety - avoiding accidental data loss / corruption• Realistic - nothing is perfect: S/W, H/W, People,
Organisations
• We need eXtreme e-Science• Smart engineers working with extreme care• Ramp & flow between mass use and eXtreme e-
Science
• Foundation requires careful engineering
How do we manage it?How do we move it?How do we protect it?How do we trust it?
Careful Engineering Requires• Good quality models
• Specifying realistic target behaviours• Stochastic Pi Calculus?• Computer scientists, mathematicians & statisticians wanted
• Benchmarks & Measurement• Long-term, multi-purpose & realistic scale • Agreed measurement against the models• Shaped by & shaping standards• Foundations for trust
• Engineering effort• Collaborative & Competitive worldwide• Expect incremental progress not magic• We’ve come a long way
• We have much further to go
Questions
Photographer: Kathy Humphry