Research in the Cloud
-
Upload
david-wallom -
Category
Software
-
view
16 -
download
0
Transcript of Research in the Cloud
Our laboratory: the world’s largest climate modelling facility
11 years, 25 sub projects, ~100,000 volunteers (40,000 active), 127M model-years
Unlimited ensemble size: exploring uncertainties in climate predictions
Results of the BBC Climate Change Experiment:Rowlands et al, Nature Geosci., 2012
What is the role of increased greenhouse gas levels in UK autumn/winter flood events?
South Oxford on January 5th, 2003
Ph
oto
: Da
ve M
itche
ll
Dawlish, Devon, February 5th 2014 (MetOffice)Midlands, November 2012, (BBC website)
The weather@home regional modelling project(with Microsoft Research, the Risk Prediction Initiative and Environment Guardian)
• High impact weather events are typically rare and unpredictable.– Flooding– Heatwave– Drought
• They also involve small scales.
• Resolution provided by nested regional model.
• Modify boundary conditions to mimic counter-factual “world that might have been”.
UK Winter 2014 Floods
• 39726 simulations• 2014 flooding has been
described as a 1 in 100 year event in terms of rainfall volume
• Return time plot shows this has become a 1 in 80 year in terms of risk
• Risk of a very wet winter has increased by 25%
Californian Drought Experiment• Small example of future
requirements• Investigate effect of climate
change on the current drought in California– 5k current conditions
including ‘the blob’– 5k current conditions with
averaged SST– 12k natural runs
• Time relevant results
http://www.climateprediction.net/weatherathome/western-us-drought/
Consequences for climateprediction.net• Data set creation on a monthly basis• Data consumers outside the project with delivery expectation• This is a pilot for two regions (EU & PNW), there are 13 in the
envisaged future deployment of WWA globally!• Per region WU and Data Issues
– Monthly release of >20k 2 month models covering current and future 2 months -> >60k workunits permanently deployed per month
– Each model generates about 200 - 350MB…• WWA is one of 7 on-going projects• Need more capacity (volunteers) if we are to continue with other
parrallel research projects!
Our laboratory: the world’s largest climate modelling facility
“The Virtual
Volunteer”
Using the Cloud to improve our climate modelling facility
• Utilise free tier to provide virtual volunteer (taking low priority runs initially)
• Cut down OS to minimise footprint• Configuring to produce full scan runs of all resource
types for benchmark
Using the Cloud to improve our climate modelling facility
Virtual Volunteers only supporting specific projects
• Move all computational resource into cloud for specific projects (If we were to see widespread volunteer movement away from useful systems)
What Neuroscientists would like to see:1. VRE – single point of contact2. A consistent annotation method for data archiving3. Web & shared filesystem based repository for data.
Many file formats to be supported4. A searchable data base for images5. A searchable data base for video images6. A document share tool for ‘live’ manuscript editing7. File space for literature sharing (PDFs)8. Blog area
More Specifically......Help Managing Data
Initial Experimental
Idea
Experimental Design
Data CollectionAnalysis
Publication
Challenges
• Interdisciplinary teams – different expectations, cultures, requirements
• Agreed standards– Different data formats
Microscopes (Multi-photon or Confocal) Live cell fluorescent imaging Electrophysiology recordings
– Meta data standards• Complexity of tools used in community• Ability to share images, data, analysis• Network connectivity not the best
Release 2.0
• Drupal – Frontend content management. Based on Drupal Commons,
• Alfresco – Backend data management. Modified Alfresco module,
• Apache Solr – Search engine,
• Apache Tika – Metadata extraction toolkit for documents,
• Google services – Docs and Calendar,
• Cloud-based computation using GPU’s,
• NCBO ontology-based tagging,
• LDAP – Single sign-on,
• Digital Pens – Used for recording experiments,
• XML-RPC desktop client – uploading and generating content.
Neurohub in the cloud
• Deployment of automated analysis services
UPLOAD
Image Processing Engine
IPE
RP
C C
all R
ES
ULT
S
ANALYSIS
Image Processing EngineImage Processing Engine
Image Processing Engine
DO
WN
LOA
D
Neurohub in the cloud
• Neurohub System Deployment– Current deployments
• Departmental physical server• Pro: data locality• Con: limited scalability and resilience, Collaborator access difficult
• Private Cloud• Pro: Increased resilience and scalability• Con: System visibility outside researcher control, Collaborator access difficult, difficult to grow adoption
Neurohub in the cloud
• Neurohub System Deployment– Current deployments
• Departmental physical server• Pro: data locality• Con: limited scalability and resilience, Collaborator access difficult
• Private Cloud• Pro: Increased resilience and scalability• Con: System visibility outside researcher control, Collaborator access difficult, difficult to grow adoption
– Public Cloud DeploymentPro: Published item deployable on demand, anywhere independent of researcher locationCon: May have legal/ethical restrictions on data hosting, (support load when your AMI becomes the next LIMS of choice in wet lab science?!)
25
Conclusions
• We have used AWS cloud within these and other projects successfully
• We are intending to grow with further projects and utilisation of AWS covering Volunteer, urgent and service computing models
– Bash the Bug– Ocean Sampling Day
• AWS Cloud is infrastructure and requires knowledge and support to setup and configure
• Cloud is not a magic bullet that will immediately solve all issues, may actually create new ones
• Ensure you are using the right tool for the right job