Emmett, B., Gurney, R., McDonald, A., Blair, G., Buytaert...

79
Emmett, B., Gurney, R., McDonald, A., Blair, G., Buytaert, W., Freer, J. E., ... P, Z. (2014). Environmental Virtual Observatory: Final Report. (NE/I002200/1 ed.) Natural Environment Research Council. Publisher's PDF, also known as Version of record License (if available): CC BY Link to publication record in Explore Bristol Research PDF-document This is the final published version of the article (version of record). It first appeared online via NERC at http://www.evo-uk.org/get-involved/. Please refer to any applicable terms of use of the publisher. University of Bristol - Explore Bristol Research General rights This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/pure/about/ebr-terms

Transcript of Emmett, B., Gurney, R., McDonald, A., Blair, G., Buytaert...

Emmett, B., Gurney, R., McDonald, A., Blair, G., Buytaert, W., Freer, J. E.,... P, Z. (2014). Environmental Virtual Observatory: Final Report.(NE/I002200/1 ed.) Natural Environment Research Council.

Publisher's PDF, also known as Version of record

License (if available):CC BY

Link to publication record in Explore Bristol ResearchPDF-document

This is the final published version of the article (version of record). It first appeared online via NERC athttp://www.evo-uk.org/get-involved/. Please refer to any applicable terms of use of the publisher.

University of Bristol - Explore Bristol ResearchGeneral rights

This document is made available in accordance with publisher policies. Please cite only the publishedversion using the reference above. Full terms of use are available:http://www.bristol.ac.uk/pure/about/ebr-terms

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

1

www.evo-uk.org

NERC EnvironmentalVirtual Observatory Pilot

Final report

May 2014

The pilot Environmental Virtual Observatory, EVOp, a proof of concept project to develop new cloud-based applications for accessing, interrogating, modelling andvisualising environmental data, by developing a series of exemplars, EVOp has demonstrated how cloud technologies can make environmental monitoring anddecision making more efficient, effective and transparent to the whole community.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

2

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

3

NERC

Environmental Virtual ObservatoryPilot

Final Report

May 2014

Principal Investigators

Emmett, B.A. 3, Gurney, R.J. 14, McDonald, A.T., 12

Work Package Leaders

Blair, G. 6, Buytaert, W. 5, Freer, J. 11, Haygarth, P. 6, Johnes, P.J. 11, Rees, G.H. 3, Tetzlaff, D. 11,

Team members and contributors

Afgan, E. 9, Ball, L. A. 3, Beven, K. 6, Bick, M. 3, Bloomfield, J. B. 2, Brewer, P. 1, Delve, J. 3, El-khatib, Y. 6,

Field, D. 3, Gemmell, A.L. 14, Greene, S. 14, Huntingford, C. 3, Mackay, E. 6, Macklin, M.V. 1, Macleod, K. 8,

Marshall, K. 8, Odoni, N. 11, Percy, B. J. 14, Quinn, P.F. 7, Reaney, S. 4, Stutter, M. 8, Surajbali, B. 6,

Thomas, N.R. 1, Vitolo,C. 5, Williams, B.L. 3, Wilkinson, M. 7 & 8, Zelazowski, P. 13

© Natural Environment Research Council 2014

Aberystwyth University 1

British Geological Survey 2

Centre for Ecology and Hydrology 3

Durham University 4

Imperial College London 5

Lancaster University 6

Newcastle University 7

The James Hutton Institute 8

Ruder Boskovic Institute 9

University of Aberdeen 10

University of Bristol 11

University of Leeds 12

University of Oxford 13

University of Reading 14

Affiliations

When citing this document, please use -

Emmett, B.A., Gurney, R.J., McDonald, A.T., Blair, G., Buytaert, W., Freer, J., Haygarth, P., Johnes, P.J.,Rees, G.H., Tetzlaff, D., Afgan, E., Ball, L. A., Beven, K., Bick, M., Bloomfield, J. B., Brewer, P., Delve, J.,El-khatib, Y., Field, D., Gemmell, A.L., Greene, S., Huntingford, C., Mackay, E., Macklin, M.V., Macleod,K., Marshall, K., Odoni, N., Percy, B. J., Quinn, P.F., Reaney, S., Stutter, M., Surajbali, B., Thomas, N.R.,Vitolo,C., Williams, B.L., Wilkinson, M., Zelazowski, P. (2014).

Natural Environment Research Council (UK). NE/I002200/1.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

4

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

5

The Annex to this document can be found online at http://www.evo-uk.org

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

6

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

7

The final deliverables were -● A tested web service using local, national and

global exemplars

● Future funding

● An informed and engaged community

This was achieved through establishing a projectteam which had a mix of computer specialists,environmental scientists (from across 13organisations) and an end-user stakeholdergroup covering a range of organisations. Thework was organised into a series of packageswhich cooperated closely to deliver the overallvision. The packages covered; leadership andmanagement; cyber infrastructure; modelling; andtested exemplars.

The exemplars were chosen to engage with end-users and explore barriers and opportunities atthree spatial scales; local, national and global,focused on such topics as flooding, diffusepollution and uncertainty in climate changeprojections. Thus the project combined a 'narrowand deep' testing using these exemplars withmore 'broad and shallow' explorations of issuessuch as vocabularies and semantics, datasecurity and legal issues. Many briefings andpresentations of the project were given viameetings and workshops throughout the lifetimeof the project to potential end-users ranging fromDefra and public agencies, to the water industry,academic audiences, and industry bodies such asthe Information Assurance Advisory Council. Amajor conference was organised in Oct 2012 toshowcase opportunities for national andinternational initiatives working in this area at theRoyal Geographical Society. The project’s

Stakeholder Group provided guidance andsupport throughout the project, ensuring that thiswas not another IT 'white elephant' but of realvalue to organisations that are challenged dailywith tackling complex environmental problems.

All the deliverables have been achieved with acommunity of postdocs, academics and end-users who are all now familiar and excited aboutthe opportunities of the approach. New funding isin place from a variety of sources including theGovernment's Big Data Initiative, the internationalBelmont Forum and the NERC-TSB jointEnvironmental Data call, all of which haveacknowledged the role of the EVOp in securingthe funding. A final report including experiencesof barriers and opportunities encountered duringthe lifetime of the project is available on the EVOpwebsite (www.evo-uk.org) providing a legacy tobe exploited by the whole community as theyexplore the potential of application of these newcloud technologies for environmental science.

The opportunities are many, and other initiativesthat are already in progress include: real-timeintegrated monitoring of the environment toproduce real-time alerts; modellers 'cloudifying'their models and creating user-friendly webinterfaces for increased accessibility and testing;work to establish international standards andvocabularies; software developments to enhanceinter-operability, and much, much more.

The EVO Pilot (EVOp) was an ambitious two year project to test the value of new cloud technologies forconnecting and integrating fragmented data, models, and tools to deliver new holistic approaches toenvironmental challenges. The need for such an approach has become increasingly clear as we seek toimprove food, water and energy security. These all require a new way of working which spansdisciplines and organisations, and that breaks down science-culture boundaries. If successful, theproject would demonstrate the vision and opportunities for further funding, attract academic, policy,industry, and global partners, and create a step-change in the way that NERC science is delivered andexploited.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

8

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

9

Two main challenges currently limit this ongoingmodel development; the spatial/temporal limitations ofour observational capacity and the lack of integrationof data, models and visualisation tools across the air-land-water domains. However, recent advances intechnical methods allow for detection of real-timechanges in biogeochemical, hydrological andecological functioning (e.g. molecular markers,isotopic and spectroscopic approaches, land andspace-based observational techniques). Given aplatform where these observations could be explored,accessed and integrated with models across domains,a fast and more informed analysis of system changewould emerge, leading to tools which identify optionsfor immediate, targeted and thus more cost-effectivemanagement interventions.

The EVOp project therefore needed to develop aplatform for both data and models to answer these twoquestions. The approach taken was to represent dataand models as services in a secure cyber-infrastructure. In the long term there would be a needto ensure it possessed a robust architecture,standards, and global access linking private andpublic clouds, GRID and HPC environments whereappropriate. Such a platform would directly addressthe two challenges, leading to improved exploitation ofNERC data and models, and a more integratedresponse to urgent environmental challenges. Thealignment of science supply and demand in thecontext of continuing scientific uncertainty will dependon seeking out new relationships, overcominglanguage and cultural barriers to enable collaboration,and merging models and data to evaluate scenarios.

The £2 million pilot Environment Virtual Observatorypilot (EVOp) project was commissioned by NERC inJanuary 2011 to test these two hypotheses and identifyopportunities and challenges that might lead to apotentially far greater investment by NERC inpartnership with stakeholders into the future.

1.3 Building the EVOp teamNERC recognized the project required a community-led cyber-infrastructure development and new

1.1 The hypothesisThe hypothesis to be explored was that novel cloudcomputing technologies could be exploited to increaseaccessibility in a data-intensive world to "filter" andintegrate this information to manageable levels as wellas provide visualization and presentation services tomake it easier to gain creative insights and buildcollaborations. This has been called the 4th paradigm(Gray, 2007; Hey et al, 2009). The ultimate aim was tomake NERC science more efficient, effective andtransparent. The transparency issue has beenhighlighted in a recent Royal Society report asrequiring urgent attention to increase publicconfidence in the process of translation of scientificevidence through to policy making (The Royal Society,2012).

1.2 Beyond dataA second hypothesis was to test if there is value going'beyond data' to include models which are effectively asynthesis of current understanding and one of themain tools NERC scientists use to integrate complexdata, upscale and make projections under futurescenarios. Within the terrestrial and freshwatercommunities many models address environmentalquestions concerning soil and water quality, flood anddrought risk, and ecosystem structure and function(e.g. Defra recently identified ca. 60 models currentlyused in diffuse pollution modelling alone). Thesemodels simulate complex physical, chemical andbiological process interactions. There is a challengehowever in gaining access to these models, linkingthem together to deliver more holistic outputs andobjectively testing to the level needed by end-userswho need to make policy or management decisionsbased on their outputs. This leads to the secondquestion to be tested:

"How can we create a culture of moreopen and rigorous testing and evaluationof the current models necessary toimprove process understanding, processrepresentation in models and thus modelforecast accuracy?"

There is an emerging and urgent need for new approaches to environmental challenges in the broad context ofsustainability. Scientists, businesses and policymakers are asking questions that are far more interdisciplinarythan in the past. Unfortunately, an unexpected outcome of the explosion of data and associated information is thegrowing disconnect between and within the supply of scientific knowledge, and the demand for that knowledgefrom the private and government sectors. NERC commissioned the Environmental Virtual Observatory pilot projectto explore the question:

"Is there a way of providing the 'wiring' to help people access the resources they need, be they ascientist, policy maker, industrial body, regulator or member of the public?"

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

10

approaches to scientific workflows that describe,compose, model and execute ensembles of data,models, tools and visualisations on distributedresources with global access.The EVOp team therefore required a mix of IT andcomputer specialists, a test research community ofscientists and a potential future end-user communitydrawn from government, regulators and industry.A sandpit was organized which brought together acommunity from which a single large projectconsortium representing 12 institutions emergedincorporating a mix of IT specialists and soil-waterscientists (see online Annex), later this would besupplemented by additional involvement fromindividuals from other organisations Soil-water sciencewas proposed by NERC as an ideal community to testthis approach as it is experienced in cross-disciplinaryworking , has a good range of well-tested models andis well-linked to a range of end-users tacklingsignificant environmental challenges such as flooding,diffuse pollution and climate change.The consortium elected a leadership team to representthe different communities needed to deliver the pilotproject spanning IT specialism (Robert Gurney), basicresearch (Bridget Emmett) and industry needs (AdrianMcDonald). A Project Advisory Group was establishedcovering a wide range of potential end-users includingrepresentatives from government, industry, regulators,policy makers and funders (see online Annex).

1.4 Project StructureA major objective of the consortium from the beginningwas to ensure that the work programme was scienceand end-user led underpinned by a robust explorationof the available IT technologies.

Five principal areas of activity were identified, somerequiring narrow and deep testing of the approachacross different operational scales (i.e. data and modelapplication from local to global scale) whilst otherareas needed a broad and shallow exploration acrossa range of challenges (i.e. IPR and data security). Thefive areas were:

i. legal and security issues associated with securityof data handling and consumption ensuring theEVOp can scale rapidly without compromising theintegrity of data;

ii. data licensing, platform hosting and licensing, anduse of the platform taking into account the range ofdata initiatives including data.gov.uk;

iii. options for cloud infrastructure with security-by-design inbuilt;

iv. development of standards and inter-operability;

v. case studies focused on soil-water processunderstanding and management at three scales(local, national and global).

Six work packages were developed involvingoverlapping teams to cover these areas of activity:

i. WP1 Leadership and management (covering legal,security, outreach issues, future funding andresponsibility for commissioning and delivery ofthe global exemplar in Year 2);

ii. WP2 Cloud infrastructure (data licensing, platformhosting, cloud infrastructure, inter-operability,standards);

iii. WP3 Modelling in the cloud;

iv. WP4 Local Exemplar;

v. WP5 National Exemplar (hydrological andbiogeochemical exemplars);

vi. WP6 International engagement.

1.5 DeliverablesThe high level deliverables reinforced the pilot statusand the expectations placed on the project. They wereselected to facilitate advances in understanding aboutthe issues and feasibility of delivering such a service tothe community. The development process wastherefore experimental and although some useful andinteresting science emerged and technology solutionsgenerated, the resulting service was not envisaged tobecome operational as an outcome of the pilot project.In recognition of this status, emphasis was placed onidentifying future ways to progress the EVO concept.Furthermore, throughout the project, the teamendeavoured to raise awareness about the EVObeyond those communities already familiar with thepotential capability provided by IT. The deliverableswere:

i. A tested web service providing web basedenvironmental models and data across a limitednumber of exemplars at a variety of scales. Thenominal scales used are referred to as local(catchments in the range 10 to 200 km2), nationalwith some sub-national division and global.

ii. An identification and analysis of barriers andopportunities revealed in the development ofdeliverable (i). The barriers considered range fromtechnical feasibility, ownership, governance andfinancial through to the to the legal and liabilityissues. Opportunities range from improved science(both in new questions and new solutions)because of the better data and modelling synthesisand the resulting communication throughimproved public awareness to financialopportunities and market leadership.

iii. A skilled and engaged community is, in part, adirect outcome of deliverable (ii). In a pilot project,engagement is more easily achieved than the'skilling' up of many groups of stakeholdersbecause the product is not created until aconsiderable way through a project. The skillsdevelopment is therefore not seen as a technicaltraining skill but as a higher-level conceptual andvision framework - the skill to recognise theopportunity and potential and to contribute to the

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

11

moulding of the future information framework.Within the team, translation across the science/ITinterface inevitably resulted in improved skill sets.

iv. Funding and partners in place. The finaldeliverable is the positioning of the EVOp forfurther development. To strengthen partnershipsthrough membership of important crossdisciplinary alliances such as LWEC, contributionsand organisation of national and internationalconferences and the partnership with industry andacross research agendas.

1.6 End-user engagement and the use ofstoryboards

A major innovation of the project as a whole was to usestoryboards to ensure the exemplars were grounded in

real questions/challenges by end-users. Potential end-users were considered to cross a wide range ofcommunities from governments to public, and industryto regulators. Initial scoping of likely questions from arange of stakeholders are indicated in Figure 1.1, a fulllist is provided in the online Annex.; these weredeveloped with, and approved by, our Project AdvisoryGroup. Single specific questions and storyboards werethen developed for the local, national and globalexemplars. These are summarised in Figure 1.2, withthe fully annotated storyboards provided in the onlineAnnex. A requirement of these storyboards andexemplars was to test particular aspects of the dataIPR, model operability in the cloud and criticallydifferent elements of the cyber infrastructure asindicated in Figure 1.3 .

What models work?

What policy works?

How do I do it and save on cost?

How can we reduce monitoring andcollect the same information?

Credible apportioning of pollutantload between industry, water,

agriculture and others.

How do we define which model isbetter to select from our model

ensemble?

What kind of language / tools to useto make models talk to each other?

Will London run out of water?

What is the state of the local river?

What are the options to protect usfrom future flooding?

How can we convert into saleable productfor UK plc?

What is the whole impact of future flooding?

How can water security be assured?

How can the industry carbon impact bereduced?

SearchVisualisation

Data

Coordination

Interpretation

Reflection and evaluation

Computation

Figure 1.1 Exemplar questions provided by a range of EVO stakeholders.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

12

Figure 1.2 Mapping of exemplars to users and issues.

Figure 1.3 Date security.

Local National hydrology NationalBiogeochemistry Global

User

Farmers and localstakeholders; insurers

Power companiesprimarily, but other alertedstakeholders associatedwith abstractions

Government departmentsand agencies

NERC scientists; DECC

Issue

Flood risk Drought Water Framework Directivecompliance and OSPARreporting

Uncertainty in GCMs andsoil C

End product

Local catchment flow andwater data and user-friendly interface formodelling tools to forecastflood risk

Selection tool for modellingwater resource; Alert toolfor industry as to whenweather conditions maythreaten energy productiondue to drought

Modelling tool to quantifynutrient fluxes for a rangeof catchments and marinewater bodies at differentscales

GCM analogue tool linkedto impact assessmentmodel to quantify globalregions where uncertaintyin change of soil C aregreatest

Unique data needs OS /NERC / EA / Met

Live and historical flowdata; Local web cams

Live EA data; Access toforecast products

Global databases anddriving variables

Science demonstrated

Linking sensors, data andvisual data together; Landmanagement decisionsupport tool; Preparednessassessment tool

Multiple model applicationfor hydrology; Uncertainty;Model selection tool; Anensemble of coupledpredictive capability

Multiple model applicationfor biogeochemistry;Uncertainty; Regionalmodelling framework;Model selection tool; Datarich to data poor catchment

Sensitivity to climatechange impact models touncertainy betweendifferent GCMs and speedin assessing differentemission ‘storylines’

EVO ‘added value’

Access to data andmodels; Integration offunctionality

National security; Alerts forend-users; Old and newdata view and modelforecasts for multiple sites;An exemplar for a real timesecurity management issuefor important nationalinfrastructure for multiplepressures

Online scenario testing;Ensemble macronutrientmodelling; Scaling andmultiple metrics

Access to climate changeimpacts assessment toolwith increased capacityand faster working

Future potential

Adaptive modelling’ forlocal conditions; Ask anexpert blog; Preparednesstool

Approach showsdata/model cloudresources for identifyingforecast alerts for anyenvironmental threshold

Reverse engineering’ toolsto find solutions; Dynamiccoupling with hydrologicalmodelling

Linking to other land-atmosphere models;Addition of valuation tool

Framework properties Local hydrology(Eden) National hydrology National

biogeochmeistry Global

Essential infrastructureproperties

Everything as a service O O O O

Sharing of everything O O O O

Openness andinteroperability O O O O

Transparency O O O O

Ease of use by differentcommunities O O O O

The cloud as a utility

Alternative business models o O O

Elasticity o O O

The managed cloud o o o

Enhancedmanagement

Tailored managementmodels O

Multi-cloud management • • •

Web 2.0 techniques

Supporting mashups • • •

Supporting workflows • • •

Service discovery o • •

Enhanced discovery o

Systems of systemsCombining with ubiquitous

computing o •

Combining with mobility • • •

O = major effort, o = some effort, • = left for full EVOO = completed, O = required

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

13

1.7 Data securityConcerns over data security were identified as a criticalissue for many potential providers. Within the EVOpproject a scoping exercise was undertaken through aworkshop approach led by the data security industry.A report was delivered to the EVOp leadership team tohelp inform NERC of future needs, should a fullyoperational EVO be commissioned. The EVOp CyberSecurity Advisory Board proposed six key areas thatwould require further consideration should the EVOconcept be commercialised. Details are contained inthe report provided in the online Annex, and can besummarized as:

i. Striking the right balance (confidentiality, integrityand availability): A clear understanding of the coreprinciples and their corresponding priorities isimportant in the 'Security by Design' approach.Additional core principles that should beconsidered include non-repudiation, authenticationand privacy. It is also clear that the importance ofeach core principal will vary based on securitystandards, classifications and the target audiencesfor each model, tool and data set.

ii. Cloud Security: Considerations associated withcloud security fall into two areas. These are relatedto cloud providers and customers (i.e. the EVO). Inthe continuously evolving domain of cloudsecurity, the Cyber Security Advisory Board willprovide important guidance to ensure the rightcloud providers are chosen and all cloud securityissues are considered.

iii. Data Protection Methods (encryption and other):Perimeter or layer protection methods (i.e. firewallsand IDS/IPS) are common focal points in theprotection of data. Whilst these methods servetheir place, an additional consideration for the EVOis security of the systems that will store andprocess the most sensitive data. Implementation ofappropriate encryption methods may be a suitablemethod for ensuring that data is protected withinthe data centre(s), not just at the perimeter.Understanding the ideal encryption methods andhow and when to implement them is vital inmaintaining the high performance of EVO virtualmodelling and the transmission of live data (suchas river levels, real-time temperatures, etc.).

iv. Application Security: Whenever an EVO applicationinteracts with suppliers and customers (end users),both the data and the application itself must beprotected. As the EVO is expected to allow end-users to create data workflows to modify how datais analysed, it will be important to keep an audittrail of user activities and alert on behaviour whichfalls outside the norm. Just as important is theneed to have an audit trail for all activitiesundertaken on any database which stores criticalEVO information.

v. EVOp Portal Security: The EVOp portal willaggregate content from all of the EVOp systemsproviding a means for users to explore a variety ofdata sources and execute simulations and models

in the cloud. It is vital that portal security ensuresthat only an authorised user can generate requeststo the applications server(s).

vi. EVOp Security Resource Considerations: A coreteam will need to be assigned with responsibilityfor developing a Security Policy Framework anddriving cyber security as a business as usualfunction.

While the six key areas cover the key cyber securityconsiderations such as legal implications and theinteraction between these topics; alignment to currentsecurity standards and their robustness within a rapidlyevolving sector, the legal implications on security, anddomain specific; security considerations are also inneed of further investigation.

1.8 Legal IssuesA preliminary report on legal issues was prepared byThe London Institute of Space Policy and Law (ISPL)and Edwards Wildman as an output of an EVOp-convened workshop. The white paper is split into threeparts:

i. Data Collection and Licensing Issues: There will bea number of issues to consider in respect of thecollection of the underlying data to be populated inthe EVO platform. These include issues arising outof the use of third party intellectual property rightsor underlying data that is otherwise in the publicdomain.

ii. The EVOp Platform: There will be a number ofissues associated with the hosting of the EVOplatform itself and of the development of new toolsand applications. This includes issues arising outof the use of cloud computing technology.

iii. The use of the EVOp Platform and LicensingIssues: There will be a number of issuesassociated with the use of the EVOp platform,including the onward licensing of the output dataand the EVOp platform's potential liability for anyreliance placed on such output data.

The legal implications for development of the EVOwere considered in parallel to the cyber securityconsiderations. The interdependency of these twoareas was acknowledged and further consideration onhow to remain joined-up would be required if EVOwere to become operational.

1.9 Outreach and community buildingOne of the deliverables of the EVOp project was tohelp develop an engaged and educated community.Members of the team were enthusiastic in thisendeavour and participated in a wide range of activitiesincluding briefings, workshops, summer schools andconferences.

1.9.1 Briefings and conferencesMembers of the team were active in promoting theEVOp vision and key outputs within different forums. Asummary of interaction with key stakeholders and

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

14

events where EVOp featured are provided in the onlineAnnex.

1.9.2 TrainingTraining was achieved in several ways throughout thecourse of the project:

i. Seminar on cloud computing: Representativesfrom WP2 organised a seminar for EVOp teammembers and staff at Lancaster University duringthe early stages of the project to provide afoundation level of knowledge and consequently, abaseline for communication.

ii. Involvement of early career scientists: FourteenPost Doctoral Research Assistants and ResearchAssistants were employed across science andtechnology work packages.

iii. Summer School: The leadership team initiated asummer school at Istituto Veneto in Venicesponsored by NERC. The focus of the 2011summer school was closely aligned to the EVOpand the six EVOp PDRAs that attended received anoverview of cloud computing and the type ofmodels that have been used already inenvironmental sciences in cloud services. Studentswith no previous advanced computing experiencewere able to design a working web service in threedays, or carry out new work in data assimilation.

These activities demonstrate the EVOp, or an alliedconcept, could be rolled out to the science community,and also the operational community, without a veryhigh training barrier. In particular, the summer schoolillustrated that it would be possible to develop atraining course, possibly on-line or through webinars,to deliver the necessary training quickly and withoutconsiderable expense for scientists to appreciate thepotential of the approach. It must be noted howeverthat professional trained computer scientists areessential to design and implement the actualcyberinfrastructure. Indeed, one of the main issuesduring the EVOp project was the over-reliance byscientists on too few computer specialists. Any futurerelated activities would need to correct this and provideand a more balanced distribution of skills within theteam.

1.9.3 Project Advisory GroupThe project benefited from the input from an AdvisoryGroup which met on four occasions (approximatelyevery six months). Membership of the Project AdvisoryGroup was diverse and incorporated representativesfrom the water and IT industry, regulatory bodies,government, and academia (see online Annex); theywere extremely supportive of the potential and need foran EVOp approach to gain greater efficiency,effectiveness and transparency of NERC science.

1.9.4 Alignment with other UK national initiativesAwareness of aligned initiatives was essential inidentifying how a future fully operational EVO couldco-deliver more efficient, accessible and transparentdata and models. We identified the following aligned

initiatives and held various briefing/meetings with keyparticipants including NERC Theme Leaders andPrincipal Investigators:

● Data: NERC has invested National Capabilityfunding in an array of Data Centres which ensuresecure long-term storage and access to dataresources. Any future EVO platform would need towork together with the Data Centres and develop aclear interface to these as well as other initiativessuch as data.gov.uk, the UK locator programmeand those ongoing in the Met Office and OrdnanceSurvey. This would increase uptake of currentinvestment and enhance ongoing work oninternational standards and data access. Testing oflinks to some Data Centres were explored withinthe Pilot and links and discussions with other dataproviders will inevitably be ongoing in future EVO-related initiatives.

● Science: A range of research investments are inplace which cross the air, land, water, geologicaland climate communities providing resources andresearchers actively pursuing the integration ofNERC science. Of particular relevance to the EVOactivities are the Programmes which seek to bringtogether the biogeochemical-hydrological andecological communities (e.g. Macronutrients,BESS, Changing Water Cycles and Network ofSensors). These programmes all seek to improvethe science which underpins the sustainableexploitation of our natural capital thereforeproviding a wealth of resources to underpin futureEVO activities.

● Modelling: NERC's Integrated EnvironmentalModelling Initiative which emerged from NERC'sproposed Modelling Strategy seeks to identifybenefits of improved model integration and dataexchange tools. Running concurrently to EVOp, aweb portal, or 'Experimental Zone' was initiatedunder the NERC PURE programme to facilitate thesharing and fusion of models and data from avariety of different sources information topractitioners in environmental risk management.

● Tools: The Environmental Science to ServicesPartnership (ESSP) is a joint initiative by NERC, theOrdnance Survey and the Met Office to developnew products from current data and knowledge toimprove uptake and impact of environmentalscience. Thus there is input by Defra, the EA andNERC capabilities through CEH and BGS.

1.9.5 Alignment with other international initiativesIn addition to the array of data, standards and climateinitiatives ongoing in the European and global arenathe following have made specific links to the EVOpteam and indicated their interest in joint futurecollaborations:

● NSF EarthCube: Developing a framework to createand manage knowledge in geosciences tounderstand and predict the Earth system. Thereare opportunities to move forward in the short-termwith joint legal workshops where EarthCube can

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

15

learn from our current knowledge and in turn, NSFcan fund future joint workshops together with jointSAVI grants.

● NSF Neon: A continental-scale research platformfor understanding and forecasting the impacts ofclimate change, land use change and invasivespecies, on ecological processes and oninteractions of the biosphere with the geosphere,hydrosphere and atmosphere. NEON will collectdata from 106 sites, over the next 30 years, froman investment of $434M.

● Knowledge Systems for Sustainable LandscapeManagement Initiative: An initiative to developmore sustainable integrated land managementsolutions building upon existing data collectionsand analysis tools, with investments for thedevelopment and deployment of new tools.Partners include USAID, Oak Ridge National Lab,NASA, CGIAR, World Bank, NASA, CSIRO.

● Belmont Forum: High level group of the world'smajor and emerging funders of globalenvironmental change research and internationalscience councils. NERC International teamfacilitated participation in activities to define futureresearch agenda.

1.9.6 EVOp International ConferenceAAn international conference to explore how newinformation technologies, and in particular cloudcomputing, could be used in the environmental sectorwas organised by the EVOp with support from fivelearned societies (British Ecological Society; BritishHydrological Society; Royal Geographical Society;Royal Meteorological Society; and The British Societyof Soil Science). The conference, "HarnessingEmerging IT Technology for Environmental Science - A2020 Vision" was held at the Royal GeographicalSociety in London on 16 May 2012 . The event broughttogether a diverse range of topics in the form ofpresentations, interactive demonstrations and posters(see online Annex).

From an early stage in the project, the need for anEVO-led event to provide a dedicated platform wherethe ambitions and outputs of the pilot could beshowcased was apparent. Whilst the team would beactive in communicating different aspects of the projectwithin the various academic forums relevant to theirexpertise, there would be no single academicconference that could adequately represent all areas ofinterest. It was envisaged that an EVO-led event wouldbe an opportunity where the future horizon forenvironment science, as influenced by technology,could be explored in addition to raising awarenessabout the EVOp project.

Although unique at that time in its ambition to combinedata, models and tools into a single platform, theEVOp has drawn upon knowledge and links with otherinternational science and technology programmes andinitiatives active in generating new standards and newways of working. Through the conference, the pilotEVO played a role in promoting the need for groups

from disparate disciplines to come together andaddress the following questions:

● How can advances in IT help to solve or amelioratemajor environmental issues?

● What are the practical barriers hindering and theopportunities to encourage integration between IT,research and user communities?

● What approaches, individuals and institutionsappear to constitute the cutting edge of this IT -environment integration?

Over 160 people attended the event which included 36posters/demonstrations on relevant topics and sixteenpresentations. Each session offered a differentperspective, aimed to inform and provoke discussionon the potential links between the environmentalscience and IT communities both in the UK andelsewhere. Lord Selborne provided the keynoteaddress in which he outlined government policy andthe programme landscape. Other speakers includedrepresentatives from the EU and UK government, ITindustry, academic and research sectors:

ARUP Australia, British Ecological Society, CEH, Defra,Dtex, European Commission, Geoscience, Google,Manchester University, Met Office, Microsoft, Universityof Cambridge, University of Colorado, and Willis GroupHoldings.

The full conference programme is available within theonline Annex. Recordings have been prepared by theEnvironmental Sustainability Knowledge TransferNetwork and can be viewed on the ESKTN TVYouTube channel. Website, EVO portals andpublications

1.9.7 EVOp websiteA website - www.evo-uk.org - was established early onto advertise the activities and ambitions of the projectand to bring in national and international partners.

Initial work to scope out a potential landing page andthe navigation routes end-users may select wasundertaken by the team with design direction provideby website consultants INTRO. A key issue identifiedwas the different ways in which end-users liked toaccess resources. An attempt to accommodate thesedifferent preferences is reflected in a landing pagewhich has options to explore the EVO resources bylocation, topic and data. For the more experienceduser, there is also provision of direct access to models,tools, apps and the workflow library (Figure 1.4).

The current design provides an adequate mechanismfor demonstrating the pilot web service to EVO end-users; it was however noted that further work would beneeded to develop different portals for differentcommunities should the EVO ever be operationalised.A recommendation from the EVO pilot is that any futurerelated initiative should focus more resources on thisaspect and include a separate activity for planning andend-user testing of the portal interface.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

16

Figure 1.4 Design of the EVO portal.

1.10 The EVOp legacyAs one of the key aims of the project was to provideinformation to the wider community on opportunitiesand barriers in the EVOp approach, a series of legacypublications are provided within the online Annex.Videos of the local, national and global teams talkingthrough the exemplars are also available on the EVOpwebsite along with two-page flyers covering each ofthe work packages. An international publicationsummarising the overall project is also available(International Innovation paper - online link). Thispublication is an open access publication designed tocommunicate worldwide environmental research anddevelopment. The publication is distributed to over30'000 stakeholder readers at all levels in thegovernment, policy, research and related healthstakeholder sectors and communicates the impact andrelevance of both fundamental and applied research inthe field.

Finally, EVOp has played a role in raising awarenessand contributed to establishing an engaged andeducated community that are becoming increasinglymore open and ready to exploit the advantages newtechnology holds and benefit from future initiatives.Thus, perhaps one of the most important legacies ofthe project are the funding opportunities which haveemerged from NERC and others.

1.11 Funding landscape and partners

1.11.1 NERCBy early 2012, it became evident that to make the EVOoperational would require mixed funding to underpinthe complex range of activities spanning: capital,research and private sector investment along withinternational and stakeholder collaboration. NERChave opted to continue a range of activities under theEnvironment Information Initiative.

● Environmental Big Data Investments: JASMIN,CEMS, Environmental Research Workbench,Environmental Big Data Capital Call.

● Innovation Activities: NERC Environmental Data:short projects to consider applications, productsand services as a precursor to the joint NERC/TSBcall, Solving Business Problems withEnvironmental Data".

● Belmont Forum: e-Infrastructure Knowledge Hubprogramme).

Several of these initiatives have adopted approachesthat utilise and build upon knowledge acquired duringthe EVOp explicitly and thus the legacy and value ofNERC's early investment was well founded.

1.11.2 Agencies, government and industry A forum was convened by NERC to discuss the needsand opportunities for demonstrating tangible economic

Navigable by locationor topic for the non-specialist or specialist

Quick signposting andnavigation to manydata portals

Workflow areas andresources for peoplewho know what theywant

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

17

and societal benefits from NERC's researchinvestments as a whole. This included representativesfrom Defra, the EA, the National River Trust, BritishWater, Thames Water, the Welsh Government, theScottish Government, Natural England, UKWIR, theWater Consultancy Community, SEPA, theEnvironmental Sustainability KTN, the NERC Water KEProgramme, NERC Science Management and theEVOp project. A range of activities were proposed tobetter integrate the UK modelling resource and providea forum to promote exchange between the developersand users of environmental models. Key resources torealise this ambition were to operationalise theoutcomes of research programmes such asMacronutrient Cycles, Biodiversity & EcosystemService Sustainability, Changing Water Cycles andEnvironmental Virtual Observatory. This would servenot only to inform public policy and regulation butenhance business performance and practice, withconsequent benefits for both UK plc and theenvironment.

Specific funding opportunities which have recentlyemerged that seek a cloud based, or EVOp-typeapproach, include:

● A call for by the EA for requiring cloud-based data,exploration and visualisation approaches fordelivering the Water Framework Directive.

● Commissioning of an integrated data andmodeling platform for diffuse pollutionmanagement and ecosystem services by Defra,the EA, NERC and Scottish Government from aconsortium led by CEH, one of the EVOpleadership team.

1.12 Lessons learned, and opportunities andbarriers identified

These are outlined in various parts of this final reportbut in summary:

● Consider the needs of end users and maintain thisfocus throughout development. Perhaps the mostsuccessful part of the EVO project was its strongfocus on end users. The advice from the ProjectAdvisory Group to use storyboards to articulate theneeds of the user informed both the overallappearance and development of the portal, as wellas informing the IT requirements. This approach isnow embedded at the heart of other recent fundingcalls. The AGILE approach was also influential inthe project design and encouraged exposure ofthe developing system to stakeholder input. Thisaspect is often the most used and communicatedpart of the project.

● The concept of deploying data, models and toolsas services in the cloud was demonstrated to bean effective way forward. A mix of commercial andprivate cloud providers is likely to be the optimumsolution. The cost of commercial providers is likelyto be prohibitive for large modeling applications.There is much to be learned from approachesadopted by other communities e.g. the astronomycommunity and bioinformatics community (e.g.

CloudBioLimux) which could accelerate theprogress of the environmental community in thisfield.

● There is a continuing need for internationalcollaboration in the development of standards andinter-operability. The EVOp project effectivelyborrowed from existing activities in this area. TheBelmont Forum activities should ensure progressis made in the area in the immediate future.

● It is critical to ensure an appropriate balance andmix of science/IT skills in any future initiatives. TheEVOp despite trying to ensure this from the outsethad insufficient IT / computer science postdoctoralresearchers. There was a pinch-point between theenthusiastic and engaged catchment scientistswith many ideas and the capacity to make theseoperational within the cloud.

● Incorporate a small number of carefully selectedexemplars in the early stages of systemdevelopment. This was a major success of theEVO project to both provide cohesion to thediverse and large team and facilitate outreach to abroad section of the potential end-user community.The pace of development is inevitably enhancedby utilising established science for the exemplars.

● The culture change. One of the areas of the EVOpproject of most interest to end-users was thepotential for a change in research culture. Couldthe community move towards a more open andtransparent way of working? Would they be willingto web-enable their models and encourageindependent testing to gain greater trust by theend-user community. The potential for the EVO-likeservice to effectively act as a 'market place' formodels; potentially only those made freelyavailable and web-enabled in the EVO or cloudwould gain traction in the end-user community wasappealing to some end-users as they sought waysto move forward to develop more tested andintegrated modelling approaches to tackle thecomplex environmental challenges they face.

1.13 ConclusionsAlthough not operational, the EVOp portal hasdemonstrated the value of integrating fragmented andwidely-distributed public and private sector data,expert knowledge, modelling tools and visualisationservices. It has illustrated how cloud computingimproves the efficiency, speed and effective use ofsuch resources. Valuable lessons have been learnedand key issues identified for future investigation. Out ofall the impacts, perhaps most important of all is thesupport and enthusiasm it engendered from itsstakeholder groups, the potential for expansion toother science areas, industry applications and benefitsarising from social and economic data were evident.From the outset, the EVOp was purposefully ambitious.It sought to demonstrate new capability and ways ofworking, to expand the knowledge base and sign postthe way for the future. In achieving these aims theEVOp has undoubtedly had a bearing on current

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

18

NSF EarthCube

http://www.earthcube.org

Earth System Science Partnership (ESSP)

http://www.essp.org

Environment Information Initiative

http://www.nerc.ac.uk/research/capability/

Environmental Sustainability Knowledge TransferNetwork

https://connect.innovateuk.org/web/sustainabilityktn

ESKTN TV YouTube channel

http://www.youtube.com/playlist?list=PLE862DA5853C19459

Natural Environment Research Council (NERC)

http://www.nerc.ac.uk

London Institute of Space Policy and Law (ISPL)

http://www.space-institute.org

funding and approaches being taken in this arena. Inthe context of research programmes, the outputs andimpact of this pilot are significant considering themodest investment and timeframe for implementation.

1.14 ReferencesGray J. 2007 Talk given by Jim Gray to the NRC-CSTBin Mountain View, CA, on January 11, 2007

Hey T, Tansley S, Tolle, K (Eds). 2009 The FourthParadigm Data-Intensive Scientific Discovery. MicrosoftResearch

The Royal Society, 2012, Science as an openenterprise. The Royal Society Policy Centre.

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

19

A key objective of the pilot Environmental Virtual Observatory (EVOp) project was to provide a cyber-infrastructurecapable of demonstrating the potentials of a virtual observatory that utilises the powers of the Internet to supportuniform and open access to a series of scientific resources, and to therefore support and encourage onlineexperimentation. This aspect of the project aimed to show how the application of new Internet-basedtechnologies, specifically cloud computing, and web portals can facilitate this vision and in particular achieve theintegration of a wide variety of information sources (including disparate data sets, sensor data and modelsapplicable at different temporal and spatial scales), together with associated information tools and services, toprovide answers to big environmental science questions. From a technological perspective, the challenge was todefine the architecture and associated architectural principles underpinning the EVOp, supporting multi-scaleexperimentation through an open extensible infrastructure, and harnessing existing resources (data, models, etc.).Core to this task was the definition of an overall architectural approach as a refinement of Web 2.0 standards,incorporating mechanisms to deal with meta-data, and the population of the architecture with exemplar services tosupport the project's other work packages.

2.1 What is cloud computing?Cloud computing is core to the EVOp project,providing the underlying technology to implement therequired cyber-infrastructure. Cloud computing hasemerged as one of they key areas of digital innovationin recent years and the associated technology ishaving major impact in a variety of areas such aseCommerce, eGovernment and smart cities. The goalof EVOp was to investigate the potential impact ofcloud computing on the Environmental Sciences andindeed on science more generally.

In general terms, cloud computing is a shift fromresources being on individual computers towardshaving these services available in the greater Internet.The classic definition from the National Institute ofScience and Technology (NIST) defines cloudcomputing as follows (??):

"Cloud computing is a model for enablingubiquitous, convenient, on-demandnetwork access to a shared pool ofconfigurable computing resources (e.g.,networks, servers, storage, applications,and services) that can be rapidlyprovisioned and released with minimalmanagement effort or service providerinteraction."

Coulouris et al (Coulouris et al., 2011) provide analternative definition:

"A cloud is defined as a set of Internet-based application, storage and computingservices sufficient to support most users'needs, thus enabling them to largely ortotally dispense with local data storageand application software".

They then go on to refine this vision saying:

"[Cloud computing] also promotes theview of everything as a service, fromphysical or virtual infrastructure through tosoftware, often paid for on a per-usagebasis rather than purchased".

Cloud computing is better understood as not a singleapproach but actually a family of approaches and partof the EVOp story has been to work through thedifferent options and the implications for theEnvironmental Sciences community. In particular, akey defining characteristic is who provides andmanages a given cloud infrastructure:

● The first option is to have a private cloudimplemented within an organisation for thatorganisation. This option requires significant up-front investment and the responsibility formanaging the cloud rests with that organisation.The organisation though retains control andownership of the resultant infrastructure andassociated cloud services.

● The second option is to opt for a public cloudwhere the cloud is implemented and managed bya third party provider such as Amazon, Microsoft,or Google and offered through the Internet asservices to the greater public (includingorganisations or private individuals). Theadvantages here are that the upfront investmentand resultant management is delegated to a thirdparty (for a given cost), but with a loss of controlover the infrastructure (for example, for data it maynot be possible to specify or know where the datais stored).

are also common where organisationswill adopt a mix of public and private cloud provision.More generally, there is a tendency towards multi-cloud environments where organisations use multiple

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

20

highlights the fact that, in cloud computing, everythingis a service and this is the key unifying principleunderpinning cloud computing. This encompassesthough a variety of approaches, often distinguished as:

● Infrastructure as a Service (IaaS) - IaaS is thelowest level of abstraction and refers to access tobasic underlying computing resources as aservice. The most common example here is torequest and be given access to a computer as aservice, accessed across the Internet. Morespecifically, when coupled with virtualisation, youwill be given a virtual machine which you can thenuse as if it is a real, physical machine for yourpurposes (e.g. to act as a web server or as ahosting environment for models).

● Software as a Service (SaaS) - at the opposite endof the spectrum, SaaS refers to being given accessto application-level software. This is a very openended category and includes the availability of asuite of applications that offer classic functionalitysuch as word processing, e-mail, calendars, etc(e.g. Google Apps). This category also includesdomain specific software, e.g. an environmentalmodel that you would like to make available as acloud service.

● Platform as a Service (PaaS) - PaaS sits in themiddle and refers to services that reside aboveoperating systems and which are useful in theconstruction of application software. Key examplesinclude software frameworks to support webservers and databases.

A sub-goal of the EVOp project was to understand thefull range of approaches under the 'cloud computing'umbrella and the implications for the EnvironmentalSciences community.

2.1 Why cloud computing for EnvironmentalServices?

The general benefits of cloud computing are welldocument, for example see (Zhang et al., 2010). In thissection, we focus on the specific benefits in the contextof supporting an Environmental Virtual Observatory.

The most profound impact is in terms of supporting anew kind of science:

● An science - whereby open access isprovided to a range of environmental assetsincluding data sets, models and supporting assetssuch as visualisations;

● A science - wherebyassets are accessible and shareable fromanywhere via the cloud by different stakeholdergroups, encouraging collaboration and opening upnew avenues such as citizen science;

● An science - whereby problems can bestudied by bringing together data and models fromdifferent disciplinary perspectives, over differentgeographical regions and at different scales.

In addition to this, we gain a number of key benefitsemanating from the cloud computing approach:

● - All models and dataassets follow a common service model. This offersa level of transparency, by which details of whereand how the data are held are hidden from users.This allows for data to be used in models andsimulations without necessarily giving it to theusers, avoiding some of the delicate issues ofownership.

● - In cloud computing, itis possible to request resources as and when theyare needed and then return them to the cloudprovider when no longer needed. Through thisapproach, resources can grow and shrink to meetcurrent demand (a property known as elasticity).For example, if a user is running a complex climatechange model that requires extra virtual machines,these can be requested and returned when themodel completes execution.

● - Managing a cyber-infrastructure is complex, especially when it comesto ensuring key properties such as secure andreliable access to resources. For example, it isimportant to ensure that data is continuallyavailable in spite of the inevitable failure ofunderlying computing infrastructure, typicallyachieved through replication of data and the use ofprotocols to ensure consistency of replicas. Byadopting a cloud approach, such management isdelegated to the cloud provider meaning users donot need to worry about these key issues.

2.2 The EVOp approach

2.2.1 Overall approachThe first key decision was to adopt a multi-cloudapproach spanning private and public cloud provision.The private cloud is hosted in Lancaster University andis operated by us using OpenStack, an open sourcevirtual infrastructure management solution. The publiccloud resources are provided by Amazon WebServices (AWS). The pairing of OpenStack and AWS isa common one in the cloud computing world: AWS isarguably the most mature and feature rich public IaaSprovider, and OpenStack is backed by many (includinglarge organisations like NASA, IBM, and many others)as the de facto open source alternative to the coreAWS products, i.e. EC2 (utility computing) and S3(storage service). This makes it possible, at least intheory, to use the same virtual machine images to startinstances in either cloud. In order to promoteportability and to avoid being tied in to one provider(vendor lock-in), we adopted cross-cloud libraryjclouds. This open source software providesabstractions across many of the widely used cloudsolutions.

Where possible, we adopt standards to enableinteroperability within the architecture. Both AWS andOpenStack adopt web service standards and hence, inEVOp, all services (data, models and other supportiveservices) offer standard web service interfaces. Againwhere possible, we adopt a RESTful approach toservice APIs, an approach that promotes loose

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

21

coupling and consequently major improvements inscalability and manageability (Fielding, 2000). Theenvironmental models are implemented using the OGC(Open Geospatial Consortium) WPS (Web ProcessingService) standard that specifies how geospatial inputsand outputs should be.

2.2.2 The system architectureThe overall architecture for the system is shown inFigure 2.1.

The Model Library is populated by domain specialists(e.g. hydrologists) in liaison with data providers. Theprocess starts with online calibration and testing of amodel against a certain dataset (e.g. TOP-MODEL onthe rainfall data of the Eden catchment in the NorthWest of England). The outcome of this process is avirtual machine image optimised to run a fine-tuned setof models that are exposed as web services andequipped with all required data. This streamlinedexecution bundle is then stored in the library to beinstantiated upon demand.

Once a user navigates to one of the models, aconnection is created with the Resource Brokermodule of the Infrastructure Manager. This brokerresponds with an address of a cloud instance that issuitable for the type of computation required, alongwith some session information. This communication isdone using HTML5 WebSockets, which reducesnetwork overhead and browser memory usage.

The Load Balancer monitors the status of runninginstances with two objectives: minimise costs andmaintain instance responsiveness:

● For the former, user requests are served by defaultusing private instances. Upon saturation of private

cloud resources, the load Balancer initiatescloudbursting mode where public cloud instancesare used beside private ones. This is reversedupon detecting underuse, migrating users back touse private instances.

● For the latter objective, performance metrics arecollected and any notable degradation triggers theLoad Balancer to start a new instance, redirectingusers that were being served by the seeminglymalfunctioning instance to the newly created one.

A number of models were migrated to the EVOpinfrastructure. For each model, a bespoke visualisationwas developed to suit the particular factors in question.In general terms, models generate one of two types ofoutput: geospatial and time series. Geospatial data isvisualised using interactive layers superimposed overmaps. Google Maps is used due to its wealth in data,features, and the familiarity of the general public with it.The interactive nature of the geospatial layers allowsexpansion of the visualisation to include time seriesgraphs over specific map locations.

An emphasis in all models is placed on adjustabilityand flexibility in order to provide an interactive andconfigurable user experience. This is achieved viadynamic HTML and HTML5 web elements, AJAXasynchronous communications, and browser scriptingusing advanced open source JavaScript libraries suchas jQuery, Flot, qTip, and Google Maps.

2.2.3 The use of agile methodologyThe development process in EVOp relied heavily on anagile methodology based on a behaviour-drivendesign. Requirements were drawn from specificstoryboards that were outlined by the domain

EVOp Portal

Private Cloud

Public CloudLoad

Resource

Infrastructure

Policy Makers, Local

Data Providers, Modellers,

Active

Model

Workbench

jclouds BlobStore

Interaction Types

Housekeeping

HTML5 WebSockets

jclouds Compute

RESTful service calls

ssh

Figure 2.1 The overall architecture of the EVOp cyber-infrastructure.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

22

Figure 2.3 The resulting hydrograph output from a Taverna workflow run within the BioVel interface.

Figure 2.2 The Taverna workflow running the FUSE model as seen in the Taverna Workbench editor.

data_url parameters_ranges_data_url warmup_value behavioural_limit_value parameterSest_value mid_value deltim_value

data_and_parameters_loader

Workflow input ports

image

Workflow output ports

Claudia_Script_For_EVOp

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

23

hydrograph of stream discharge is saved so that userscan review and repeat previous workflow runs.

2.2.5 Exploring semantic links for EVOp dataThe team investigated the use of GeoNames ontologyand Web accessible database. The GeoNamesdatabase provides unique Web URIs for locations thatcan then be used to link to other web resources usingWeb APIs. The team entered 25 EVOp related sites into GeoNames using the EVOP tag. These locationscould then be linked through the GeoNames databaseto other resources such as Wikipedia entries for nearbylocations (linked through DBPedia) and to Webservices providing local weather readings (locatedthrough International Civil Aviation Organization (ICAO)sites in GeoNames). This facility to access relatedinformation to EVOp sites was developed as aJavascript tool but not included in the final EVOp portalpartly due to the reliability of the GeoNames API duringtesting.

2.3 ReferencesCoulouris, G., Dollimore, J., Kindberg, T., Blair, G.S.(2011). Distributed Systems: Concepts and Design.Pearson.Zhang Q., Cheng, L., Boutaba, R. (2010). Cloudcomputing: state-of-the-art and research challenges.Springer Journal of Internet Services and Applications(JISA). 1:1, 7-18.Fielding, R.T. (2000). Architectural Styles and theDesign of Network-based SoftwareArchitectures. Ph.D. thesis, University of California, Irvine.

The NIST definition of cloud computing

http://csrc.nist.gov/publications/PubsSPs.html#800-145

Apache jclouds

http://jclouds.apache.org/

The Taverna Workflow Management System

http://www.taverna.org.uk/

specialist within the consortium. These are referred toas the storyboard owners. Prototypes were developedbased on these requirements, and are iterativelyimproved and built following verification (within thedevelopment team) and validation (with the storyboardowners, the wider consortium, and stakeholders)processes.

2.2.4 The portalThe EVO portal provides several means of findinginformation. The portal front page hosts a selection ofpaths to content and functionality. Furthermore, thereare a myriad of info pages that introduce the modelsand data hosted on the portal along with associatedlinks, FAQs and tutorials. Examples of the portable canbe seen throughout this report (for example, seeFigure 1.4 for the entry point to the portal).

The project team also investigated the role ofworkflows to enhance the capabilities of the portal,allowing scientists to create bespoke experimentsconnecting data and models together in a desired andrepeatable pattern ready for execution in the cloud.

The experimentation focused on hydrological modelsto enable users to explore different managementscenarios that might affect water resources in theirlocation. A number of web services and a bespokeweb page to run them were created. Building on this, amore flexible interface was developed using theTaverna Workflow Management System.

Figure 2.2 illustrates an example Taverna workflow thatwas created in the project. This workflow features aWPS service running an R version of the FUSEhydrological model for stream discharge in acatchment (within the Claudia script). The BioVel EC2image was configured to run this workflow with theTaverna server to handle input parameters and thedata loading. The workflow was then connected to aseparate EVOp EC2 image to call the WPS running thehydrological modelling service.

Figure 2.3 illustrates the view for the user of theworkflow when loaded into the BioVel Web interface.This hides the details of the connection of differentcloud services to run the FUSE model. The resulting

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

24

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

25

The EVOp portal deploys interactive web applications to model local flooding and diffuse pollution. From a userperspective, web-based environmental models do not require any software installation as they are accessed usingstandard web protocols using an internet connection (e.g., http). As a result, they are platform independent andcan be easily integrated in workflows as part of the process of environmental data retrieval, manipulation,visualisation, and communication. To maximize efficiency and interoperability with storage, interaction andvisualisation tools, the EVOp team have adopted the Web Processing Services (WPS) standard of the OpenGeospatial Consoritum (OGC), which consists of a wide range of industry and academic actors. Additionally,several stable software implementations of the OGC standards exist, which greatly facilitated the development of

These models were written in different programminglanguages (C, Fortran, C++, and MS Excel). In orderto integrate them into a common web serviceinfrastructure, TOPMODEL and FUSE were integratedin the R data analysis platform, either by wrapping Rcode around the original implementation (forTOPMODEL), or by re-implementing them directly in R(for FUSE). TOPMODEL is already available as an Rpackage, and FUSE has also been made available. Amajor advantage of R is its convenient integration withPyWPS, a Python implementation of the OGC WPS.The communication between Python and R isfacilitated by an existing connector (RPy2). A localPostgreSQL database was used for data storagebecause a web-based implementation such as theOGC Sensor Observation Service was deemedunfeasible within the context of the pilot.

The ECM was originally implemented in Excel andhence a new implementation of the model was writtenfor this project in a combination of PHP (on the cloudside) and JavaScript (on the desktop side). Thislanguage and platform split in the implementationmeans that the cloud was able to perform the query ofthe spatial database for the area of interest and thebulk of the model calculations. The summary tablesare then passed to the final calculation in the browser.Keeping the final stage of the calculation on thedesktop means that the user can undertake scenariotesting, such as altering the livestock numbers in acatchment, with the results being very rapidlycalculated and displayed. The main spatial database isstored in a MySQL database and the outputs of themodelling can be rapidly mapped at any spatial scalefor visual interpretation of model predictions.

3.1 Selected models and web serviceimplementation

For the purpose of this project, three differentenvironmental models were implemented as webservices:

● TOPMODEL (Beven and Kirkby, 1979) was chosento demonstrate the local exemplar. Topmodel is awidely used and well documented semi-distributed, conceptual hydrological model. It wasoriginally developed for small, upland catchmentsin a humid environment, and is therefore verysuitable for the catchments selected for the localcase studies. The model produces a time series ofriver discharge and spatial patterns of soilmoisture.

● FUSE (Clark et al., 2008) is a state-of-the-artmodelling toolbox which includes well establishedalgorithms including VIC and the StanfordWatershed Model. As a toolbox, it allowscombination of many different algorithms, thuscreating over 1000 possible model structures. Assuch, it provides a very flexible solution for thelarge number of catchments to be modelled in thenational case study. The model produces a timeseries of river discharge.

● Export Coefficient Model (ECM) (Johnes et al.,2007) is an established approach to allowprediction of nutrient (nitrogen and phosphorus)flux from land to inland and coastal waters. Thepredictions are based on agricultural practices,such as stocking densities, fertiliser use and croptypes, human population density and atmosphericdeposition, hence making the model suitable forrapid mitigation scenario development and testing.The model produces predictions from field tonational scale and generates mapped outputs andsummary tables of nutrient export by source andpractice.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

26

3.2 Workflow integration and applicationsThe modelling web services were integrated in theworkflows of two EVOp applications to showcase theirpotential. These applications are accessible via themain EVOp portal and focus on simulating the impactof land-use changes on flood risk, and diffuse pollutionand water quality.

The communication between client (e.g. modellingweb portal) and server is defined by the OGC WPSstandard. The web client sends the server a HTTP GETrequest to execute the process. Once the executionterminates, the server sends back an XML responsewhich is parsed at the client side extracting thesimulated time series. Data are retrieved by queryingthe database in real time while the visualization of timeseries is based on the Google-charts API and similarlibraries.

The flood modelling tool is available at the followingfour locations:

● River Eden catchment:

q Dacre Beck at Dacre Bridge, Cumbria,England.

q Blind Beck, Cumbria, England.

● Dyfi catchment: Dyfi Bridge, Machynlleth, Wales.

● Tarland catchment: Coull Bridge, Aberdeenshire,Scotland.

It consists of the following user-interactions:

i. Selection of the location of interest: Based on theuser input, observed river flow and precipitationdata, as well as pre-calibrated model parametersare queried from the database.

ii. Modification of the land-use scenario: The userinput is then mapped onto modifications of theoptimal model parameters. At this stage of theEVOp, this conversion is based on an empiricalparameter translation table. However, theoreticalresearch on how to parameterise user scenarios ison-going.

iii. Subsequently, the model is run with the modifiedparameter values and the results are visualised inthe web interface. The model can be run manytimes with different scenarios. The simulateddischarge time series are kept while the usersession is active. Multiple simulations can beselected and visualised together to facilitate thecomparison of different scenarios. The plottingfunctionality also allows for the visualisation of anestimate of the streamflow threshold above whichthe flow spills over the banks causing possibleflooding.

The diffuse pollution modelling tool has beenimplemented at the full UK scale at a spatial resolutionof 4km2. To run the model, the user makes a series ofchoices:

● Select an area of interest: These are based oncountries, OSPAR zones, coastal drainage zones,River Basin Districts and river or water quality

catchments based on the locations of gaugingsites.

● For the area of interest, specify various optionsconnected with the model setup, although not alloptions are enabled in the demo:

q Selection of the model.

q Selection of the data set to run of the model,such as land cover data from different years.

q Selection of the hydrological model.

● For the selected model configuration, the user canchoose to run the model with the observed historicland management, or to test a set of mitigationoptions. The mitigation options are 'GoodAgricultural Practice', 'Catchment SensitiveFarming', 'Mitigation through on-farm measures','Farming for WFD compliance' and 'Urban Wastewater Treatment Directive Plus (UWwTD+)'. Eachof these mitigation scenarios makes suitableadjustments to the configuration of the landscapeto represent the change. These mitigation optionscan be spatially targeted at the level of thegeoclimatic region (targeting different measuresaccording to the key characteristics of eachregion).

● The results are then presented as tables, maps orcharts for both the observed historic, and thedeveloped scenario.

3.3 UncertaintiesUncertainties remain a major issue to tackle. Theproject reviewed the relevant technologies that areavailable to document uncertainties in a web servicescontext. One of the most promising initiatives isUncertWEB which aims to provide data models andmark-up tools for the propagation of uncertainties inmodelling workflows and communication ofuncertainties to the end-user. Experiments are on-going to integrate uncertWEB (Williams et al., 2010) inthe FUSE modelling toolset by means of addingUncertWEB data models to the object-oriented classdefinitions in the R toolbox.

At the same time, efforts were focused on integratingthe major uncertainty quantification methods in themodelling toolbox. The following methods are currentlyavailable:

● Generalized Likelihood Uncertainty Estimation(GLUE, Beven and Binley, 1992, Beven and Binley2013).

● Differential Adaptive DREAM (Vrugt et al., 2009).

● Bayesian Total Error Analysis (Thyer et al., 2009).

These implementations are a combination of effortsfrom the R development community and the EVOpproject.

Lastly, efforts have been aimed at developing anobjective model structure selection algorithm, basedon methods for data mining of model performances,and the extrapolation of model structures in space andtime.

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

27

Many of the abovementioned activities need furtherattention. Even though functional implementations ofeach of the algorithms exist, major technological andconceptual challenges are still to be overcome. These,and others, should be explored further in eventualfollow-up projects to EVOp.

3.4 OpportunitiesThe future opportunities for linking the hydrologicaland biogeochemical models can be seen in bothresearch and decision support focused areas. Forexample, the tighter linkage of the model componentsor models could help to investigate the grand scientificenvironmental challenges on, for example, theinfluence of hydrology on processes controlling lossesof nutrients (N, P and other macronutrients) from landto surface waters, the location of critical source areasfor diffuse pollution in the landscape and how theseexport processes and locations may alter underprojections of climate change. This need for tightercoupling has been recognised in the EU SEAMLESSproject, which is aiming to overcome "fragmentation inresearch models and data in Europe for assessingagricultural systems". EVO has an opportunity tointegrate more closely with this, and other, projects.

The modelling community is moving from an approachwhere one model approach / structure was deemedsufficient to understand and predict system propertiesto approaches that embrace ensembles of approachesand model structures. This change in approachcreates an opportunity for EVO since cloud computingoffers a solution to the greater computationalresources required, and simplifies the testing ofdifferent models (approach and structure) against thesame datasets to provide new insights. EVOp couldhave a significant role in supporting the communityand advancing the science by providing a database(and metadata) on a wide range of catchment studiesthat could be used by the wider community. Throughbetter sharing of data and approaches (as has beenshown in molecular and marine sciences) EVO can actto accelerate progress in the environmental sciences.

Taking a high level view of the possible futureopportunities, there are many possible ways in whichEVO could add value to science and society:

● The range of predictions could be extendedthrough linking other catchment hydrological andbiogeochemical models into the framework andallowing the cross comparison between thedifferent predictions. As each new model isincluded, the potential number of combinationssignificantly increases, requiring the computationresource available in the cloud. Examples includeCAPRI DNDC, PSYCHIC, MITERRA, INITIATOR,INCA, RiverStrahler, CRUM3, SCIMAP and PIHM.Research is required to understand which modelcouplings yield the greatest information gain andhow to interface different type of model output(resolutions and types of predictions)

● Time and space scales. Finer spatial scale models,such as SCIMAP, at the 5x5 metre resolution to

compliment national models, such as the currentECM. Temporally dynamic models such as HBVlight, INCA or CRUM3. The temporally dynamicmodels could enable real time flow and or WQpredictions for water companies which could beused in abstraction or discharge planning.

● Testing old models with new data. DTC N and Ptime series are the required level of resolution totest detailed process catchment models. The DTCdatacentre could provide data series as webservices (mid 2014) and used to drive simulationmodels. Linking the data stream with the modellingtools could create a powerful platform to testapproaches and hence encourage developers tomake their tools compatible with the EVOapproach.

● There are opportunities to develop mobileapplications that make use of the cloud hostedmodels to deliver support services for CSFOs orfor teaching and learning at all levels of education(flood models, mitigation scenarios andaugmented reality).

● There is a growing trend towards open sourcemodels and publishable workflows (myExperiment,R and iPython notebook), which share bestpractice and help others to use techniques. Linkingthe EVO hydrological and biogeochemical modelsinto these tools increases their accessibility andvalue.

● Stakeholder engagement earlier in the modellingprocess. Through having a range of differenthydrological and biogeochemical modelsavailable, it is possible to work closely withstakeholders to understand their problems andbuild a tailored solution rather than focusing thecurrent limited set of tools that may be available tothat user.

3.5 BarriersThe barriers that have been identified are:

● IP on models and datasets. This cannot beunderstated.

● The required data may not exist, may be protectedby IP, not be publicly available (governmental orcommercial issues), or be lost. There is thereforethe need to make as much data available aspossible and to ensure that the datasets that arecollected in the future have maximum informationcontent and well documented metadata.

● Modellers and modelling groups may see theconversion of existing models to a web serviceapproach difficult or not worthwhile. If there couldbe a modelling portal that had licences for the keydatasets (e.g. NextMap 5m product, LCM 2007,geology, soils, rainfall, discharge and waterchemistry data), then this would be a significantpull for developers to move their tools to theplatform.

● There are significant challenges with linkingmodels together to create a solution. This relates

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

28

to the issue that models may have been written fora different reason, work at different space and timeresolutions, and the feedback between processesrepresented in different models may not becaptured.

● As models become more advanced, thecomputational cost increases significantly. Thecomputational resource is there via cloudcomputing technologies but there needs to be amethod to cover the costs of using the resource.Precedents for this exist in HPC (e.g. Hector)although a more dynamic 'economy' could becreated with 'rewards' for sharing and creatingtools and computational resources and 'costs' forusing tools and computational resources.

● There is the issue of standards for modeldescription, parameterisation and output. Thereare a few standards for model output (e.g.WaterML) but there is no currently definedstandard for the communication of theuncertainties in predictions and measurementsbetween models.

● The need for further development of methods topropagate uncertainties within modellingworkflows.

● The need for methods to communicateuncertainties to end-users.

3.6 ReferencesBeven, K., & Binley, A. (1992). The Future ofDistributed Models: Model Calibration and UncertaintyPrediction. Hydrological Processes, 6, 279-298.

Beven, K. J., and Binley, A. M., 2013, GLUE, 20 yearson. Hydrol. Process. DOI: 10.1002/hyp.10082.

Beven, K. J. and M. J. Kirkby, (1979), A PhysicallyBased Variable Contributing Area Model of BasinHydrology, Hydrological Sciences Bulletin, 24(1): 43-69.

Clark, M. P., A. G. Slater, D. E. Rupp, R. A. Woods, J.A. Vrugt, H. V. Gupta, T. Wagener, and L. E. Hay(2008), Framework for Understanding Structural Errors(FUSE): A modular framework to diagnose differencesbetween hydrological models, Water Resour. Res., 44,W00B02, doi:10.1029/2007WR006735.

Johnes, P.J., Foy, R., Butterfield, D. and Haygarth,P.M. (2007) Land use for England and Wales:evaluation of management options to support 'goodecological status' in surface freshwaters. Soil Use andManagement, 23 (Suppl 1). pp. 176-196. doi:10.1111/j.1475-2743.2007.00120.x

Thyer, M., Renard, B., Kavetski, D., Kuczera, G.,Franks, S. W., & Srikanthan, S. (2009). Criticalevaluation of parameter consistency and predictiveuncertainty in hydrological modeling: A case studyusing Bayesian total error analysis. Water ResourcesResearch, 45, W00B14.

Vitolo, C., Buytaert, W., McIntyre, N, Reusser D., andthe EVOp team. Data Mining of Hydrological ModelPerformances with the FUSE framework. Inpreparation.

Vrugt, J., Ter Braak, C. J. F., Gupta, H. V., & Robinson,B. A. (2008). Equifinality of formal ( DREAM ) andinformal ( GLUE ) Bayesian approaches in hydrologicmodeling ? Stoch Environ Res Risk Assess.doi:10.1007/s00477-008-0274-y.

Williams, M., Cornford, D., Bastin, L., & Pebesma, E.(2010). Uncertainty Markup Language, OGCDiscussion paper 08-122r2.

EU SEAMLESS project

http://www.seamlessassociation.org

FUSE

https://f-forge.r-project.org/projects/r-hydro

TOPMODEL R-package

http://cran.r-project.org/web/packages/topmodel/

Uncertainty Markup Language

http://www.uncertml.org/

Uncertweb discussion paper

http://www.uncertweb.org

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

29

The UK is richly endowed with environmental data that have been collected over many decades for a wide varietyof reasons (e.g. research, monitoring change, operational, public safety, legal reporting, and regulatoryobligations) by many different organisations. The long tradition of environmental monitoring provides the UK withan enormous reservoir of historical environmental information that also is of huge global importance. People areawakening to the potential that such data possesses not only for science but for stimulating economic growth andbringing positive impacts and benefits to society.

4.1 Time to unleash the potential ofenvironmental data

The vast majority of environmental data have beencollected over the years through public funding. In itsOpen Data White Paper (HMG, 2012), the UKGovernment clearly articulates its commitment toinspire the "innovation and enterprise that spurs socialand economic growth" through improved sharing ofpublic data. This is reflected in the new Open Datapolicies published by all Government departments.The BIS Open Data Strategy (BIS, 2012), for example,states the Department's commitment "both toincreasing the economic impact of existing publicsector information and also to releasing new publicsector information to expand the market". A furtherdriver for Government is transparency: helping peoplemake better choices about public services and holdinggovernment to account. In a scientific context, OpenData not only promotes innovation but also providesgreater transparency of the research process enablingthe provenance of data to be checked and resultingclaims and discoveries to be corroborated. The RoyalSociety, in its recent report, “Science as an openenterprise” (2012), called on scientists to"communicate the data they collect... to allow free andopen access, and in ways that are intelligible,assessable and usable."

4.2 Open data beyond the UKSuch ambitions on Open Data are not confined to theUK but are part of a global trend towards realising thevalue and impact of data. The US Government in itspaper, Digital Government - Building a 21st CenturyPlatform to Better Serve the American People (USG,2012), undertakes to "unlock the power of governmentdata to spur innovation". Similarly, on launching theOpen Data Strategy for Europe in December 2011,European Commission Vice President Neelie Kroes

stated, "taxpayers have already paid for thisinformation, the least we can do is give it back to thosewho want to use it in new ways that help people andcreate jobs and growth."

4.3 InitiativesA number of initiatives are leading the way inpromoting data sharing and derive better impact fromenvironmental data, including: the EC's INSPIREDirective which aims "to establish an infrastructure forspatial information in Europe to support policies oractivities which may have an impact on theenvironment", and the Defra-led UK LocationProgramme - a UK pan-government initiative toimprove the sharing and re-use of public sectorlocation information"; and the Public Data Group (MetOffice, Ordnance Service, Companies House, LandRegistry), which seeks "to maximise the long termeconomic and social benefit of data". TheEnvironmental Science to Service Partnership (ESSP)involving Defra, the EA, Met Office, Ordnance Surveyand the NERC, aims to "combine data, information,knowledge, and expertise to deliver services forsociety, private enterprises and government, to informand support decision making". NERC itself hasadopted an open data policy and, through theimplementation of its new Science InformationStrategy, is enabling improved access to data from theactivities it funds.

4.4 AdvancesAdvances in information and communicationtechnologies are further rapidly transforming the wayenvironmental data are collected, accessed, sharedand analysed. Improved capabilities offered by theinnovation in cloud computing, smart-phones andcollaboration tools all affect how people deliver,receive, and synthesise environmental data and arehelping to promote the Open Data agenda. Several

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

30

projects and initiatives at national-, European- andglobal-level have sought (or are seeking) todemonstrate the benefits and potentials of exploitingnew technology to derive better impact from data. Aswell as this project, examples include: the Met Office'sOpen Platform, the European Commission’s SharedEnvironmental Information System, the EEA's Eye onEarth, the NSF's EarthCube (USA), the GlobalMonitoring for Environment and Security (GMES), andthe Global Earth Observation System of Systems(GEOSS)).

A study undertaken on behalf of Research CouncilsUK, in partnership with JISC and the Royal Society,explored public views on Open Data in research. Thefollowing findings, taken from the study's Final Report(TNS, 2012), provide a series of pointers for EVO as itseeks to develop beyond the pilot:

● whilst openness was believed to promote scrutinywhich could help build trust... confusion may arisefrom multiple interpretations of the same data,which in turn could impact on the trustworthinessof research;

● regarding innovation... arguments around betterefficiencies in the research process, potential costsavings and to a lesser extent growth (by utilisingdatasets to develop new products and services)were accepted;

● the principal benefits of open data were seen toaccrue for researchers rather than the public;

● the concept of openness sat uneasily withresearchers... there was a strong view that those

who had put the effort into developing a datasetshould have a period of time to take exclusiveadvantage of this;

● the most important concern around open data wasthat it should be promoted when it serves thepublic interest... defined almost exclusively interms of data that can help improve human healthand, to a lesser extent, the environment;

● data should not be released too early or in a waythat would be likely to promote poor decisionmaking or do harm; and

● public funded and academic researchers weregenerally thought to be more open than thosefunded in the private sector,.... increasedcommercial funding ... was seen as having thepotential to negatively impact.

Open Data are defined as data that are:

● accessible, ideally via the internet, at no morethan the cost of reproduction, without limitationbased on user identity or intent

● in a digital, machine readable format forinteroperation with other data; and

● free of restriction on use or redistribution in itslicencing conditions.

Source: HMG Open Data White Paper, 2012

Figure 4.1 A glimpse of the environmental data landscape.

DataShare

What’s in your backyard?

Environment Agency Scotland’s Environment WebDefra

OTC

myEnvironment

Scotish National HeritageInformation Service

Data.gov.uk

UK Open Data InstituteMEDIN

Natural England

EOF NBNVO

CEDaR LifeWatch

EDC

BRCECNNRFA

NEBC

NEBC SIS / EDCs

DCC

JISC RDMBritish Library

FBA

HEIsCEDA Research Councils

Royal Society

EarthCubeOGC

CUAHSI-HIS

US EPA Envirofacts

UNEP EDEPANGAEA

DataCite ICSU CODATA

ICSU World Data System

WMO Global Runoff Data Centre

IPCC DDC

IPBES

GBIF GEOSS

US NOAA GPCC

OEDC Data Compendium

WMO

World Bank Data CatalogueUNEP - GRID

Australian National Plan forEnvironmental Information

Australian WRIS

JRC IESGMES

SEIS

INSPIRE

EIONETENDS

EC Open AccessEuroGEOSS

Government

Scottish Government

Projects / Networks UK record centres /local authorities

Environmental DataInitiatives andOrganisations

Projects / Networks

ResearchEurope

UK

Government

Global

NationalInternational partnerships

Government

GovernmentScientific Research

Commercial

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

31

4.5 The pros and consThe Royal Society (2012), whilst claiming open datacan increase a published paper's profile (one casestudy it refers to showed a 69% increase in citation),acknowledges that protecting IPR around data are stilla vital issue for many and concedes that "legitimatereasons for keeping data closed must be respected".However, it asserts "the small percentage incomeuniversities derive from IP" should "not work againstlonger term benefit to the national economy." Clearlyfurther work is needed to strike the correct balancebetween the two.

The reuse of data provided by individuals for purposesother than those for which they were originallycollected raises further issues of confidentiality andprivacy and the rights and responsibilities of differentdata stakeholders, which sometimes may conflict (e.g.the rights of individuals versus "the common good"). Itwould seem many of the moral and ethical aspects ofOpen Data remain to be explored.

4.6 ConclusionsDespite recent technological advances, a complex andfragmented environmental data landscape persistscomprising many different organisations and initiativesall seeking to provide environmental data andassociated information (see Figure 4.1). Scientists,students, consultants, environmentalists, politiciansand members of the general public alike regularly havedifficulty in obtaining relevant data, which, even whensuccessful in identifying, often are returned inunintelligible or unusable form. Such fragmentationand heterogeneity is inefficient and confusing andremains a huge barrier to science, innovation andeconomic growth.

From a technical viewpoint, implementation ofGovernments' Open Data policies requires data ofmany different types and formats to be madeaccessible in a standard way, through common,standards-based tools and services. EVOp, albeit in arelatively small way, has attempted to tame some ofthe heterogeneity and addressed several of thetechnical challenges of bringing environmental data,models and tools together as services in a cloudenvironment, demonstrating the potential benefits to arange of stakeholders. Progressing beyond the pilot,Open Data will provide the EVO with opportunities toexploit, and derive impact from, an increasingly widerange of data types. But in so doing, the EVO will itselfbe at the vanguard of Open Data implementation andwill be forced to address many of the technical andnon-technical issues that have been set-out in thissection.

4.7 ReferencesBIS, 2012. Open Data Data Strategy 2012-14.Department for Business Innovation & Skills. June2012, pp20.

HMG, 2012. Open Data White Paper - Unleashing thePotential. Her Majesty's Government, The StationeryOffice, June 2012, pp. 51.

The Royal Society, 2012. Science as an openenterprise. The Royal Society Science Policy CentreReport 02/12. June 2012, pp. 104.

TNS, 2012. Open Data Dialogue Final Report. TNSBRMB , London, 2012, pp.79.

USG, 2012. Digital Government - Building a 21stCentury Platform to Better Serve the American People.United States' Government, 23 May 2012, pp. 31.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

32

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

33

The development of the local exemplar has been undertaken in three largely rural catchments based in Scotland,England and Wales, with a team from three universities and one research institute. The project focused on soil andwater issues, and this has led to the development and trialling of visualisation of the linkages between hydrology,land use and flooding in the rivers Dyfi, Tarland and Eden. An important aspect of the work has been theconsultation with a wide range of potential users in each of these catchments.

5.1 The CatchmentsThree catchments were used. Their locations can beseen in Figure 5.1.

5.1.1 The DyfiThe River Dyfi is located north of Aberystwyth in mid-Wales and its catchment drains an area of 671 km2.The Dyfi and its tributaries form a dense, dendriticdrainage network with a total channel length of over1,500 km. Rainfall in the upland areas is on average2000 mm per annum, falling to ~1000 mm on thecoast. Land use in the catchment is dominated byagricultural activity, whereas slate quarrying and metalmining were historically prevalent. Virtually the wholeDyfi catchment has been given some form ofconservation status, including the Snowdonia NationalPark, 23 SSSIs and being designated as a UNESCOBIOSPHERE Reserve.

5.1.2 The DeeThe River Dee in northeast Scotland has multiple highlevel habitat designations (Natura 2000, SAC) forspecies such as Freshwater Pearl Mussel andeconomically-important Salmonid fish species. TarlandBurn, situated centrally in the Dee, is the first tributarywith intensive land use and first point of nutrient-impacted waters entering the oligotrophic main river.Rainfall is approximately 1000 mm with long periods ofwinter snow. The Tarland catchment (100 km2)includes the village of Tarland and Aboyne(populations 600 and several thousand). The TarlandBurn suffers diffuse pollution and morphology issueswith pressures from farming, urbanization and septictanks. It is currently described as Poor-Moderate underthe WFD and is a Priority Catchment for SEPA.Additionally, the community has suffered recentflooding and, in response to these many impacts, hasshown some excellent examples of community ledinitiatives in natural flood management and riparianhabitat improvement. A decade of research into boththe natural functioning and improvements in thecatchment has given a wealth of data and knowledge

that will enable testing of models for biophysical andsocio-economic aspects of catchment management.

5.1.3 The EdenThe Eden catchment in northwest England is a mixedgrassland area, with an area of 2398 km2 and a mainchannel length of 130 km. Agriculture in the catchmentis characterised by mixed dairy and livestock farming,and comprises both rough grazing and improvedgrazing with some arable land use towards the northand on the richer soils of the River Eden floodplain.Average rainfall across the catchment is ~1700 mmper year; however this masks rainfall gradientsassociated with much higher rainfall on the uplands ofthe Lake District and Pennine fells. The catchment hasseveral designations for SSSI and SAC status, for the

Figure 5.1 Catchments and their locations

Dyfi, West Wales

Tarland in the Dee

Morland in the Eden

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

34

range of habitats and species it supports, and the riverpasses through two National Parks, two AONBs and aWorld Heritage Site. The Morland sub-catchment of theEden is 10 km2 and is located in the southwest of themain catchment. Pressures in the Morland sub-catchment and Eden are predominantly flooding anddiffuse pollution.

5.2 Development of the exemplarThe development of the Local Exemplar for EVOp isoutlined below, through the storyboard development,steps taken in implementation, evaluation of the workwith stakeholders, and the potential for the future. Eachcatchment developed a storyboard that reflected theneeds of the different stakeholders based around thetheme of flooding and these are presented individually.

5.3 Storyboard description and stakeholders

5.3.1 TarlandThe community in Tarland - that includes farmers andthe dominant land owner) have worked withresearchers to understand how to adapt and mitigateagainst extreme hydrological events linked to otherissues such as the channelisation of streams, diffusepollution management, and habitat loss. Actionscoalesced following major flooding of the villagecentres and prime arable land several times in 2002(Figure 5.2). Co-constructional problem solving withstakeholders has led to trial catchment schemes ofstream remeandering, several ponds and wetlands,point source mitigation, and knowledge contributingstrongly to the sustainable approaches enshrined inthe Flood Risk Management (Scotland) Act 2009. Whatthe community have sought during this developmenthas been access to local data and knowledge toinform them and their decision making. The static toolsprovided by the agencies did not adequately providedynamic information such as live data feeds andinterpretation of flood risk management in theirlandscape. The storyboard for Tarland concentratedon this stakeholder need (see online Annex).

The project built on existing monitoring platforms at 1and 50 km2 catchment area stream stations to bringlive feeds to the catchment website. EVO hasbroadened the sensor network in Tarland and thisfeeds into live modelling within EVOp. The project hasalso brought hydrology and water cycles intoeducation by hosting weather stations at three localprimary schools. The children take manualmeasurements (rainguages, snow stick, temperature)and record and compare these to Davis Vantageautomated electronic met stations located on theschool sites.

The EVOp project has followed the timeline of an EUproject in Tarland Aquarius - Farmers as WaterManagers. As part of this considerable effort was madeto bring a flood mitigation measure upstream ofTarland village. The Aquarius project finished in 2011with a fascinating socio-economic-biophysicalknowledge of the barriers to natural and engineeredflood mitigation measures in the catchment, yet no

Figure 5.2 Looking upstream from the Coull bridge siteat extensive flooding across the prime arable land onalluvial soils in the valley base during 2002.

Figure 5.3 TUFLOW modeled output for a 1:100 yearflood event in the upper Tarland basin, which issubsequently used as the basis for visualizations of theflood extent used in engagement with local landmanagers, Council and in the web tool.

Your Catchment

http://yourcatchment.hutton.ac.uk/

Farmers as Water Managers

http://www.aquarius-nsr.eu/PilotAreas/Scotland/Scotland.htm

measure had been achieved. The proposedengineering structure was large and unacceptable tothe farmers involved and stalemate had been reachedbetween the local Council, regulator and landmanagers.

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

35

As part of the EVOp development, AberdeenshireCouncil commissioned work to improve the livesensing network as part of the ongoing EVOpdevelopments and skills demonstrated in bringing liveinformation onto web-based services. This has beenimplemented in three further stations and now providesthe Council with early warning information for planningand emergency provision. One of the stations hasbeen intentionally located at the farm site of the failedflood storage measure. The provision of information ishelping this farmer understand the water volumesinvolved, likely storage depths, and risks to his cattleand infrastructure.

Further to this, flooding scenarios were applied to thisfarmland using the hydraulic model TUFLOW, andTOPMODEL. These were calibrated with the longer-term hydrological stations' data to bring an applicationto the web based services in EVOp (Figure 5.3). Thevisualisations developed as part of the EVOp web tool(Figure 5.4) have now been used to open dialoguebetween the farmer, the EVOp project staff, and thelocal Council. It is reasonable to believe that the use ofthe full range of live data informing and interpretivetools, brought together under the local EVOp banner,are enabling a process of realising a better informedcommunity (farmers, regulators, responsibleauthorities in flooding such as Aberdeenshire Counciland school children - filtering up to their largely farmerparents). These better-informed stakeholders areprepared now to act differently and more favourablywith regard to catchment based water quantity (andquality) solutions.

5.3.2 DyfiThe Dyfi Catchment team developed an integratedstoryboard to meet the needs of two key audiences,namely the insurance industry and local stakeholders.A semi-structured qualitative discussion was held withparticipants to meet the following storyboardobjectives to:

i. introduce EVO and assess reactions to its goals aswell as present and invite feedback on a draftstoryboard as developed by the Dyfi team.Community feedback resulted in a substantiallyrevised storyboard - available in the online Annex;

ii. critically evaluate the storyboard's Integrated FloodRisk Management Tool;

iii. gather feedback on EVOp data andcommunication needs.

The storyboard approach was to co-develop the EVOpinterface with potential end-users to reflect stakeholderneeds. This co-development approach identifiedseveral unique EVO offerings, including manybeneficial tools that could only be offered with a cloud-based platform:

● Mapping Tool: The Dyfi Storyboard mocked-up amapping tool that was received very well bystakeholders. They were particularly interested inthe geomorphological vulnerability point maps, aswell as the ability to integrate these data with otherlayers from an EVO database, including LiDAR,socio-economic data, and infrastructure.

Figure 5.4 Upstream and downstream visualisations of the flood extent of a 100-year event being embedded intothe EVOp developed tools and used for local engagement to enable catchment based natural flood managementoptions.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

36

● Timeline Tool: Potential timeline tool, using slidertechnology, was recommended by stakeholdersfor EVO's flood management interface, withrecommendations that it should be split into twodifferent sections, namely (i) flood timeline and (ii)analysis tools. This has been illustrated on thestoryboard with a SWITCH button. Stakeholderswant a simple timeline, e.g. 'Past, Now, Currentand Future' to replace original presented optionsthat included Holocene, historical, and future floodpredictions.

● Live Data Tool: A storyboard mock-up of how a livedata tool could operate within EVOp wasdeveloped. It uses Google Earth as the backdrop.This shows how georeferenced SMS text inputscould be streamed to the interface to communicatelive stakeholder flood updates as well asintegrating the EA flood gauge levels.

● Offer Choice: Stakeholder feedback demonstratesthat different users have different preferences tolaunch an EVO enquiry. Some users, particularlynon-technical audiences prefer to search vialocation and want to immediately know whichdatabases are available to them for their specificregion. Other, more technically minded users aremore issue- or tool-driven, e.g. a consultantwanted to immediately know whetherEnvironmental Impact Assessment tools would beavailable to him, or whether the appropriatedatasets were available to conduct the EIA.Therefore the four steps, illustrated in thestoryboard, should receive equal weighting whenlanding on EVO platform: (i) Location; (ii) Tools;(iii) Issues; and (iv) Help. The Help function woulduse an information filtering system to deliver thebest options to match the user's needs.

● Video Tour: A video tour of EVOp should beprominently displayed upon arrival. A three minuteduration was identified by stakeholders as theoptimal video length.

5.3.3 EdenThe basis for the storyboard in the Eden centres on thepotential conflict between increasing food productionand changing upstream land use to mitigate againstdownstream flood risk. The full storyboard can befound in the online Annex. EVOp would be used todemonstrate how different scenarios of future land useand land management would potentially impact on theflood risk posed by a particular stream by dynamicallyrunning hydrological and hydraulic cloud-basedmodels of that catchment and creating differentvisualisations in the form of hydrographs andinundation maps. This information could then becombined with other data sources on housing stockand farm land lost to provide an economic cost-benefitanalysis of the different scenarios. For example, thiscould be in terms of houses inundated, propertydamage costs, loss of farmland, costs of forgoneproduction. The information provided by the EVOpwould enable the community to debate what kind offuture they want to see and show justification for the

funding of flood mitigation measures on farmland,providing farmers with incentives to adopt natural floodmanagement measures. Further information oninterpreting model results, their uncertainties and howthe models are set up would also be provided to assistusers in their understanding of the issues. Theopportunity would also exist for users to providefeedback on their interpretation of the model results.This might raise issues relating to the socialacceptability of certain scenarios or the accuracy of themodel predictions versus local knowledge of how thelandscape works. The final stage in the storyboard isthe ability to save or share model outputs using acloud-based user account.

The situation that was the basis for the storyboard is areal problem of 'muddy flooding' that occurs in the sub-catchment of Morland Beck in the river Eden. Thehypothetical situation used came from a catchmentflooding workshop attended by local farmers andlandowners, in which the EVOp was beingdemonstrated as a device for explaining the linkagesbetween activities on the land and consequent floodrisk in the river downstream.

5.4 Steps taken in implementationThe implementation of the local EVOp online demo hasoccurred in iterative stages over the course of theproject. The Eden storyboard was selected to bedeveloped as the online demonstrator initially, withfunctionality added to the other catchments over thecourse of the project. There were two main strands tothe process, the development of the web interface andvisualisations and the development of the cloudmodelling across different catchments. These twostrands combined at a later stage in the process as themodelling work was integrated into the web interface.

5.5 Web interfaceThe initial development of the web interface focused onthe development of visualisation tools and providingthe user with the ability to explore their localcatchment. The basis for this work was the use ofGoogle Maps API as background mapping due to itsfamiliarity for users, free accessibility, and widespreaddevelopment of supporting applications. A mapinterface was chosen to provide a user interested in aparticular geographical area the opportunity to exploredata availability in that location (Figure 5.5).

The ability to view and visualise 'live' data washighlighted as an interest of local stakeholders whenconsidering the topic of local flooding in the Eden. Thesources of data on river level were taken from the EA’sriver and sea levels data (Figure 5.6). For the pilot, itwas agreed simply to use the image take directly fromtheir website but, with data access agreements, thisinformation could be brought into the EVO cloud asraw data thus greatly expanding the ability of the userto manipulate and visualise it as required. Other livedata was provided by the Eden Demonstration TestCatchment project including live rainfall, webcam andstream water quality information. This data was used todevelop a second visualisation type to assist users with

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

37

interpreting the data they were looking at. Line graphplots of stream temperature and turbidity (an indicatorof stream flow) were combined with webcam imagesover the period, allowing the user to scroll over thegraph and see how the stream change related to whatwas being measured in the stream (Figure 5.7).

5.6 Cloud modellingInitially, hydrological modelling of a sub-catchment ofthe Eden - Blind Beck - was developed using data fromthe CHASM project. TOPMODEL, a very wellestablished hydrological model was used, as it hasbeen implemented as a web service. Model setup wascarried out offline to ensure that the input data andparameters were in the correct format and the modelcould reproduce observed discharge at the outlet ofthe catchment adequately. The model output is ahydrograph of stream discharge for a storm event. Thisoutput can be modified by running the model underthe different scenarios to compare the stream'sresponse to changes to land use and landmanagement.

Once the model was successfully running offline, stepswere taken to 'cloudify' the model and its outputs bylinking a database containing the input data, the webprocessing service, and a web-based model interfacelinked to the map page (Figure 5.8). This control panelenables the user to select scenarios to run, changeparameter values, and run the model in real time.

Figure 5.5 Data on local flooding in the Eden catchment.

Figure 5.6 Live river level data provided by theEnvironment Agency.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

38

Following the success of modelling Blind Beck, threefurther sites (Dacre Beck [Eden], Coull Bridge[Tarland] and Dyfi Bridge [Dyfi]) have been added tothe database to enable the modelling tool to beexploited in all three catchments.

Scenarios of land use and land management changewere developed with the help of the local community inMorland (see Section 5.9). Four scenarios were usedto illustrate how changes to land use and landmanagement practices are likely to impact on floodrisk at the catchment outlet. Uncertainty in the modeloutputs was also included in the control panel throughthe use of sliders for the most sensitive parametervalues in the model. These sliders default to thesettings for each scenario to allow a user to comparehow changes to these values alter the model outputs.

Figure 5.7 Visualising stream sonde and webcam data.

Figure 5.8 Model interface.

In addition to the functionality provided by themodelling control panel, help texts and diagrams havebeen prepared to provide background information andexplanation for the main features of the page (seeSection 5.8).

5.7 Unique parts of the designThe unique parts of the design are:

● Two part interface - map and modelling pages takethe user from a catchment overview to morefocused and complicated modelling processes.

● Viewing live data from different sources - access todata can be a significant barrier for a whole rangeof user groups, the demo utilises data from fivedifferent sources (Environment Agency, Eden DTC,James Hutton, CHASM, BADC) to producevisualisation and run models.

● Combining different types of data together -understanding data outputs for non-scientist userscan be challenging. By providing the means tocombine different data sources together, such aswater quality sondes and webcams, the task ofinterpreting data can be made more user friendly.

● Development of scenarios for land use and floodrisk - the use of scenarios encapsulates some ofthe complexities of model and parameter setupinto descriptions that are more easily understoodby people with no, or limited modelling experience.They can be used to demonstrate the linkagesbetween processes in the catchment and effectson water in the stream.

● Dynamic and elastic cloud modelling across twosites in the Eden catchment - rapid, on-demandmodel runs using a cloud-based model.

● Learning and explanatory materials based on EVOwebsite and as dynamic pop-ups to illustrate howthe model represents the landscape - materialsdeveloped to facilitate the use of the cloud-basedmodelling tool for non-specialists.

5.8 Help systemsThe help systems in place are:

● Help text and a dynamic cartoon relating a stormevent on a hillslope to the discharge seen in thestream are used to illustrate how TOPMODELconceptually represents hillslope runoff processes(Figure 5.9). The user can mouse over the graph insequence and view how the rainwater movesthrough the surface and underground to reach theriver. Text is added to explain some of the differentconcepts incorporated in the model.

● The descriptions of scenarios are provided alongwith the parameter values used for each scenarioand how the parameters represent the changes inprocess. For example, under the increasedwoodland scenario, the m parameter - whichreflects the how quickly water moves through thecatchment - is increased to reflect the greaterwater storage capacity that woodland land userepresents.

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

39

Figure 5.9 Dynamic cartoon illustrating TOPMODELand rainfall-runoff processes.

5.9 EvaluationThe evaluation process has been an importantcomponent of the activities carried out by the workpackage. Several opportunities were created to raiseEVO awareness, gather stakeholder feedback andgenerate research questions across the threecatchments including:

● Three Evaluation Workshops, held in Machynlleth(22 attendees) where several potential EVO end-users attended, including: State government(Countryside Council for Wales), representativesfrom local environmental groups, e.g. UNESCOBiosphere, Dyfi Woodlands, Centre for AlternativeTechnology, and the Centre for Public PolicyResearch, as well as local consultant scientists,environmental artists, residents and a local schoolteacher. Participants ranged in age from mid-twenties to retirement age.

● Online media, e.g. a Flickr site, twitter account andproject website, www.dyfivo.org.uk have beendeveloped, the latter receiving over 1,000 uniquevisitors.

● Stakeholder Community events, namely the launchof the Dyfi Catchment and Woodland ResearchPlatform (a collaborative research venture betweenAberystwyth University and Forest Research)which attracted over 100 scientific and policymaking end-users, including the WelshGovernment Minister for Environment, and a "Nighton the Dyfi" community event attended by over 100local people, which was sponsored and hosted bythe Dyfi Virtual Observatory.

● A short questionnaire survey was carried out withthe local residents of Morland village in the Eden togauge the level of interest in the environment, the

types of issues of interest, what questions peoplehad about the environment and, their use of theinternet. In total, 37 questionnaires werecompleted. The survey was not intended to be arepresentative sample for the village but a snapshot of people's views. It was also used to recruitpeople to attend future workshops for thedevelopment and evaluation of the local scaleEVOp demo.

● Development meetings - two meetings in theMorland area during the development phase of thelocal EVOp demo; the first with farmers in June2011 (Figure 5.10) and the second with localresidents and farmers in October 2011.

● Evaluation meetings - four meetings with a rangeof different stakeholders were carried out from Julyto October 2012 in the Eden and Tarland.Feedback on the demo development was gatheredthrough a discussion session and questionnairesurvey. A copy of the questionnaire used isviewable in the online Annex.

5.10 Issues identified through evaluation

5.10.1 Research IssuesListening to stakeholder feedback indicates there is awide range of issues that should be tackled by theEVO. These included:

● biodiversity (including invasive species) andhabitat conservation

● river flooding and sea level rise

● climate change and energy security

● food security and natural resource scarcity,including drought

● waste minimisation

● diffuse pollution

● marine health and fisheries

The storyboard developed for the Dyfi reflects this widerange of identified issues. Unprompted, a participantalso raised the need to have access to cultural data.This led to several participants widely airing the needfor socio-economic data in general to be incorporated

Figure 5.10 Discussing data with farmers in Morland.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

40

into the EVO platform to allow a 'whole systems'approach.

Furthermore, stakeholders stated the need for thenecessary tools to investigate these issues in anintegrated fashion.

As the EVO demonstration portal currently stands,there is no capacity to integrate flood modelling withdiffuse pollution; instead, these modules currently sitside by side. For the EVO platform to genuinely assistthe breakdown of 'silo' management of environmentalissues, the stakeholder feedback should be heeded.

Results from the questionnaire survey show that the'environmental' issues of interest to the local residentsof Morland varied widely. However there were anumber of issues raised that many respondents had incommon: recycling, wildlife or biodiversity, energy,weather, and climate. A number of respondents alsomentioned flooding and water quality, when askeddirectly about these issues, but levels of concernvaried from those who thought it was not a problem forthem or their local area, to people who had beendirectly affected by flooding or felt that there wereproblems with pollutants getting into the local Beck. Anumber of environmental questions were raised by thequestionnaire respondents, relating to both local,national and international issues, these aresummarised in Section 5.10.7. Of those surveyed, 89%had access to the internet either at home, work orthrough the library, indicating that delivery ofenvironmental information using a web based platformwould be able to reach the majority of these people.

5.10.2 Scale of EnquiryIt is clear from stakeholder feedback in Machynlleththat the EVO portal needs to adopt a far more flexibleapproach to allow inter-scalar enquiries. For instance,one participant asked: "What is a river catchment?"

Currently the EVO platform is set up to run enquiries atthe river catchment or basin scale, yet the researchindicates in this study and elsewhere that rivercatchment awareness is typically low among citizenaudiences. Indeed, the question outlined above, onceraised, was echoed across the workshop room. Whilea simple descriptor could be hot-linked to a "What is ariver catchment?" pop up to aid understanding,research indicates that enquiries would be betterfacilitated if they could be run at any scale chosen bythe end-user, e.g. by using a lasso tool to capture theirregion of interest.

Different enquiry scales identified from stakeholdersincluded:

i. neighbourhood, e.g. an individual concerned withtheir local river's potential to flood and use live data

ii. habitat wide, e.g. Cors Fochno by CCW

iii. river catchment scale, e.g. for users such as theUNESCO recognised Dyfi Biosphere Partnership

iv. at a UK wide scale, as communicated by CAT andPIRC, the Public Interest Research Centre based inMachynlleth

v. a scientist stated the need to take an integratedecosystem approach, including the whole Dyficatchment out to the Irish Sea at the Dyfi Platformevent.

To emphasize this, the EVO should not assume thatlocal people are solely interested in local issues.Everyone is a local somewhere, and research indicatesthat potential end-users are interested in all scales,from the local to the global, and the EVO shouldaccommodate this feedback.

Further insights from the workshops in Machynllethinclude:

● Universal agreement with the EVO aim, i.e. to solveenvironmental issues;

● No sign-up procedure wanted. This was identifiedas an unnecessary obstacle. EVO should promptsign-up only when a user needs to save EVOoutput to an account;

● Definitely requires functionality for users toimmediately search tools, data holdings, etc., uponarrival so that end-users can quickly discernwhether EVO meets their user needs;

● Near universal appreciation for jigsaw pieces as acommunication tool to convey the EVO'sintegrative approach;

● Universal agreement that tool options should bemade fully available and not dictated by assumedneeds, i.e. non-technical users can judge forthemselves whether they can or cannot use a tool;

● Mixed feelings were recorded for whether or not ascale should be used to communicate level ofdifficulty for each tool. To avoid offence,alternatives should be investigated;

● “Learn more” tabs should be universally availableand lead to a page that uses both text, visuals andcase studies as well as external links to provideample information to make an informed choice;

● When using a tool, a progress bar should beshown to communicate how many steps areinvolved to reach the end of their particularenquiry;

● Many communicated the need for a 3Dvisualisation tool that would use GIS functionalityto drape data layers over. This need was identifiedby both experts and non-experts; and

● Participants appreciated the use of a familiarmapping interface, e.g. Google Earth.

5.10.3 Development Meetings in the EdenThe first meeting focused on the challenges of farmingand water quality management in the Eden catchment.Interest was expressed by the farmers in the access tolive data being collected as part of the EdenDemonstration Test Catchments project, particularly inrelation to water quality and rainfall data.

A number of farmers also expressed an interest inhosting monitoring of other environmental variablessuch as soil moisture and soil temperature that would

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

41

also assist them with their farming practices. Issuesraised about water-related more to water resourcesand its cost, as flooding was not perceived to be anissue affecting them directly. Technological ways toimprove the efficiency of farming and reduce costswere of interest, such as optimising fertilizer applicationtimes. Measures suggested to alter diffuse pollutionand water runoff should be cost neutral and allow thefarmers to continue to produce food.

The second meeting focused more on the explorationof data, understanding of water processes in thelandscape, and the initial development of modelscenarios. A discussion of flooding in Morland villageusing the EA's flood inundation predictions highlightedthat the local knowledge of the residents could beused to 'ground truth' these model predictions andhelp to improve the way the model is set up to run.

A lot of interest was expressed in the ability to look at'live' data from the local stream, with a number ofquestions raised as a result (see Section 5.10.8). Inaddition, it was suggested that measurements of soilmoisture and rainfall could be helpful and used toprovide flood predictions. Interest in alternatives to aweb based platform such as the use of Apps wassuggested for when people are away from theircomputers. Among the attendees, the understandingof rainfall and runoff processes and when to expectflooding in the local area was relatively good.Discussion of scenarios for model developmentcentred around the need for 'realistic' representation ofcurrent farming practices, ensuring that runoffmanagement measures were proportionate to the areaand that low intensity farming or woodland conversionwere unpopular due to the adverse economic andsocial impacts on the rural economy. Increasingwoodland in certain areas was a more acceptablealternative. Some scepticism was expressed by someattendees of the usefulness of the availability of thedata for them personally:

● What was the use of water quality information forresidents?

● If I want to know what the weather is doing I justlook out of the window."

Further environmental questions raised during thesemeetings are included in Section 5.10.8.

5.10.4 Evaluation MeetingsThe results from four evaluation sessions of the localEVOp demo are reported. A wide range of differentstakeholders attended the sessions includingscientists, representatives from government agenciesand charities, land managers, local residents andfarmers. After being demonstrated or using the localEVOp demo, attendees took part in a discussion andwere asked to fill in a brief questionnaire to gain theirfeedback. The feedback from all sessions is reportedcollectively, firstly in terms of the more quantitativequestionnaire responses, followed by the main themesand specific issues emerging from the free textresponses and broader discussion.

5.10.5 Quantitative FeedbackFigure 5.11 summarises responses to the mainquestions on the likely usage, ease of use andappearance of the local EVOp demo.

The results from the questionnaire data suggest thatthe respondents have a mixed perception of the localEVOp demo. Although the total number of responsesdoes not represent a large sample size, there are agreater number of positive than negative responses toquestions about interest in using the demo and itsappearance. The large number of neutral responsesperhaps reflects that the demonstrator, as shown, doesnot currently meet the needs of those users, which asa pilot might be expected. The potential use of thelocal EVOp demo was seen more for work as opposedto home purposes, indicating how stakeholderscurrently view its likely utility.

5.10.6 Qualitative FeedbackA number of very interesting and helpful discussionswere held during the evaluation sessions, yielding a

Figure 5.11 Main quantitative questionnaire responsesfor the local EVOp demo.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

42

range of information from participants on specificissues relating to the functionality of the currentdemonstrator and what further developments theywould like to see made to enhance what is beingprovided.

A clear message from the evaluation with stakeholderswas that the concept of the cloud and its use was notimportant to them; there was an expectation that thisinformation should be available via the internet. Thereal interest was in how it could help their particularproblem or the way information was presented.

A general comment on the local flooding tool was thatpeople were confused on the detail and who wasbeing targeted to use it. In general, stakeholderswanted to know the purpose, what models were tellingus, and what it means for day to day managementdecisions. There was need for a much greater contextto what was being presented. It was suggested theEVOp local scale tool could be developed into a farmerengagement tool. Further development of the localtools would probably need to focus on the needs of afew regular end-users, such as Catchment SensitiveFarming Officers and the River Trusts, with theprospect for widening this audience over time.

More specific functionality issues raised during theevaluation events related closely to the general themeof context, including the need for more assistance withusing the demo in terms of help functions, tutorials andvideo guides to explore specific features. This wasparticularly the case with the more technical aspects ofthe demo, such as the modelling pages. One groupalso felt there was a need to differentiate what waspresented in terms of users - hiding parametermanipulation within the modelling page as an optionaladvanced step. There was positive feedback on theuse of the mapping tool and the way it presented data.However, further advances could be made by linkingmodelling to the specific catchments where the data islocated. Design issues that were highlighted relate tothe cluttered appearance of the demo interface, withredundant or dead links, overlapping logos, and poorplacement of features which require the user to scrollup and down to view information. Integration andvisualisation of different data types, such as theexample of combining stream sonde and webcamimages was seen as very positive and a powerful toolfor promoting understanding of stream functioning.

A whole range of suggestions were made for broaderenhancements of the local EVOp demo. In particular, itwas felt the tools needed a clearer focus andapplicability to land management scale decisionmaking, be that a farm scale tool or clearerimplications of the impacts of decisions made at thefarm level in terms of economic cost or practicalchanges needed e.g. what is the cost of the topsoil lostper year? How much woodland and where?

The availability of tools specific to land managementactions were also suggested, using automated datacollection such as sensors for soil moisture to inform afarming activity such as when traversing the land invehicles would not cause damage.

The concept of bringing together different data sourceswas widely welcomed, although extension of this wasneeded to provide for the interdisciplinary data needsof different users and linkages between existingprojects to provide evidence for the modelled changesin land management practice. For flooding, examplesof case studies suggested are the National TrustHolnicote flood management project, and the EdenRivers Trust's ALFA project. The ability to utilisehistorical data to understand the context of currentpredictions was also suggested as a way to extend thecurrent range of tools. It was indicated that a betterunderstanding of the underlying risks posed by theenvironment is required in farm business planning. Anexample was given of being able to demonstrate anincrease in frequency of high-intensity rainfall events,relative to the last 50 years. The risks associated withthis were not only the 'end of pipe' problems offlooding or diffuse pollution risk but also those inmedium- to long-term farm planning, such as decisionson crop grown and infrastructure investment, whichcan ultimately affect environmental outcomes. There isclearly an enthusiasm for the potential offered by thelocal EVOp demo; however this requires furtherrefinement to translate it into a true decision supporttool.

5.10.7 Questions raised by residentsQuestions raised by residents include:

● What species were in this area historically?

● Why can't the rivers be dug out like they used tobe?

● Why are there restrictions on managing the river?

● Why is there more flooding now?

● Can I get home if the river floods?

● What was the local drainage project in the villagefor?

● What is being done in the local area?

● What are people in this area concerned about?

● What lives in the river?

● What animals live around here?

● What plant/ animals are in trouble?

● Why aren't the rivers cleared?

● What wildlife is in the Beck?

● Why are there algae in the stream?

● Is there an economic case for wind farms?

● Is our waste really getting recycled? Is landfill asin?

● Why are they cutting the trees down next to theBeck?

● Why have the starlings been disappearing?

● Will I get flooded?

● Does recycling go to landfill?

● What's the point of recycling?

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

43

● Does the salt from the roads affect the fish in therivers - is it killing them?

● Why are onshore windfarms used?

● Environmental projects - what is going on in mylocal area?

5.10.8 Questions raised during workshops withresidents and farmers

Questions raised by residents and farmers include:

● How high are the nutrient values in Morland Beck?

● How do they compare to other locations?

● Are what we are seeing in the stream legacyeffects from farming practices years ago?

● How much N, P, K are we losing?

● What are the monetary values of those losses?

● Can we do something about the manure beingwashed off the land?

● Do these results depend on what season it is?

● How long does it take for the water to get into thestream after a storm?

● Are there historical water quality data available forany sites in the local area?

● How has the water quality changed since 2000?

● Are we actually in a much improved environmentcompared to years ago?

● Do we actually need to improve water quality?

● What are the options for controlling the flow ofwater off farmland?

● What affect does vegetation on the river banks andbed have on the flow of water?

5.10.9 Potential for the futureOverall, there was universal agreement that EVOp hasthe potential to provide a tool that holds botheducational and scientific value. Results clearlydemonstrate demand for the EVO at a local scale. Theworkshop attendees in particular identified the hugebenefits from using a 'cloud' platform, e.g. accessingdata that is not normally available to them, as well asusing tools that they do not usually have on their owndesktops. Above all, stakeholders saw EVOp as aneffective means to taking a whole systems approach tosolving environmental issues and would like to see itmade available as soon as feasibly possible.

There is a great deal of potential to further develop thelocal EVOp demo and incorporate some of theadditional features illustrated by the storyboard. Inparticular, to make the outputs from modelling moretangible and potentially useful, the results from thehydrological model need to be fed into a hydraulicmodel set up for a particular community. This wouldenable the effects of different land use and landmanagement scenarios to be tested in terms ofpredicted inundation maps. By linking these maps tosocio-economic data, assessments of the costs andbenefits of options could be made, providing thecommunity with information with which to make

decisions. It is essential that additional functionalitywithin EVOp is matched by the careful development ofsupporting material and help features to empower andeducate users in how to carry out analysis in aconsidered way.

At a simpler level, the addition of more data andsensors from within the study catchments or acrossmore locations would expand the geographical rangeof EVOp, while expansion into other issues such asdiffuse pollution would provide more information onspecific localities, highlighted by the local communityworkshops.

In terms of tools, the ability to import and manipulatedata, rather than stream an image, would allow moreoptions for how the user can view data at sites andcompare data between sites. Creating a greater senseof ownership of EVOp by the wider community isimportant for its continuation and future success; thismay partly be achieved by the development of crowdsource tools to enable a wide range of people tocontribute to tackling science problems.

5.10.10Specific opportunity to link sensors and modelsEnvironmental models are essential for understandingprocesses and creating predictions in the naturalenvironment. However, models usually require a widerange of data in order to generate results. Recenttechnological advances which allow the potential tocapture, transfer and visualise field measurements, aregrowing rapidly, but, there are many fundamentalhydrological and environmental issues to resolve; e.g.scale issues, resolution, data formats and processfunction.

Models are often built, calibrated and validated withsynthetic, incomplete or simplified datasets. Theestablishment of environmental and virtualobservatories is helping to inform us of the past,present and possible future state of watersheds andwill supply the tools to quantify and assess the impactsof past and future management through data sensingand visualisation.

There are many different types of environmental datawhich can be used in models. For example, these datacan be quantitative or qualitative, they can be onepoint measurement or a gridded map; instantaneousor time series based; and data can be a realmeasurement or a spatially interpreted synthetic map.

A vast array of historic environmental data exists in arange of different formats. Traditionally, the mostcommon way to collect data was to physically go outand measure it or have some sort of remote recordingdevice which needs human intervention to download it.In the last century, telemetering environmental sensorsstarted to evolve. For example, in the 1920s the firstenvironmental telemetered data was collected fromweather balloons for weather forecasting. The conceptof remotely connecting to a sensor and acquiring realtime data was starting to grow. However, early forms oftelemetry used telephone lines and basic radiocommunication. With the development of GSM mobilenetworks, it is now possible to telemeter more

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

44

Figure 5.13 Schematic of the EVOp local work package exemplar.

environmental sensors. And, the speed of thesenetworks is also growing, GSM has been supersededby GPRS and GPRS has been superseded by 3G. Thisis not a trend that will slow down. 4G technology isstarting to be rolled out further and our mobile networksignal footprint is growing fast globally. Where wecannot access a mobile network, new communicationmethods and new wireless protocols can be used.

Being able to now access this superfast mobilenetwork allows us to connect to more sensors,download large files faster and access sensors in realtime. With the development of newer fastercommunication methods it will soon be possible tocollect larger files at faster speeds from further away.

It is not just communications which are advancing, sotoo are sensors. For example, over the past decade,the technology behind water measurements hasbecome more robust and the data is more accurate. Itis now possible to gather real time water quality datafrom local catchments and advise on the status of thechannel. However, the advancement of this technologydoes not necessarily lead to fully automated systems.Sensors will still always need to be calibrated byhumans.

New sophisticated sensor networks, which gathermore data and transmit it more quickly means that thevolume of data collected will be greater. Thus, theamount of data collected is growing and it needs to bearchived in meaningful ways. This was highlighted byGoogle CEO Eric Schmidt when he stated in a speechin 2010 that every two days the amount of informationgenerated is equal to everything that was gahteredfrom the dawn of time until 2003.

It is information that is needed rather than simply data.Therefore the role of the EVO is to inform on what datawe have, what it can be used for and how, when linkedto EVO tools, it can be used for science or to informdecision making.

There are many applications where environmentalsensors are currently linked to forecasting models. Forexample weather radars are linked to weatherforecasting models, and gauging station and raingauge networks are linked to flood forecasting models.Sensors can be used in real time to help reduce theuncertainty of short term predictions. Obtaining reliabledata is vital, but reliable data is expensive to collect.

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

45

For example, not every river in the UK is monitored forflow or level. Therefore some settlements close tounmonitored rivers are not connected to a floodwarning system. This is linked to the cost of monitoringthese rivers and to the accuracy of modelapproximating to the local area.

Flow and level measurements are vital forunderstanding flood characteristics of a river, but alsothe severity of a flood. However, it is just as importantto convey the accuracy of the data and the implicationof uncertainty on the results. Equally as important arequalitative forms of data from floods. The concept of'crowd sourcing' using mobile devices is starting tochange the way environmental data is collected.Whether it is collecting georeferenced photos of aflood on a mobile device or recording the location ofpollution breaches on a blog, this new data sourcefrom mobile technology gives a new level of data thatcan be used. For example, pictures of flood extentscan help to calibrate a hydraulic model and thereforereduce uncertainties associated with predictions.

Figure 5.13 highlights a schematic of how the localEVO exemplar functions by bringing together cloudbased databases and models along with catchmentknowledge (Figure 5.13 - layer 1). Layer 1 exploitsexisting cloud based resources, tools and models. Inthe EVO pilot it was shown, using Google Maps, howlocal communities can gain access to liveenvironmental sensors in their catchment (Figure 5.13 -layer 2). Data from these catchments has been used in

cloud modelling allowing stakeholders to betterunderstand future scenarios in their catchment (Figure5.13 - layer 3). The bulk of the effort is driven by theneed to integrate datasets within a commonreferencing system in forms that make the use of thedata clear to the end-user. Computing on the cloudalso creates its own range of technological challenges.However, each step forward seems to open up a vastarray of potential to link spatial and temporal mapswith novel sensor and qualitative data.

The technology behind environmental sensors is stilladvancing. More sophisticated sensors are beingdeveloped that will allow measurement of newenvironmental parameters (e.g. Phosphatemeasurement directly being measured in the fieldmore quickly and accurately). The cost of producingsensors is rapidly decreasing as new technogiesadvance (e.g. a raingauge disdrometer can now beproduced for ¤10) and new low cost telemetry optionsare becoming available (e.g. 4G network and zigbeewifi communication). With these new developments itwill be possible to create low cost dense networks ofsensors that produce more accurate environmentaldata. With this, a new array of database technologieswill arise allowing the user to directly connect to theright data source more quickly. Connecting these newdata sources directly to models using cloudtechnology will reduce uncertainty aroundenvironmental predictions.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

46

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

47

The EU Water Framework Directive has enacted a legislative imperative to ensure all European waters bodiesreach 'good ecological status' by 2015. Nationally this requires improvements to modelling hydrological quantityand quality in all water bodies and water resources in a more integrated manner and at the same time assess theuncertainties in these predictions. Assessment of uncertainty will be especially important when conductingscenario testing for quantifying likely environmental change (Cloke et al. 2012). Examples of current models thatcan be run at national scales include G2G (e.g. Bell et al., 2007a and 2007b), which is being used as the basis fornational flood forecasting (e.g. Price et al., 2012) and mass balance mixing models of flow and water qualitybased on monitoring evidence such as SimCAT (Crabtree et al., 2009). However what these models do not allowfor is understanding and testing where competing conceptual structures of processes best-predict streamdischarge and quality and if this can be related to catchment characteristics.The EVO project produced national exemplars for both hydrology and biogeochemistry to explore the ways inwhich cloud-computing could support the development of an integrated modelling framework to deliver ensemblepredictions of hydrological and biogeochemical behaviour in catchments for the whole of the UK, and estimate theuncertainties associated with these predictions.

6.1 Hydrological National ExemplarThe national hydrological modelling conducted for theEVOp project is the first of its kind to test competingmodels of rainfall-runoff structures nationally, and in acomprehensive uncertainty analysis procedure toquantify predictive uncertainties. With regard to themodelling methodology, the model structuralexploratory scheme (FUSE) developed by Clark (e.g.Clark et al., 2008) was used. FUSE is to date the mostcomprehensive tool available to researchers in thehydrological sciences for the exploration andassessment of both model structural and parameteruncertainties in the manner attempted here.

With regard to the geographical scope of the work,FUSE has been applied at catchment scale with almostcomplete national coverage; that is to say, nearly all ofthe regularly monitored catchments brought within theNRFA stream gauging network have been included.This extends the modelling assessment far beyondwhat has been attempted in previous studies for theU.K (e.g. Bell et al., 2007a and b; Arnell, 2011) andreflect the first national scale benchmarking ofpredictive capability. Taking these points into account,the research is therefore able to demonstrate resultsnot only for individual catchments but also the patternsof model structural performance and parameteruncertainty that emerge across the whole landmass ofmainland Great Britain, with the exception of thosenear-coastal catchments and adjacent areas for whichno stream data were made available.

Key questions of interest are:

● What happens when we join up data nationally?

● What happens when we join up modellingnationally?

● What can this deliver in terms of nationalcapability?

The modelling has therefore been aimed at achieving anational picture of the ability to predict streamflow, withthe broadest feasible range - so far as this is possiblewithin the National River Flow Archive (NRFA) gaugingnetwork of catchment sizes, locations andcharacteristics. The catchments exhibit differences inflow control and management, such that some arealmost wholly natural, whereas others may be highlymodified for regulation (for example, by flow controlsduring floods, or by abstractions for irrigation). In theUK, many catchments will lie somewhere betweenthese two extremes; they are neither wholly natural norwholly managed. The list drawn from NRFA recordscomprises 1,454 stream gauge stations in the UK(Figure 6.1), of which 1,403 are in Great Britain andanother 51 in Northern Ireland. Catchment sizes rangefrom ~1 km2 to 104 km2, and some catchments arelocated on offshore islands (the Hebrides, Orkneysand Shetlands).

By including such a wide range of geologies, there isalso clear inclusion of the influence of differentgroundwater and base flow conditions. Thus, manycatchments in areas overlying the chalk and otherpotentially aquiferous rocks will exhibit high base flow

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

48

indices (BFI) which define the potential contributions ofgroundwater; similarly, some catchments in theselocations may also exhibit major losses togroundwater, such that runoff is much lower thanwould otherwise be expected from consideration aloneof mean annual rainfall and evaporation from surfacetopographic features.

In addition to achieving the national coverage outlinedabove, the modelling seeks to achieve the finest spatialand temporal resolutions - with respect to inputs andoutputs - commensurate with the time and resourcesavailable. A daily time step has thus been applied in allthe calibration work, such that rainfall and evaporationinputs, the stream flow data and the modelhydrographic outputs, are all expressed in dailyintervals. With respect to the spatial resolution of theinput data, rainfall and evaporation are sourced from

Figure 6.1: location of the 1454 stream gauging stations in the UK for which NRFA stream flow data wereprovided. The geographical boundary of Northern Ireland is extended south beyond the political border so as toinclude all of the contributing areas of the cross-border catchments.

daily values at 1 km2 resolution and then aggregatedfor each catchment according to the area cut.

Two further aims in the modelling are related to dealingwith calibration and prediction uncertainty. Inparticular, the research seeks to demonstrate theinfluence and effect of both by specifically includingmultiple model structures in the calibrations. In thisway, not only are the best (and worst) performingmodel calibrations derived for each catchmentaccording to particular metrics or combinations thereofand through the application of dense sampling in theparameter space, but these calibrations are achievedusing a selection of different hydrological modelstructures. In this way, the best performing structurescan also be identified for each catchment and metric,or metric combination. The research thereforehighlights how model structural differences affect

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

49

model calibration; it also shows how model structuralperformance varies across space and by catchment.

6.1.1 Chosen modelling methodology: conceptionFUSE has been developed as a tool to aididentification of the most appropriate model structureto use for a particular problem (Clark et al., 2008;Kavetski and Clark, 2010). The scheme as presentlycoded applies to a set of lumped catchment models.

An important research question in hydrology is to beable to understand what level of complexity is neededin a conceptual model of rainfall runoff processes toprovide useful predictions, given imperfect data andknowledge of catchment behaviour. There is no doubtthis will vary depending on the questions being askedof a particular modelling application, and so differenttypes and complexities of models may be needed fordifferent purposes. The work focused on identifying thebroader patterns of model prediction capability, andhence benchmarking national capability, whilstrecognising that model structures are not perfect andwhere the catchments themselves may be highlymodified and lack critical details that might inform whymodel deficiencies do or do not occur. Hence, themethod embraces the potential for model structuralerror; and this in turn is likely to vary from place toplace (Beven, 2000, 2002, 2007). In this respect, apoor calibration may be caused by major incongruitiesbetween the physical properties of the system - thecatchment being modelled - and the model beingapplied to simulate it. There also may be situationswhere a calibration appears to be good even thoughthe underlying structure is a poor representation of thereality. This may provide "the right answer for thewrong reasons". In addition, there may be many waysin which an acceptable answer may be obtained asdefined by good predictive capability, so that themodel demonstrates a significant degree of modelequifinality (Beven, 2006; Beven and Freer, 2001). Theaim of the method, within a comprehensive uncertaintyanalysis procedure is to aid diagnosis of thesepossible modelling outcomes, and thereby contributeto understanding of the most appropriate structure orstructures to use for a particular application. Moreover,a preliminary study using FUSE to evaluate modelstructures in an uncertainty analysis framework revealsthat individual model performance is different indifferent regions and no single structure is likely to be'best' across all catchments (Coxon, 2011). The needto assess for multiple model structures is thus clearlywarranted from both theoretical and evidentialstandpoints.

6.1.2 Chosen modelling methodologyThe FUSE scheme uses four primary source modelstructures, all of which have been applied widely andare well respected, these include TOPMODEL, VIC,PRMS and SAC (Clark et al., 2008) (see Figure 6.2).These are all broadly similar in that they incorporatestate variables for soil water - in one or more stores -with fluxes which allow the movement of water throughthe system according to particular process laws. The

equations of state and the parameters which are usedin them govern the flux rates at any time-step in eachmodel. Solution of the fluxes and updating of statevariable quantities is carried out using an implicit Eulerscheme, which is considered and demonstrated to bemore conceptually and mathematically correct thanother schemes typically employed in hydrologicalmodelling (Kavetski and Clark, 2010).

An important feature of FUSE is that the sophisticationof the code allows new model structures, here called'variants', to be formed out of any component of any ofthe four source models. Thus one variant maycomprise components of PRMS and TOPMODEL only,whereas another may use components of TOPMODEL,VIC and SAC, and so on. The mixing of the models'features in this way is not entirely without limit andthere are conceptually feasible variants which are notpractical to include due to their complexity and theawkward structure of the code needed to run them.Nevertheless, FUSE permits over 1,000 variants to betested and explored, each with a parameter andequation set controlling fluxes and states in themanner described. One of the most importantconsiderations here is to be able to simulate more than1,100 catchments for a range of model types andwithin an uncertainty analysis procedure that allowsunderstanding of the limits of the predictive capability,and to express the uncertainties in the predictions.

6.1.3 Acquisition and preparation of dataIn order to conduct the calibrations, the main inputs ofrainfall and evaporation need to be prepared for eachcatchment and calibration time period of interest.Likewise the stream flow data from the NRFA recordsneed preparation, and in particular to be examined forgaps or errors in any record which might make itunsuitable for use. A general rule applied initially in thework conducted - the "80-10 rule" - was that in anycalibration period, the stream flow record should be atleast 80% complete, and the length of any single,contiguous gap in the record no longer than 10% ofthe total calibration period, excluding the initial warmup. The application of this rule meant that ~150 of thestream flow records were unusable.

Rainfall and evaporation input data were provided inmm per day. With respect to the rainfall data provided,this is the daily 1k m2 resolution record prepared bythe EA and the Met Office. The methodology for thepreparation of which is reported in Keller et al., 2006.To use in FUSE, the daily rainfall for each squarekilometre cell is first compiled into a time series for thatcell, covering the period from 1st January, 1961 to 31stDecember, 2008, which is the entire data record periodprovided. The total for each day for a particularcatchment is then summed by selecting the sourcecells (or part thereof where appropriate) relating to thecatchment cut area and aggregating all of each day'stotal for those cells, and then repeating this for the nextday, and then the next, and so on, thus forming anequivalent daily time series for the catchment as awhole.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

50

Figure 6.2 Simplified model structure diagrams, adapted from Clark et al., 2008, of each of the four source modelsused in FUSE.

A similar method is applied to the evaporation product,sourced from the UK MORECS records (Hough andJones, 1997). However, the MORECS data are firstprovided at 40 km resolution (thus 1600 km2 per cell),so the daily time series are first divided into singlesquare kilometres, and these are then aggregated foreach cut catchment area in the same way as for therainfall. It should be noted that throughout the potentialevaporation product ("PET") was used from theMORECS data rather than actual evaporation product.The reason being that within each model structure inthe FUSE scheme a calculation is made for actualevaporation losses.

Regarding the cut catchments, it was not possible toobtain a product to use (based on the NRFAcatchment outlines dataset), so the areas and outlineswere calculated from a 50m resolution DEM of GreatBritain, resampled from the 5m NextMap data availablefrom NEODC data sources; a similar DEM wasobtained latterly for use to cut out the catchments inNorthern Ireland. Of 1,403 catchments aimed to be cutin Great Britain, over 1,250 were cut so as to produceareas within +/- 5% of those on the NRFA database,and a further 100 or so to within +/- 10%. Generally,the areas cut and the shapes obtained were judged asacceptable, and the overall extent of catchmentcoverage - even after removal of those not usable inthe first calibrations because of gaps in the streamflowrecord - was considered more than adequate to satisfythe project’s aim of achieving national hydrologicalmodelling (Figure 6.3).

Regarding the streamflow data, these were all providedin units of mean flow discharge per day, expressed incubic metres per second. These data were convertedto equivalent mm per day, using the areas cut for eachcatchment, before being used in the FUSEcalibrations.

6.1.4 Sampling, model evaluation and modelperformance metrics

The approach to each set of model simulations followsthe same path, namely to begin with a review of thestreamflow data to establish the most suitablecalibration period. A further restriction made on thecalibrations conducted so far is that the usablecalibration period should be 10 years long, plus a oneyear start up period, and should end on or after 31stDecember, 1998. Thus, the most recent possiblecalibration period would cover from 1st January, 1999,to 31st December, 2008, with a warm up period from1st January, 1998; similarly, the oldest acceptableperiod, based on the same selection method, wouldcover from 1st January 1989, running to 31stDecember 1998, with 1997 as the warm up year. Theselatter requirements were applied so as to ensure theinitial set of calibrations conducted extend overcomparatively recent periods which may still beconsidered broadly equivalent with the present interms of climate, land use, hydrographical responseand catchment flow management

Once the streamflow period had been chosen, therainfall and evaporation data for the same period arealso selected from the catchment aggregated datasets

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

51

in order to begin running the FUSE calibration. Furtherchoices required are which model structural variants toapply in the calibration, and how many points tosample in the model parameter space.

For every catchment's results reported, the decisionwas made after some preliminary testing andexploration to conduct the first set of calibrations using2,000 sample points per catchment per model. Thetotal number of sample points per catchment is thensimply a multiple of the number for each modelstructure. Also, the sampling scheme is a space-fillingSOBOL scheme, similar to a Latin-Hypercube method,and incorporated in FUSE's general program structure;the use of SOBOL is explained in more detail by Clarket al., 2008.

With respect to the model structures of interest, a coreaim within EVOp was to demonstrate the effects ofevaluating multiple model structural simulations anduncertainty. This has never been demonstrated beforein the UK and is only really possible with morepowerful cloud computing type resources.

(a) Catchment areas cut (b) Aggregated area of catchments

Figure 6.3 (a) 1,103 catchment areas cut in mainland Great Britain, based on the 50m DEM, all to within +/-5% ofthe NRFA listed areas. Although all 1,403 catchments were cut, only those within the +/-5% band have been used,and the potential number is reduced after removing those with incomplete streamflow records. In (a), the smallercatchments are overlaid on the larger so as to preserve detail of the smaller areas cut. In (b), the aggregated areaof the catchments cut in (a) is shown, demonstrating that the coverage is national, although there are some areasmissing in the lowlands and near the coastal margins. The catchment areas in Northern Ireland are not shown.

During the model evaluation of each catchment andeach of the four model structures, the predicteddischarge was compared with the observed. A set ofcalibration performance metrics were then calculatedand were related to each sampled point in the multi-dimensional parameter space.

With respect to calibration there is a wide range ofpossible metrics that might be used to quantify thecalibration adequacy, and these metrics may in turn besplit up in different ways, for example by season(Figure 6.7), to provide more discrimination betweenone performance metric and another, or onecatchment and another. The only metrics commentedon within this report are the Nash Sutcliffe index ("NS"),the sum of absolute errors ("SAE"), and the Nash-Sutcliffe index of the logarithm of flows ("NSlog"). Themethod of calculation of the NS and NSlog indices isshown in the online Annex.

These particular metrics have been chosen becausethey can be used to indicate how well a model hasbeen calibrated, not only to the overall flow record, but

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

52

also with an emphasis on certain flow magnitudes.Thus, the NS index is often found to be particularlygood at calibrating for the higher flood flow peaks,whereas the NSlog score is better for evaluating modelstructures that perform well for low flow periods; theSAE score appears to be a useful measure forassessing general model simulations for the wholeperiod.

With respect to the running of the FUSE scheme, thishas been conducted on a high performancecomputing (HPC) cluster, which is, for the purposesreported here, a closed cloud computing resource.Jobs submitted permit many different catchments tobe run at once. In the first main calibration exercisereported here, the run comprised some 8.8 millionsimulations, and took eight working days to completeon the HPC cluster.

The abstracted output requires further analysis andprocessing before the data can be presented asusable results for inclusion in scientific literature.Dealing with the SAE metric, the top 5% of results foreach model and catchment are considered usable.Although this appears on the face of it to be anarbitrary cut off, it can be considered in an equivalentway to the standard of accepting as significant aprobability of 0.05 (i.e. 5%) or lower in a statistical testsuch as a Student's 't' test. In the same way, the best5% of NS and NSlog scores are also treated as usable,

with the proviso that all such scores must be greaterthan zero. A feature of NS and NSlog is that themaximum possible score is 1, denoting perfectagreement between the modelled and observedoutput. Similarly, a score of zero denotes a calibrationresult that is no better than using the mean flow (ormean of the log flow) for the whole series, which wouldbe of no value in this work. It follows that only scoresabove zero are used, and these in turn must be withinthe top 5% of the results (Figure 6.4).

6.1.5 Initial results: examples of spatial presentationand analysis

The procedure of catchment selection and calibrationwas conducted for all of the catchments satisfying therequirements of the most recent 10 year periodselection and after application of the 80-10 rule. Theabstracted results were analysed, and can then beplotted to begin to benchmark the national picture ofpredictive capability for all catchments analysed.

Figure 6.4 provides a useful overview of the sorts ofresults the national hydrological modelling hasgenerated. The results plotted here are from 1,103catchments, this being the number that satisfied thefirst calibration period, and the 80-10 rule.

Figure 6.5 shows the best NS score for eachcatchment and model structure from the Monte Carlosimulations, as explained above. The results clearly

Figure 4, (a) and (b) Example of FUSE output, the calibration data here being the Nash-Sutcliffe index ("NS")achieved by each of the four main source models - TOPMODEL, VIC, PRMS and SAC - for the catchment of theRiver Thames at Kingston, stream gauge no. 39001. To aid comparison between the models, the sample pointsare plotted cumulatively on the x-axis, 2000 points sampled per model structure. See figure text by (a) and (b) fordetail.

(a) all NS output data across the four models. Most ofthese calibrations must be discarded because any NSscore <0 represents a calibration worse than simplyusing the mean flow.

(b) the same data as in (a) but only showing theacceptable NS score (>0). The calibrations actuallyused for EVOp purposes are then a subset of these.Note also how performance differences betweenmodels are evident.

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

53

indicate differences between the best modelsimulations achieved with each model. It is strikinghow in closer analysis, the poor performance of someof the models can be related almost immediately togeology or land use characteristics. For example, bothTOPMODEL and PRMS do poorly on areas dominatedby chalk. More detailed analysis also shows that inmany areas a NS score of 0.8 or higher has beencalculated. Given that the inputs of rainfall andevaporation have been lumped, sometimes over quitelarge areas (up to 104 km2 in the case of the Thamesat Kingston), and the inputs and outputs are alsoaggregated to daily periods, such high calibrationsscores are considered a very promising start; it will beof great interest to see to what extent these results canbe improved upon with further work, for example bytrialling other model variants.

The NS score is particularly good for calibrating for thehigher flows, so it is of interest to compare resultswhere the calibration metric better-matches the lowerflows, using the NSlog score. Rather than showing thisfor each model individually, the results can also beviewed across the whole model ensemble, showingthe best result for each catchment regardless of themodel structure used (Figure 6.6).

Likewise, inter-seasonal comparisons are also madepossible on the same basis, using the ensemble bestresult for reach catchment (Figure 6.7).

6.1.6 Single objective versus multiple objectivecalibration performance criteria

In the example results, only one metric at a time hasbeen considered. However, a broader calibrationparameter set for each catchment may be provided byconsidering multiple performance criteria i.e. usingmetrics in combination to calculate a more generallyapplicable set of calibrations across the differentmetrics or models. For example, it is evident from thecontrast between high and low flow performance, inFigure 6.6, and the seasonal differences, in Figure 6.7,

Figure 6.5 Best NS scores for each catchment and model structure; results shown for 1,103 catchments, 2,000sample points per model structure. A score of 1 is a perfect simulation, scores below 0.6 would not be normallyclassified as good simulations of high flood flow behaviour.

that trying to obtain an overall best fit for all seasonsand flow conditions will require a degree ofcompromise. Similarly, when including additionalmetrics (for example NS, NSlog and SAE incombination) the best parameter values generating thebest results for each metric individually may not bethose that generate an equivalent best value for theother two metrics.

6.1.7 Other considerations and calibration issuesOne aspect of the work conducted to date is that noallowance has been made for catchments departingfundamentally from the conceptual structures shown inthe FUSE scheme. In particular, although the four mainmodel structures used in FUSE are immediatelyapplicable to a wide range of soils, geologies andclimates, there are difficulties when these are appliedto catchments where the flows are strongly affected bygroundwater, in particular high base flow indices,losses to groundwater or from abstractions to extra-catchment areas. Abstractions and irrigation are likelyto affect overall water balance. If severe this is difficult ifnot impossible to compensate for by parameter valueadjustments alone. Similarly, a catchment may gainwater from sources beyond its topographic watershed.For example for industrial discharges or domesticoutflows sourced from reservoirs well outside thecatchment area. In future work, it would be of greatinterest to see whether model calibration performancecan be improved in the catchments most subject tothese factors, for example by trying to take intoaccount recorded abstraction and discharge datawhere these are available. This of course would begreatly simplified in a full EVOp where such data andmodels were more directly coupled in a moresophisticated framework than can be achieved in thisdemo pilot.

Another aspect to consider is the reliability of the flowgauge data itself, irrespective of whether there aregroundwater or abstraction factors to account for. Themodelling here demonstrates clearly that for many

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

54

catchments, there is no uniquely 'best' model to use fornational hydrological modelling; rather the modelstructure most appropriate for one catchment appearsto differ from that most appropriate for another.However, the veracity of this finding may need to betempered by assessing the value of the calibrationdatasets. For many of the stream gauges, the data arelikely to be at their most reliable when the flows arewithin bank. However, once the stream or river is inflood, the discharge becomes more speculative, andduring large flood events, where the water is well overbankfull, the stream discharges may be seriously inerror (either over- or under-estimated). Work is in handto undertake a review of stage-discharge relationshipsat various gauges in the UK, to see how considerationof the uncertainties in the stream gauging may affectoverall calibration and model structural uncertainties.This work is beyond the scope of the EVOp, butpresents an important and potentially valuableopportunity for further scientific research, and one thatwould also benefit the usefulness of the EVOp.

6.1.8 Northern IrelandAlthough stream flow data were provided for 51catchments in NI, there is no rainfall or evaporationproduct, equivalent to those for GB that can be usedas inputs for the FUSE modelling. This point aside, thecatchment cuts have been prepared for NorthernIreland and these can assist with the biogeochemistry

Figure 6: best NS scores contrasted with best NSlog scores, for the entire combined multi-model ensemble andeach catchment. These metrics indicate extra detail in the calibration performance, NS being most indicative of agood match of the modelled output with high flows and NSlog of a good match of the model with low flows.

national exemplar. Also, if suitable rainfall andevaporation products become available, calibrationsusing FUSE could be conducted quickly to augmentthe national hydrological modelling already completedfor Great Britain.

6.1.9 Unique science and demonstrations of thehydrological national exemplar

● The first ever full exploration of the nationalhydrological modelling capability for greater than1,100 catchments.

● The first ever national assessment of multiplemodel structures in a closed cloud computingresource.

● The first ever national comprehensive assessmentof model uncertainty analysis to understandparameter and model structure uncertainty andpredictive capability.

● The first ever national multi-criteria assessment ofmodel simulation performance that explicitlyassesses if models are fit for purposes for highflows, low flows and seasonal responses.

● Improvements to the hydrological modellingpredictive capability by using a grand ensemble ofmodel structures.

● Ability to extract model structures and parametersets that are 'behavioural' for simulating 'tailored'

High flow performance Low flow performance

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

55

Figure 6.7 Best NS scores for the entire model ensemble for each catchment,, contrasting summer with winterperformance. The poorer calibrations in the summer in the south, particularly in areas dominated by the chalk, willbe noted; similarly the poorer calibrations in winter in Scotland, over the northern highlands, are striking, and arepossibly due to snow melt.

performance metrics for different types ofcatchment flow responses.

● Identify linkages between model structures andparameters to catchment characteristics than canaid regionalisation and predictions in 'ungauged'catchments.

6.1.10 Potential 'near possible' opportunities due tothis research

● More explicit coupled linkages between dischargefluxes and the biogeochemistry components tounderstand the monthly and seasonal dynamics ofwater quantify and quality fluxes.

● Improve the diagnostics (i.e. understanding modeldeficiencies to build better models) of modelstreamflow predictions by explicitly taking intoaccount the uncertainties in the observed dataproducts.

● Develop the first national flood inundationsimulation framework that explicitly included theuncertainties in the predicted upstream boundaryconditions. This will allow the inundationuncertainties in given flood return periods to beexplicitly quantified

● Further increase the spatial complexity of modelstructures used in analysis to understand the limitsof the predictive capability

● Enhance the understanding of highly modified riversystems by including the national EA abstractionslicence and time series information for the UK(some 22,000 licence agreements) and improvethe predictive capability in these areas and leadtowards an improved national water resourcesmodel of the UK to assess water security issues

● The ability to improve the assessments ofenvironmental change scenarios on water qualityand through the biogeochemistry modellingcomponent the associated water quality.

6.1.11 Barriers● The project has identified the current difficulty in

the UK to bring together all the requiredobservational and catchment characteristics tomake this modelling possible. This was aconsiderable time sync and we now have aframework to make any additional simulationsrelatively easily to achieve

● Although we have been able to demonstrate amulti-model ensemble that includes acomprehensive assessment of uncertainty for>1,100 catchments without a fully implementedcloud computing resource we are still limited in thedemonstration we have conducted.

Summer performance Winter performance

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

56

Within a fully functioning cloud computing resource itwould have been possible to -

i. Analyse a range of model spatial complexitiesrather than just 'lumped' conceptual structures;

ii. Run models for sub-daily input-output simulationsand therefore better able to capture moreconvective storm responses;

iii. Run simulations for greater than 10 years tounderstand multi-decadal model behaviour andtrends;

iv. Simulate multiple scenarios of input rainfalluncertainties and thus improve model diagnostics.

6.1.12 ReferencesArnell NW. 2011. Uncertainty in the relationshipbetween climate forcing and hydrological response inUK catchments. Hydrology and Earth SystemSciences, 15, 897-912.

Bell VA, Kay AL, Jones RG, and Moore RJ. 2007a.Development of a high resolution grid-based river flowmodel for use with regional climate model output.Hydrology and Earth System Sciences, 11(1), 532-549.

Bell VA, Kay AL, Jones RG, and Moore RJ. 2007b. Useof a grid-based hydrological model and regionalclimate model outputs to assess changing flood risk.International Journal of Climatology, 27, 1657-1671.

Beven K. 2000. Uniqueness of palce and processrepresentations in hydrological modelling. Hydrologyand Earth System Sciences, 4(2), 203-213.

Beven K. 2002. Towards a coherent philosophy formodelling the environment. Proceedings of The RoyalSociety, series A, 458, 2465-2484.

Beven K. 2006. A manifesto for the equifinality thesis.Journal of Hydrology, 320, 18-36.

Beven K. 2007. Towards integrated environmentalmodels of everywhere: uncertainty, data and modellingas a learning process. Hydrology and Earth SystemSciences, 11(1), 460-467.

Beven K and Freer J. 2001. Equifinality, dataassimilation, and uncertainty estimation in mechanisticmodelling of complex environmental systems using theGLUE methodology. Journal of Hydrology, 249, 11-29.

Beven K and Westerberg I. 2011. On red herrings andreal herrings: disinformation and information inhydrological inference. Hydrological Processes, 25,1676-1680.

Clark MP, Slater AG, Rupp DE, Woods RA, Vrugt JA,Gupta HV, Wagener T, and Hay LE. 2008. Frameworkfor Understandign Structural Errors (FUSE): A modularframework to diagnose differences betweenhydrological models. Water Resources Research, 44,article number W00B02, DOI: 10.1029/2007WR006735.

Cloke, H.L., Wetterhall, F., He, Y., Freer, J., andPappenberger, F. 2012. Modelling climate impact onfloods with ensemble climate projections. Quarterly

Journal of the Royal Meteorological Society, early view13 AUG 2012, DOI: 10.1002/qj.1998.

Coxon G. 2011. An Evaluation of Multiple HydrologicalModel Hypotheses in the UK using a Framework forUnderstanding Structural Errors. Unpublished MScthesis, School of Geographical Sciences, University ofBristol, University Road, Bristol, UK 62 pages.

Crabtree, B., Kelly, S., Green, H., Squibbs, G., Mitchell,G., 2009. Water Framework Directive catchmentplanning: a case study apportioning loads andassessing environmental benefits of programme ofmeasures. Water Science and Technology, 59(3): 407-416.

Hough MN and Jones RJA. 1997. The United KingdomMeteorological Office rainfall and evaporationcalculation system: MORECS version 2.0 - anoverview. Hydrology and Earth System Sciences, 1(2),227-239.

Kavetski D and Clark MP. 2010. Title: Ancientnumerical daemons of conceptual hydrologicalmodeling: 2. Impact of time stepping schemes onmodel analysis and prediction. Water ResourcesResearch, 46, article number W10511, DOI:10.1029/2009WR008896

Keller V, Young AR, Morris D, and Davies H. 2006.Technical Report: Task 1.1: Estimatioin of PrecipitationInputs. Environment Agency R & D Project W6-101 -Continuous Estimation of River Flows (CERF). Mainreport and appendix, 36 pages.

Price D, Hudson K, Boyce G, Schellekens J, Moore RJ,Clark P, Harrison T, Connolly E, and Pilling C. 2012.Operational use of a grid-based model for floodforecasting. Proceedings of the ICE: WaterManagement, 165 (2), 65-77.

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

57

6.2 Biogeochemistry National ExemplarNutrient enrichment of water bodies is the singlebiggest source of water pollution in the UK, generatingadverse impacts on the chemical quality andecological status of headwater streams, lowland rivers,lakes, estuaries and the coastal zone. In order toaddress the challenge, integrated management isneeded of all of the sources which contribute to thisflux, and the pathways linking sources within thelandscape to the receiving water body. However,development of integrated catchment managementstrategies in practical terms is limited by the availabilityof science understanding and knowledge which istransferable between scales and catchments, fromdata-rich to data-poor areas of the UK.

This exemplar was developed to determine the extentto which cloud computing infrastructure could supportthe development of national capability in integratedcatchment management.

6.2.1 The challengeObservational data informing process understanding isoften generated at relatively small experimental scales,and is not necessarily directly scalable for applicationat the whole catchment scale without the developmentof a modelling solution, bespoke for the system underinvestigation and the data available to drive that model.Policy makers and environmental managers seeking todevelop management and mitigation options forimpacted systems have had to either

i. fund the development of bespoke modelling andmonitoring programmes for each catchment orinterest,

ii. use knowledge acquired from inadequate, lowresolution catchment monitoring in the catchmentof interest or neighbouring catchment, or

iii. use knowledge acquired from high resolutionstudies in systems which might not be directlycomparable with the catchment of interest.

In order to deliver effective understanding of catchmentfunction under environmental change for all UKcatchments there is, therefore, an urgent need todevelop better mechanisms for the transfer ofknowledge and science understanding between data-rich and data-poor systems. The NationalBiogeochemical Modelling Framework developedunder EVO, and supported by cloud computinginfrastructure provides an opportunity to address thisneed.

6.2.2 Limitations of existing modelling approachesDynamic modelling approaches, such as the INCAmodelling suite for N, P, C and sediment (Whitehead etal. 1998; Wade et al., 2005; Lazar et al., 2010), providethe opportunity to capture the science understandinggenerated by high resolution research on catchmentbehaviours at a range of scales, but expert knowledgeand high concomitant costs are associated with thecalibration of the model(s) to local conditions in eachapplication.

Simpler correlative statistical modelling approaches,such as the Global News model (Seitzinger et al.,2010), can generate visually attractive simulations ofcatchment behaviour at regional to global scale, butlack a physical basis (or representation of the specificphysical conditions controlling functional behaviour indiffering environments) and are frequently highlyinaccurate and necessarily uncertain, generating ahigh risk when used to support operationalmanagement and policy development (Greene et al., inreview).

There is, therefore, a need to develop physically-basedmodelling approaches that can be run without theneed to expert involvement for all areas of the UK,including both data-rich and data-poor or unmonitoredcatchments that can accurately simulate the likely fluxof nutrients to waters within complex landscapes ofdiffering environmental character, and their likelyresponse to mitigation and management underrealistic policy scenarios.

6.2.3 The barriersThe national scale biogeochemistry exemplardeveloped under EVOp directly addressed thischallenge, exploring the use of cloud computing toprovide a physically-based modelling framework tosupport ensemble biogeochemical modelling acrossthe whole of the UK to provide predictive capability toquantify nutrient (nitrogen, N, and phosphorus, P)fluxes to inland and coastal waters in the UK, atgeographic scales ranging from 4 km2 grid tocatchment, river basin and UK national scale.

The objective for this exemplar was to demonstrate thebenefits of modelling major nutrient cycles, togetherwith uncertainty and scenario analysis for predictingimpacts of nutrient mitigation strategies (e.g. landfertilisation intensity and stocking densities), onnutrient flux behaviours to UK waters, within a cloudcomputing environment. This has delivered cloud-enabled national biogeochemical modelling capability,demonstrating potential to capitalise on extensivenational investment in the development of scienceunderstanding, data sets and modelling tools tosupport integrated catchment management, allowingknowledge transfer from data rich to data poor regions.The approach involved (1) the development of aphysically-based biogeochemical modelling frameworkto represent the major geoclimatic regions of the UK,and (2) bringing together, for the first time, nationaldata sets, models and uncertainty analysis into a cloudcomputing environment, allowing exploration andbenchmarking of current predictive capability fornational scale biogeochemical modelling.

The sections that follow provide a description of thenational biogeochemistry storyboard, proposed userprofile, and the methods and analysis undertaken toenable the storyboard in a cloud computingenvironment, making it accessible to users via theEVOp portal. Visualisation of the outputs and uniquefeatures of the work carried out are highlighted, in

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

58

addition to an account of the user help systemsembedded in the portal. Finally, the potential for futurecloud-enabled biogeochemical modelling within afuture EVO framework, together with the opportunitiesand barriers to further development, are outlined.

6.2.4 Storyboard and stakeholdersThe outline of the storyboard for the nationalbiogeochemistry exemplar is provided in Annex 1. Thestoryboard was designed to illustrate an example ofhow and why users may interact with this componentof the EVOp. Users identified include Governmentdepartments and agencies such as Defra, EA, SEPA,WAG and NIEA who need to know the percentagereductions in N and P flux to waters that are possibleusing existing policy instruments to address issuessuch as WFD (EU Water Framework Directive)compliance and OSPAR (Oslo and Paris Conventionfor the Protection of the Marine Environment of theNorth-East Atlantic) reporting. In such cases, the userin search of evidence would enter the EVOp portal andchose to explore the 'Diffuse Pollution' topic.

A map interface directs the users to select the scale atwhich they require N and P outputs (e.g. RiverCatchment, WFD River Basin District, Country, CoastalDrainage Unit, OSPAR or UK). Once the scale hasbeen selected the user moves on to help-enabled webpages that provide options and advice for model anddata selection, scenario generation opportunities, andoutput format until the desired result is achieved (e.g. amap showing N and P flux at a 4 km2 resolution acrossthe UK, pie charts showing the percentagecomposition of sources of N and P flux in a rivercatchment, or bar charts showing the load of N and Pexported to UK waters from each landscape type).

6.2.5 Steps taken in implementation ofbiogeochmeistry national exemplar storyboard

The biogeochemistry storyboard is currently availableas a web-based cloud-enabled exemplar in the EVOpportal (reference WP2). Implementation of thestoryboard in this form comprised a series of key stepsin the pilot: (i) development of a cloud-enabled andinterrogable database for the nutrient flux modelselected for the exemplar, (ii) implementation of themodelling structure in the cloud and (iii) visualisation ofthe storyboard as a web interface in the EVOp.

6.2.6 Development of database for macronutrientmodel application

6.2.6.1 Outline of nutrient model chosen for exemplarin the pilot

The Export Coefficient Model (ECM) (Johnes, 1996)was used in this exemplar to provide initialdemonstration of the national capabilities of N and Pflux modelling within the EVOp portal (Greene et al., inreview). The ECM takes a semi-distributed approach,calculating mean annual total N and total P loadsdelivered to a water body (freshwater or marine) as thesum of the nutrient loads exported from each nutrientsource in its surface topographic catchment. Sourcesof N and P include diffuse export from differing land

cover classes, fertiliser use by crop type, livestockwastes by livestock type and management system,human waste discharges via point sources andatmospheric deposition to land and water within eachcatchment. The model equation is as follows:

L= Ei ( Ai ( Ii ) ) + p

where:

L = Load of nutrients (N or P)

E = Export coefficient for nutrient source i

A = Area of river basin occupied by land use type i, or number of livestock type i, or people

I = Input of nutrients to source i

p = Input of nutrients from precipitation/deposition

The ECM approach has been adopted and applied formodelling nutrient pollution worldwide (EuropeanCommission, 2002; Matias and Johnes, 2012; Tian etal., 2012; Nasr and Bruen, 2013) owing to its relativesimplicity and limited data requirements, which areeasily accessible from readily available datasets.Although the ECM does not directly incorporate thecomplex processes involved in simulating nutrientpollution flux (including groundwater pathways fornutrient transport) and requires significantly less datainput than process based models, it has goodprediction accuracy (Johnes & Butterfield, 2002;Johnes et al., 2007) and is especially suitable for areaswhere few observed data are available. Therefore, themodel tends to meet the needs for long-term nutrientassessment in catchments and nutrient managementplans, by providing accurate source apportionmentwhich is scalable from grid to national scale, andtransferable from data-rich to data-poor environments(see Section 3a (i)). For these reasons, plus modelownership, the ECM was considered applicable for theEVOp.

The unique parameter values (input and exportcoefficients) for the ECM were originally developedusing field-scale data on land use and management,together with detailed information on sewage wastemanagement and discharges, and rigorously testedagainst water quality data in two UK river catchments(Johnes, 1996; Johnes and Heathwaite, 1997). Thatmodel generated predictions within ± 5% of observedN and P loadings. However, for modelling nutrient fluxin multiple catchment types across the UK, and tosupport national scale policy development, a nationalbiogeochemical modelling framework (geoclimaticregion) for the ECM was later developed (Johnes et al.,1996; Johnes and Butterfield, 2002; Johnes et al.,2007). Coefficients for the input and export of sourcesof N and P were generated for six main geoclimaticregions across England and Wales (Figure 6.8a).

The sub-models represent N and P flux from majorgeoclimatic units in England and Wales that comprisebroadly similar climate, geology, soil type, topography

n

∑i=1

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

59

and natural vegetation cover. These landscape units,therefore, display broadly similar ranges of N and Pexport/retention potential as a function of hydrologicflow volume, velocity, timing and routing from land towater. The coefficient sets of both N and P have beentested by rigorous multi-catchment application andhistorical curve fitting to long term water qualityrecords for at least three catchments within each of the

six geoclimatic regions, each with a minimum of 10years of observed N and P data, and a total of 75catchments across England and Wales (Figure 6.8b).Predictions have given an r2 of 0.98 when correlatedwith observed N and P data. An example of modelvalidation for one site with long-term observationaldata for TP is given in Figure 6.9 (after Bennion et al.2005).

Figure 6.8 Calibration and validation of the Geoclimatic Regions framework for England and Wales (after Johneset al., 1997; 2007; Johnes and Butterfield, 2002).

a) b)

Intensive arable regionsMixed arable/dairying regions; permeableLowland dairying regionsMixed arable/dairying regions; impermeableExtensive livestock/upland regionsUrban/non-agricultural regions

Figure 6.9 Example of model validation. Parameter sets calibrated and validated for Windermere South Basin (a)were applied to input data for Esthwaite Water (b). Model estimates of mean annual TP concentration in inflowingstreams, converted to mean annual lake TP concentrations using the OECD (1982) equations linking lake andinflow TP concentrations are compared with observed TP concentrations in each lake (after Bennion et al., 2005).

Observed TP

EC-TP (OECD standard)

EC-TP (OECD shallow lakes)

Observed TP

EC-TP (OECD standard)

EC-TP (OECD shallow lakes)

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

60

6.2.6.2 Developing National Biogeochemical Modellingcapability through delivery of the revisedgeoclimatic regions modelling framework,updated and extended to include Scotland andNorthern Ireland

The first UK-wide biogeochemical modellingframework (based on geoclimatic regions) wasdeveloped for sub-model implementation of the ECMin the EVOp portal, enabling demonstration of nationalbiogeochemical modelling capability within a cloudcomputing environment. The original geoclimaticregion structure, comprising six landscape units inEngland and Wales (Figure 6.8), was refined during thepilot and extended to comprise nine distinct landscapeunits across Scotland, England, Wales and NorthernIreland; each geoclimatic region was assigned inputand export coefficients for N and P flux from source towater, based on the earlier work of Johnes andcolleagues on the source-transfer-delivery pathway foreach source type, by geoclimatic region in Englandand Wales. The distribution and description of the ninerevised regions across the UK are provided in Figure6.10.

6.2.6.3 Methods for delineating the revised geoclimaticregions modelling framework for the UK

Delineation of the nine revised regions involved the firstcomprehensive use of the British Geological Survey(BGS) Parent Material Model (PMM) data, which wasprovided by the BGS at a spatial resolution of 50m(Lawley, 2011). These data gave digitised informationon the basic foundations of soils, their structure,drainage and geochemistry across the UK. Metrics ofslope gradients extracted from a high resolution DEMand gridded runoff data generated from the hydrologyNational Exemplar (Section 2d (ii)), both at 50 mspatial resolution, were also used for classification. Arule-based methodology was developed to separateout nine distinct geoclimatic units (Table 1 in Annex 1).These rules were based on the characteristics of thethree sets of data used, in terms of capacity for N andP export/retention, building on and proving greaterclarity to the classification of the original six regions.First, the characteristics of the original six regions(pink, green, blue, brown, yellow and red) were re-examined and explicit rules for separation betterdefined (e.g. unambiguous geology, slope and runoffthresholds). A further three regions (olive, orange andpurple) were identified as key landscape units, andthese were similarly delineated using distinctivefeatures of their landscape and biogeochemicalfunction.

A presumption in the ECM is that the area outlined inthe yellow geoclimatic region (Figure 6.10) representsa geoclimatic region in the UK that has the highestpotential rates of N and P flux from land to water due tosteep slopes, high rainfall and low base flow index.They have the lowest actual (realised) rates of export,however, as these upland moorland peat areas (> 300m amsl) mainly support low intensity sheepproduction, and to a lesser extent cattle grazing andare therefore used relatively extensively. This means

that despite the high degree of slope, abundance ofrainfall generating runoff averaging 1200-2000 mm peryear, together with the relatively high proportion ofoverland flow and near-surface lateral quickflow as afunction of thin soils and scarce vegetation that overlieimpermeable pre-Cambrian bedrock, the high nutrientexport potential of these landscapes is not translatedinto high nutrient flux rates.

The regions assigned as purple also have high N andP export potential, similar to the coefficients of theyellow regions. However, these landscape units aretypically hilly lowland (< 300 m amsl) peat systemsoverlying impermeable Palaeogenic rock that havedeveloped in areas such as estuaries, valleys andother topographic depressions. Poor drainage leads tothe area becoming water-logged. Owing to theirrelative accessibility compared to upland peat areas,the purple regions are subjected to increasedagricultural intensity. However, relatively low nutrientexport rates occur owing to nutrient retention within thedeveloping peat layers and the tight cycling of anyavailable nutrients within the system. Likewise, thecoefficients for the orange regions represent highexport potential, but low actual nutrient export rates asa result of peat soils overlying Eocene bedrock in thenon-intensive agricultural regions.

Low to moderate N and P flux potential is representedin the coefficients for the flat dry regions of the UK,classified here as brown, despite intensive arableproduction with associated high rates of fertiliser N andP applications to crops and grass. This reflects the factthat despite a high rate of N and P input to thislandscape, the flatness of the topography, the lowrates of runoff (< 150 mm per year) and permeablesoils generate a low actual rate of nutrient export. Incomparison, the model presumes the sensitivity to Nand P export is greater in the lowland hilly blue region,which is underlain by permeable bedrock (Chalk,Jurassic Limestone), because of the high rate of N andP input, combined with the higher rate of nutrientexport potential generated by steeper slopes, andhigher rainfall intensity.

The geoclimatic regions coloured pink have asomewhat similar geographic distribution to the yellowregions, but occur on moderately elevated (< 300 mamsl) acidic soils underlain by impermeablemetamorphic bedrock (e.g. Old Red Sandstone).Differences between both regions also include greaterexport rates for N and P due to the greater intensity ofcrop and grassland production and higher livestockdensities commonly supported in these regions. Theaccumulation of N and P inputs here are likely togenerate a substantial pool of N and P which haspotential for export to adjacent waters in wetconditions. Finally, the highest rates of N and P exportare predicted for the green and olive regions,representing flat lowland areas with heavy clay soils(green), river alluvium and flat coastal plains underlainby marine or estuarine sediments (olive). These flat,wet, low permeability landscapes are subjected to thehighest rates of grassland fertiliser application in theUK, together with the highest stocking densities for

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

61

dairy and beef cattle production, which collectivelylead to a combination of high N and P input rates andexport potential. These regions are most sensitive tothe loss of N and P than other regions because of theerodibility and hydrological flow routing traits of heavyclay soils.

6.2.6.4 Preliminary testing of the revised geoclimaticregions modelling framework with external data

The nine revised geoclimatic regions were fully testedusing base flow index (BFI) values and catchmentdescriptions of hydrological catchments monitored inthe national river flow archive (NRFA). Catchmentboundaries, representative of all nine regions, wereoverlaid on the geoclimatic regions and the assignedBFI values (ranging from 0-1) were examined forconsistency with the newly classified geoclimaticregions. For example, the blue regions coincided witha BFI of > 0.8 and the green regions had a BFI of <0.3 (where 1.0 is a theoretical 100% baseflow

contribution). In addition, BFI values and hydrologicclassification data at a 1 km2 spatial resolution (data:hydrology of soil types (HOST)) were also used in asimilar way for intra-catchment testing of the regionswhen catchment-based data was not suitable.

However, further testing of the model extension toScotland and Northern Ireland, through modelcalibration and validation against long-term andmultiple catchment observed N and P data in eachgeoclimatic region type is still needed, to confirm thesuitability of the coefficient ranges for these regions.The ECM also doesn't take account of groundwaterpathways for nutrient transport; these may beimportant for long-term nutrient assessments, andconsequently may need to be accounted for in futureiterations of the model. Modelling output from this newframework, as presented for Scotland and NorthernIreland should therefore be viewed as preliminary untilvalidation has been undertaken.

Upland areas underlain by Pre-Cambrian rocks with thin acid soils.

Lowland areas with deep, moderately acidic soils underlain bymetamorphic rock

Flat lowland areas with heavy clay soils

Coastal plains with fine silt to clay soils underlain by marine or estuarinesediments

Hilly lowland areas with peaty soils overlying Paleogenic rock

Hilly lowland areas with calcareous soils overlying permeable sedimentary

Flat lowland - areas with deep soils overlying Quarternary deposits

Urban areas

Thin acid soils overlying Ecocene rock

Figure 6.10 Revised geoclimatic region framework, extended to include Scotland and Northern Ireland; the firstbiogeochemical modelling framework for the UK.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

62

6.2.7 Data for ECM applicationThe year 2000 was selected for demonstration of theECM in the EVOp portal. The digital data required formodel application were acquired from source,manipulated off-line in ArcGIS v 10 and transferred to ageospatial database for subsequent integration into theEVOp portal infrastructure (see section 3.1.4).

Data for land use and livestock (total number of cattle,pigs, sheep and poultry) in Britain (England, Wales andScotland) were taken from the Annual AgriculturalCensus Returns for 2000 at a spatial resolution of 4km2 grid. The data comprised the area of land used forcereal crops, other arable crops, bare fallow land,permanent grassland, temporary (ley or rotational)grassland, the area of rough grazing land (unfertilised),and the area of woodland/orchards. Correspondingdata for Northern Ireland were sourced from theNorthern Ireland Agricultural Census 2000. The totalnumber of humans in Britain and Northern Ireland wereobtained from respective 2001 census of humanpopulation. Atmospheric N deposition across the UKdata was sourced from Defra at a spatial resolution of 5km2. Owing to a deficiency of distributed P depositiondata, a predefined value (Johnes, 1996) was adopted.

Additional information (input coefficients) toaccompany the input data included values for theaverage amount of N and P produced perhuman/livestock unit annually, the nature and extent ofsewage treatment facilities/livestock manure handling,fertiliser applications rates to crops and grassland andN fixation (taken from the Survey of Fertiliser Practice),using coefficient ranges previously reported by Johnes

et al (2007) and Johnes and Butterfield (2002). In afully realised application of the model, there would bescope for refinement of these values using spatiallydistributed data from public utility and Governmentdepartment data resources, incorporated within thecloud-enabled geospatial database.

6.2.8 Development of a cloud-enabled integratedand interrogable database

The ECM model database was developed at a grid-based spatial scale of 4 km2 across the UK, comprising63,241 independent grids. Each 4 km2 grid was given aunique identification number and converted to anArcGIS shapefile. Input parameters (e.g. permanentgrass, urban area) were given a unique codename andparameter information was extracted per grid byintersection of the data shapefiles with the 4 km2 gridshapefile. After data population, a shapefile describingthe model database and 4 km2 grid boundaries wasthen overlaid on a shapefile containing the revisedgeoclimatic region framework (example shown inFigure 6.11). The proportion of each geoclimatic regionin each 4 km2 grid was extracted and a percentagecomposition assigned per grid. The shapefile (modeldatabase plus 4 km2 grid boundaries) was thenintersected with shapefiles representing thegeographic boundaries of various reporting unit (rivercatchment, WFD RBD, CDU, OSPAR, country). Furthercolumns were created in the model database toindicate the spatial location of each 4 km2 grid withrespect to the various reporting units.

Figure 6.11 Example of the 4 km2 gridded dataset overlaying the revised geoclimatic region model framework.

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

63

6.2.9 Model implementation in the cloudModel implementation within the cloud is described inSection (WP2).

incorporate the new geoclimatic regions typology andthe extension of the framework to include bothScotland and Northern Ireland. Scenarios testedrepresent conditions appropriate under (1) GoodAgricultural Practice policy guidance, (2) mitigationmeasures appropriate to the delivery of reduceddiffuse N and P fluxes to support WFD compliance, inaddition to those tested under (1), and (3) addition ofmeasures to ensure compliance with the standardsrequired under the EU Urban Wastewaters TreatmentDirective (UWwTD) to all sewage sources in the UK. Atpresent UWwTD compliance is only required for largerwastewater treatment works (WwTW) serving apopulation equivalent greater than or equal to 10,000persons. The scenario testing suggests that even withall measures in place, a maximum of 58% reduction inP export and 30% reduction in N export would bepossible, with the greatest rates of reduction in Pexport occurring in major urban centres, while thegreatest rates of N export reduction occur in ruralareas.

Figure 6.12 Examples of the EVO interface - a) Map page - scale selection, b) Model selection, c) Scenarioselection, and d) Output selection.

6.2.10 Examples of outputsExamples of the model output at all scales for N andfor P are presented in Figures 6.13 - 6.20. Figures 6.13,6.14, and 6.15 show model outputs for TP export, TPexport from diffuse sources, and and TP export frompoint sources to inland and coastal waters (kg / ha)across the UK using data for the year 2000. Outputsare shown in order of increasing spatial area, rangingfrom 4 km2 grids up to OSPAR zones. Figures 6.16,6.17 and 6.18 show comparable model outputs for TN,demonstrating the dominance of diffuse sources in theN flux signal nationally. This contrasts with the signalfor TP which is more evenly split between diffuse andpoint sources nationally, but shows significanthotspots associated with urban centres when viewedat 4 km2 grid scale where point sources clearlydominate the TP signal.

Figures 6.19 and 6.20 show the nationalbiogeochemical modelling capability delivered by thecloud-enabled modelling structure for P, and N. Thisuses the model to run simultaneous, multiplegeoclimatic region scale, distributed scenario testing.Scenarios tested are based on those previouslyreported by Johnes et al. (2007) but extended to

a) b)

c) d)

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

64

2.17 TP kg

2.10 TP kg

1.67 TP kg

2.16 TP kg

1.67 TP kg

Figure 6.13 Modelling outputs for TP loads (kg / ha) across the UK using data for the year 2000. Outputs areshown in order of increasing spatial area, ranging from 4 km2 grids up to OSPAR zones.

4km2 River Catchment WFD RBD Coastal Drainage Unit OSPAR Zone

Total TP kg / ha 0.0 - 0.1 0.1 - 0.2 0.2 - 0.4 0.4 - 0.8 0.8 - 1.6 1.6 - 3.2 3.2 - 6.4 6.4 - 12.8 > 12.8

Figure 6.15 Modelling outputs for point source TP loads (kg / ha) across the UK using data for the year 2000.Outputs are shown in order of increasing spatial area, ranging from 4 km2 grids up to OSPAR zones.

Total TP kg / ha 0.0 - 0.1 0.1 - 0.2 0.2 - 0.4 0.4 - 0.8 0.8 - 1.6 1.6 - 3.2 3.2 - 6.4 6.4 - 12.8 > 12.8

4km2 River Catchment WFD RBD Coastal Drainage Unit OSPAR Zone

Figure 6.14 Modelling outputs for diffuse source TP loads (kg / ha) across the UK using data for the year 2000.Outputs are shown in order of increasing spatial area, ranging from 4 km2 grids up to OSPAR zones.

Total TP kg / ha 0.0 - 0.1 0.1 - 0.2 0.2 - 0.4 0.4 - 0.8 0.8 - 1.6 1.6 - 3.2 3.2 - 6.4 6.4 - 12.8 > 12.8

4km2 River Catchment WFD RBD Coastal Drainage Unit OSPAR Zone

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

65

Total TP kg / ha 0.0 - 0.1 0.1 - 0.2 0.2 - 0.4 0.4 - 0.8 0.8 - 1.6 1.6 - 3.2 3.2 - 6.4 6.4 - 12.8 > 12.8

4km2 River Catchment WFD RBD Coastal Drainage Unit OSPAR Zone

Figure 6.16 Modelling outputs for TN loads (kg / ha) across the UK using data for the year 2000. Outputs areshown in order of increasing spatial area, ranging from 4 km2 grids up to OSPAR zones .

Total TP kg / ha 0.0 - 0.1 0.1 - 0.2 0.2 - 0.4 0.4 - 0.8 0.8 - 1.6 1.6 - 3.2 3.2 - 6.4 6.4 - 12.8 > 12.8

4km2 River Catchment WFD RBD Coastal Drainage Unit OSPAR Zone

Figure 6.17 Modelling outputs for diffuse-sourced TN loads (kg / ha) across the UK using data for the year 2000.Outputs are shown in order of increasing spatial area, ranging from 4 km2 grids up to OSPAR zones.

Total TP kg / ha 0.0 - 0.1 0.1 - 0.2 0.2 - 0.4 0.4 - 0.8 0.8 - 1.6 1.6 - 3.2 3.2 - 6.4 6.4 - 12.8 > 12.8

4km2 River Catchment WFD RBD Coastal Drainage Unit OSPAR Zone

Figure 6.18 Modelling outputs for point-sourced TN loads (kg / ha) across the UK using data for the year 2000.Outputs are shown in order of increasing spatial area, ranging from 4 km2 grids up to OSPAR zones.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

66

Total TP kg / ha 0.0 - 0.1 0.1 - 0.2 0.2 - 0.4 0.4 - 0.8 0.8 - 1.6 1.6 - 3.2 3.2 - 6.4 6.4 - 12.8 > 12.8

Current conditions + Good agricultural practice + Farming for WFD compliance + UWwTD compliance

5.73% decrease 14.65 decrease 57.90% decrease

Figure 6.19 Modelling outputs for TP loads (kg / ha) based on nutrient management scenarios across the UKusing data for the year 2000.

Scenarios

Total TP kg / ha 0 - 2 3 - 4 5 - 8 9 - 16 17 - 32 33 - 64 65 - 128 > 128

Current conditions + Catchment sensitive farming + Mitigation through on-farm measures + WFD compliance

10.25% decrease 24.71 decrease 29.10% decrease

Figure 6.20 Modelling outputs for TN loads (kg / ha) based on nutrient management scenarios across the UKusing data for the year 2000.

Scenarios

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

67

6.2.11 Unique design features of the biogeochemistrynational exemplar

The unique features of the biogeochemistry nationalexemplar are -

● Revised and tested national biogeochemicalmodelling framework - novel, developed usinghigh resolution data and for full UK (extended toinclude Scotland and Northern Ireland)

● Gridded model runs - using % of each geoclimaticregion

● Accumulation of gridded outputs to generateoutput at any scale

● Including Northern Ireland in model runs

● Scenario option - extra feature

● Scenario per geoclimatic region - useful formanagement purposes

● Combining N and P load with runoff generatedfrom hydrological component of the NationalExemplar to generate concentrations

● Pie charts offer source apportionment of loadestimates at any scale from 4 km2 grid scale toOSPAR reporting unit scale

● Cloud-based computing allows generation ofoutputs very efficiently, with multiple-scale,multiple-source manipulation available at the touchof a button

● A range of visualisation options are possible, andthe structure is designed to accept multiple modeltypes and structures, facilitating the developmentof national biogeochemical ensemble modellingcapability (see below), and fully-coupledhydrological-biogeochemical modelling capabilityfor dynamic modelling of N and P cyclingdynamics at sub-annual scale.

6.2.12 OpportunitiesA criticism of current ways of working in catchmentscience and management is that model calibrations arefixed in time, and new data is not used to refine modelcalibrations, as it is collected. By using thecomputational power and data-source interconnectionsthat the cloud computing model provides, there is theopportunity to rapidly and continuously re-assess modelparameter performance and associated predictiveuncertainty. As data-poor areas become richer withtargeted monitoring, this new information could be putto rapid use in understanding the system, improvingparameterisation of models operating within theframework, and detecting when 'sufficient' data hasbeen collected to improve model fit to observedbehaviours and reduce the associated uncertainties.

By enabling ensemble modelling of biogeochemicalbehaviours within the cloud computing environmentoffered within EVO it will be possible to operate themodels within an uncertainty framework, to provideguidance on those priority areas, particularly in data-poor regions, where collection of additionalobservational data is needed to reduce predictive (andmanagement) uncertainty. This ensures that the

measurements that are made have the maximumpotential information content for minimal cost.

The large computational resources provided to usersthrough cloud computing create further opportunities formodels to be used in innovative ways. For example, thetype of modelling approach explored and developed inEVO is typically used as a management decisionsupport tool to predict the likely effectiveness ofmitigation measures, and then to develop a catchmentrestoration plan based on the optimal predictedcombination of measures for that catchment. Currently,the number of possible options that can be explored bythe user is limited by both model computational timeand the user's perception of which individual orcombinations of options will be likely to work for a givensite. Through the adoption of Monte Carlo modellingmethods, coupled with the computational poweravailable in the cloud, it would be possible to calculate abroad ensemble of complex mitigation scenarios usingthis and other similarly structured modellingapproaches. This ensemble approach has a number ofadvantages:

● Each mitigation scenario can be ranked against arange of metrics including costs and effectiveness;

● Combinations of actions that may not have beenconsidered can be included in the analysis;

● It is possible to limit the amount of environmentalimprovement that is feasible without significantalterations to the landscape and system of foodproduction, allowing the user to make choiceswhich improve water quality while also protectingfood security; and

● When working with stakeholders, it is possible toshow the range of options that have beenconsidered and set stakeholder-suggested optionsin context with all other potential mitigationapproaches, weighing the benefits of eachproposed scheme against the costs in terms ofwater quality and food security.

In the biogeochemistry exemplar the export coefficientmodel is one possible model suitable for this scale ofapplication, which can capitalise on the enhancedfunctionality provided by the cloud infrastructureunderpinning EVO. It provides the capacity to simulateannual total N and total P export to any catchmentoutlet, or at 4 km2 grid scale for any area of the UK,regardless of relative observational N or P flux dataavailability. It also provides the capability to apportionsources according to sectoral contributions (pointsource discharges from sewage treatment works, septictanks; diffuse source loading from dairy vs beef cattleproduction, cereal vs other arable crop production,permanent vs temporary grassland). However, it is notthe only model which has this capacity. Anotheropportunity for the development of the EVO NationalBiogeochemical Modelling Framework is to take theinformation from the national scale modelling andgeoclimatic region modelling framework and use it toinform the parameterisation of spatially detailed models,such as SCIMAP (Reaney et al. 2011, Milledge et al.2012). SCIMAP is normally applied on a 25 m2 grid tocatchments and represents the hydrological

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

68

connectivity and potential nutrient or sediment export.The approach has been successfully tested againstboth ecological (Reaney et al 2011) and water chemistrydata (Milledge et al. 2012) and is used by both the EAand Rivers Trusts to support the development ofintegrated catchment management.

The model is normally applied by fitting the pattern ofland cover coefficients to an observed spatial pattern ofthe determinand of interest. Milledge et al. (2012)showed that there are significant differences in theeffective land cover parameter in different parts of theUK. These differences could be linked to the nationalscale models and hence enable a transfer of knowledgeto the finer spatial scale. The incorporation of SCIMAPinto the portal would create a tool focused on a differentset of questions, at a different spatial scale / resolutionfor a different set of users, and providing an alternativerange of functionality and model output capability to theexport coefficient model explored within EVO.

6.2.13 International opportunitiesOther models developed by teams from Alterra, TheNetherlands (de Vries and colleagues), and from the EUJRC (Leip and colleagues) in Ispra, have developedsimilar types of approaches, with slightly different foci orfunctionality, for C, N and in one case for P. An exampleof one of these pre-existing model outputs for N ispresented in Figure 6.21, alongside the exportcoefficient model output for N generated in thisprogramme. While these have been developed forapplication across the EU-27, and not calibrated toobserved data, at least for the UK, they provide analternative conceptualisation of the rates, andapportionment of C, N and P flux across complexlandscapes, typically running at 5 km2 grid scaleresolution. They also provide enhanced and/oralternative capability options, including simultaneoussimulation of both gaseous and aqueous C and Nfluxes, together with C and N sequestration undercurrent and future scenarios of land use, landmanagement and climate change.

There is, therefore, a clear opportunity to bring theseand other physically-based models into the cloud-enabled National Biogeochemical Modelling Frameworkto provide increased functionality and ensemblemodelling capability. Given the enhanced capacityoffered by this cloud-enabled platform, a statisticalcomparison, and uncertainty analysis function couldalso be developed, providing the user with theopportunity to derive a comprehensive assessment of:

i. the potential sources of nutrient flux,

ii. the process controls on nutrient flux behaviours incomplex landscapes,

iii. the likely range of future behaviours underchanging environmental conditions,

iv. a statistical assessment of the areas of agreementand disagreement within the model ensemble, and

v. an estimate of the uncertainties associated witheach simulation and the areas (geographically, orin terms of process controls) of greatestdisagreement and therefore uncertainty in current

conceptual models of C, N and P flux at grid tolandscape scale. This opportunity would provide amajor step forward in knowledge transfer to theend-user community, and in coupled phase(gaseous, terrestrial, aqueous) understanding ofthe major biogeochemical cycles to the science,policy and science research communities.

All of these approaches work at an annual time step,averaging 'typical' behaviours either in a physical rules-based system to take explicit account of variation in thenatural biogeochemical function of differing landscapetypologies, or transferring observational data on specificfluxes from each environmental compartment from data-rich to data-poor environments using a rules-basedapproach.

Another type of approach which, with some adjustment,could be brought into EVOp is physically-baseddynamic nutrient flux modelling, operating for individualnutrient fractions, and at a sub-annual, typically dailytime step. Examples of this type of modelling includethe INCA modelling suite developed by Whitehead andWade (see, for example Whitehead et al., 1998; Wade etal., 2005; Lazar et al., 2010). At present, while thisapproach has been developed and tested across a widerange of European catchments, each applicationrequires expert knowledge and high resolutionobservational water quality data to calibrate the modelto local conditions. As such, this type of approach hasbeen restricted to application in data-rich systems.However, recent advances in statistical modelling underthe USGS SPARROW Modelling programme (SPAtiallyReferenced Regressions On Watershed attributes)provide an opportunity to bring dynamicbiogeochemical modelling capability into EVO.

SPARROW is a mass-balance catchment modellingapproach which relates observed water quality tocatchment nutrient inputs and other catchmentattributes. SPARROW tracks the transport of nutrientsfrom inland waters to the coastal zone, explainingspatial patterns in stream water-quality conditions inrelation to human activities and natural processes. Fromthis approach, the model generates parameter valuesfor a range of processes, and produces predictions oflong-term average loads, concentrations, and flux ratesfrom each parameter with associated error estimates forall stream reaches within the modelled catchments.Examples of typical model output currently generatedfor the conterminous USA (a, b) and a major globalwatershed (the Mississippi; c, d) are presented in Figure6.22.

If the model were to be constrained to run atgeoclimatic region scale in the UK, it could be used togenerate optimal parameter ranges from data-rich areaswhich could be used within the NationalBiogeochemical Modelling Framework to run dynamic,daily time-step models in data-poor areas, enhancingbiogeochemical modelling capability to provide dailytime-step ensemble modelling capability for a range ofnutrient fractions, within an uncertainty framework. If themodel were to be constrained to run at geoclimaticregion scale in the UK, it could be used to generateoptimal parameter ranges from data-rich areas which

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

69

could be used within the National BiogeochemicalModelling Framework to run dynamic, daily time-stepmodels in data-poor areas, enhancing biogeochemicalmodelling capability to provide daily time-step ensemblemodelling capability for a range of nutrient fractions,within an uncertainty framework.

Work to adapt the EVOp National BiogeochemicalModelling Framework approach for application acrossthe conterminous USA would also provide a significantstep forward in developing an international exemplar ofthe extension of physically-based biogeochemicalmodelling capability internationally. This would allow awider range of geoclimatic regional sub-models to bedeveloped and tested in another data-rich region, andprovide the platform from which to upscale this regionalmodelling framework approach to provide physically-based nutrient flux modelling in both data-rich and data-poor regions globally. The models are ready to move tothis next level, but are currently running on differentplatforms which are not presently aligned. There is alsothe clear potential to extend this initially to modellingnutrient fluxes across the EU-27, and to develop a muchwider range of functionality within physically-basednutrient flux modelling by comparison and alignmentwith the pre-existing suite of European nutrient fluxmodels developed by the Alterra and JRC groups. Thecloud-enabled modelling framework developed in EVOpprovides the necessary infrastructure to support thisdevelopment.

6.2.14 The barriersThe only significant barrier to the realisation of theseopportunities is time and funding to supportdevelopment of the ensemble modelling capability and,ultimately, for data collection where modelparameterisation is limited in data-poor regions. Thevarious groups who own the IP for these models arekeen to progress this aspect of the biogeochemicalmodelling function and have worked on a suite of priorprogrammes including the European N Assessment(Sutton et al., 2011) and UN SCOPE Global N Cyclingprogrammes (see for example (Alexander et al., 2002;Boyer et al, 2002; Howarth et al., 2012; Johnes andButterfield; 2002). The European and USA N cyclingcommunities, in particular, are ready to move to the nextstage to generate physically-based modelling of N fluxfrom land to inland and coastal waters, from grid tomajor catchment scale, and within the development of acloud-enabled platform to enhance the data handlingand storage capacity. The only barriers to realising thispotential is sufficient funding.

There is the potential to extend this capability to globalscale, and the barriers there, in addition to funding, aredata availability and access, issues associated with datasecurity, and the engagement of a wider range ofinternational groups. The latter would be developedunder the umbrella of the International NitrogenInitiative. The greater barrier would be associated withdata access and security. However, a number of globaldatasets already exist and are readily available.Progress to overcome this barrier could be made by theadjustment of existing modelling approaches toaccommodate specific data availability at Global scale.

Figure 6.21 Simulation of N flux to aquaticsystems using physically-based regionalisedmodelling approaches (a) UK national exportcoefficient model for the year 2000 (thisproject), and (b) IDEA and INTEGRATORcombined models for EU27, for the year 2002(Leip et al., 2011).

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

70

6.2.15 ReferencesEuropean Commission (2002) Common ImplementationStrategy for the Water Framework Directive. GuidenceDocument No. 3, Analysis of Pressures and Impacts.The Directorate General Environment of the EuropeanCommission, Brussels.

Johnes, P. (1996) Evaluation and management of theimpact of land use change on the nitrogen andphosphorus load delivered to surface waters: the exportcoefficient modelling approach. Journal of Hydrology,183, 323-349.

Johnes, P.J. & Butterfield, D. (2002) Landscape,regional and global estimates of nitrogen flux from landto sea: Errors and uncertainties. Biogeochemistry, 57,429-476.

Johnes, P.J., Foy, R., Butterfield, D. & Haygarth, P.M.(2007) Land use scenarios for England and Wales:evaluation of management options to support 'goodecological status' in surface freshwaters. Soil Use andManagement, 23, 176-194.

Lawley, R. (2011) The Soil-Parent Material Database: AUser Guide. British Geological Survey Open Report,OR/08/034. 53pp.

Matias, N.G. & Johnes, P.J. (2012) CatchmentPhosphorous Losses: An Export Coefficient ModellingApproach with Scenario Analysis for WaterManagement. Water Resources Management, 26, 1041-1064.

Nasr, A. & Bruen, M. (2013) Derivation of a fuzzynational phosphorus export model using 84 Irishcatchments. Science of The Total Environment, 443,539-548.

Sims, J.T., Bergström, L., Bowman, B.T. & Oenema, O.(2005) Nutrient management for intensive animalagriculture: Policies and practices for sustainability. SoilUse and Management, 21, 141-151.

Tian, P., Zhao, G., Li, J., Gao, J. & Zhang, Z. (2012)Integration of monthly water balance modeling andnutrient load estimation in an agricultural catchment.International Journal of Environmental Science andTechnology, 9, 163-172.

Vitousek, P.M., Naylor, R., Crews, T., David, M.B.,Drinkwater, L.E., Holland, E., Johnes, P.J.,Katzenberger, J., Martinelli, L.A., Matson, P.A.,Nziguheba, G., Ojima, D., Palm, C.A., Robertson, G.P.,Sanchez, P.A., Townsend, A.R. & Zhang, F.S. (2009)Nutrient Imbalances in Agricultural Development.Science, 324, 1519-1520.

Figure 6.22 (a) Data flows and model output at continental scale using the SPARROW modelling approach, (b)incremental TOC flux to streams, (c) incremental TN flux to streams, (d) TN flux to catchment outlet.

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

71

An exemplar which demonstrated the potential value of the EVO approach to a global-scale science and policyquestion was developed. It addressed the sensitivity of carbon, water and energy fluxes to current uncertainty inclimate projections from the 22 GCMs used in the 4th IPCC Assessment. The IT problems being tackled related to:● increased accessibility to a large climate change impact model by the terrestrial climate change impacts

community;

● increased speed of computing;

● elasticity and auto-scalability.

This global exemplar was intended to develop a third approach to that being developed for the local and nationalexemplars thus providing an independent exploration of possible futures for EVO. It demonstrates the potentialbenefits of exploiting developments in other scientific communities (i.e. bioinformatics) and addresses particularchallenges in cloud-enabling large, complex models and managing the associated costs.

7.1 Science backgroundGeneral Circulation Models (GCMs) have been themain tool to address issues of climate change. Thecomplexity of these come with two penalties. First,even with the fastest computers, it remains onlypossible to make a few very selected simulations toinvestigate a given problem, thus restricting thenumber of global emissions scenarios that can betested, and accessibility by the impacts community.Second, there remains a great deal of uncertaintysurrounding climate projections. This is in part due totheir complexity which needs to be explored andcommunicated by the impacts community. This is aparticular concern for modelling land surfaceresponse, given its dual role as a fundamentalcomponent of the climate-carbon cycle system, and itsdirect link to food and water security. For this reason,an intermediate methodology was developed which

i. emulates climate models, and

ii. captures the full complexity of land surfacemodels.

This system provides an ability to assess terrestrialecosystem changes that might be expected in awarming world and of great value to the climatechange impacts community. More specifically the"pattern-scaling" methodology is utilised, with theconcept being that changes in surface climate, foreach month and at each geographical position, arelinearly proportional to the amount of average warmingover land. In addition, a global thermal energy balancemodel was built which maps from different pathways inatmospheric greenhouse concentrations to meanglobal temperature increase over land, which allows

scaling of the patterns. This structure is called the"GCM analogue model", (Huntingford and Cox, 2000)where "analogue" means replication (of the GCM). Theanalogue model system was developed for Version 3of the Met Office Hadley Centre GCM (HadCM3;Gordon et al., 2000) linked to the Met Office SurfaceExchange Scheme (MOSES; Cox et al., 1998) and theDynamic Global Vegetation Model (DGVM), calledTRIFFID (e.g. Clark et al., 2011; Huntingford et al.,2000). The full scheme was called IMOGEN("Integrated Model Of Global Effects of climaticaNomalies") (Huntingford et al., 2010). This systemoperates relatively quickly, and a full transientsimulation between modelled years of 1860 (for pre-industrial times) and year 2100, and for all "landpoints", can be undertaken in around 24 hours on thelatest processors. However, to sample full climateuncertainty, this work needed to be repeated forclimate patterns derived from the full 22 GCMs thatcontributed to the 4th IPCC assessment and using theenhanced land-atmosphere model which willcontribute to the next Unified Model of the UK, namelyJULES.

The science aims of the EVOp global exemplar were:

● To deliver a version of IMOGEN-JULES frameworkwith inputs/outputs that are in a format that couldbe easily operated from a web interface, anddriving its operation in a compute "cloud"environment.

● To define a set of scalable climate patternsrepresenting the 22 GCM simulations thatcontributed to the 4th IPCC assessment, all on acommon grid of 2.5°x3.75° spatial resolutions.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

72

● To develop a working prototype applying thisIMOGEN-JULES framework, within the cloud, toexplore the uncertainty bounds of current GCMpredictions on change in soil carbon stocks in soil(but with outputs for a wide range of other carbonand water fluxes).

7.2 Steps taken in implementation

7.2.1 Step 1The first step as in all exemplars concerned theidentification of potential end-users to ensure a clearfocus to the work carried out within the EVO Pilot. Astoryboard was created (see online Annex) to ensure alogical structure to the problem would be completedwhich addressed a specific need. Two end-users werechosen;

i. A NERC climate change impact scientist interestedin the impact of climate projects on carbon fluxesto and from the atmosphere and the implications ofuncertainty in climate projections spatially at aglobal scale

ii. A government department (e.g. DECC) interestedin exploring methods of communicatinguncertainty in climate projections.

The navigation route they take to explore thesequestions were identified as being essentially thesame.

7.2.2 Step 2Translation of the storyboard navigation path intoEVOp is currently accessed though a "Global" buttonwithin "Explore by Location". Eventually, otherpathways from the portal home page would beactivated (a “Climate Change” option within “Exploreby Topic”, “Show me the Data” and “Show me theResources”).

This takes the user to a welcome page which providessome information on the model (Figure 7.2):

7.2.3 Step 3The end-users define an emission scenario through aweb portal (Figure 7.3). Options include:

● Ecosystem Variables

q Carbon: Soil Carbon, Vegetation Carbon, NetPrimary Productivity

q Water: Evaporation, Soil moisture, Runoff

q Vegetation: Fractional cover of plant functionaltypes (PFTs)

● Dates for model run

q Absolute 2020, 2050 and/or 2100

q Relative to pre-industrial 1860

● Output type

q Data

q Maps

If the emission scenario has already been run before,the user would be able to access maps held in themodel library at no charge. Due to the large amount ofdata, data is not stored (Figure 7.4).

Currently, years are restricted to 2020, 2050 and 2100due to resource limitations. A full implementation of themodel would enable any year to be requested. After3-5 days, an email is sent with the informationrequested demonstrating ~85-90% reduction incomputing time which would be required without anEVO cloud solution (each one of the 22 GCM analoguemodels normally requires 3 days computing time = 66days).

Figure 7.1 Schematic of accessing IMOGEN runs library or cloud resources using global exemplar. (See folder fororiginal flowchart).

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

73

Figure 7.2 Portal welcome page.

Figure 7.3 Select options.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

74

7.2.4 Step 4Unlike the other EVO exemplars, for this exemplar theemission scenarios to be selected by the users sit onan Amazon EC2 compute node initialised via the'Biocloud Central' web server. The benefits of this arethat the amount of continual logging steps andauthentication steps are significantly reduced relativeto stepping through the Amazon EC2 console.

This significant reduction in computing time was madepossible through the implementation of Steps 5 -8 .

7.2.5 Step 5Developing a set of scalable climate patternsrepresenting the 22 GCM simulations that contributedto the 4th IPCC assessment, all on a common grid of2.5°x3.75° spatial resolutions. This was a major pieceof work as all GCMs do not use a common grid andthere are many complexities e.g. resolving issuesrelating to coastal grid squares.

7.2.6 Step 6These patterns together with JULES and IMOGENmodelling systems were ported to a custom virtualimage, CBL-Imogen, based on the CloudBioLinuxvirtual image.

The Amazon EC2 platform was selected as the biggestsupporter of open source software. There are costsassociated with the use of the Amazon cloud but thecomputer usage would have exceeded those providedfor academic use and there is a cost verse time trade-off. Benefits relative to the use of HPC is greateraccessibility and flexibility.

CloudBioLinux was used to enable the efficientutilisation of the Amazon cloud resources and exploit

the extensive experience from the bioinformaticscommunity using this cloud-managing interface.CloudBioLinux incorporates CloudMan, allowing it tobe deployed as clusters of compute nodes. Thebenefits are an accessible and reproducible cloudenvironment that enables decentralisation of servicesand realises a scalable model. A fully functionalcomputer cluster is created on demand that can bescaled depending on the requirements defined by theuser (fast but expensive or slow and cheaper).Computers are run in parallel thus increasing speed.Once the model runs are complete, the cluster createdterminates and the instances are stoppedautomatically.

All code is open-course and available in github :

● Imogen Infrastructure Portal:https://github.com/afgane/imogen

● GCM analogue portal: https://github.com/afgane/ghem

● CloudBioLinux: https://github.com/chapmanb/cloudbiolinux

7.2.7 Step 7Unlike the other EVO exemplars, a free and opensource post-processing tool called GrADS to createmaps from model output was used in the cloud. This isimportant for cloud computing as licensing may restrictthis kind of deployment. This is also a major step forthe climate change modelling scientists who havetraditionally used expensive proprietary softwareapproaches. Unfortunately, resources were notavailable to fully implement this in a dynamic way butthe principal has been demonstrated.

Figure 7.4 Access data (for demonstration purposes data is currently provided from the library).

Figure 7.5 Mean soil carbon estimate (kgC/m2), the standard deviation from the ensemble of the 22 GCManalogue models, the minimum and the maximum. Maps indicate the impact of uncertainty in climate projectionsfor soil carbon across the Amazon, Central and West Africa and SE Asia.

Figure 7.6 Mean net primary productivity (kgC/m2/year), the standard deviation from the ensemble of the 22 GCManalogue models, the minimum and the maximum. Again, maps indicate the impact of uncertainty in climateprojections lie primarily in the Amazon, Central and West Africa and SE Asia.

Env

ironm

enta

l Virt

ual O

bser

vato

ryw

ww

.evo

-uk.

org

76

7.2.8 Step 8A series of output maps for emailing to users weredesigned. This is a cost-effective way of deliveringresults as maps and parameters are saved, but not thedata. This is because storing large datasets in cloudand direct download off the cloud incur an ongoingcost but dispatching over e-mail is free (Figure 7.5).

7.3 Unique selling points of the globalexemplar

The following aspects have been tested / explored inthe EVOp Global exemplar:

● Cloud-enabling of a large, complicated model toincrease accessibility and flexibility in use ofcomputer resources with major reductions in runtime from 66 days to 3 days.

● Demonstration of value of use of more commercialcloud providers. Usage would have exceededpublic/academic resources and greater flexibilityand accessibility than HPC solution.

● Cloud-based web portal using a solution whichminimised authentication and verification needs,being the closest to our ideal user scenario.

● Cloud-based post-processing, non-proprietarymap visualisation tool.

● Autoscable use of computing resources for costefficient use of computing time.

● The benefits of using Open source solutions withall code developed for the project available on-line..

7.4 Future funding / next steps.Funding to cloud-enable the JULES model under theNERC Big Data initiative has been secured. This buildson the interest raised through the success of this EVOglobal exemplar. Once completed, this new project willprovide greater accessibility to this important modelbeyond its primary land-atmosphere community.

7.5 ReferencesClark, D.B. et al., 2011. The Joint UK LandEnvironment Simulator (JULES), model description -Part 2: Carbon fluxes and vegetation dynamics.Geoscientific Model Development, 4(3): 701-722.

Cox, P.M., Huntingford, C. and Harding, R.J., 1998. Acanopy conductance and photosynthesis model foruse in a GCM land surface scheme. Journal ofHydrology, 212(1-4): 79-94.

Gordon, C. et al., 2000. The simulation of SST, sea iceextents and ocean heat transports in a version of theHadley Centre coupled model without fluxadjustments. Climate Dynamics, 16(2-3): 147-168.

Huntingford, C. et al., 2010. IMOGEN: an intermediatecomplexity model to evaluate terrestrial impacts of achanging climate. Geoscientific Model Development,3(2): 679-687.

Huntingford, C. and Cox, P.M., 2000. An analoguemodel to derive additional climate change scenariosfrom existing GCM simulations. Climate Dynamics,16(8): 575-586.

Huntingford, C., Cox, P.M. and Lenton, T.M., 2000.Contrasting responses of a simple terrestrialecosystem model to global change. EcologicalModelling, 134(1): 41-58.

Environm

ental Virtual O

bservatoryw

ww

.evo-uk.org

77

The Pilot Environmental Virtual Observatory, EVOp, was a proof of concept project to develop new cloud-based applications for accessing, interrogating, modellingand visualising environmental data, by developing local and national scale exemplars. EVOp has demonstrated how cloud technologies can make environmentalmonitoring and decision making more efficient, effective and transparent to the whole community.

www.evo-uk.org www.evo-uk.org

This work was supported by the Natural Environment Research Council pilot project,Environmental Virtual Observatory (NE/I002200/1).