the UM with Rose/cylc NCAS Introduction to...

81
University of Reading, 5 May 2017 NCAS Introduction to Running the UM with Rose/cylc Part 1: Overview

Transcript of the UM with Rose/cylc NCAS Introduction to...

Page 1: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

University of Reading, 5 May 2017

NCAS Introduction to Running the UM with Rose/cylc

Part 1: Overview

Page 2: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Contents

• Repositories

• The UMUI versus Rose & Cylc

2

Page 3: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

UM code (pre version 9.0)

• Master repositories inside Met Office.

• Local repositories on PUMA and other sites with:

– A copy of code releases (vn8.1, vn8.2 etc).

– Local user code developments.

– Some user developments from other repositories.

• Manual intervention required to share code releases &

developments.

3

Page 4: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Repository (pre version 9.0)

4

Make code changes

Extract code intoUMUI job

PUMA

Local UM repository

Make code changes

Extract code intoUMUI job

Met Office

Local UM repository

Local UM repository

Make code changes

Extract code intoUMUI job

BoM

Local UM repository

Make code changes

Extract code intoUMUI job

KMA

Page 5: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

MOSRS

Met Office Science Repository Service

• Common FCM repository for all UM users from vn10.0.

• Used for other Met Office codes (e.g. JULES, MONC) and to

document collaborative projects (e.g. UKESM, GMED):

https://code.metoffice.gov.uk/trac/home/wiki/ProjectList

Working with MOSRS:

• Users access Trac system (wiki and tickets) via web.

• Code changes are made by remotely checking out and committing

to MOSRS.

• Sites have local read-only mirrors, used by suites when extracting

code (faster than accessing MOSRS).

5

Page 6: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

MOSRS

6

Met Office Shared Repositories

(UM, JULES, etc)

PUMA

Local Mirror of Shared

Repositories

Make code changes

Extract code intoRose suite

Updated every 5 mins

Make code changes

Extract code intoRose suite

Met Office

Local Mirror of Shared

Repositories

Page 7: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Using MOSRS

Register for an account via NCAS-CMS or group sponsor (MOAP, JWCRP etc).

Need to set up password caching to read & write to repos.

To check out code use .x: fcm co fcm:um.x-br/dev/...

Suite extracts from the mirror (.xm): fcm:um.xm-tr

7

Page 8: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Code developments

Non-Met Office users can now directly submit changes intended for the trunk.

Need to follow very specific procedure, which includes:

• Liaising with code owner• Documenting changes & testing in a code ticket on MOSRS • Running rose-stem developer tests (on MONSooN or

colleague may run inside Met Office).

Further details:

https://code.metoffice.gov.uk/trac/um/wiki/working_practices

Submission schedule and release deadlines:

https://code.metoffice.gov.uk/trac/um#ReleaseSchedule

8

Page 9: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Contents

• Repositories

• The UMUI versus Rose & Cylc

9

Page 10: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

The UMUI

• Each site has its own UMUI database (e.g. PUMA, Met Office)

• UMUI release for each UM version.

• Contains logic to create namelists and submission scripts.

• Issues & limitations of the UMUI:

– Jobs not under version control.

– Manual copying required to share jobs between sites.

– No direct mapping between UMUI windows and namelist

entries.

– Users cannot make changes to UMUI windows or logic

(“hand-edit” scripts are run on generated files).

10

Page 11: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Rose and cylc

• From UM vn9.0 the UMUI is replaced by Rose.

• Rose is a set of tools for running & managing scientific applications (not specific to the UM).

• “Jobs” are replaced by “suites”.

• Tightly integrated with Cylc & FCM.

– Cylc is a powerful workflow management and scheduling system.

– FCM is the code management system (version control & build).

11

Page 12: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

UM software (pre vn9.0)

12

UMUIDatabase of user jobsGraphical job editor

ReconfigurationPrepares initial model state

Output file toolsData processingAnalysis and visualisation

PUMA

HPC

Data analysis platform

Input file toolsPrepare ancillary data

FCMCode manager

Compilation and build

ArchivingMoves data

OASISCoupler

NEMO/CICEOcean & sea-ice

Atmosphere modelDynamical corePhysics Diagnostics (STASH)

JULES, SOCRATES, UKCA

IO server

Page 13: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

UM software (vn10.0 onwards)

13

UMUI RoseExperiment databaseApplication editor

ReconfigurationPrepares initial model state

Output file toolsData processingAnalysis / visualisation

PUMA

HPC

Data analysis platform

Input file toolsPrepare ancillary data

MOSRS

FCMCode/suite manager

Compilation and build

Cylc Workflow manager & job submission

OASISCoupler

NEMO/CICEOcean & sea-ice

Atmosphere modelDynamical corePhysics Diagnostics (STASH)

JULES, SOCRATES, UKCA

IO serverXIOSIO server

PostprocData processing & archiving

Page 14: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Rose suites

• Suites consist of a set of text files defining: namelists, metadata, application configuration & auxiliary data.

• Suite editor GUI displays only what is defined in files (based on metadata):

– Can add your own variables, metadata etc.

• Rose generates namelists & submission scripts only based on suite files (unlike the UMUI).

• Cylc allows you to define complex workflows.

• However… suites are somewhat stand-alone (individual suites need to contain ARCHER settings), and runtime settings can look quite different between suites.

14

Page 15: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Rose/cylc tools

• Suite repository browser (rosie): – Suites are under version control– Global suite repository on MOSRS

• Suite editor (rose edit): – Searchable, undo button etc.

• Progress monitor/manager (gcylc): – Can pause, stop, restart and edit running suites.

• Output log viewer (rose-bush): – Web-based for PUMA-ARCHER suites

• Can use GUIs or command-line tools.

15

Page 16: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

University of Reading, 5 May 2017

NCAS Introduction to Running the UM with Rose/cylc

Part 2: Running the UM

Page 17: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Suite repository

Rose suites are held in a repository either locally (e.g. puma) or

on the MOSRS (‘u’).

• Suites are under version control.

• Suite-ids are a combination of letters and numbers e.g:

– u-aa003, u-al624 etc

(No concept of experiment groupings as with UMUI).

• To launch the graphical suite repository viewer:

rosie go &

• You can copy suites via the GUI or command-line tools:

rosie copy <suite-id>

17

Page 18: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Rosie go

18

Double-click to check-out and edit or run a suite

Click to copy a suite

Add search terms

Page 19: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

A basic UM suite

A basic UM suite consists of the following tasks:

• Tasks may have different names in different suites. • More complex suites may have additional tasks, for e.g.

post-processing, archiving logs, testing etc.

19

fcm_make Code extract and mirror

fcm_make2 Code pre-processing and build

recon Reconfiguration

atmos Model run

Page 20: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Editing suites

To launch the Rose suite editor GUI, double-click a suite in the Rosie GUI.

Suites consist of a set of text files:

• Checked-out suites live in a directory: $HOME/roses/<suite-id>/

To launch the Rose editor GUI from the command-line: • Go to the suite file directory

• Run: rose edit &

Scientific settings look similar to the UMUI.

20

Page 21: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Rose edit

21

Top-level suite settings

UM model settings

Code & build settings

Page 22: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Using the GUI

Features:

• Search facility (or can grep suite files).

• Undo/redo button.

• Highlighting of unsaved changes.

• Basic description of each field plus additional information

(only as useful as metadata provided!)

• Immediate type checking (where specified in metadata).

• Further checking with “Check fail-if/warn-if” macro.

22

Page 23: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Editing a UM suite

23

Undo / redo

Search

Error details

Total errors

Page 24: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Running suites

• ARCHER suites are submitted from PUMA.

• Monsoon suites are submitted from the Rose VM.

• Run from editor GUI or command line:rose suite-run

• Launches a GUI which displays progress of suite.

• Running suite is controlled through a daemon running on PUMA (or equivalent).

• Users can: – stop, pause, and restart suites– edit running suites– re-run parts that have failed.

24

Page 25: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

25

Rose suite-run

Page 26: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Output log files

Log files are copied to PUMA: ~/cylc-run/<suite-id>/log

Can view through rose-bush:

PUMA: http://puma.nerc.ac.uk/rose-bush/

MONSooN: Launch firefox on exvmsrose then go to:

http://localhost/rose-bush/

Note: Rose bush only shows files from last run (files from older runs are tarred up by default).

Log messages from the model are sent to job.out and job.err (equivalent of .leave files).

Other errors, e.g. submission failures and suite timeout messages are reported in: job-activity.log, suite/out, suite/err

26

Page 27: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

27

Rose bush

Page 28: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Output data

All Rose/cylc suite files are held in the directory:

~/cylc-run/<suite-id>/

• Note: This is a symlink to your $DATADIR (i.e. /work)

Model output and restart files are usually written to: share/data/History_data/

Other files may live under the task work directory, e.g: work/atmos/

19810901T0000Z/19811001T0000Z/

Build files go to:

share/fcm_make/

28

Page 29: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Terminology recap

29

Suite: Equivalent of a job

Rose: Application management system

Rosie: Suite repository manager

Rose edit: Application editor

Rose bush: Output log viewer

Cylc: Scheduling & workflow system

FCM: Code management & build system

Page 30: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

University of Reading, 5 May 2017

NCAS Introduction to Running the UM with Rose/cylc

Part 3: Version control and suite discovery

Page 31: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Suites and version control

• Suites are held in an FCM (subversion) repository:

– u : main repository on MOSRS

– puma : local repository for testing only

– mi: Met Office internal

• Suite-ids are prefixed with the repository id:

– u-aa003, puma-aa003, mi-aa003 are all different suites

• You use a mix of rosie and FCM commands to interact with the repository…

31

Page 32: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Rosie checkout & copy

• You need to check out a suite to edit and run it.

– Double-click on the suite in rosie go

– On the command line: rosie checkout <suite-id>

• All checked out suites go to your ~/roses directory.

• Usually you want to copy to your own suite (so that you can save changes back to the repository):

– In rosie go right click and select “Copy Suite”

– On the command line: rosie copy <suite-id>

– This checks out the suite as well.

32

Page 33: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

FCM commits

• Any changes you make in the Rose editor are saved in your checked out working copy only.

• To save changes to the repository:

– cd ~/roses/<suite-id>

– fcm commit

• Useful commands:

– fcm status

– fcm diff

• There is a Trac system (with wiki & ticket system) for the suite repository so you can document changes (in the same way as for code development).

33

Page 34: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Rosie suite browser

rosie go: Graphical suite browser

• By default lists your checked out suites only.

• Provides suite status information such as repository modifications, local changes etc.

• Search facility to browse all suites in the puma and u repos.

• Can copy, checkout, edit and run suites from here.

rosie ls : Lists checked out out suites on command-line

34

M u-aj789/trunk@32775 annetteosprey HighResMIP [Dev ARCHER] GC3.1 N512 ORCA025M u-ai656/trunk@28813 annetteosprey um_idl_LAM [Development] ENDGame RCE (ARCHER)

Page 35: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

35

Status of checked out suites

Ordered by suite owner

Page 36: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

University of Reading, 5 May 2017

NCAS Introduction to Running the UM with Rose/cylc

Part 4: Editing a UM suite

Page 37: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Basic UM suite tasks

A basic UM suite consists of the following tasks:

Task settings can be found under the app headings (fcm_make and um) in the GUI.

37

fcm_make Code extract and mirrorfcm_make app

fcm_make2 Code pre-processing and build

recon Reconfiguration um app

atmos Model run (with cycling)

Page 38: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Basic UM suite

38

Apps

Top-level suite settings

UM model settings

Page 39: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

39

Code changes

Add branch

Page 40: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Model changes

Similar to UMUI

Page 41: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Runtime settings

Model settings i.e. namelist entries are largely the same between suites as they use standard metadata.

However the higher level runtime settings can appear in different places depending on the suite. These include:

• Model run length, cycling, start date• Start file• Job time-limits and number of processors• Machine username and project code

There is no standard format for what appears at the top-level suite-conf in the GUI - this is suite dependent. You may need to edit the suite.rc file directly.

41

Page 42: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

GA7 suite

42

More apps

More options in top-level control

Page 43: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Date and time format

Rose and cylc use the ISO 8601 date format to specify the start date, run length and cycling periods (CRuns).

Format:

• Specific date & time: CCYYMMDDThhmmZ– E.g. 20160915T1430Z

• Time period: PnYnMnDTnHnMnS– E.g. P10Y6M, P3M, P1D, PT6H

In some suites (e.g. GC3) it may not be necessary to write ISO dates directly - they may be translated from a [Y, M, D, H, M, S] array.

43

Page 44: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Dates in a GA7 suite

44

Specific date

Time period

Page 45: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Exploring STASH

• STASH can look daunting:

– Each STASH request & profile is listed by a hash-index

(because of the way Rose processes namelists).

– Select the top-level menu item (e.g. “STASH Requests” or

“Domain Profiles”) for a summary table.

• Underneath the model is doing the same thing as before:

– STASH requests need a domain, time and usage profile.

– Need to consider output streams, file headers and

re-initialisation.

45

Page 46: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Adding STASH

To add a new STASH request:

• Select “New” from the summary page, then edit entry in table. • Just because a diagnostic is available doesn’t mean it works.

To add a new profile:

• Copy (“Clone”) an existing profile, give it a new name, and make at least one change to it.

Important:

• Run the STASH macro (TidyStashTransfrom) to generate indices for any new requests or profiles.

• Run the checker (Validate) macros to verify STASH items are available and are set up correctly.

46

Page 47: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

STASH requests table

47

Page 48: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

STASH macros

48

stash_testmask.STASHTstmskValidate Check STASH output is valid given science configurations

stash_indices.TidyStashValidate Checks if STASH and item namelists have the correct index.

stash_indices.TidyStashTransform Correct any namelist indexes. Run this when adding new STASH requests or profiles.

stash_indices.TidyStashTransformPruneDuplicated

As above, plus removes duplicate entries.

stash_requests.StashProfilesValidate Checks for problems with referencing between stash requests and profiles.

stash_requests.StashProfilesRemoveUnused

Removes any profile namelists which aren't being referenced by any stash requests.

Page 49: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

STASHmaster file

The STASHmaster file lives in the UM code trunk.

To make changes (for example if you are editing/adding a

diagnostic):

• Either edit in a branch or set your suite to use a local

STASHmaster file.

• See the UM Rose training on MOSRS for instructions:

https://code.metoffice.gov.uk/doc/um/latest/um-training/stashmaster.html

49

Page 50: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

University of Reading, 5 May 2017

NCAS Introduction to Running the UM with Rose/cylc

Part 5: Running and controlling suites

Page 51: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Running suites

To run a suite:

• Click the play button from the rosie or rose edit GUIs. • From the command-line:

– Navigate to the suite directory, e.g: cd ~/roses/u-aa774– Then: rose suite-run

There are many options to this command - some useful ones are listed in the following slides.

Rose/cylc provide powerful options for pausing, stopping & restarting suites.

51

Page 52: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Rose suite-run

Useful options to rose suite-run:

52

--new-N

Delete any files from previous runs.

--restart Restart suite from where it stopped.

--reload Update suite definition for an already running suite

-v -v --debug Debug/get more info.

--no-gcontrol Do not launch cylc GUI (progress monitor).

--no-log-archive Don’t tar up (archive) log files from previous runs

--name=NAME-n=NAME

Give the suite a different name to the base directory.

Page 53: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Rose suite-run

Options that allow you to see what the suite does without

actually running - useful for complex suites:

53

-- --hold Hold (pause) suite on startup.

-- --mode=simulation Perform a dummy run (executes all tasks as 'sleep 10').

--install-only-i

Construct app files & copy over to remote hosts but don’t run

--local-install-only -l

Construct app files but don’t copy over to remote host.

( cylc options )

Page 54: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

When a suite is launched

When you launch a suite:

• The run scripts and application files (e.g. namelists) are

copied to the cylc-run directories on the required machines.

• A suite manager process is launched on the suite host (PUMA

or the Rose VM).

• Cylc keeps track of the progress of running tasks.

• Cylc submits tasks according to dependencies defined in the

suite (e.g. build, then reconfiguration, then atmosphere

model).

• On PUMA cylc monitors remote (e.g. ARCHER) tasks by

polling every 5 minutes.

54

Page 55: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Cylc GUI

The Cylc GUI (gcylc) is launched automatically when you submit a suite.

• You can safely shutdown the GUI and log out of PUMA without affecting the suite.

• To relaunch the GUI, go to roses directory and run:

rose sgc or rose suite-gcontrol

The Cylc GUI allows you to:

• View the task dependencies in your suite • Monitor suite progress and view log files• Kill, re-run and change the status of tasks• Pause, release, and shutdown the suite

55

Page 56: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Graph view

Ungroup tasks Succeeded

Submitted (queuing)

Running

Waiting (for other tasks to complete)

Page 57: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Right click on task row

Page 58: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Reloading a suite

• You can make changes to a suite whilst it is running by

reloading the suite definition.

rose suite-run --reload

• Warning: if you try to reload through the Cylc GUI - it will

only pick up changes to the suite.rc file & not anything else.

• You can then re-trigger failed tasks.

• Note: the reload won’t affect any tasks that have already

been submitted. You will have to kill & re-trigger to run with

the new changes.

58

Page 59: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Right click

Page 60: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Suite lifecycle

• Normally a suite will shutdown if it completes successfully.

• Usually if the suite fails, it will remain active until it times out (the

timeout period is defined in the suite.rc file).

• To see what suites you have running:

– rose suite-scan (command line listing)

– cylc gscan (graphical view)

• Whilst a suite is active you can view it through the cylc GUI:

– rose sgc

• Once it has stopped, you will have to look at rose-bush or the

cylc-run directory to see whether it succeeded.

• Stopped suites can be restarted from where they left off.

60

Page 61: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Restarting a suite

Restarting a suite loads the state of a previous run, and allows you to continue on from that point, in order to:

• Fix a failed suite (without starting again from the beginning).

• Continue the run for longer.

Warning: note the difference between:

rose suite-run --restart : re-install & restart suite

• Use to update suite with any changes e.g. run length. • Usually what you want to do.

rose suite-restart : restart without re-installing suite

• Will not pick up any suite changes. 61

Page 62: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

When a suite restarts

• Rose/cylc checks the task states are all correct:

– Because the suite can be shutdown or crash whilst the

tasks are still running.

• If the previous run succeeded:

– It submits from where it left off (continuing a long run).

• If the suite had failed tasks:

– It leaves it as it is, allowing you to make any necessary

fixes.

– Then manually re-trigger failed tasks and the suite will

continue on as normal.

62

Page 63: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Stopping a suite

To stop a running suite:

• Click the stop button in the GUI • rose suite-shutdown or rose suite-stop

Different modes of shutdown:

From the GUI select “Control -> Stop suite” to access options.

63

(default) stop the suite after current active tasks have finished

-- --now stop immediately, orphaning current active tasks

-- --kill stop after killing current active tasks

Page 64: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Further information

• For more information on the command line tools (for fcm, rose and cylc) use ‘help’:

rose help <command>rose <command> --helprose <command> -h

• There is a lot more that can be done with Rose and cylc, e.g. setting up complex workflows; adding new tasks at run-time; configuring how the suite is visualised; writing your own macros etc. See the documentation for details:

http://cylc.github.io/cylc/html/single/cug-html.htmlhttp://metomi.github.io/rose/doc/rose.html

64

Page 65: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

University of Reading, 5 May 2017

NCAS Introduction to Running the UM with Rose/cylc

Part 6: Rose/cylc suite files

Page 66: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Understanding Rose suites

A Rose suite consists of:

• Suite definition: – Describes tasks to be run and in what order – Describes where and how to run tasks

• App definition: – Application settings such as input files, namelists etc

• Metadata: – Defines how settings are displayed in the GUI. – Can also provide help, logic, macros.– The UM picks up central metadata.

66

A task can be an application (app) such as the UM which is defined in the suite.

Page 67: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

UM suite directory

A basic UM suite directory structure:

roses/u-aa774/

rose-suite.conf

rose-suite.info

suite.rc

app/

fcm_make/

rose-app.conf

file/

fcm-make.cfg

um/

rose-app.conf

meta/

rose-meta.conf

67

Suite metadata

UM application settings: - namelists, environment

variables, file locations

Build settings:- code branches,

platform configuration

Suite definition & settings

Page 68: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Suite files

Top-level suite files:

Application files (under app/.../):

68

suite.rc Workflow & task definitions (cylc)

rose-suite.info Rosie suite information (Rose)

rose-suite.conf Suite settings (Rose)

meta/rose-meta.conf Suite metadata (Rose)

rose-app.conf Application settings (Rose)

file/fcm-make.cfg Build settings (FCM)

Page 69: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Suite level:

Application level:

Additional suite files

69

site/<machine>.rc Machine specific settings to be included in suite.rc file (Cylc).

opt/rose-app-<name>.conf Optional settings to be applied to application (Rose).

meta/lib/ etc/

Custom macros.

bin/ file/

Other (non Rose/cylc format) files required by suite.

Page 70: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Optional configurations

70

Click to view optional configuration overrides

Page 71: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Editing Rose suites

• The rose suite files are plain text so can be edited directly.

– But it is usually easier to use the GUI because of the extra

information from the metadata.

• The GUI just displays settings defined in these files according

to the metadata.

• However: The you cannot edit the suite.rc file in the GUI.

This defines the workflow (list of tasks & dependencies), and

job submission details (hostname, project-code etc).

• Warning: There is no standard form for the suite.rc files. So they can look different between suites.

71

Page 72: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

suite.rc

Cylc suite configuration file (extended INI format), containing:

• Suite “workflow” in terms of: – Tasks (e.g. compilation, reconfiguration, and atmosphere

run).– Dependencies between tasks (the order in which they

should run) and cycling.

• Task definitions: – Each task runs a command or an app. – Need to specify where and how tasks are run (e.g.

ARCHER/Monsoon, job time limit, project code etc).

• Suites can use inheritance and templating (though Jinja) to generate highly complex workflows.

72

Page 73: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Simple workflow

73

# suite.rc[ scheduling ] initial cycle point = 18010101T0000

final cycle point = 18020101T0000

[[ dependencies ]] [[[ R1 ]]]

graph = “fcm_make => \ fcm_make2 => \ recon => atmos” [[[ P3M ]]] graph = “atmos[-P3M] => atmos”

(Courtesy of Scott Wates)

Describes when tasks are run.

Cycling for long model runs.

Run once - start up tasks

Page 74: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

74

• Complex workflow from NEMO data assimilation suite.

• Tasks are grouped.

Page 75: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

75

Task definitions

[runtime] ...

# Atmosphere Model Run [[atmos]] inherit = XC30 [[[directives]]] -l select=1 -l walltime=00:20:00{% if HPC_QUEUE is defined %} -q = {{ HPC_QUEUE }} {% endif %} [[[environment]]] UM_INSTALL_DIR = /work/y07/y07/umshared ROSE_TASK_APP = um ASTART=../recon/atmos.astart FLUME_IOS_NPROC = 0 OMP_NUM_THREADS = 1 MPI_TASKS_PER_NODE = 24 TASKS_PER_NUMA = 12 TOTAL_MPI_TASKS = 24 ROSE_LAUNCHER_PREOPTS="""-ss -n$TOTAL_MPI_TASKS -N $MPI_TASKS_PER_NODE -S $TASKS_PER_NUMA -d $OMP_NUM_THREADS"""

Definition for each task

Inherit settings

Jinja code

Variable set in rose-suite.conf

PBS header

App to run

aprun settings

Page 76: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

rose-suite.conf

Generally used to set top-level variables used in suite.rc file.

76

[jinja2:suite.rc]BUILD=trueHPC_ACCOUNT='n02-cms'HPC_HOST='login.archer.ac.uk'HPC_QUEUE='short'HPC_USER='annette'RECON=trueVN='10.5'

[[XC30]]...

[[[directives]]] -W umask = 0022 -A = {{ HPC_ACCOUNT }} [[[job submission]]] method = pbs [[[remote]]] host = {{ HPC_HOST }}{% if HPC_USER is defined %} owner = {{ HPC_USER }}{% endif %}

rose-suite.conf suite.rcApply these variables to suite.rc file

Page 77: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

University of Reading, 5 May 2017

NCAS Introduction to Running the UM with Rose/cylc

Part 7: Summary

Page 78: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Rosie checkout v copy

78

Suite repository PUMA directory

rosie copy● Creates a new

identical item in the repository.

● And pulls files to local directory.

u-ab123~/roses/

u-ab123/

u-ab123~/roses/

u-ab456 u-ab456/

rosie checkout

● Pulls files into local directory.

● Does not alter repository

Double click in rosie go

Page 79: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

roses v cylc-run directories

79

PUMA ARCHER

Checkout/copy suite →● Creates ~/roses/<suite-id>/ ● Make changes to these files

Submit suite →● Creates ~/cylc-run/<suite-id>/ ● Creates ~/cylc-run/<suite-id>/

● This is a symlink to /work

● This includes suite files (from roses) plus○ Output from tasks that run

on PUMA (fcm_make)

● This includes suite files plus○ Output from ARCHER

tasks (fcm_make2, recon, atmos) in share/ work/

○ Logs from all tasks (copies logs from ARCHER)

○ Suite logs

○ Logs (job.out/job.err) from ARCHER tasks under log/

Page 80: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

Common issues

• Be careful about editing suite files with the rose editor GUI open:

– If you run the suite through the GUI or save the suite, any

changes made outside the rose editor will be lost.

• Be aware of optional configurations:

– You can’t make changes to these through the editor, but you

can view them by clicking the icon.

• The cylc_runs/ directory can fill up fast on PUMA:

– These can usually be deleted as the logs are also on the HPC.

• Mail event notifications on PUMA and rose host-select for

ARCHER require some setup.

– See: http://cms.ncas.ac.uk/wiki/RoseCylc

80

Page 81: the UM with Rose/cylc NCAS Introduction to Runningcms.ncas.ac.uk/documents/training/RoseMay2017/UM... · 2017. 5. 5. · UM code (pre version 9.0) • Master repositories inside Met

CMS Rose/cylc setup and useful information:

http://cms.ncas.ac.uk/wiki/RoseCylc

UM code and documentation papers:

https://code.metoffice.gov.uk/trac/um

Documentation:

• Rose: http://metomi.github.io/rose/doc/• Cylc: http://cylc.github.io/cylc/html/single/cug-html.html• FCM: http://cylc.github.io/cylc/html/single/cug-html.html

Further information

81