Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG...
Transcript of Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG...
![Page 1: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/1.jpg)
SLIDE 1 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
ApplicationScheduling
Richard Lagerstrom15 MAY 2003
![Page 2: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/2.jpg)
SLIDE 2 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Scheduling Hierarchy and Scope
UNICOS/mpSingle Node,
MultiprocessorProcess
PScheDSingle System,
MultinodePlacement
PBS ProOrganizational,
Departmental, or ClusterBatch
GlobusGlobalGrid
ExampleScopeName
![Page 3: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/3.jpg)
SLIDE 3 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Functional Organization
pbs_mom
pbs_server pbs_sched
PScheD Kernel
PBS Pro
UNICOS/mp
![Page 4: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/4.jpg)
SLIDE 4 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
History
• Psched was ported from Cray T3E
• Enhanced to do initial placement
• Modified to support multi-CPU nodes
• User and admin. displays through psview
• More displays with apstat
• Cray X1 kernel cannot initiate applicationswithout the assistance of psched
![Page 5: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/5.jpg)
SLIDE 5 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Introduction
• Placement strategies
• Placement requirements
• Starting an application
• Gang scheduling
• Migration
• The PBSpro interface
![Page 6: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/6.jpg)
SLIDE 6 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Placement Strategies
Many configurable options
• Equalize node workload
• Minimize node fragmentation
• Maximize processor utilization
![Page 7: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/7.jpg)
SLIDE 7 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
• Power-of-2 MSP/SSP per node
• Memory loaded when executing
Accelerated applications need:
• Global address space ID (GASID) for off-node accelerated memory references
• Node contiguity
Placement Requirements
![Page 8: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/8.jpg)
SLIDE 8 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Six-node Example
• Each node has 4 MSPs and 16GB memory
• Five with application flavor
• One with operating system and supportflavors
AppApp App App OS/SUPApp
![Page 9: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/9.jpg)
SLIDE 9 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Application Mapping
How many PEs are allocated to a node?
• User option to choose PEs/node
• Psched will pick a mapping by default
• Memory usage per PE is the major reasonfor user specified mapping
![Page 10: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/10.jpg)
SLIDE 10 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Mapping Examples
-n 10 –N 4
-n 10 –N 2
-n 5 –N 1
![Page 11: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/11.jpg)
SLIDE 11 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Posting an Application
Support node phase
• Run aprun -n x -N y a.out
• aprun checks for option errors
• aprun sends an RPC request to psched topost the app for placement
• aprun waits for a signal to continue
• Psched gets PBSpro queue limits
• Psched creates an apteam and joins aprun
![Page 12: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/12.jpg)
SLIDE 12 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Placing an Application
• Psched generates placement information
• Psched sends placement information to thekernel
• On the next time slice psched sends a startsignal to aprun to exec() PE 0 of the app
![Page 13: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/13.jpg)
SLIDE 13 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Application Startup
Application node(s) phase
• Execution begins in startup() which setsup the shared memory environment
• All PEs of the app are created with a placedfork()
• App execution begins in main()
![Page 14: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/14.jpg)
SLIDE 14 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Running an Application
User types aprun -n20 a.out
Waitrequest
RPCrequest
psched1 2 3
kernel - apteamctl
ExecPE 0
Startup�app runs
Move to app node[0]
![Page 15: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/15.jpg)
SLIDE 15 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Application Execution
• Apps are time sliced by psched
• Memory of inactive apps may page out
• Memory of active apps is locked in
![Page 16: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/16.jpg)
SLIDE 16 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Gang Scheduling
Five gangs – three parties
Three time slice example
![Page 17: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/17.jpg)
SLIDE 17 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Gang Scheduling 1
Five gangs – three parties
First time slice
Five gangs – three parties
![Page 18: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/18.jpg)
SLIDE 18 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Gang Scheduling 2
Five gangs – three parties
Second time slice
Gang Scheduling
Five gangs – three parties
![Page 19: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/19.jpg)
SLIDE 19 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Gang Scheduling 3
Five gangs – three parties
Third time slice
Gang Scheduling
Five gangs – three parties
![Page 20: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/20.jpg)
SLIDE 20 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
Application Migration
• A target place list is generated
• The app is disconnected to stop executionand unlock its memory pages
• The target place list is given to the kernel
• The app is connected
• Memory pages are moved from the originnodes or disk to the target nodes
• Execution begins on the target nodes
![Page 21: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/21.jpg)
SLIDE 21 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
AAAApppppppplllliiiiccccaaaattttiiiioooonnnn EEEExxxxiiiitttt
• Each PE exits
• PE 0 waits for all other PEs to exit
• When PE 0 exits the kernel detects the PEcount is zero
• The kernel sends psched the app exit signal
• Psched deletes the kernel’s apteam entryand its internal information about the app
![Page 22: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/22.jpg)
SLIDE 22 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
PPPPBBBBSSSSpppprrrroooo////PPPPsssscccchhhheeeedddd IIIInnnntttteeeerrrrffffaaaacccceeee
PBS pschedResource and
Usage information
Application queuelimits
Limitsdatabase
![Page 23: Scheduling Application · PE 0 of the app. SLIDE 13 Application Scheduling Richard Lagerstrom CUG 2003 / Columbus, Ohio, USA 6/2/03 Application Startup Application node(s) phase ...](https://reader033.fdocuments.us/reader033/viewer/2022050302/5f6afd8deb1dc970de1f966e/html5/thumbnails/23.jpg)
SLIDE 23 Application Scheduling Richard LagerstromCUG 2003 / Columbus, Ohio, USA6/2/03
End
---end---