1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath...
description
Transcript of 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath...
![Page 1: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/1.jpg)
1Software and Services Group 1
Execution FrontiersCnC support for highly adaptive execution
Kath Knobe Intel
12/07/12
![Page 2: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/2.jpg)
2Software and Services Group 2
Warning • This is all high level conceptual thinking• Many details to be determined• Today: just the basic idea without any concern for efficiency.• Lots of room for optimizing
Suggestions /comments more than welcome!
![Page 3: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/3.jpg)
3Software and Services Group 3
Motivation: Highly adaptive computing for exascale
Critical exascale issues (inspired by work on UHPC and X-Stack)Require the ability to move currently executing parts of the app to another place in the platform or to a later time.
• Resilience−Fragile components−Lots of them
• Power management−Power components off−Power components down
• Self-aware computing−Modify mapping based on feedback
• Change of goals−Between power and time to solution, for example
Thesis: management of the execution frontiers in CnC is a mechanism supporting highly adaptive computing for exascale.
![Page 4: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/4.jpg)
4Software and Services Group 4
Checkpoint/restart Hierarchical CnC
Hierarchical checkpoint/restart
Hierarchical checkpoint/restartFor adaptive execution
2 passes - Abstract: unlimited resources - Actual: with resource constraints
For faults
![Page 5: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/5.jpg)
5Software and Services Group 5
Outline• Abstract (platform has infinite memory and processors)
−Semantic state−Checkpoint/restart−Hierarchical CnC −Hierarchical checkpoint/restart
• Actual (with resource constraints)• Beyond faults
![Page 6: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/6.jpg)
6Software and Services Group 6
Outline• Abstract
−Semantic state−Checkpoint/restore−Hierarchical CnC −Hierarchical checkpoint/restart
• Actual • Beyond faults
![Page 7: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/7.jpg)
7Software and Services Group 7
Outline• Abstract
−Semantic state−Checkpoint/restore−Hierarchical CnC −Hierarchical checkpoint/restart
• Actual • Beyond faults
![Page 8: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/8.jpg)
8Software and Services Group 8
Semantics / execution model
Itemavail
tagavail
![Page 9: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/9.jpg)
9Software and Services Group 9
Semantics / execution model
Itemavail
stepcontrolReady
stepdataReady
tagavail
![Page 10: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/10.jpg)
10Software and Services Group 10
Semantics / execution model
Itemavail
stepcontrolReady
stepready
stepdataReady
tagavail
![Page 11: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/11.jpg)
11Software and Services Group 11
Semantics / execution model
Itemavail
stepcontrolReady
stepready
stepdataReady
tagavail
![Page 12: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/12.jpg)
12Software and Services Group 12
Semantics / execution model
Itemavail
stepcontrolReady
stepready
stepdataReady
tagavail
![Page 13: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/13.jpg)
13Software and Services Group 13
Semantics / execution model
Itemavail
stepcontrolReady
stepready
stepdataReady
stepexecuted
tagavail
![Page 14: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/14.jpg)
14Software and Services Group 14
Semantics / execution model
Itemavail
stepcontrolReady
stepready
stepdataReady
stepexecuted
tagavail
The primitive attributes come from below: available, executed The derived attributes propagate at this level: control_ready, data_ready, ready
2 levels:• Graph level (above)• User serial code level (below)
![Page 15: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/15.jpg)
15Software and Services Group 15
Execution frontier• An execution frontier is a CnC program state:
−The set of attributes of instances of steps, tags and items−The contents of available items
• CnC execution can proceed from a execution frontier
• Some examples of execution frontiers:− Normal program input (set of available items and tags)− Normal program output (set of available items and tags)− Any state during execution (more general)
• Perspective− Traditional focus:
> Data structure is items; computation is step.> step instance consumes and produces items.
− Alternate view: > Data structure is execution frontier; computation is step, subgraph or full program.> Applying a computation to an execution frontier yields another execution frontier.
![Page 16: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/16.jpg)
16Software and Services Group 16
Outline• Abstract
−Semantic state−Checkpoint/restart−Hierarchical CnC −Hierarchical checkpoint/restart
• Actual • Beyond faults
![Page 17: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/17.jpg)
17Software and Services Group 17
Checkpoint/restart summary(abstract)• Changes to the execution frontier are saved continuously as they occur
• Changes are saved in less volatile “place”• Asynchronous, no barriers• No programmer involvement• Saved state may not correspond to an actual state • Can restart from any saved state
![Page 18: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/18.jpg)
18Software and Services Group 18
Outline• Abstract
−Semantic state−Checkpoint/restore−Hierarchical CnC −Hierarchical checkpoint/restart
• Actual • Beyond faults
![Page 19: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/19.jpg)
19Software and Services Group 1919
Cholesky domain spec
TrisolveTag: row, iter
CholeskyTag: iter
UpdateTag: col, row, iter
CONTROL TAG
CONTROL TAG
CONTROL TAG
Cholesky: iter
Trisolve: row, iter
Update: col, row, iter
COMPUTE STEP
COMPUTE STEP
COMPUTE STEP
Array : col, row, iter
DATA ITEM
![Page 20: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/20.jpg)
20Software and Services Group 20
Looks like a CnC spec at each level
<iterTag: iter>CONTROL TAG
COMPUTE STEP(C: iter)
![Page 21: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/21.jpg)
21Software and Services Group 21
Looks like a CnC spec at each level
iterations<iterTag: iter>CONTROL TAG
COMPUTE STEP(cholesky:)
COMPUTE STEP(C: iter)
COMPUTE STEP(TU:)
![Page 22: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/22.jpg)
22Software and Services Group 22
Looks like a CnC spec at each level
<iterTag: iter>CONTROL TAG
COMPUTE STEP(C: iter)
COMPUTE STEP(U:)
COMPUTE STEP(trisolve)
<rowTag: row>CONTROL TAG
COMPUTE STEP(cholesky:)
COMPUTE STEP(TU:)
![Page 23: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/23.jpg)
23Software and Services Group 23
get…get…… = .. + … *… /… = …if …put
Executed semantics: leafCOMPUTE STEP(trisolve: row)
Executed is a primitive attribute. It comes from below. - Leaf : termination of the serial code below
![Page 24: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/24.jpg)
24Software and Services Group 24
Executed semantics: non-leaf
COMPUTE STEP(U:)
COMPUTE STEP(trisolve)
<rowTag: row>CONTROL TAG
COMPUTE STEP(TU:)
Executed is a primitive attribute. It comes from below. - Leaf : termination of the serial code below- non-leaf: termination of the subgraph below
![Page 25: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/25.jpg)
25Software and Services Group 25
Hierarchical CnC application: execution is at the leaves only
Cholesky
trisolve
update
![Page 26: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/26.jpg)
26Software and Services Group 26
Hierarchical CnC application: intermediate nodes maintain state
State of each iteration
State of each row
![Page 27: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/27.jpg)
27Software and Services Group 27
Hierarchical view of the abstract platform tree
A node looks like a full machine at each level:a subtree of the memory hierarchy + the associated set of cores
Hierarchical platform node
![Page 28: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/28.jpg)
28Software and Services Group 28
Abstract platform:Depth and extent of platform hierarchy corresponds exactly
to the depth and extent of the dynamic application
The mapping is direct
![Page 29: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/29.jpg)
29Software and Services Group 29
Outline• Abstract
−Semantic state−Checkpoint/restore−Hierarchical CnC −Hierarchical checkpoint/restart
• Actual • Beyond faults
![Page 30: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/30.jpg)
30Software and Services Group 30
Hierarchical checkpoint / restart(abstract)
Hierarchical application node
![Page 31: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/31.jpg)
31Software and Services Group 31
Hierarchical checkpoint/restart(abstract)
Checkpoint for that application node
Hierarchical application node
![Page 32: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/32.jpg)
32Software and Services Group 32
Hierarchical checkpoint/restart(abstract)
Checkpoint for that application node
resides at the parent place
Hierarchical application node
![Page 33: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/33.jpg)
33Software and Services Group 33
Hierarchical checkpoint/restart(abstract)
Checkpoint for that application node
resides at the parent place
Hierarchical application node
Distinct checkpoints residing at a single place remain separate.
We will see why later.
![Page 34: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/34.jpg)
34Software and Services Group 34
Abstract failure model
• The system knows if/when a node fails − We’re not talking about soft errors
• Abstract platform node fails temporarily then returns
![Page 35: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/35.jpg)
35Software and Services Group 35
Hierarchical checkpoint/restart(abstract)
1-level Checkpoint• Fault • Fullstop• Restart
![Page 36: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/36.jpg)
36Software and Services Group 36
Hierarchical checkpoint/restart(abstract)
1-level Checkpoint• Fault • Fullstop• Restart
![Page 37: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/37.jpg)
37Software and Services Group 37
Hierarchical checkpoint/restart(abstract)
1-level Checkpoint• Fault • Fullstop• Restart
![Page 38: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/38.jpg)
38Software and Services Group 38
Hierarchical checkpoint/restart(abstract)
1-level Checkpoint• Fault • Fullstop• Restart
![Page 39: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/39.jpg)
39Software and Services Group 39
Hierarchical checkpoint/restart(abstract)
Checkpoint in hierarchy• Fault • Fullstop• Restart
![Page 40: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/40.jpg)
40Software and Services Group 40
Hierarchical checkpoint/restart(abstract)
Checkpoint in hierarchy• Fault • Fullstop• Restart
![Page 41: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/41.jpg)
41Software and Services Group 41
Hierarchical checkpoint/restart(abstract)
Checkpoint in hierarchy• Fault • Fullstop• Restart
![Page 42: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/42.jpg)
42Software and Services Group 42
Hierarchical checkpoint/restart(abstract)
Checkpoint in hierarchy• Fault • Fullstop• Restart
![Page 43: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/43.jpg)
43Software and Services Group 43
Hierarchical checkpoint/restart(abstract)
Checkpoint in hierarchy• Fault • Fullstop• Restart
From above: step simply looks like it took longer than expected.
Checkpoint/fullstop at one node looks like checkpoint/continue for the whole program
![Page 44: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/44.jpg)
44Software and Services Group 44
Hierarchical checkpoint/restart:Summary
• Each node in a hierarchy has all the characteristics of a whole program checkpoint.
• Checkpoint/fullstop/restart at nodes in the hierarchy enables the application as a whole to adapt and continue through faults.
![Page 45: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/45.jpg)
45Software and Services Group 45
Outline• Abstract • Actual: with resources and resource constraints
−Semantic state−Checkpoint/restore−Hierarchical CnC −Hierarchical checkpoint/restart
• Beyond faults
![Page 46: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/46.jpg)
46Software and Services Group 46
Semantic state for execution(limited memory)
• Checkpointed information leaves the trailing edge of the execution frontier−Dead tags−Dead items−Dead stepsThis is the motivation for the term “execution frontier” as opposed to “execution state”. It’s only the relevant frontier of the state.
• Dead is a derived attribute. It doesn’t propagate up from the children. It is derived independently within each (sub)program.
![Page 47: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/47.jpg)
47Software and Services Group 47
Hierarchical CnC map to actual platformplatform: limited depth / limited extent at each level
Platform hierarchy
Application hierarchy
![Page 48: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/48.jpg)
48Software and Services Group 48
Hierarchical CnC map to actual platformflatten the depth
Platform hierarchy
Application hierarchy
![Page 49: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/49.jpg)
49Software and Services Group 49
Hierarchical CnC map to actual platformfold extent
Platform hierarchy
Application hierarchy
![Page 50: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/50.jpg)
50Software and Services Group 50
Actual failure model
• Platform node fails and may not return − or don’t want to wait until it returns
• Restart is at some other platform node
![Page 51: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/51.jpg)
51Software and Services Group 51
Remapping
A B
Map:
![Page 52: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/52.jpg)
52Software and Services Group 52
Remapping
A B
A B
Map:
![Page 53: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/53.jpg)
53Software and Services Group 53
Remapping
X
A BY
A B
Map: Original checkpoint of B is at XNew checkpoint of B is at YFollows the new platform location
A B
A B
![Page 54: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/54.jpg)
54Software and Services Group 54
Remapping
X
A BY
A B
Map: Original checkpoint of B is at XNew checkpoint of B is at YFollows the new platform location
A B
A B
This is why we don’t want to merge checkpoints of the application children at the platform parent.
We may want to relocate each child independently.
![Page 55: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/55.jpg)
55Software and Services Group 55
What do we have?
• A way of maintaining the execution frontier of −A running application−A running subgraph of an application
• A mechanism for taking an execution frontier and moving it−To another place−To a later time
• Use of this to cope with faults
![Page 56: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/56.jpg)
56Software and Services Group 56
Outline• Abstract • Actual: with resources and resource constraints• Beyond faults
![Page 57: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/57.jpg)
57Software and Services Group 57
Adaptive execution• If we can checkpoint and continue elsewhere on a fault, we
can checkpoint and continue elsewhere for our own reasons. Big relevant exascale issues:−Resilience• Actual/predicted failures
−Power management−Self-aware computing−Changes in goals
• Mechanism not policy!• Status:
−No staffing or funding yet.
![Page 58: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/58.jpg)
58Software and Services Group 58
Other uses of execution frontiers
• Mechanism for connecting reusable components• Low priority app
− Execute/checkpoint/restart one step at a time − Stop mid-step when high priority work arrives
• Long-lived app with very slowly arriving input − e.g., phylogenetic tree for SARS virus
• Debugging− View state− Reverse time (undo)
• Soft-errors−Compute more than once. Compare
• Something like out-of-core computation but not baked into application
![Page 59: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/59.jpg)
59Software and Services Group 59
Potential: Forms & operationsForms • As executing
− general, arrays, trees…• Serialized• Streaming• Encrypted• Compressed• Database • Excel • Human readable
Operations • Save/restore• Partition/specialize
−At fork into distinct large subgraphs
• Merge −At join of distinct large subgraphs
• Send • Compare (e.g., for fault
tolerance)• Explicitly modify (e.g., debug)• Rename collections (e.g., for
composition
![Page 60: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/60.jpg)
60Software and Services Group 60
Relook at motivation: Highly adaptive computing for exascale
Critical exascale issues:require the ability to move currently executing parts of the app to another place in the platform or to a later time.
• Resilience−Fragile components−Lots of them
• Power management−Power components off−Power components down
• Self-aware computing−Modify mapping based on feedback
• Change of goals−Between power and time to solution, for example
Looking forward to:• Lowering the design• Implementation• Experimenting
Looking for feedback and collaborators
![Page 61: 1 Software and Services Group 1 Execution Frontiers CnC support for highly adaptive execution Kath Knobe Intel 12/07/12.](https://reader036.fdocuments.us/reader036/viewer/2022081513/5a4d1b7b7f8b9ab0599b9012/html5/thumbnails/61.jpg)
61Software and Services Group 61