CnC for Tuning Hints on OCR - Purdue Engineering · PDF fileCnC for Tuning Hints on OCR ......
Transcript of CnC for Tuning Hints on OCR - Purdue Engineering · PDF fileCnC for Tuning Hints on OCR ......
CnC for Tuning Hints on OCR
Nick Vrvilo, Rice University
The 7th Annual CnC Workshop
September 8, 2015
Acknowledgements
This work was done as part of my internship with the OCR team, part of Intel Federal, LLC at Jones Farm (Hillsboro, OR).
Mentors (Intel): Josh Fryman and Romain Cledat
Habanero Team (Rice): Vivek Sarkar, Kath Knobe, Zoran Budimli, and Sanjay Chatterjee
2
Objective
Demonstrate the effectiveness of OCR tuning hints by way of code generation from a higher-
level programming model (CnC).
3
OCR
Tunings
Objective
CnC-OCR Scaffolding
CnC App Code CnC
Graph
hints
handler
4
Open Community Runtime (OCR)*
OCR project goals: Provide effective abstraction for diverse
hardware Typify future task-based execution models Handle large-scale parallelism efficiently Maintain a separation of concerns
(application/scheduling/resources) Open source (encourage collaboration)
* OCR ==> X-Stack Traleika Glacier projects implementation
5
Outline
Introduction
OCR Hints API
CnC on OCR
Tuning Hints Implementation and Analysis
6
CnC / OCR Concept Mapping
Concept OCR construct CnC construct
Task classes (code) EDT template Step collection
Task instance EDT Step instance
Data classes
All DBs have type void* (keeping track of individual DBs types is the app programmer's
responsibility)
Item collection
Data instance Datablock Item instance
Unique instance identifier GUID Tag (step tag / item key)
Dependence registration Event add dependence Item get
Dependence satisfaction Event satisfy Item put
7
OCR Hints API: Example
// Assume we have a template and a datablock
ocrGuid_t edt;
ocrEdtCreate(&edt, template, 0, NULL, 1, NULL,
EDT_PROP_NONE, NULL_GUID, NULL);
{ // Set an OCR hint
ocrHint_t stepHints;
ocrHintInit(&stepHints, OCR_HINT_EDT_T);
ocrGetHint(edt, &stepHints);
ocrSetHintValue(&stepHints, OCR_HINT_EDT_PRIORITY, 100);
ocrSetHint(edt, &stepHints);
}
ocrAddDependence(datablock, edt, 0, DB_DEFAULT_MODE);
8
OCR Hints API:
Pros Generic
Conceptually decoupled
Light-weight
Cons Verbose
Placed in app source code
Limited expressiveness
9 9
Outline
Introduction
OCR Hints API
CnC on OCR
Tuning Hints Implementation and Analysis
10
CnC-OCR Developer Workflow
Write graph spec
Run translator tool (produces skeleton project)
Flesh-out skeleton code
Run program (functionality check)
debug
Write tuning spec(s)
Re-run translator tool (updates scaffolding code)
Re-run program (performance check)
fine-tuning
11
OCR
Tunings
CnC-OCR + Tuning
CnC-OCR Scaffolding
CnC App Code CnC
Graph
hints
handler
12
Separation of Concerns in CnC
Graph specification can be written without implementation details
Step function implementations written without knowledge of the external graph (only its own inputs and outputs)
Tuning specification given in a separate file Easy to mix-in different tunings for performance
testing Try combinations of tunings until you find the
ideal configuration
13
Outline
Introduction
OCR Hints API
CnC on OCR
Tuning Hints Implementation and Analysis
14
Tuning Hints Overview
1. Step / item distribution
2. Step affinity with input
3. Step priority
4. Scheduler throttling
5. Partial item requests
15
Hint #1: Step / Item Distribution Functions
What? Declare a function for mapping individual step / item instances from a collection onto the set of OCR policy domains.
Why?
Distributed OCR currently lacks advanced schedule/placement heuristics.
Need control of distribution for a reasonable baseline.
16
Smith-Waterman Sequence Alignment
Each input sequence length ~200k
Dynamic programming optimization on ~40-billion cell matrix
Tiles of 177x153 cells
Total of 1138x1322 tiles
17
Smith-Waterman Specification
Graph Specification
[ int above[] : i, j ];
[ int left[] : i, j ];
[ SeqData *data : () ];
( swStep: i, j )
0),
[ left: i, j ] $when(j > 0)
-> [ below @ above: i+1, j ],
[ right @ left: i, j+1 ],
( swStep: i+i, j ) $when(i+1 < #nth);
Tuning Specification
[ above ]: {
distfn: (i / 16) % $RANKS
};
[ left ]: {
distfn: (i / 16) % $RANKS
};
( swStep ): {
distfn: (i / 16) % $RANKS
};
18 18
Smith-Waterman Sequence Alignment
Each input sequence length ~200k
Dynamic programming optimization on ~40-billion cell matrix
Tiles of 177x153 cells
Total of 1138x1322 tiles
Default: CnC default distribution
Row-block: Rows in blocks of 16
10 runs per configuration
19
0
10
20
30
40
50
1 2 4 8
Ave
rage
Exe
cuti
on
Tim
e (
seco
nd
s)
Node Count
CnC-OCR Default CnC-OCR Row-Block
iCnC Row-Block
115.40 141.49
Hint #2: Step Affinity with Input Item
What? Declare that a step instance be affinitized with one of its input items.
Why? OCR can use this affinity to improve scheduling
heuristics.
More expressive way to specify tunings like hint #1.
20
Smith-Waterman Specification
Graph Specification
[ int above[] : i, j ];
[ int left[] : i, j ];
[ SeqData *data : () ];
( swStep: i, j )
0),
[ left: i, j ] $when(j > 0)
-> [ below @ above: i+1, j ],
[ right @ left: i, j+1 ],
( swStep: i+i, j ) $when(i+1 < #nth);
Tuning Specification
[ above ]: {
distfn: (i / 16) % $RANKS
};
[ left ]: {
distfn: (i / 16) % $RANKS
};
( swStep ): {
placeWith: above
};
21 21
Hint #3: Step Priority Weights
What? Express a priority weight for a given CnC step, such that steps with heavier weights should execute earlier.
Why? Search problems: prioritize paths likely to find the
answer sooner
Enable concurrency: prefer task with high-demand output (many consumers)
22
N-Queens Puzzle
Board size: 13x13
Solutions possible: 73,312
23
N-Queens Specification
Graph: [ u64 solutions[4]: i ];
( placeQueen: row, board )
-> ( placeQueen: row+1, board_prime ),
[ solutions: ? ];
Tuning: ( placeQueen /* row, board */ ): {
priority: row
};
24
Implementation of Step Priority Weights
Description Default Scheduler
Priority Scheduler
Location
Base data structure deque bin-heap utils/
Scheduler interface wrapper
deque bin-heap scheduler-object/
Scheduler (aggregate) root object
wst pr-wsh scheduler-object/
Scheduler heuristic behavior
hc priority scheduler-heuristic/
25
N-Queens Puzzle
Board size: 13x13
Solutions possible: 73,312
Solutions sought: 5,000
DEQ: Default work-stealing deque
DFS: Prioritize deep rows
BFS: Prioritize shallow rows
50 runs per configuration
0
1
2
3
4
DEQ DFS BFS
Ave
rage
exe
cuti
on
tim
e (
seco
nd
s)
26
Hint #4: Stoker Step (Scheduler Throttling)
What? Annotate the work-creating steps (which we call stokers) so that the runtime can differentiate them from non-work-creating steps (which we call quenchers).
Why? If the scheduler has plenty of work to do, we can throttle
by not running any more stoker steps for the time being. For work stealing, we can prioritized stoker-steps for
stealing, mitigates the need for more stealing in the near-term.
27
Task-Bomb (Synthetic Example)
Root step creates Z=32 stoker steps
Each stoker creates
Y=100 quencher tasks
One stoker task
Recursion creates X=200 levels
Since the stoker is always created last, we would expect all of the stokers to run in a depth-first manner when using the standard work-stealing deque scheduler
$initialize
stoker(0,0)
quencher(0,0,0)
quencher(0,0,Y)
stoker(0,1)
quencher(0,1,0)
quencher(0,1,Y)
stoker(0,2)
stoker(Z,0)
quencher(Z,0,0)
quencher(Z,0,Y)
stoker(Z,1)
28
Task-Bomb CnC Graph Spec
[ void *done: () ];
( stoker: i, j )
-> ( quencher: i, j, $rangeTo(Y) ),
( stoker: i, j+1 ) $when(j [ done: () ] $when(i==0 && j==X && k==Y);
( $initialize: () ) -> ( stoker: $range(Z),