Static Translation of Stream Programs

27
Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney

description

Static Translation of Stream Programs. S. M. Farhad School of Information Technology The University of Sydney. Parallel Programming Gap. Uniprocessor hits the physical limit Multicore, many core systems Programming challenges now Multiple flows of control and memories - PowerPoint PPT Presentation

Transcript of Static Translation of Stream Programs

Static Translation of Stream Programs

S. M. Farhad

School of Information Technology

The University of Sydney

Parallel Programming Gap

Uniprocessor hits the physical limit Multicore, many core systems

Programming challenges now Multiple flows of control and memories Finding algorithm that can run in parallel Correctness of the program Synchronization Issues Debugging the program

2

3

Stream Programming Paradigm General purpose programming

paradigm Suitable for high performance

applications Inherent parallelism Two types of elements in a program:

Actors: computation elements Streams: channels to ship data

Stream graph: composition of actors and streams

Input and output: stream

Actor

Input channels

Output channels

StreamIt Language

StreamIt: stream programming language

Architecture independent

Modular

parallel computation

may be any StreamIt language construct

joinersplitter

pipeline

feedback loop

joiner splitter

splitjoin

filter

4

5

A Simple Filter in StreamIt

int->int filter A {

init { // empty }

work pop 1 peek 2 push 1 {

push(peek(0) + peek(1));

pop();

}

}

A

StreamIt Example

float->float pipeline Example() {add A();add B();add splitjoin {

split duplicate;add D();add E();join roundrobin;

}add G();add H();

}

6

E

F

A

B

C

split

join

D

z

1

1 1

1

1 1

1

1

Research Question?

7

G

H

A

B

D

split

join

E

z

1

1 1

1

1 1

1

1

?

Parallel System

Proc-1

Proc-2

Proc-N

88

Static Translation of Stream Programs We propose

Novel steady state analysis describing the output bandwidth of an actor depending on the input of the stream program

Quantitative analysis to resolve bottlenecks in stream programs

Algorithm for mapping actors to parallel system Scheduling mapped actors to processor

Our goal To statically optimize the throughput of a stream

program on a parallel system

99

Overview of the Proposed System

Finding a Closed Form for Steady State

Resolve Bottleneck

Find Mapping by ILP or Approximation

Schedule Actors for Processors

Bandwidth Transfer Model

Synchronous model Fundamental assumption:

Actors deliver constant output bandwidth given a constant input bandwidth

Output bandwidth of actor is normalized (scaled btw. 0..1)

Bandwidth of channel (i,j): output bandwidth of actor i multiplied with weight wij

10

i

j

wijxi

n

kkkiii xwx

1

k

11

n

jijw

)(11 zx

Finding a Closed Form for Steady State

Simultaneous Linear Functional Equations System Bandwidth transfer functions:

Resulting in a simultaneous functional equation system:

11

i

n

kkkiii xwx

1

111 zx

iii xx :

Finding a Closed Form for Steady State

Solving Equation System

Solve by Gaussian-Style equation solver Bandwidth variables are successively eliminated

Substitution Loop-breaking

Solution of the system is a linear function for each actor

, which does not depend on the predecessors depends only on the input bandwidth “z”

ii zz

Finding a Closed Form for Steady State 12

13

Mapping Actors to Processors An NP-hard problem

Takes too long time Approximation algorithm

Much faster O(n log n) Non-optimal solution

We formulate Integer Linear Programming problem considering Input, output and processing

bandwidths of each actor Processing capacity of each

processor Inter processor communication

bandwidths

P1

P2

P3G

H

A

B

D

split

join

E

z

1.0

0.5 0.5

1.0

1.0 1.0

1.0

1.0

Find Mapping by ILP or Approximation

Find Optimal Solution by Binary Search Developing a test for a

given input bandwidth “z” of stream program

Binary Search If solution is not feasible at

the mid point, new upper bound for z

If solution is feasible at the mid point, new lower bound for z

0

zsys

1

Solution space

mid

left

right

14Find Mapping by ILP or

Approximation

Test for a given z: Simple Model is a 0-1 binary variable 1 ≤ i ≤ n, 1 ≤ j ≤ p

for all i, 1 ≤ i ≤ n …… (1)

for all j, 1 ≤ j ≤ p …… (2)

where

15

p

jijy

1

1

ijy

n

iiji yU

1

1

n

jjjiii zwU

1

* )(

Find Mapping by ILP or Approximation

Bottleneck Analysis

Constraints limiting input bandwidth of the whole program Input, output and processing

bandwidth constraints

Bottleneck-free if SystemBW < ActorBW

If SystemBW > ActorBW then there is a bottleneck in the system

16Resolve Bottleneck

0

SystemBW

1

< ActorBW

Region Duplication

17Resolve Bottleneck

1818

Overview of the Proposed System

Finding a Closed Form for Steady State

Resolve Bottleneck

Find Mapping by ILP or Approximation

Schedule Actors for Processors

19

Example Program

Mapping to Processors

20

21

Execution Time of CPLEX

Execution Time of an Approx. Algorithm

22

2323

Optimal vs. Approximation

Summary

Synchronous model for stream programs Novel steady state model Statically optimize the throughput of stream

programs Resolving bottleneck by simple quantitative

analysis Finding an approximation for the mapping

problem

24

Related Works

[1] Static Scheduling of SDF Programs for DSP [Lee ‘87]

[2] StreamIt: A language for streaming applications [Thies ‘02]

[3] Phased Scheduling of Stream Programs [Thies ’03]

[4] Exploiting Coarse Grained Task, Data, and Pipeline Parallelism in

Stream Programs [Thies ‘06]

[5] Orchestrating the Execution of Stream Programs on Cell [Scott ’08]

[6] Software Pipelined Execution of Stream Programs on GPUs

[Udupa‘09]

[7] Synergistic Execution of Stream Programs on Multicores with

Accelerators [Udupa ‘09]

25

Future Plan to Complete

Year Duration Milestones

2009 2 months Prepare for publication: Solving SLFE system, ILP model, Approx. Algorithm, Bottleneck resolving technique

2010

3 monthsDesign approximation/heuristic algorithm for actor placement with communication constraints between processing elements

4 months Experiment on different benchmarks and compare the results

3 months Investigate new scheduling techniques for stream programs

2 months Further publications covering both topics (at least 2)

2011

4 months Design a small stream programming language that uses our proposed technique

6 months Implement the small stream programming language that uses our proposed technique

2 months Further publications

2012 3 months Write up PhD thesis and further publication

26

Questions?

27