Static Translation of Stream Programs S. M. Farhad School of Information Technology The University...

27
Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney

Transcript of Static Translation of Stream Programs S. M. Farhad School of Information Technology The University...

Page 1: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Static Translation of Stream Programs

S. M. Farhad

School of Information Technology

The University of Sydney

Page 2: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Parallel Programming Gap

Uniprocessor hits the physical limit Multicore, many core systems

Programming challenges now Multiple flows of control and memories Finding algorithm that can run in parallel Correctness of the program Synchronization Issues Debugging the program

2

Page 3: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

3

Stream Programming Paradigm General purpose programming

paradigm Suitable for high performance

applications Inherent parallelism Two types of elements in a program:

Actors: computation elements Streams: channels to ship data

Stream graph: composition of actors and streams

Input and output: stream

Actor

Input channels

Output channels

Page 4: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

StreamIt Language

StreamIt: stream programming language

Architecture independent

Modular

parallel computation

may be any StreamIt language construct

joinersplitter

pipeline

feedback loop

joiner splitter

splitjoin

filter

4

Page 5: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

5

A Simple Filter in StreamIt

int->int filter A {

init { // empty }

work pop 1 peek 2 push 1 {

push(peek(0) + peek(1));

pop();

}

}

A

Page 6: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

StreamIt Example

float->float pipeline Example() {add A();add B();add splitjoin {

split duplicate;add D();add E();join roundrobin;

}add G();add H();

}

6

E

F

A

B

C

split

join

D

z

1

1 1

1

1 1

1

1

Page 7: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Research Question?

7

G

H

A

B

D

split

join

E

z

1

1 1

1

1 1

1

1

?

Parallel System

Proc-1

Proc-2

Proc-N

Page 8: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

88

Static Translation of Stream Programs We propose

Novel steady state analysis describing the output bandwidth of an actor depending on the input of the stream program

Quantitative analysis to resolve bottlenecks in stream programs

Algorithm for mapping actors to parallel system Scheduling mapped actors to processor

Our goal To statically optimize the throughput of a stream

program on a parallel system

Page 9: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

99

Overview of the Proposed System

Finding a Closed Form for Steady State

Resolve Bottleneck

Find Mapping by ILP or Approximation

Schedule Actors for Processors

Page 10: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Bandwidth Transfer Model

Synchronous model Fundamental assumption:

Actors deliver constant output bandwidth given a constant input bandwidth

Output bandwidth of actor is normalized (scaled btw. 0..1)

Bandwidth of channel (i,j): output bandwidth of actor i multiplied with weight wij

10

i

j

wijxi

n

kkkiii xwx

1

k

11

n

jijw

)(11 zx

Finding a Closed Form for Steady State

Page 11: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Simultaneous Linear Functional Equations System Bandwidth transfer functions:

Resulting in a simultaneous functional equation system:

11

i

n

kkkiii xwx

1

111 zx

iii xx :

Finding a Closed Form for Steady State

Page 12: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Solving Equation System

Solve by Gaussian-Style equation solver Bandwidth variables are successively eliminated

Substitution Loop-breaking

Solution of the system is a linear function for each actor

, which does not depend on the predecessors depends only on the input bandwidth “z”

ii zz

Finding a Closed Form for Steady State 12

Page 13: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

13

Mapping Actors to Processors An NP-hard problem

Takes too long time Approximation algorithm

Much faster O(n log n) Non-optimal solution

We formulate Integer Linear Programming problem considering Input, output and processing

bandwidths of each actor Processing capacity of each

processor Inter processor communication

bandwidths

P1

P2

P3G

H

A

B

D

split

join

E

z

1.0

0.5 0.5

1.0

1.0 1.0

1.0

1.0

Find Mapping by ILP or Approximation

Page 14: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Find Optimal Solution by Binary Search Developing a test for a

given input bandwidth “z” of stream program

Binary Search If solution is not feasible at

the mid point, new upper bound for z

If solution is feasible at the mid point, new lower bound for z

0

zsys

1

Solution space

mid

left

right

14Find Mapping by ILP or

Approximation

Page 15: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Test for a given z: Simple Model is a 0-1 binary variable 1 ≤ i ≤ n, 1 ≤ j ≤ p

for all i, 1 ≤ i ≤ n …… (1)

for all j, 1 ≤ j ≤ p …… (2)

where

15

p

jijy

1

1

ijy

n

iiji yU

1

1

n

jjjiii zwU

1

* )(

Find Mapping by ILP or Approximation

Page 16: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Bottleneck Analysis

Constraints limiting input bandwidth of the whole program Input, output and processing

bandwidth constraints

Bottleneck-free if SystemBW < ActorBW

If SystemBW > ActorBW then there is a bottleneck in the system

16Resolve Bottleneck

0

SystemBW

1

< ActorBW

Page 17: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Region Duplication

17Resolve Bottleneck

Page 18: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

1818

Overview of the Proposed System

Finding a Closed Form for Steady State

Resolve Bottleneck

Find Mapping by ILP or Approximation

Schedule Actors for Processors

Page 19: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

19

Example Program

Page 20: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Mapping to Processors

20

Page 21: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

21

Execution Time of CPLEX

Page 22: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Execution Time of an Approx. Algorithm

22

Page 23: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

2323

Optimal vs. Approximation

Page 24: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Summary

Synchronous model for stream programs Novel steady state model Statically optimize the throughput of stream

programs Resolving bottleneck by simple quantitative

analysis Finding an approximation for the mapping

problem

24

Page 25: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Related Works

[1] Static Scheduling of SDF Programs for DSP [Lee ‘87]

[2] StreamIt: A language for streaming applications [Thies ‘02]

[3] Phased Scheduling of Stream Programs [Thies ’03]

[4] Exploiting Coarse Grained Task, Data, and Pipeline Parallelism in

Stream Programs [Thies ‘06]

[5] Orchestrating the Execution of Stream Programs on Cell [Scott ’08]

[6] Software Pipelined Execution of Stream Programs on GPUs

[Udupa‘09]

[7] Synergistic Execution of Stream Programs on Multicores with

Accelerators [Udupa ‘09]

25

Page 26: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Future Plan to Complete

Year Duration Milestones

2009 2 months Prepare for publication: Solving SLFE system, ILP model, Approx. Algorithm, Bottleneck resolving technique

2010

3 monthsDesign approximation/heuristic algorithm for actor placement with communication constraints between processing elements

4 months Experiment on different benchmarks and compare the results

3 months Investigate new scheduling techniques for stream programs

2 months Further publications covering both topics (at least 2)

2011

4 months Design a small stream programming language that uses our proposed technique

6 months Implement the small stream programming language that uses our proposed technique

2 months Further publications

2012 3 months Write up PhD thesis and further publication

26

Page 27: Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.

Questions?

27