Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  ·...

34
Turning Control Flow Graphs into Callgraphs Transforma5on of par55oned codes for execu5on in heterogeneous architectures PABLO BARRIO TOBIAS KENTER CARLOS CARRERAS CHRISTIAN PLESSL ROBERTO SIERRA

Transcript of Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  ·...

Page 1: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Turning  Control  Flow  Graphs  into  Callgraphs  

 Transforma5on  of  par55oned  codes  for  execu5on  in  heterogeneous  architectures  

PABLO  BARRIO  TOBIAS  KENTER  

CARLOS  CARRERAS  CHRISTIAN  PLESSL  ROBERTO  SIERRA  

Page 2: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Outline

2  Turning  CFGs  into  callgraphs  

1.  Heterogeneous  High  Performance  Compu;ng  

 2.  Compila;on  toolchain    

3.  Code  refactoring  for  execu;on  in  heterogeneous  pla?orms  

Page 3: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Outline

3  Turning  CFGs  into  callgraphs  

1.   Heterogeneous  High  Performance  Compu5ng  

 2.  Compila;on  toolchain    

3.  Code  refactoring  for  execu;on  in  heterogeneous  pla?orms  

Page 4: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

High Performance Computing & Embedded Systems

4  Turning  CFGs  into  callgraphs  

but  geBng  closer  every  day…  

Embedded   HPC  

Type  of  processors   Heterogeneous   Homogeneous  

Size   Small   Massive  

Memory   Shared   Distributed  

Page 5: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Objectives

5  Turning  CFGs  into  callgraphs  

•   A  code  par55oner  for  heterogeneous  architectures.    •   Easy  to  add  models  for  new  devices  and  architectures.  

•   Par;;oning  based  on  soLware  and  hardware  characteris;cs.    •   Communica;ons  generated  for  distributed  memory  systems.    •   Automa5c  paralleliza5on,  both  func;onal  and  data  parallel.  

Page 6: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

The solution under research

6  Turning  CFGs  into  callgraphs  

FRONT  END  

op;miza;on  passes  

BACK  END  

C,  C++,  Fortran…    

LLVM  IR    

LLVM  IR    

asm,  VHDL…    

Profiling    Es;ma;on    Par;;oning  

&  Mapping  

 Code  refactoring  

 

Page 7: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Outline

7  Turning  CFGs  into  callgraphs  

1.  Heterogeneous  High  Performance  Compu;ng  

 2.   Compila5on  toolchain    

3.  Code  refactoring  for  execu;on  in  heterogeneous  pla?orms  

Page 8: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

LLVM-based compilation toolchain

8  Turning  CFGs  into  callgraphs  

Module  1  

Module  N  

.  

.  

.  

Es;ma;on  

Linked  module  

Front  ends  

opt  &  link  

Profiling?   lli  

Profile  info  

RSD  file  Par;;oning  &  mapping  

Code  refactoring  

Module  1  

Module  M  

.  

.  

.  

Back  end  1  

Back  end  M  

Exe1  

ExeM  

.  

.  

.  

.  

.  

.  

yes  

no  

Data  in  

Page 9: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Partitioning & Mapping

9  Turning  CFGs  into  callgraphs  

[PartitioningPass] PARTITIONING OVERVIEW: Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00

[PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU 4 --> CPU beforeHeader --> CPU 5 --> CPU 6 --> CPU 7 --> CPU 8 --> CPU_SIMD 9 --> CPU_SIMD 11 --> CPU_SIMD 12 --> CPU_SIMD 13 --> CPU 14 --> CPU afterHeader --> CPU

...

Page 10: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Outline

10  Turning  CFGs  into  callgraphs  

1.  Heterogeneous  High  Performance  Compu;ng  

 2.  Compila;on  toolchain    

3.   Code  refactoring  for  execu5on  in  heterogeneous  plaQorms  

Page 11: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Function-based control flow

11  Turning  CFGs  into  callgraphs  

BB_A: ... jmp BB_B

BB_B: ... jne A, C

BB_C: ...

BB_A: ... call B() ret

BB_B: ... jne callA, callC

BB_C: ...

callA: ... call A() ret

callA: ... call A() ret

callC: ... call C() ret

A()

B()

C()

Page 12: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Refactoring methodology

12  Turning  CFGs  into  callgraphs  

duplicate  constants    distribute  globals    for  every  original  func;on  f    

 ini;atorList  ←  find  ini;ators(f)    

 create  new  func;ons(f,  ini;atorList)    

 fix  branches(ini;atorList)    

 fix  phi  nodes(ini;atorList)  

Page 13: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Refactoring methodology

13  Turning  CFGs  into  callgraphs  

duplicate  constants    distribute  globals    for  every  original  func;on  f    

 ini5atorList  ←  find  ini5ators(f)    

 create  new  func;ons(f,  ini;atorList)    

 fix  branches(ini;atorList)    

 fix  phi  nodes(ini;atorList)  

Page 14: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Initiator list ← find initiators(f)

14  Turning  CFGs  into  callgraphs  

Par;;oning  result   Ini;ators   Resul;ng  func;ons  

entry    T      F  

if.end    T      F  

for.cond.preheader  

for.cond18.loopexit  

for.body      T              F  

if.then  

for.body20          T              F  

for.end26  

return  

entry    T      F  

if.end    T      F  

for.cond.preheader  

for.cond18.loopexit  

for.body      T              F  

if.then  

for.body20          T              F  

for.end26  

return  

entry    T      F  

if.end    T      F  

for.cond.preheader  

for.cond18.loopexit  

for.body      T              F  

if.then  

for.body20          T              F  

for.end26  

return  

Page 15: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Refactoring methodology

15  Turning  CFGs  into  callgraphs  

duplicate  constants    distribute  globals    for  every  original  func;on  f    

 ini;atorList  ←  find  ini;ators(f)    

 create  new  func5ons(f,  ini5atorList)    

 fix  branches(ini;atorList)    

 fix  phi  nodes(ini;atorList)  

Page 16: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

BB_A: %3 = add i32 %2, 6 jmp BB_B

BB_B: %4 = mul i32 %3, %3 jne BB_A, BB_C

BB_C: ret call i32 @puts(%num)

DEV  1  

DEV  2  

i32 f(i8* %num)

create new functions (f, initiatorList)

16  Turning  CFGs  into  callgraphs  

MODULE  1   MODULE  2  

declare i32 @puts(i8*)

Declare  used  func;ons  in  the  

des;na;on  module  

Page 17: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

BB_A: %3 = add i32 %2, 6 jmp BB_B

BB_B: %4 = mul i32 %3, %3 jne BB_A, BB_C

BB_C: ret call i32 @puts(%num)

DEV  1  

DEV  2  

i32 f(i8* %num)

Splitting functions

17  Turning  CFGs  into  callgraphs  

MODULE  1   MODULE  2  

declare i32 @puts(i8*)

i32 f2(i8* %arg1, i32 %arg2)

Create  new  func;on  prototype  

Page 18: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

BB_A: %3 = add i32 %2, 6 jmp BB_B

BB_B: %4 = mul i32 %3, %3 jne BB_A, BB_C

BB_C: ret call i32 @puts(%num)

i32 f(i8* %num)

create new functions (f, initiatorList)

18  Turning  CFGs  into  callgraphs  

MODULE  1   MODULE  2  

declare i32 @puts(i8*)

i32 f2(i8* %arg1, i32 %arg2)

Move  Basic  Blocks  

Page 19: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

BB_A: %3 = add i32 %2, 6 jmp BB_B

BB_B: %4 = mul i32 %arg2, %arg2 jne BB_A, BB_C

BB_C: ret call i32 @puts(%arg1)

i32 f(i8* %num)

create new functions (f, initiatorList)

19  Turning  CFGs  into  callgraphs  

MODULE  1   MODULE  2  

declare i32 @puts(i8*)

i32 f2(i8* %arg1, i32 %arg2)

Fix  argument  uses  

Page 20: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Refactoring methodology

20  Turning  CFGs  into  callgraphs  

duplicate  constants    distribute  globals    for  every  original  func;on  f    

 ini;atorList  ←  find  ini;ators(f)    

 create  new  func;ons(f,  ini;atorList)    

 fix  branches(ini5atorList)    

 fix  phi  nodes(ini;atorList)  

Page 21: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

BB_A: %3 = add i32 %2, 6 %r = call i32 f2(%num, %3) ret %r

BB_B: %4 = mul i32 %arg2, %arg2 jne fcaller, BB_C

BB_C: ret call i32 @puts(%arg1)

i32 f(i8* %num)

fix branches (initiatorList)

21  Turning  CFGs  into  callgraphs  

MODULE  1   MODULE  2  

declare i32 @puts(i8*)

i32 f2(i8* %arg1, i32 %arg2)

Replace  old  branches  by  func;on  calls  

fcaller: %r = call i32 f(%num, %3) ret %r

Page 22: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Refactoring methodology

22  Turning  CFGs  into  callgraphs  

duplicate  constants    distribute  globals    for  every  original  func;on  f    

 ini;atorList  ←  find  ini;ators(f)    

 create  new  func;ons(f,  ini;atorList)    

 fix  branches(ini;atorList)    

 fix  phi  nodes(ini5atorList)  

Page 23: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

STACK  B  

BB_A: ... jmp BB_B

BB_B: ... jne A, C

BB_C: ...

BB_A: ... call B() ret

BB_B: ... jne callA, callC

BB_C: ...

callA: ... call A() ret

callC: ... call C() ret

A()

B()

C()

Loops generate recursive calls

23  Turning  CFGs  into  callgraphs  

STACK  A  

stack  limit  

.  .   .  

.  .   .  vars  

ret  

ret  

vars  

vars  

ret  

ret  

vars  

vars  

ret  

ret  

vars  

vars  

ret  

ret  

vars  

vars  

ret  

ret  

vars  

vars  

ret  

Page 24: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Fixing loop recursion: a loop pass

24  Turning  CFGs  into  callgraphs  

header: %3 = add i32 %2, 6 br label %latch

latch: %4 = mul i32 %3, %3 %cond = icmp ne %4, 0 br i1 %cond, label %header, label %exit

exit: ret call i32 @puts(%num)

header: %cond = load i1* %cmpRes br i1 %cond, label %postheader, label %”exit”

latch: %4 = mul i32 %3, %3 %cond = icmp ne %4, 0 store i1 %cond, i1* cmpRes br label %header

exit: ret call i32 @puts(%num)

preheader: %cmpRes = alloca i1 store i1 true, i1* %cmpRes br label %header

postheader: %3 = add i32 %2, 6 jmp label %latch

Page 25: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Fixing loop recursion: final code refactoring

25  Turning  CFGs  into  callgraphs  

header: %cond = load i1* %cmpRes br i1 %cond, label %postheader, label %”exit”

latch: %4 = mul i32 %3, %3 %cond = icmp ne %4, 0 store i1 %cond, i1* cmpRes br label %header

exit: ret call i32 @puts(%num)

preheader: %cmpRes = alloca i1 store i1 true, i1* %cmpRes br label %header

postheader: %3 = add i32 %2, 6 jmp label %latch

DEV  1  

DEV  2  

header: %cond = load i1* %cmpRes br i1 %cond, label %postheader, label %”cal”

latch: %4 = mul i32 %3, %3 %cond = icmp ne %4, 0 store i1 %cond, i1* cmpRes ret

exit: ret call i32 @puts(%num)

preheader: %cmpRes = alloca i1 store i1 true, i1* %cmpRes br label %header

postheader: %3 = add i32 %2, 6 call latch() br label %header

f()

latch()

exit()

cal: call exit()

Page 26: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Output from the tool

26  Turning  CFGs  into  callgraphs  

Time profiling hello.ir [HPCmap] Parsing module hello.ir... [ReadArchPass] Parsing architecture ../architectures/CPU_SIMD.arch... [EstimationPass] Estimating from profiling information... [PartitioningPass] PARTITIONING OVERVIEW: [PartitioningPass] Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00 [LoopRecursionBreakPass] Analyzing loop 5 <-> 12 [PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original module's functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU

... PartitionWriterPass::find_initiators() -- Inspecting function main()

Trivial initiators: 5 8 Entry block initiator: entry Nontrivial initiators: 14

... PartitionWriterPass::create_new_Fs() -- Splitting up function main

Function main1_CPU inserted in module CPU.part Moving BB 14 from function main to function main1_CPU

... PartitionWriterPass::branches_to_fcalls() -- Fixing branches:

to BB entry, moved to function main to BB 14, moved to function main1_CPU

PartitionWriterPass::fix_initiator_phis() -- Initiators: main2_CPU::5 2 phis updated

[PartitionWriterPass] Module CPU.part generated [PartitionWriterPass] Module CPU_SIMD.part generated Partitioned hello.ir

Page 27: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Preliminary results

27  Turning  CFGs  into  callgraphs  

0

1

2

3

4

5

ctrl arrayops adi floyd-warshall

Exec

utio

n tim

e (s

)original

partitionedvectorized

part + full vectpart + semi vect

Page 28: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Conclusions

28  Turning  CFGs  into  callgraphs  

•   Compila;on  toolchain  for  heterogeneous  architectures  

•   Code  refactoring  based  on  spliBng  func;ons  into  smaller  ones.  

•   Removed  recursion  generated  by  loops  being  transformed  into  func;ons.  

•   The  func;on  call  approach  does  not  introduce  a  significant  overhead  so  far.  

Page 29: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Work in progress…

29  Turning  CFGs  into  callgraphs  

IN  THE  REFACTORING  PASS    

 Execute  in  a  real  architecture  (one  executable  per  device)    Distributed  memory    Automa;c  communica;ons  

   IN  THE  COMPLETE  TOOLCHAIN    

 Iden;fica;on  of  parallelism    Data  par;;oning    Improve  es;ma;on,  par;;oning  heuris;cs,  profiling…  

Page 30: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Time profiling hello.ir [HPCmap] Parsing module hello.ir... [ReadArchPass] Parsing architecture ../architectures/CPU_SIMD.arch... [EstimationPass] Estimating from profiling information... [PartitioningPass] Partitioning... [PartitioningPass] PARTITIONING OVERVIEW: [PartitioningPass] Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00 [LoopRecursionBreakPass] Analyzing loop 9 <-> 9 [LoopRecursionBreakPass] DONE [LoopRecursionBreakPass] Analyzing loop 6 <-> 6 [LoopRecursionBreakPass] DONE [PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original module's functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU beforeHeader --> CPU 8 --> CPU_SIMD 9 --> CPU_SIMD 13 --> CPU afterHeader --> CPU puts with BBs:

PartitionWriterPass::find_initiators() -- Inspecting function main() Trivial initiators: 5 11 Entry block initiator: entry Nontrivial initiators: 14 Results: entry has initiator entry beforeHeader has initiator entry 5 has initiator 5 11 has initiator 11 12 has initiator 11

[PartitionWriterPass] Module CPU.part generated [PartitionWriterPass] Module CPU_SIMD.part generated Partitioned hello.ir

Turning  CFGs  into  callgraphs  

Page 31: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Time profiling hello.ir [HPCmap] Parsing module hello.ir... [ReadArchPass] Parsing architecture ../architectures/CPU_SIMD.arch... [EstimationPass] Estimating from profiling information... [PartitioningPass] Partitioning... [PartitioningPass] PARTITIONING OVERVIEW: [PartitioningPass] Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00 [LoopRecursionBreakPass] Analyzing loop 9 <-> 9 [LoopRecursionBreakPass] DONE [LoopRecursionBreakPass] Analyzing loop 6 <-> 6 [LoopRecursionBreakPass] DONE [PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original module's functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU beforeHeader --> CPU 8 --> CPU_SIMD 9 --> CPU_SIMD 13 --> CPU afterHeader --> CPU puts with BBs:

PartitionWriterPass::find_initiators() -- Inspecting function main() Trivial initiators: 5 11 Entry block initiator: entry Nontrivial initiators: 14 Results: entry has initiator entry beforeHeader has initiator entry 5 has initiator 5 11 has initiator 11 12 has initiator 11

[PartitionWriterPass] Module CPU.part generated [PartitionWriterPass] Module CPU_SIMD.part generated Partitioned hello.ir

Turning  CFGs  into  callgraphs  

Page 32: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Time profiling hello.ir [HPCmap] Parsing module hello.ir... [ReadArchPass] Parsing architecture ../architectures/CPU_SIMD.arch... [EstimationPass] Estimating from profiling information... [PartitioningPass] Partitioning... [PartitioningPass] PARTITIONING OVERVIEW: [PartitioningPass] Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00 [LoopRecursionBreakPass] Analyzing loop 9 <-> 9 [LoopRecursionBreakPass] DONE [LoopRecursionBreakPass] Analyzing loop 6 <-> 6 [LoopRecursionBreakPass] DONE [PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original module's functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU beforeHeader --> CPU 8 --> CPU_SIMD 9 --> CPU_SIMD 13 --> CPU afterHeader --> CPU puts with BBs:

PartitionWriterPass::find_initiators() -- Inspecting function main() Trivial initiators: 5 11 Entry block initiator: entry Nontrivial initiators: 14 Results: entry has initiator entry beforeHeader has initiator entry 5 has initiator 5 11 has initiator 11 12 has initiator 11

[PartitionWriterPass] Module CPU.part generated [PartitionWriterPass] Module CPU_SIMD.part generated Partitioned hello.ir

Turning  CFGs  into  callgraphs  

Page 33: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Time profiling hello.ir [HPCmap] Parsing module hello.ir... [ReadArchPass] Parsing architecture ../architectures/CPU_SIMD.arch... [EstimationPass] Estimating from profiling information... [PartitioningPass] Partitioning... [PartitioningPass] PARTITIONING OVERVIEW: [PartitioningPass] Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00 [LoopRecursionBreakPass] Analyzing loop 9 <-> 9 [LoopRecursionBreakPass] DONE [LoopRecursionBreakPass] Analyzing loop 6 <-> 6 [LoopRecursionBreakPass] DONE [PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original module's functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU beforeHeader --> CPU 8 --> CPU_SIMD 9 --> CPU_SIMD 13 --> CPU afterHeader --> CPU puts with BBs:

PartitionWriterPass::find_initiators() -- Inspecting function main() Trivial initiators: 5 11 Entry block initiator: entry Nontrivial initiators: 14 Results: entry has initiator entry beforeHeader has initiator entry 5 has initiator 5 11 has initiator 11 12 has initiator 11

[PartitionWriterPass] Module CPU.part generated [PartitionWriterPass] Module CPU_SIMD.part generated Partitioned hello.ir

Turning  CFGs  into  callgraphs  

Page 34: Turning’Control’Flow’Graphs’into’ Callgraphs · 12.04.2012  · Turning’Control’Flow’Graphs’into’ Callgraphs ’ Transformaonofparonedcodesfor execuoninheterogeneousarchitectures’

Time profiling hello.ir [HPCmap] Parsing module hello.ir... [ReadArchPass] Parsing architecture ../architectures/CPU_SIMD.arch... [EstimationPass] Estimating from profiling information... [PartitioningPass] Partitioning... [PartitioningPass] PARTITIONING OVERVIEW: [PartitioningPass] Initial exec time was 1.81e-07 s, new is 1.06e-07 -- Speedup = 1.71e+00 [LoopRecursionBreakPass] Analyzing loop 9 <-> 9 [LoopRecursionBreakPass] DONE [LoopRecursionBreakPass] Analyzing loop 6 <-> 6 [LoopRecursionBreakPass] DONE [PartitionWriterPass] Generating partitioned code PartitionWriterPass::runOnModule() -- Original module's functions:

odd with BBs: entry --> CPU main with BBs: entry --> CPU 3 --> CPU beforeHeader --> CPU 8 --> CPU_SIMD 9 --> CPU_SIMD 13 --> CPU afterHeader --> CPU puts with BBs:

PartitionWriterPass::find_initiators() -- Inspecting function main() Trivial initiators: 5 11 Entry block initiator: entry Nontrivial initiators: 14 Results: entry has initiator entry beforeHeader has initiator entry 5 has initiator 5 11 has initiator 11 12 has initiator 11

[PartitionWriterPass] Module CPU.part generated [PartitionWriterPass] Module CPU_SIMD.part generated Partitioned hello.ir

Turning  CFGs  into  callgraphs