Problem(Based(Benchmarks:(and( their(role(in(parallel...

57
Problem Based Benchmarks: and their role in parallel algorithms Guy Blelloch Carnegie Mellon University Alenex 2012 1 Also: Jeremy Fineman, Phil Gibbons (Intel), Julian Shun, Harsha Vardham Simhadri, …

Transcript of Problem(Based(Benchmarks:(and( their(role(in(parallel...

Page 1: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Problem  Based  Benchmarks:  and  their  role  in  parallel  algorithms  

Guy  Blelloch  Carnegie  Mellon  University  

 

Alenex 2012 1

Also: Jeremy Fineman, Phil Gibbons (Intel), Julian Shun, Harsha Vardham Simhadri, …

Page 2: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Outline  

•  The  challenge  with  parallel  algorithms    •  The  problem  based  benchmark  suite  •  How  they  do  on  modern  mulAprocessors  

Alenex 2012 2

Page 3: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

16  core  processor  

Page 3 Alenex 2012

Page 4: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

64  core  blade  servers  ($6K)  (shared  memory)  

Page 4

x 4 =

Alenex 2012

Page 5: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

1024  “cuda”  cores  

Alenex 2012 5

Page 6: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

6 Alenex 2012

Up to 300K servers

Page 7: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Alenex 2012 7

Page 8: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Different  Architectures  

•  MulAcore  (shared  memory)  •  GPUs  •  Distributed  memory  •  FPGAs  

Alenex 2012 8

Page 9: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Different  Programming  Approaches  •  transacAons  •  futures    •  nested  parallelism  •  map-­‐reduce  •  CUDA/GPU  programming  •  data  parallelism  •  PRAM    •  bulk  synchronizaAon  Alenex 2012 9

Page 10: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

•  threads  •  message  passing  •  parallel  I/O  models  •  parAAoned  global  address  space  •  coordinaAon  languages  •  concurrent  data  structures  •  events    •  …  Alenex 2012 10

Different  Programming  Approaches  

Page 11: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

But….  •  How  well  do  these  work  on  standard  problems?  

•  How  do  they  compare?  •  What  kind  of  algorithms  work  best?  •  How  easy  are  they  to  program?  

Alenex 2012 11

Page 12: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Outline  

•  The  challenge  with  parallel  algorithms    •  The  problem  based  benchmark  suite  •  How  they  do  on  modern  mulAprocessors  

Alenex 2012 12

Page 13: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Problem  Based  Benchmarks  

•  Define  a  set  of  benchmarks  in  terms  of  Input/Output  behavior  on  specific  inputs,  and  use  them  to  compare  soluAons.  

Alenex 2012 13 Input Output

Page 14: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Problem  Based  Benchmarks  •  Judge  based  on:  

– Performance  and  scalability  – Ability  to  reason  about  performance  – Quality  of  code  – Generality  over  inputs  – Pla_orm  independence  

Some  aspects  can  be  judged  qualitaAvely,  others  aspects  will  be  at  the  eye  of  the  beholder.      

Therefore  making  code  public  is  very  important.  

Alenex 2012 14

Page 15: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

The  PBBS  effort  

Benchmarks  with  following  characterisAcs  – Well  known  and  understood  – Concisely  described  –  Implementable  in  under  1000  lines  of  code  – Broad  representaAon  of  domains  – Correctness  or  quality  of  output  easily  measured  –  Independent  of  machine  type  

Alenex 2012 15

Page 16: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Many  ExisAng  Benchmarks  

But  none  we  know  of  match  the  spec  •  Code  Based  :  SPEC,  Da  Capo,  PassMark,  Splash-­‐2,  PARSEC,  fluidMark  

•  Applica4on  Specific:  Linpack,  BioBench,  BioParallel,  MediaBench,  SATLIB,  CineBench,  MineBench,  TCP,  ALPBench,  Graph  500,  DIMACS  challenges  

•  Method  Based:  Lonestar  •  Machine  analysis:  HPC  challenge,  Java  Grande,  NAS,  Green  500,  Graph  500,  P-­‐Ray,  fluidMark    

 Alenex 2012 16

Page 17: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Status  

•  About  15  benchmarks  defined  with  supporAng  code    

•  SequenAal  implementaAons  •  MulAcore  implementaAons  •  Will  make  public  in  February  

Alenex 2012 17

Page 18: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Preliminary  Benchmarks  I  

Alenex 2012 18

Sequences   *  Comparison  SorAng  

*  Removing  Duplicates  

*  DicAonary  

Graphs   *  Breadth  First  Search  

Graph  Separators  

*  Minimum  Spanning  Tree  

*  Maximal  Independent  Set  

Geometry/Graphics  

*  Delaunay  TriangulaAon  and  Refinement  

*  Convex  Hulls  

*  Ray  Triangle  IntersecAon  (Ray  CasAng)  

Micropolygon  Rendering  

Page 19: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Preliminary  Benchmarks  II  

Alenex 2012 19

Machine  Learning  

*  All  Nearest  Neighbors  

Support  Vector  Machines  

K-­‐Means  

Text  Processing  

*  Suffix  Arrays  

Edit  Distance  

String  Search  

Science   *  Nbody  force  calculaAons  

PhylogeneAc  tree  

Numerical   *  Sparse  Matrix  Vector  MulAply  

Sparse  Linear  Solve  

Page 20: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Each  Benchmark  Consists  of:  

•  A  precise  specificaAon  of  the  problem  •  SpecificaAon  of  Input/Output  file  formats  •  A  set  of  input  generators.  •  A  weighAng  on  the  inputs  •  Code  for  tesAng  the  results  •  Baseline  sequenAal  code  •  Baseline  parallel  code(s)  

Alenex 2012 20

Page 21: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Example  Input  

SorAng:  – Random  floats  (uniform)  – Random  floats  (exponenAal  bias)  – Almost  sorted  – Strings  generated  from  trigram  probability  and  randomly  permuted  

– Structures  with  float  key  and  3  addiAonal  fields  

Alenex 2012 21

Page 22: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Outline  

•  The  challenge  with  parallel  algorithms    •  The  problem  based  benchmark  suite  •  How  they  do  on  modern  mulAprocessors  

– Using  32-­‐core  Intel  Nehalem  – What  parallel  algorithms  work  

Alenex 2012 22

Page 23: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Algorithmic  Models  

•  PRAM  •  BSP  •  Nested  Parallelism  with  Work  and  Span  

– Compose  work  by  summing  – Compose  span  by  taking  the  max  

•  Parallel  Cache  Oblivious  Model  – Count  SequenAal  Cache  misses  – Can  be  used  to  bound  parallel  cache  misses  

Alenex 2012 23

Page 24: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

How  do  the  problems  do  on    a  modern  mulAcore  

Alenex 2012 24

Tseq/T3231.621.611.2109

14.5151711.7171815

0"

4"

8"

12"

16"

20"

24"

28"

32"

Sort"

Duplicate"Removal"

Min"Spanning"Tree"

Max"Independ."Set"

Spanning"Forest"

Breadth"First"Search"

Delaunay"Triang."

Triangle"Ray"Inter."

Nearest"Neighbors"

Sparse"M

xV"

Nbody"

Suffix"Array"

T1/T32"Tseq/T32"

Page 25: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Divide  and  Conquer  

•  SorAng  :  Sample  sort  •  Nearest  neighbors  :  building  quad-­‐oct  trees  •  Triangle-­‐ray  intersect  :  k-­‐d  trees  •  N-­‐body  simulaAon:  Callahan-­‐Kosaraju  

Alenex 2012 25

Page 26: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

SorAng  :  Sample  Sort  

Alenex 2012 26

A1 A2 A3  

Am  

m

m

Sort(A1)  Sort(A2)  Sort(A3)  

Sort(Am)  m

m S

S1  S2  S3  

Sm  

Divide using P

Si  P   MERGE

Sample   P  sort

Page 27: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

SorAng  :  Sample  Sort  

Alenex 2012 27

Send to buckets

}  Finally, sort buckets. }  Depth(n) = O(log2(n)) }  Work(n) = O(n log n) }  Q1(n; M,B) = O((n/B)(log(M/B)(n/B))

Page 28: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Alenex 2012 28

Tseq/T3231.621.611.2109

14.5151711.7171815

0"

4"

8"

12"

16"

20"

24"

28"

32"

Sort"

Duplicate"Removal"

Min"Spanning"Tree"

Max"Independ."Set"

Spanning"Forest"

Breadth"First"Search"

Delaunay"Triang."

Triangle"Ray"Inter."

Nearest"Neighbors"

Sparse"M

xV"

Nbody"

Suffix"Array"

T1/T32"Tseq/T32"

Page 29: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Sort  Performance,  More  Detail  weight   STL  Sort   Sanders  Sort   Quicksort   SampleSort   SampleSort  

     Cores   1   32   32   32   1  

Uniform   .1   15.8   1.06   4.22   .82   20.2  

ExponenAal   .1   10.8   .79   2.49   .53   13.8  

Almost  Sorted   .1   3.28   1.11   1.76   .27   5.67  

Trigram  Strings   .2   58.2   4.63   8.6   1.05   30.8  

Strings  Permuted  

.2   82.5   7.08   28.4   1.76   49.3  

Structure   .3   17.6   2.03   6.73   1.18   26.7  

   Average   36.4   3.24   10.3   .97   28.0  

Alenex 2012 29

All inputs are 100,000,000 long. All code written run on Cilk++ (also tested in Cilk+) All experiments on 32 core Nehalem (4 X x7560)

Page 30: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

SpeculaAve  ExecuAon  

Several  efficient  sequenAal  algorithms  are  greedy  loops  that  insert/process  items  one  at  a  Ame,  but  with  dependences:  §  Maximal  independent  Set  (over  verAces)  §  Maximal  Matching  (over  edges)  §  Spanning  Tree  (over  edges)  §  Delaunay  TriangulaAon  (over  points)      Alenex 2012 30

Page 31: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Maximal  Independent  Set  

SequenAal  algorithm:  for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out

   

Alenex 2012 31

X  

1 2

3

9

5

4

7

8 6

10

x

x

x x

x

x

Page 32: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Maximal  Independent  Set  

SequenAal  algorithm:  for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out

Very  efficient:  most  edges  not  even  visited,  simple  loops  About  7x  faster  than  sorAng  m  edges  

   Alenex 2012 32

Page 33: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Maximal  Independent  Set  

Same  algorithm:  with  parallel  speculaAon  for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out

   

Alenex 2012 33

X  

1 2

3

9

5

4

7

8 6

10

x

x

x x

x

x

Page 34: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Maximal  Independent  Set  

same  algorithm:  with  speculaAon  on  prefix  for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out

   

Alenex 2012 34

X  

1 2

3

9

5

4

7

8 6

10

?

?

?

1   2   3   4   5   6   7   8   9   10  

Page 35: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Maximal  Independent  Set  

same  algorithm:  with  speculaAon  on  prefix  for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out

   

Alenex 2012 35

X  

1 2

3

9

5

4

7

8 6

10

1   2   3   4   5   6   7   8   9   10  

Page 36: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Maximal  Independent  Set  

same  algorithm:  with  speculaAon  on  prefix  for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out

   

Alenex 2012 36

X  

1 2

3

9

5

4

7

8 6

10

x

x

?

? 1   2   3   4   5   6   7   8   9   10  

Page 37: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Maximal  Independent  Set  

same  algorithm:  with  speculaAon  on  prefix  for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out

   

Alenex 2012 37

X  

1 2

3

9

5

4

7

8 6

10

x

x

?

? 1   2   3   4   5   6   7   8   9   10  

Page 38: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Maximal  Independent  Set  

same  algorithm:  with  speculaAon  on  prefix  for each u in V : S[u] = Remainfor each u in V if for all v in N(u), v < u, S[v] = Out then S[u] = In else S[u] = Out

   

Alenex 2012 38

X  

1 2

3

9

5

4

7

8 6

10

x

x

x

x

x

x 1   2   3   4   5   6   7   8   9   10  

Page 39: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

MIS  Parallel  Code  struct MISStep { bool reserve(int i) { int d = V[i].degree; flag = IN; for (int j = 0; j < d; j++) { int ngh = V[i].Neighbors[j]; if (ngh < i) { if (Fl[ngh] == IN) { flag = OUT; return 1;} else if (Fl[ngh] == LIVE) flag = LIVE; } } return 1; }

bool commit(int i) { return (Fl[i] = flag) != LIVE;}};

void MIS(FlType* Fl, vertex* V, int n, int psize) speculative_for(MISStep(Fl, V), 0, n, psize);}Alenex 2012 39

Page 40: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Maximal  Independent  Set  

Costs:  –  Span  =  O(log3  n)    

Expected  case  over  all  iniAal  permutaAons  –  Work  =  O(m)    

if  prefix  size  =  O(n/dmax)  

DetermininisAc  :    –  result  only  depends  on  iniAal  permutaAon  of  

verAces  

Alenex 2012 40

Page 41: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Alenex 2012 41

Tseq/T3231.621.611.2109

14.5151711.7171815

0"

4"

8"

12"

16"

20"

24"

28"

32"

Sort"

Duplicate"Removal"

Min"Spanning"Tree"

Max"Independ."Set"

Spanning"Forest"

Breadth"First"Search"

Delaunay"Triang."

Triangle"Ray"Inter."

Nearest"Neighbors"

Sparse"M

xV"

Nbody"

Suffix"Array"

T1/T32"Tseq/T32"

Page 42: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Spanning  Tree  

SequenAal  algorithm:for each (u,v) in E u’ = find(u) v’ = find(v) if (u’ != v’) union(u’,v’)

   

Alenex 2012 42

Page 43: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Spanning  Tree    struct STStep { bool reserve(int i) { u = F.find(E[i].u); v = F.find(E[i].v); if (u == v) return 0; if (u > v) swap(u,v); R[v].reserve(i); return 1;}

bool commit(int i) { if (R[v].check(i)) { F.link(v, u); return 1;} else return 0; }};

void ST(res* R, edge* E, int m, int n, int psize) { disjointSet F(n); speculative_for(STStep(E, F, R), 0, m, psize);}

Alenex 2012 43

Page 44: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Delaunay  TriangulaAon/Refinement  

•  Add  points  in  parallel  but  detect  conflicts  

Alenex 2012 44

15

16 16

15

15

15

16

15

15

Page 45: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

DicAonary  Using  hashing:  

– Based  on  generic  hash  and  comparison  – Problem:  representaAon  can  depend  on  ordering.  Also  on  which  redundant  element  is  kept.  

– SoluAon:  Use  history  independent  hash  table  based  on  linear  probing…representaAon  is  independent  of  order  of  inserAon  

– Use  write-­‐min  on  collision  

Alenex 2012 45

6   7   3   11   9   5   8  

7, 11 3 9 8, 5 6

Page 46: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Breadth  First  Search  (BFS)  

Goal:  generate  the  same  BFS  (spanning)  tree  as  the  sequenAal  Q  based  algorithm.  

Alenex 2012 46

Page 47: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Breadth  First  Search  (BFS)  

SequenAal  algorithm:  

Alenex 2012 47

Page 48: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Breadth  First  Search  (BFS)  

Another  possible  tree:  

Alenex 2012 48

Page 49: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Breadth  First  Search  (BFS)  

SoluAon:  – Maintain  FronAer  and  priority  order  it  – Use  writeMin  to  choose  winner.  

Alenex 2012 49

1

1

2

1

2

3

1

2

3

4

Page 50: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Delaunay  TriangulaAon/Refinement  

•  Incremental  algorithm  adds  one  point  at  a  Ame,  but  points  can  be  added  in  parallel  if  they  don’t  interact.  

•  The  problem  is  that  the  output  will  depend  on  the  order  they  are  added.  

Alenex 2012 50

Page 51: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Delaunay  TriangulaAon/Refinement  

•  Adding  points  determinisAcally  

Alenex 2012 51

Page 52: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Delaunay  TriangulaAon/Refinement  

•  Adding  points  determinisAcally  

Alenex 2012 52

Page 53: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Delaunay  TriangulaAon/Refinement  

•  Adding  points  determinisAcally  

Alenex 2012 53

Page 54: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Performance  on  32  Core  Intel  Nehalem  

Alenex 2012 54

Tseq/T3231.621.611.2109

14.5151711.7171815

0"

4"

8"

12"

16"

20"

24"

28"

32"

Sort"

Duplicate"Removal"

Min"Spanning"Tree"

Max"Independ."Set"

Spanning"Forest"

Breadth"First"Search"

Delaunay"Triang."

Triangle"Ray"Inter."

Nearest"Neighbors"

Sparse"M

xV"

Nbody"

Suffix"Array"

T1/T32"Tseq/T32"

Page 55: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Some  Conclusions  from  Experiments  

•  MulAcores  work  quite  well…but  there  are  some  issues  with  memory  bandwidth  

•  Most  problems  parallelize  well.  •  Cost  models  are  reasonably  accurate  •  Parallel  code  does  not  need  to  be  complicated  •  Need  a  mix  of  parallelizaAon  techniques    

Alenex 2012 55

Page 56: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Open  QuesAons  

•  How  do  the  benchmarks  do  on  other  machines….other  models?  

•  Are  there  beser  sequenAal  implementaAons  •  Are  there  beser  parallel  implementaAons  •  More  benchmarks  –  perhaps  ones  that  don’t  parallelize  well  (e.g.  max  flow?).  

Alenex 2012 56

Page 57: Problem(Based(Benchmarks:(and( their(role(in(parallel ...guyb/papers/alenex12.pdfProblem(Based(Benchmarks:(and(their(role(in(parallel(algorithms(GuyBlelloch(Carnegie(Mellon(University(

Back  to  the  benchmarks  

•  Need  for  standardized  “problem  based”  benchmarks  for  comparing  approaches.  

•  ParAcularly  important  for  parallel  algorithms,  but  also  useful  for  sequenAal  algorithms.  

•  With  adequate  framework,  should  be  possible  for  anyone  to  submit  new  benchmarks  and  soluAons.  

Alenex 2012 57