Download - PASCAL - University of California, Irvine

HPC Factory

A Parallel Algorithmic SCALable Framework for N-body Problems

Laleh Aghababaie Beni, Aparna Chandramowlishwaran

Euro-Par 2017

PASCALA Parallel Algorithmic SCALable Framework

for N-body Problems

Laleh Aghababaie Beni, Aparna Chandramowlishwaran

Euro-Par 2017

HPC Factory

• Introduction• PASCAL Framework

• Space Partitioning Trees• Tree Traversal• Prune/Approximate Generators

• Optimizations & Parallelization • Experiments & Results • Conclusions & Future Work

Outline

HPC Factory

Outline




HPC Factory

N-body calculations

Force computation

Nearest neighbors

Kernel density estimation

Range count

⌅q ⇤ Q : F (q) =�

r⇥(Q�{q})

Cr � q

||r � q||3

⌅q ⇤ Q : AllNN(q) = argminr⇥R d(q, r)

⌅q ⇤ Q : KDE(q) =1

|R|�

r⇥R

K(q, r)

⌅q ⇤ Q : Range(q) =�

r⇥R

I(dist(q, r)) ⇥ h)

HPC Factory

N-body calculations

Force computation

Nearest neighbors


Range count

⌅q ⇤ Q : F (q) =�

r⇥(Q�{q})

Cr � q

||r � q||3


⌅q ⇤ Q : KDE(q) =1

|R|�

r⇥R

K(q, r)


r⇥R


What do these have in common?

HPC Factory

N-body calculations

Force computation

Nearest neighbors


Range count

⌅q ⇤ Q : F (q) =�

r⇥(Q�{q})

Cr � q

||r � q||3


⌅q ⇤ Q : KDE(q) =1

|R|�

r⇥R

K(q, r)


r⇥R


Consider pairs of points – naïvely O(N2)

What do these have in common?

HPC Factory

Commonality: Optimal approximation algorithms

Force computation⇤q ⇥ Q : F (q) =�

r⇥(Q�{q})

Cr � q

||r � q||3

Evaluate interactions→ Tree traversals

Store aggregate data atnodes, e.g., bounding box,mass

•Hierarchical tree-based approximation algorithms for force computations, e.g., Barnes-Hut or FMM

HPC Factory

N-body problems in other domains

Problem Operators Kernel FunctionAll Nearest Neighbors

All Range SearchAll Range Count

Naive Bayes ClassifierMixture Model E-step

K-means E-stepMixture Model Log-likelihood

Kernel Density EstimationKernel Density Bayes Classifier

2-point (cross-)correlation Nadaraya-Watson Regression

Thermodynamic AverageLargest-span set

Closest PairMinimum Spanning TreeCoulombic Interaction

Average DensityWave function

Hausdorff DistanceIntrinsic (fractal) Dimension

8, argmin

8,[ arg

||xq � xr||I(h

min

< ||xq

� x

r

|| < h

max

)

I(hmin

< ||xq

� x

r

|| < h

max

)8,⌃(1/

p2⇡|⌃

k

|)e� 12 (xi�µk)

T⌃�1k (xi�µk)P (C

k

)8, argmax

(1/p

2⇡|⌃k

|)e� 12 (xi�µk)

T⌃�1k (xi�µk)8, 8

8, argmin ||xq � xr||

(1/p

2⇡|⌃k

|)e� 12 (xi�µk)

T⌃�1k (xi�µk)

X, log

X

8,⌃ �(||xq � xr||

h

)

�(||xq � xr||

h

)P (Ck)8, argmax⌃

max,min

8,⇧||xq � xr||

�(||xq � xr||)⌃,⌃ I(||xq � xr|| < h)

⌃,⌃ I(||xq � xr|| < h)

8,⌃ yr �(||xq � xr||

h

)

⌃,⌃ �(||xq � xr||)max, ...,max ⌃(||xq � xr||)

argmin, argmin ||xq � xr||8, argmin ||xq � xr||8,⌃ ↵q↵r

||xq � xr||

⌃,⌃ I(||xq � xr|| < h)

Each problem has a set of operators and a kernel function

HPC Factory

• One of the original seven dwarfs or motifs

• FMM listed among the top 10 algorithms having the greatest influence in 20th century

• EM is one of the top 10 algorithms having the highest impact in data mining

• Applications

• Machine learning

• Computer vision

• Computational geometry

• Scientific computing …

Why N-body methods?

HPC Factory

Key Ideas and Findings

• An algorithmic framework for N-body problems• Automatically generates prune & approximation

conditions• Results in O(N log N) and O(N) algorithms• Domain-specific optimizations and parallelization

• Show 10-230x speedup on 6 different algorithms compared to state-of-art libraries/softwares

• Out-of-the-box new optimal algorithms• O(N log N) EM algorithm for GMM’s• O(N) algorithm for Hausdorff distance

HPC Factory




Outline

HPC Factory

PASCAL Framework

Datasets

N-body spec.:Operators &

Kernel function

Prune/Approximate condition generator

Tree Traversal SchemesMulti tree traversals

BaseCase Prune/Approximate ComputeApproximate

Space-partitioning TreesKd-tree

Domain-Specific Optimizations

Optimized code

Parallelization

HPC Factory

Tree Construction

http://www.cs.cmu.edu/~dpelleg/kmeans.html

Recursively divide space until each box has at most q points.

http://www.cs.cmu.edu/~dpelleg/kmeans.html