Post on 22-Dec-2015
Solving Irregular Problems Through Parallel Irregular Trees
Fabrizio Baiardi
Paolo Mori
Laura Ricci
Dipartimento di Informatica
Università di Pisa
Istituto di Informatica e Telematica
CNR - Pisa
PDCN 2005
Outline
• Irregular problems main features
• Hierarchical representation of the domain
• Parallel Irregular Tree library
• Experimental results
• Future works
PDCN 2005
Irregular Problems• the domain includes a set of elements characterised by
– the position in the domain– other problem specific properties
• the elements distribution is– non-homogeneous– dynamic and non-predictable
• the evolution of an element– depends upon that of other elements (locality)– updates the element properties
• Examples– Barnes Hut – Adaptive Multigrid Methods– Radiosity methods
PDCN 2005
Hierarchical Representation
• the domain is recursively partitioned into a set of spaces by applying a a problem dependent condition
• the Hierarchical Tree represents the decomposition and each Hnode represents either a space or an element
PDCN 2005
Distributed Hierarchical Tree Htree representation distributed among the p-nodes
pt = <{h0,..hn-1}, mHt>– private Htree (pHt): subtree assigned to a p-node
– mapping Htree (mHt): represents the hierarchical relations among the private Htrees ( )
h0h1
h2h3
PDCN 2005
PIT Library defines:
– PITree
– PIT operations• key point: both the sequential and the parallel
versions of the application are structured in terms of operations on Htrees
• aims– be a simple, complete and effective parallelization tool
– hide to the user the details of the parallel programming
– preserve most of the sequential code
PDCN 2005
PIT API• main operations
– PITree creation– PITree completion– PITree update
• alternative API– standard– advanced
• composition of the adopted API– standard structure– customised for the specific problem
PDCN 2005
PITree Creation
• it creates the PITree starting from the domain elements– one (or more) pHt for each p-node– one mHt replicated in each p-node
• it implements a distributed strategy to exploit memory at best
• it needs some user-defined functions to manage the elements of the target problem
PDCN 2005
PITree Completion (I)
• standard API: – fault prevention and informed fault prevention– one function only implements the strategy– invoked before each operator
PITree_completion(pht_root, stencil_0) tp_op_0(pht_root)
this comes from the sequential code
PDCN 2005
PITree Completion (II)• advanced API:
– informed fault prevention only– two distinct functions
• PITree_det_neighbours: invoked each time the neighbourhood relations among the elements changes
• PITree_exch_neighbours: invoked before each operator
PITree_det_neighbors(pht_root, stencil_0)
PITree_exch_neighbors(pht_root, stencil_0) tp_op_0(pht_root)
this comes from the sequential code
PDCN 2005
PITtree Update (I)• advanced API: two distinct functions
– PITree correction: • updates the mapping of the elements violating the mapping strategy• it is invoked after each operator that updates the distribution
tp_op_0(pht_root) PITree_correction(pht_root)
– PITree balance: • updates the mapping to redistribute the workload among the p-nodes• it is invoked after each operator that modifies the workload
tp_op_0(pht_root) PITree_balance(pht_root, Tresh)
PDCN 2005
PITtree Update (II)
• Standard API: – one function only, PITree update, implements the
PITree correction and balancing – PITree update is invoked after each operator
tp_op_0(pht_root)
PITree_update(pht_root, Tresh)
PDCN 2005
Parallelization• Standard:
– the functions of the sequential version are inserted into the standard structure
– the development is straighforward– a deep knowledge of the target problem is not required
• Customized– the PIT operations are inserted into the sequential code
according to the semantics of the target problem– a deep knowledge of the target problem is required – both the standard and the advanced API can be adopted– it achieves a better efficiency
PDCN 2005
Sequential Code
irregular_problem(tElementList *dom) {
...
root = Htree_creation(dom)
...
while (not solution_computed) {
tp_op_0(root)
…
tp_op_n(root)
}
}
problem operator: mainly consists in a visit of the Htree
PDCN 2005
Standard Structure
irregular_problem(tElementList *dom) { ... pht_root = PITree_creation(dom, dec_el, incl_el, rem_el) ... while (not solution_computed) { PITree_completion(pht_root, stencil_0) tp_op_0(pht_root) pht_root = PITree_update(pht_root, T) …. PITree_completion(pht_root, stencil_n) tp_op_n(pht_root) pht_root = PITree_update(pht_root, T) }}
PDCN 2005
Customised Structureirregular_problem(tElementList *dom) { … pht_root = PITree_creation(dom, dec_el, incl_el, rem_el) ... while (not solution computed) { PITree_det_neighbors(pht_root, stencil_0+..+stencil_i) PITree_exch_neighbors(pht_root, stencil_0) tp_op_0(pht_root) … PITree_exch_neighbors(pht_root, stencil_i) tp_op_i(pht_root) PITree_correction(pht_root) PITree_det_neighbors(pht_root, stencil_i+1+..+stencil_n) … PITree_exch_neighbors(pht_root, stencil_n) tp_op_n(pht_root) PITree_update(pht_root) }}
PDCN 2005
Validation• Applications
– Adaptive Multigrid Methods– Hierarchical Radiosity
• Parallel architectures– PC cluster
• Intel Pentium II 266MHz• 128 Mb• 100Mb Fast Ethernet
– IBM Beowulf (x330)• Intel Pentium III 1.133GHz• 1GB per p-node (2 procs) • Myricom LAN (264MB)
PDCN 2005
Adaptive Multigrid Methods• fast iterative methods to solve partial diff. equations
• discretized and multi level domain representation through a grid hierarchy
• adaptive problem: – the discretization is finer where the equation
is irregular– new grids are added during the computation
in )8(
))2(2())(2cos(10),(
[1,0][1,0]in 02
2
2
2
sinh
yxsinhyxyxu
dy
ud
dx
ud• Poisson
Problem
PDCN 2005
Sequential Codeamm(tElementList *initial_grid) {
root=Htree_creation(initial_grid)
while (not end) { smoothing(root, v, f, all_levels) for level from Lmax downto Lg { rest(root, level) restriction(root, level-1) smoothing(root, e, r, level-1) } for level frm Lg+1 to Lmax { prolongation(root, level) correction(root, e, level) smoothing(root, e, r, level) } correction(root, v, all_levels) end = norm(root) if (not end) Lmax = refinement(root)}
PDCN 2005
Parallel Code (I)
amm(tElementList *initial_grid) {
pht_root = PITree_creation(initial_grid, dec_el, incl_el, rem_el)
while (not end) { PITree_det_neighbors(pht_root, stencil_union) PITree_exch_neighbors(pht_root, smooth-rest_stencil, all_levels) smoothing(pht_root, v, f, all_levels) for level from Lmax downto Lg { PITree_exch_neighbors(pht_root, smooth-rest_stencil, level) rest(pht_root, level) PITree_exch_neighbors(pht_root, restriction_stencil, level) restriction(pht_root, level-1) PITree_exch_neighbors(pht_root, smooth-rest_stencil, level) smoothing(pht_root, e, r, level-1) }
PDCN 2005
Parallel code (II)
for level frm Lg+1 to Lmax { PITree_exch_neighbors(pht_root, prolongation_stencil, level) prolongation(pht_root, level) correction(pht_root, e, level) PITree_exch_neighbors(pht_root, smooth-rest_stencil, level) smoothing(pht_root, e, r, level) } correction(pht_root, v, all_levels) PITree_exch_neighbors(pht_root, norm_stencil, level) end = norm(pht_root) if (not end) Lmax = refinement(pht_root) pht_root = PITree_update(pht_root, T) }}
PDCN 2005
Domain
Hierarchical
Decomposition
After
10 Iterations
PDCN 2005
Load Balancing
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2 5 10 20 30 50 100
Treshould (%)
Co
mp
leti
on
tim
e (
sec)
PDCN 2005
Efficiency
50
60
70
80
90
100
2 4 8 10 16 32
numbero of p-nodes
eff
icie
ncy
(%
)
IBM Beowulf PC Beowulf
PDCN 2005
Hierarchical Radiosity• a model of the light exchanges to
compute the illumination of a scene
• representation of the scene– discretized and hierarchical– adaptive
• locality: interactions among objects at distinct abstraction levels
PDCN 2005
Sequential Codehierarchical_rad(segment_list *scene) {
root = Htree_creation(scene)
visib_list_det(root)
while (not end) {
Gather_H(root)
for level from L_min to L_max
Push_H(root, level)
for level from L_max downto L_min
Pull_H(root, level)
end = RefineLink_H(root)
}
}
PDCN 2005
Parallel Code (I)
hierarchical_rad(segment_list *scene) {
pht_root = PITree_creation(scene, dec_el, incl_el, rem_el)
PITree_exch_neighbors(pht_root, vis_stencil, all_levels)
visib_list_det(pht_root)
while (not end) {
PITree_exch_neighbors(pht_root, int_list, all_levels)
Gather_H(pht_root)
for level from L_min to L_max {
PITree_exch_neighbors(pht_root, push_stencil, level)
Push_H(pht_root, level)
}
PDCN 2005
Parallel Code (II)
for level from L_max downto L_min {
PITree_exch_neighbors(pht_root, pull_stencil, level)
Pull_H(pht_root, level)
}
end = RefineLink_H(pht_root)
pht_root = PITree_balance(pht_root)
}
}
PDCN 2005
Test
Scene
• 192 polygons
• 896 segments
PDCN 2005
Efficiency
50
60
70
80
90
100
2 4 8 10 16 32
number of p-nodes
eff
icie
ncy
(%
)
IBM Beowulf PC Beowulf
PDCN 2005
Future Works
• the definition of the set of problems that cannot be solved adopting our methodology
• the definition of programming constructs for the considered class of problems