Solving Irregular Problems Through Parallel Irregular Trees Fabrizio Baiardi Paolo Mori Laura Ricci...

Post on 22-Dec-2015

217 views 0 download

Tags:

Transcript of Solving Irregular Problems Through Parallel Irregular Trees Fabrizio Baiardi Paolo Mori Laura Ricci...

Solving Irregular Problems Through Parallel Irregular Trees

Fabrizio Baiardi

Paolo Mori

Laura Ricci

Dipartimento di Informatica

Università di Pisa

Istituto di Informatica e Telematica

CNR - Pisa

PDCN 2005

Outline

• Irregular problems main features

• Hierarchical representation of the domain

• Parallel Irregular Tree library

• Experimental results

• Future works

PDCN 2005

Irregular Problems• the domain includes a set of elements characterised by

– the position in the domain– other problem specific properties

• the elements distribution is– non-homogeneous– dynamic and non-predictable

• the evolution of an element– depends upon that of other elements (locality)– updates the element properties

• Examples– Barnes Hut – Adaptive Multigrid Methods– Radiosity methods

PDCN 2005

Hierarchical Representation

• the domain is recursively partitioned into a set of spaces by applying a a problem dependent condition

• the Hierarchical Tree represents the decomposition and each Hnode represents either a space or an element

PDCN 2005

Distributed Hierarchical Tree Htree representation distributed among the p-nodes

pt = <{h0,..hn-1}, mHt>– private Htree (pHt): subtree assigned to a p-node

– mapping Htree (mHt): represents the hierarchical relations among the private Htrees ( )

h0h1

h2h3

PDCN 2005

PIT Library defines:

– PITree

– PIT operations• key point: both the sequential and the parallel

versions of the application are structured in terms of operations on Htrees

• aims– be a simple, complete and effective parallelization tool

– hide to the user the details of the parallel programming

– preserve most of the sequential code

PDCN 2005

PIT API• main operations

– PITree creation– PITree completion– PITree update

• alternative API– standard– advanced

• composition of the adopted API– standard structure– customised for the specific problem

PDCN 2005

PITree Creation

• it creates the PITree starting from the domain elements– one (or more) pHt for each p-node– one mHt replicated in each p-node

• it implements a distributed strategy to exploit memory at best

• it needs some user-defined functions to manage the elements of the target problem

PDCN 2005

PITree Completion (I)

• standard API: – fault prevention and informed fault prevention– one function only implements the strategy– invoked before each operator

PITree_completion(pht_root, stencil_0) tp_op_0(pht_root)

this comes from the sequential code

PDCN 2005

PITree Completion (II)• advanced API:

– informed fault prevention only– two distinct functions

• PITree_det_neighbours: invoked each time the neighbourhood relations among the elements changes

• PITree_exch_neighbours: invoked before each operator

PITree_det_neighbors(pht_root, stencil_0)

PITree_exch_neighbors(pht_root, stencil_0) tp_op_0(pht_root)

this comes from the sequential code

PDCN 2005

PITtree Update (I)• advanced API: two distinct functions

– PITree correction: • updates the mapping of the elements violating the mapping strategy• it is invoked after each operator that updates the distribution

tp_op_0(pht_root) PITree_correction(pht_root)

– PITree balance: • updates the mapping to redistribute the workload among the p-nodes• it is invoked after each operator that modifies the workload

tp_op_0(pht_root) PITree_balance(pht_root, Tresh)

PDCN 2005

PITtree Update (II)

• Standard API: – one function only, PITree update, implements the

PITree correction and balancing – PITree update is invoked after each operator

tp_op_0(pht_root)

PITree_update(pht_root, Tresh)

PDCN 2005

Parallelization• Standard:

– the functions of the sequential version are inserted into the standard structure

– the development is straighforward– a deep knowledge of the target problem is not required

• Customized– the PIT operations are inserted into the sequential code

according to the semantics of the target problem– a deep knowledge of the target problem is required – both the standard and the advanced API can be adopted– it achieves a better efficiency

PDCN 2005

Sequential Code

irregular_problem(tElementList *dom) {

...

root = Htree_creation(dom)

...

while (not solution_computed) {

tp_op_0(root)

tp_op_n(root)

}

}

problem operator: mainly consists in a visit of the Htree

PDCN 2005

Standard Structure

irregular_problem(tElementList *dom) { ... pht_root = PITree_creation(dom, dec_el, incl_el, rem_el) ... while (not solution_computed) { PITree_completion(pht_root, stencil_0) tp_op_0(pht_root) pht_root = PITree_update(pht_root, T) …. PITree_completion(pht_root, stencil_n) tp_op_n(pht_root) pht_root = PITree_update(pht_root, T) }}

PDCN 2005

Customised Structureirregular_problem(tElementList *dom) { … pht_root = PITree_creation(dom, dec_el, incl_el, rem_el) ... while (not solution computed) { PITree_det_neighbors(pht_root, stencil_0+..+stencil_i) PITree_exch_neighbors(pht_root, stencil_0) tp_op_0(pht_root) … PITree_exch_neighbors(pht_root, stencil_i) tp_op_i(pht_root) PITree_correction(pht_root) PITree_det_neighbors(pht_root, stencil_i+1+..+stencil_n) … PITree_exch_neighbors(pht_root, stencil_n) tp_op_n(pht_root) PITree_update(pht_root) }}

PDCN 2005

Validation• Applications

– Adaptive Multigrid Methods– Hierarchical Radiosity

• Parallel architectures– PC cluster

• Intel Pentium II 266MHz• 128 Mb• 100Mb Fast Ethernet

– IBM Beowulf (x330)• Intel Pentium III 1.133GHz• 1GB per p-node (2 procs) • Myricom LAN (264MB)

PDCN 2005

Adaptive Multigrid Methods• fast iterative methods to solve partial diff. equations

• discretized and multi level domain representation through a grid hierarchy

• adaptive problem: – the discretization is finer where the equation

is irregular– new grids are added during the computation

in )8(

))2(2())(2cos(10),(

[1,0][1,0]in 02

2

2

2

sinh

yxsinhyxyxu

dy

ud

dx

ud• Poisson

Problem

PDCN 2005

Sequential Codeamm(tElementList *initial_grid) {

root=Htree_creation(initial_grid)

while (not end) { smoothing(root, v, f, all_levels) for level from Lmax downto Lg { rest(root, level) restriction(root, level-1) smoothing(root, e, r, level-1) } for level frm Lg+1 to Lmax { prolongation(root, level) correction(root, e, level) smoothing(root, e, r, level) } correction(root, v, all_levels) end = norm(root) if (not end) Lmax = refinement(root)}

PDCN 2005

Parallel Code (I)

amm(tElementList *initial_grid) {

pht_root = PITree_creation(initial_grid, dec_el, incl_el, rem_el)

while (not end) { PITree_det_neighbors(pht_root, stencil_union) PITree_exch_neighbors(pht_root, smooth-rest_stencil, all_levels) smoothing(pht_root, v, f, all_levels) for level from Lmax downto Lg { PITree_exch_neighbors(pht_root, smooth-rest_stencil, level) rest(pht_root, level) PITree_exch_neighbors(pht_root, restriction_stencil, level) restriction(pht_root, level-1) PITree_exch_neighbors(pht_root, smooth-rest_stencil, level) smoothing(pht_root, e, r, level-1) }

PDCN 2005

Parallel code (II)

for level frm Lg+1 to Lmax { PITree_exch_neighbors(pht_root, prolongation_stencil, level) prolongation(pht_root, level) correction(pht_root, e, level) PITree_exch_neighbors(pht_root, smooth-rest_stencil, level) smoothing(pht_root, e, r, level) } correction(pht_root, v, all_levels) PITree_exch_neighbors(pht_root, norm_stencil, level) end = norm(pht_root) if (not end) Lmax = refinement(pht_root) pht_root = PITree_update(pht_root, T) }}

PDCN 2005

Domain

Hierarchical

Decomposition

After

10 Iterations

PDCN 2005

Load Balancing

1200

1300

1400

1500

1600

1700

1800

1900

2000

2100

2 5 10 20 30 50 100

Treshould (%)

Co

mp

leti

on

tim

e (

sec)

PDCN 2005

Efficiency

50

60

70

80

90

100

2 4 8 10 16 32

numbero of p-nodes

eff

icie

ncy

(%

)

IBM Beowulf PC Beowulf

PDCN 2005

Hierarchical Radiosity• a model of the light exchanges to

compute the illumination of a scene

• representation of the scene– discretized and hierarchical– adaptive

• locality: interactions among objects at distinct abstraction levels

PDCN 2005

Sequential Codehierarchical_rad(segment_list *scene) {

root = Htree_creation(scene)

visib_list_det(root)

while (not end) {

Gather_H(root)

for level from L_min to L_max

Push_H(root, level)

for level from L_max downto L_min

Pull_H(root, level)

end = RefineLink_H(root)

}

}

PDCN 2005

Parallel Code (I)

hierarchical_rad(segment_list *scene) {

pht_root = PITree_creation(scene, dec_el, incl_el, rem_el)

PITree_exch_neighbors(pht_root, vis_stencil, all_levels)

visib_list_det(pht_root)

while (not end) {

PITree_exch_neighbors(pht_root, int_list, all_levels)

Gather_H(pht_root)

for level from L_min to L_max {

PITree_exch_neighbors(pht_root, push_stencil, level)

Push_H(pht_root, level)

}

PDCN 2005

Parallel Code (II)

for level from L_max downto L_min {

PITree_exch_neighbors(pht_root, pull_stencil, level)

Pull_H(pht_root, level)

}

end = RefineLink_H(pht_root)

pht_root = PITree_balance(pht_root)

}

}

PDCN 2005

Test

Scene

• 192 polygons

• 896 segments

PDCN 2005

Efficiency

50

60

70

80

90

100

2 4 8 10 16 32

number of p-nodes

eff

icie

ncy

(%

)

IBM Beowulf PC Beowulf

PDCN 2005

Future Works

• the definition of the set of problems that cannot be solved adopting our methodology

• the definition of programming constructs for the considered class of problems