Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

24
Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001

Transcript of Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Page 1: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Dynamic Load Balancing Tree and

Structured ComputationsCS433

Laxmikant Kale

Spring 2001

Page 2: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

When to send work away:

• Consider a processor with k units of work, with P other processors, – assume that a message takes 100 microsecs to reach:

• 20 microseconds send-processor overhead,

• 60: network latency

• 20 receive processor overhead

– If each task takes t units of time to complete, under what conditions should send them out to others (vs. doing it itself)?

– E.g. if t=100 microseconds? 50? 1000?

Key observation: the “master” spends 40 microseconds on coordination for a task, although the latency is 200 microsecs

Page 3: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Tree structured computations

• Examples: – Divide-and-conquer

– State-space search:

– Game-tree search

– Bidirectional search

– Branch-and-bound

• Issues:– Grainsize control

– Dynamic Load Balancing

– Prioritization

Page 4: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Divide and Conquer

• Simplest situation among the above– Given a problem, a recursive algorithm divides it into 1 (2) or

more subproblems, and solutions to the subproblems are composed to create a solution

– Example: adaptive quadrature

– Consider a simpler setting:

• Fib(n) = fib(n-1) + fib(n-2)

• Note: the fibonacci algorithm is not important here

– Issues:

• subtrees are unequal size, so can’t assign work a priori

• Fire tasks in parallel: but too fine-grained

Page 5: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Dynamic load balancing formulation:

• Each PE is creating work randomly• How to redistribute work?

• Initial allocation• Rebalancing

• Centralized vs distributed

Page 6: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Reading Assignment

• Adaptive grainsize control:– http://charm.cs.uiuc.edu go to publications, 95-05

• Prioritization and first-solution search:– http://charm.cs.uiuc.edu go to publications, 93-06

• Dynamic Load Balancing for tree structured computations:– Vipin Kumar’s papers (link to be added shotrly)

– http://charm.cs.uiuc.edu go to publications, 93-13

• A few more papers will be posted soon..

Page 7: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Adaptive grainsize control:

• Strategy 1: cut-off depth – (but must have an estimate of the size of the subtree)

• Strategy 2: stack splitting– Each PE maintains a stack of nodes of the tree

– If my stack is empty, “steal” half the stack of someone else

– Which part of the stack? Top? Bottom?

Page 8: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Adaptive grainsize control:

• Strategy 3:– Objects (tree nodes) decide whether to make children

available for other processors by calling a function in the runtime

– runtime monitors the size of its Queue (stack), and possibly size of other processor’s queues

Page 9: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Adaptive grainsize control:

• Strategy 3: Objects decide how big they want to grow– Monitor execution time (number of tree nodes evaluated)

– If the number is above a threshold:

• Fire some of my nodes as independent objects to be mapped somewhere else

– Problem: you sometimes get a “Mother” object that just keeps firing lots of smaller objects

• Solution: above the threshold, split the rest of the work into two objects and fire them off.

Page 10: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Dynamic load balancing

• Centralized:– maintain top levels of tree on one processor

– serve requests for work on demand

• Variation:– hierarchical:

Page 11: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Fully Distributed strategies

• Keep track of neighbors• Diffusion/Gradient model• Neighborhood averaging• What topology to use:

– Machine’s

– Hypercube

– Denser?

Page 12: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Gradient model

• Misnomer:too broad a name• Actual strategy:

– Processor arranged in a topology

• (may be virtual, but the original purpose was to use real)

– Each processor (tries to) maintain an estimate of how far it is from an idle processor

– Idle processors have a distance of 0

– Other processors: periodically send their numbers to nbrs

• My distance = 1 + min(neighbor’s distance)

– If my distance is more than a neighbor’s, send some work to it

• Work will “flow” towards idle processor

Page 13: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Neighborhood averaging

• Assume virtual topology• Periodically send my own load (queuesize) to neighbors• Each processor:

– Calculate avergae load of its neighborhood

– If I am above average, send pieces of work to underloaded neighbors so as to equalize them

• Estimate of work:– Assume the same for each unit

– Use better estimate if known

Page 14: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Randomized strategies

• Random initial assignment:– As work is created, assign it to a PE

– Problems: no way to correct errors

– Each message goes across processor: communication overhead

• Random demand:– If I am idle, ask a randomly selected processor for work

– If I get a demand, send half of my nodes to the requestor

– Good theoretical properties

– In practice: somewhat high overhead

Page 15: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Using Global Average

• Carry out a periodic global averaging to decide the average load on all processors

• If I am above average:– send work “away”

– Alternatively, get a vector of overload via global averaging, and figure out whom to send what work

Page 16: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Using Global Loads

• Idea:– For even a moderately large number of processors, collecting

a vector of load on each PE is not much more expensive than the collecting the total (per message cost dominates)

– How can we use this vector without creating serial bottleneck?

– Each processor know if it is overloaded compared with avg.

• Also knows which Pes are underloaded

• But need an algorithm that allows each processor to decide whom to send work to without global coordination, beyond getting the vector

– Insight: everyone has the same vector

– Also, assumption: there are sufficient fine-grained work pieces

Page 17: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Global vector scheme: contd

• Global algorithm: if we were able to make the decision centrally:

Receiver = nextUnderLoaded(0);

For (I=0, I<P; I++) {

if (load[I] > average) {

assign excess work to receiver, advancing receiver to the next as needed;

}

To make a distributed algorithm run the same algorithm on each processor! Except ignore any reassignment that doesn’t involve me.

Page 18: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Tree structured computations

• Examples: – Divide-and-conquer

– State-space search:

– Game-tree search

– Bidirectional search

– Branch-and-bound

• Issues:– Grainsize control

– Dynamic Load Balancing

– Prioritization

Page 19: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

State Space Search

• Definition:– start state, operators, goal-state (implicit/explicit)

– Either search for goal state or for a path leading to one

• If we are looking for all solutions:– same as divide and conquer, except no backward

communication

• Search for any solution: – Use the same algorithm as above?

– Problems: inconsistent and not monotonically increasing speedups,

Page 20: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

State Space Search

• Using priorities:– bitvector priorities

– Let root have 0 prio

– Prio of child:

– parent + my rank

p01 p02p03

p

Page 21: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Effect of Prioritization

• Let us consider shared memory machines for simplicity:– Search directed to left part of the tree

– Memory usage: let B be branching factor of tree, D its depth:

• O(D*B + P) nodes in the queue at a time

• With stack: O(D*P*B)

– Consistent and monotonic speedups

done

unexplored

active

Ideal Stack-stealing Prioritized

Page 22: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Need prioritized load balancing

• On non shared memory machines?• Centralized solution:

– Memory bottleneck too!

• Fully distributed solutions:• Hierarchical solution:

– Token idea

Page 23: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Bidirectional Search

• Goal state is explicitly known and operators can be inverted– Sequential:

– Parallel?

Page 24: Dynamic Load Balancing Tree and Structured Computations CS433 Laxmikant Kale Spring 2001.

Game tree search

• Tricky problem:• alpha beta, negamax