Parallel Patterns Reduce & Scan
description
Transcript of Parallel Patterns Reduce & Scan
Parallel Patterns - Reduce & Scan 1
PARALLEL PATTERNSREDUCE & SCAN
6/16/2010
Parallel Patterns - Reduce & Scan 2
Programming Patterns For Parallelism• Some patterns repeat in many different contexts• e.g. Search an element in an array
• Identifying such patterns important • Solve a problem once and reuse the solution• Split a hard problem into individual problems• Helps define interfaces
6/16/2010
Parallel Patterns - Reduce & Scan 3
We Have Already Seen Some Patterns
6/16/2010
Parallel Patterns - Reduce & Scan 4
We Have Already Seen Some Patterns• Divide and Conquer• Split a problem into n sub problems• Recursively solve the sub problems• And merge the solution
• Data Parallelism• Apply the same function to all elements in a collection, array• Parallel.For, Parallel.ForEach• Also called as “map” in functional programming
6/16/2010
Parallel Patterns - Reduce & Scan 5
Map• Given a function f : (A) => B• A collection a: A[]• Generates a collection b: B[], where B[i] = f( A[i] )
• Parallel.For, Paralle.ForEach• Where each loop iteration is independent
6/16/2010
f f f f f f f f
A
B
Parallel Patterns - Reduce & Scan 6
Reduce And Scan• In practice, parallel loops have to work together to
generate an answer• Reduce and Scan patterns capture common cases of
processing results of Map
6/16/2010
Parallel Patterns - Reduce & Scan 7
Reduce And Scan• In practice, parallel loops have to work together to
generate an answer• Reduce and Scan patterns capture common cases of
processing results of Map
• Note: Map and Reduce are similar to but not the same as MapReduce• MapReduce is a framework for distributed computing
6/16/2010
Parallel Patterns - Reduce & Scan 8
Reduce• Given a function f: (A, B) => B• A collection a: A[]• An initial value b0: B• Generate a final value b: B• Where b = f(A[n-1], … f(A[1], f(A[0], b0)) )
6/16/2010
f f f f f f f fb0b
A
Parallel Patterns - Reduce & Scan 9
Reduce• Given a function f: (A, B) => B• A collection a: A[]• An initial value b0: B• Generate a final value b: B• Where b = f(A[n-1], … f(A[1], f(A[0], b0)) )
• Only consider where A and B are the same type
6/16/2010
f f f f f f f fb0b
A
Parallel Patterns - Reduce & Scan 10
Reduce
6/16/2010
f f f f f f f fb0b
A
B acc = b_0;for( i = 0; i < n; i++ ) { acc = f( a[i], acc );}b = acc;
Parallel Patterns - Reduce & Scan 11
Associativity of the Reduce function• Reduce is parallelizable if f is associative
f(a, f(b, c)) = f(f(a,b), c)
• E.g. Addition : (a + b) + c = a + (b + c)• Where + is integer addition (with modulo arithmetic)• But not when + is floating point addition
6/16/2010
Parallel Patterns - Reduce & Scan 12
Associativity of the Reduce function• Reduce is parallelizable if f is associative
f(a, f(b, c)) = f(f(a,b), c)
• E.g. Addition : (a + b) + c = a + (b + c)• Where + is integer addition (with modulo arithmetic)• But not when + is floating point addition
• Max, min, multiply, …• Set union, intersection,
6/16/2010
Parallel Patterns - Reduce & Scan 13
We can use Divide and Conquer• Reduce(f, A[1…n], b_0)
= f ( Reduce(f, A[1..n/2], b_0), Reduce(f, A[n/2+1…n], I) ) where I is the identity element of f
6/16/2010
f f f f f f f fb0 b
A
I f
Parallel Patterns - Reduce & Scan 14
Implementation Optimizations• Switch to sequential Reduce for the base k elements• Do k way splits instead of two way splits
• Maintain a thread-local accumulated value• A task updates the value of the thread it executes in
6/16/2010
Parallel Patterns - Reduce & Scan 15
Implementation Optimizations• Switch to sequential Reduce for the base k elements• Do k way splits instead of two way splits
• Maintain a thread-local accumulated value• A task updates the value of the thread it executes in• Requires that the reduce function is also commutative
f(a, b) = f(b, a)
6/16/2010
Parallel Patterns - Reduce & Scan 16
Implementation Optimizations• Switch to sequential Reduce for the base k elements• Do k way splits instead of two way splits
• Maintain a thread-local accumulated value• A task updates the value of the thread it executes in• Requires that the reduce function is also commutative
f(a, b) = f(b, a)• Thread local values are then merged in a separate pass
6/16/2010
Parallel Patterns - Reduce & Scan 17
Scan• Given a function f: (A, B) => B• A collection a: A[]• An initial value b0: B• Generate a collection b: B[]• Where b[i] = f(A[i-1], … f(A[1], f(A[0], b0)) )
6/16/2010
f f f f f f f fb0
A
Parallel Patterns - Reduce & Scan 18
Scan
6/16/2010
f f f f f f f fb0
A
B acc = b_0;for( i = 0; i < n; i++ ) { acc = f( a[i], acc );}
Parallel Patterns - Reduce & Scan 19
Scan is Efficiently Parallelizable• When f is associative
6/16/2010
Parallel Patterns - Reduce & Scan 20
Scan is Efficiently Parallelizable• When f is associative• Scan(f, A[1..n], b_0) = Scan(f, A[1..n/2], b_0), Scan(f, A[n/2+1…n], ____)
6/16/2010
f f f f f f f fb0
A
?
Parallel Patterns - Reduce & Scan 21
Scan is Efficiently Parallelizable• When f is associative• Scan(f, A[1..n], b_0) = Scan(f, A[1..n/2], b_0), Scan(f, A[n/2+1…n], Reduce(f, A[1..n/2], b_0))
6/16/2010
f f f f f f f fb0
A
?
Parallel Patterns - Reduce & Scan 22
Scan is useful in many places• Radix Sort • Ray Tracing• …
6/16/2010
Parallel Patterns - Reduce & Scan 23
Scan is useful in many places• Radix Sort ( )• Ray Tracing• …
6/16/2010
Parallel Patterns - Reduce & Scan 24
Computing Line of Sight• Given x1, … xn with altitudes a[1],…a[n]• Which of the points are visible from x0
6/16/2010
Parallel Patterns - Reduce & Scan 25
Computing Line of Sight• Given x0, … xn with altitudes alt[0],…alt[n]• Which of the points are visible from x0
• angle[i] = arctan( (alt[i] – alt[0]) / i )
• xi is visible from x0 if all points between them have lesser angle than angle[i]
6/16/2010
Parallel Patterns - Reduce & Scan 26
Solution
6/16/2010
Parallel Patterns - Reduce & Scan 27
Radix Sort
5 = 1017 = 1112 = 0104 = 1005 = 1013 = 0111 = 001
6/16/2010
Parallel Patterns - Reduce & Scan 28
Radix Sort
5 = 1017 = 1112 = 0104 = 1005 = 1013 = 0111 = 001
6/16/2010
2 = 0104 = 1005 = 1017 = 1115 = 1013 = 0111 = 001
Parallel Patterns - Reduce & Scan 29
Radix Sort
5 = 1017 = 1112 = 0104 = 1005 = 1013 = 0111 = 001
6/16/2010
2 = 0104 = 1005 = 1017 = 1115 = 1013 = 0111 = 001
4 = 1005 = 1015 = 1011 = 0012 = 0107 = 1113 = 011
Parallel Patterns - Reduce & Scan 30
Radix Sort
5 = 1017 = 1112 = 0104 = 1005 = 1013 = 0111 = 001
6/16/2010
2 = 0104 = 1005 = 1017 = 1115 = 1013 = 0111 = 001
4 = 1005 = 1015 = 1011 = 0012 = 0107 = 1113 = 011
1 = 0012 = 0103 = 0114 = 1005 = 1015 = 1017 = 111
Parallel Patterns - Reduce & Scan 31
Basic Primitive: Pack• Given an array A and an array F of flags• A = [5 7 2 4 5 3 1]• F = [1 1 0 0 1 1 1]
• Pack all elements with flag = 0 before elements with flag = 1• A’ = [2 4 5 7 5 3 1]
6/16/2010
Parallel Patterns - Reduce & Scan 32
Solution
6/16/2010
Parallel Patterns - Reduce & Scan 33
Other Applications of Scan• Radix Sort• Computing Line of Sight• Adding multi-precision numbers• Quick Sort• To search for regular expressions• Parallel grep
• …
6/16/2010
Parallel Patterns - Reduce & Scan 34
High Level Points• Minimize dependence between parallel loops• Unintended dependences = data races• Next lecture
• Carefully analyze remaining dependences• Use Reduce and Scan patterns where applicable
6/16/2010