Problem-solving on large-scale clusters:   theory and applications

Post on 19-Mar-2016

43 views 1 download

description

Problem-solving on large-scale clusters:   theory and applications. Lecture 1: Introduction and Theoretical Background. Today’s Outline. Introductions Quiz Course Objective & Administrative Info fold and map : Theory. Introductions. Name + trivia. Quiz Time!. - PowerPoint PPT Presentation

Transcript of Problem-solving on large-scale clusters:   theory and applications

Problem-solving on large-scale clusters:  theory and applications Lecture 1: Introduction and

Theoretical Background

Today’s Outline• Introductions• Quiz• Course Objective & Administrative Info• fold and map: Theory

Introductions• Name + trivia

Quiz Time!• Not graded; helps us calibrate how difficult

to make this seminar• Okay (and encouraged!) to leave

questions blank

Course Outline• Introduction to parallel programming and

distributed system design– successfully decompose problems into map and

reduce stages – decide whether a problem can be solved with a

parallel algorithm, and evaluate its strengths and weaknesses

– understand the basic tradeoffs and major issues in distributed system design

– know the common pitfalls of distributed system design • This seminar is light on “facts” and “recipes”,

heavy on “tradeoffs”

Course Information (1 of 2)• Lecturers:

– Albert J. Wong– Hannah Tang

• Lab consultant:– Alden King

• Liasons:– John Zahorjan– Christophe Bisciglia

Course Information (2 of 2)• Textbook

– None; see online course readings• Webpage:

http://www.cs.washington.edu/cse490h• Mailing lists:

– Course discussion: cse490h@...

Warning: Theory Ahead!• Before we can talk about MapReduce, we

need to talk about the concepts on which it is founded:– Programming languages: fold and map– Distributed systems: data dependancies

Digression: Function Objects (1 of 3)

• A function object is a function that can be manipulated as an object– Sometimes referred to as a “functor”

• In Java, this is usually implemented with a class that has an execute() (or similarly named) method

class ReverseAlphaOrder implements Comparable {

public int Compare(Object o1, Object o2) {

if(o1 instanceof String && o2 instanceof String) {

return String(o1) >= String(o2);

}

}

String[] myStrings;

ReverseAlphaOrder rao;

Collections.sort(myStrings, rao);

Digression: Function Objects (2 of 3)

• Example: Inheriting from the Comparable interface to use Collections.sort()

The underlying idea is to pass the “greater than” operation to sort()

Digression: Function Objects (3 of 3)• In Java, methods that take function objects are

“higher-order functions”– Collections.sort() is a higher-order function

• Mathematically, a “higher order function” is a function which does at least one of the following:– Take one or more functions as input– Output a function

• Examples: – The derivative (from calculus)

d/dx (x3 + 2x) = 3x2 + 2

fold - Introduction• fold is a family of higher-order functions

that process a data structure and return a single value– Commonly, fold takes a function f and a list l, and recursively applies f to “combine” the elements of l

– The return value may be “complex”, e.g. a list• Example:

– fold (+) [1,2,4,8] -> ???– fold (/) [64,8,4,2] -> ???

fold - Directionality• Remember how we said fold was “a family of

functions”? – foldr (/) [64,8,4,2] -> 64 / (8 / (4/2)) -> 16– foldl (/) [64,8,4,2] -> ((64/8) / 4) / 2 -> 1

• “fold right” – recursively applies f over the right side of the list

• “fold left” – recursively applies f over the left side of the list

Right fold Left fold

648

4

÷

÷

2÷ 4

64 8

÷

÷

fold - Questions• Discussion questions:

– What should the base case return?•foldr (+) [] -> ???•foldr (/) [] -> ???

– Can a right fold be implemented as a loop (using tail recursion)? What about left fold?

• Enrichment questions:– What happens to a right fold when given an

infinite list? What about left fold?

fold - Formal Definition• fold takes a function and a list as its inputs –

but it can also take more values. – In particular, fold maintains context / state across

each invocation of f

-- If the list is empty, return the initial value ‘z’foldr f z [] = z -- If the list is not empty, calculate the result of folding the-- rest, and apply f to the first element and to that result.-- The context from previous invocations of f is implicitly -- passed to the current invocation of via foldrfoldr f z (x:xs) = f x (foldr f z xs)

What is the formal definition of foldl?

fold – An Intuition• fold “iterates” over a data structure, and

maintains one unit of state– At each iteration, f is invoked with the current

element and the current state– fold’s return value is the result of f’s final

invocation

map - Introduction• map is a higher-order function that

“transforms” each element in a sequence of elements– Commonly, map takes a function f and a

sequence s, and applies f to each element of s

• Example:– map square_root [1,4,9,16] -> ???

map’s Return Value• map returns a sequence

– The new sequence s’ is not necessarily the same size as s

– The elements of s’ do not necessarily have the same type as the elements of s

• Recall that the sum of N vectors was equal to the sum of their components:

• Let components() decompose a vector into its X and Y components

map’s Return Value – Example

a

ba+b

map components [ ] = , ,

), (,), (, ,= [ ( ) ] ???

, ,, , ,= [ ] ???

map - Questions• Enrichment questions:

– For what values of f and z will fold f z l = l? How can you modify f such that fold f z l = map f l?

– Bonus question: can you implement map in terms of fold?

– Visit foldl.com and foldr.com :)

map – Formal definition• map takes a function and a data structure

as its inputs

-- If the list is empty, there’s nothing to domap f [] = [] -- If the list is not empty, apply f to the first element and-- add the result to the mapping of f on all other elementsmap f (x:xs) = f x : map f xs

What is the complexity of map? What is its runtime?

Exercise (1 of 2)• Individually:

– Determine how these operations can be solved with a fold, a map, or some combination of fold and map:

• Given a list of vectors, add them to determine the resultant vector.

• Ray tracing a single ray– Ray tracing takes a list of rays that intersect the camera, and

traces their path back to their respective lightsources, even across their reflection over several surfaces

• Assuming you had access to a company’s monthly paystubs for all employees for an entire year, calculate how much annual income tax is owed per-person.

• Run-length encoding. – Run-length encoding takes a possibly-repetitive string and

rewrites it as a (value, frequency) pair, eg “aaa b ccccc dd” -> “a3 b c5 d2”.

• Find the smallest element in an array– Come up with some challenging problems yourself!

Exercise (2 of 2)• In small groups, compare your answers to

the above, and stump your team with the problems you came up with!