Invent Episode 3: Tech Talk on Parallel Future

Click here to load reader

download Invent Episode 3: Tech Talk on Parallel Future

of 16

Transcript of Invent Episode 3: Tech Talk on Parallel Future

  • 1.Invent Show Tech Talk Series
    Parallel Future

2. Parallelism is Here.
12-Aug-10
Invent Show
2
In the words of Sun Microsystems researcher Guy Steele:
The bag of programming tricks that has served us so well for the last 50 years is the wrong way to think going forward and must be thrown out.
In the words of famous Berkeley Professor Dave Patterson:
We desperately need new approach to hardware and software based on parallelism since industry has bet its future that parallelism works
3. The Paradigm Shift What Caused It?
12-Aug-10
Invent Show
3
Moores Law:
The density of transistors on a chip doubles every 18 months, for the same cost.
Now failed
We have reached a limit in reducing the transistor size Power Wall
Memory bandwidth is now an issue Memory Wall
Set of problems we can solve with a single computer is not going to get any larger ILP Wall
Solution:
Parallel computing multicores
Distributed computing data centers (Google, Facebook, Yahoo)
4. So What is the Difference?
Good sequential code
Good Parallel Code
12-Aug-10
Invent Show
4
Minimizes total number of operations.
Minimizes space usage.
Stresses linear problem decomposition.
Performs redundant operations.
Requires extra space.
Requires multiway problem decomposition.
5. Basics
12-Aug-10
Invent Show
5
Not all code can be parallelized
Fibonacci function: Fk+2= Fk+ Fk+1
But most of the computations can be parallelized
Large amount of consistent data to be processed with no dependencies
6. Basic Model Master/Worker Model (1/2)
12-Aug-10
Invent Show
6
Consider a huge array that can be broken into sub-arrays
7. Basic Model Master/Worker Model (2/2)
12-Aug-10
Invent Show
7
MASTER
Initializes the array and splits it up according to the number of WORKERS
Sends each WORKER its subarray
Receives the results from each WORKER
WORKER
Receives the subarray from the MASTER
Performs processing on the subarray
Returns results to MASTER
8. MapReduce
12-Aug-10
Invent Show
8
Simple data-parallel programming model designed for scalability and fault-tolerance
Pioneered by Google
Processes 20 petabytes of data per day
Popularized by open-source Hadoop project
Used at Yahoo!, Facebook, Amazon,
9. What is MapReduce used for? (1/2)
12-Aug-10
Invent Show
9
At Google:
Index construction for Google Search
Article clustering for Google News
Statistical machine translation
At Yahoo!:
Web map powering Yahoo! Search
Spam detection for Yahoo! Mail
At Facebook:
Data mining
Ad optimization
Spam detection
10. What is MapReduce used for? (2/2)
12-Aug-10
Invent Show
10
In research:
Astronomical image analysis (Washington)
Bioinformatics (Maryland)
Analyzing Wikipedia conflicts (PARC)
Natural language processing (CMU)
Particle physics (Nebraska)
Ocean climate simulation (Washington)
VisionerBOT our custom Web crawler
11. MapReduce Programming Model
12-Aug-10
Invent Show
11
Data type: key-value records
Map function:
(Kin, Vin) list(Kinter, Vinter)
Reduce function:
(Kinter, list(Vinter)) list(Kout, Vout)
12. Example: Word Count
12-Aug-10
Invent Show
12
def mapper(line):
foreach word in line.split():
output(word, 1)
def reducer(key, values):
output(key, sum(values))
13. Word Count Execution
12-Aug-10
Invent Show
13
Reduce
Output
Input
Map
Shuffle & Sort
the, 1
brown, 1
fox, 1
the quick
brown fox
brown, 2
fox, 2
how, 1
now, 1
the, 3
Map
Reduce
the, 1
fox, 1
the, 1
the fox ate
the mouse
Map
quick, 1
how, 1
now, 1
brown, 1
ate, 1
cow, 1
mouse, 1
quick, 1
ate, 1
mouse, 1
Reduce
how now
brown cow
Map
cow, 1
14. Example: VisionerBot Web Crawler
12-Aug-10
Database and Multimedia Lab.
14
15. MapReduce Execution Details
12-Aug-10
Invent Show
15
Single master controls job execution on multiple slaves
There could be hierarchy of masters under the control of absolute master
Mappers are preferably placed near to each other in order to minimize network delay
There should be checkpoints to make sure recovery process if some operation gets crashed
16. 12-Aug-10
Invent Show
16
QUESTIONS AND FEEDBACK