Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.

8
Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users

Transcript of Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.

Page 1: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.

Apache Hadoop MapReduce

What is it ? Why use it ? How does it work Some examples Big users

Page 2: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.

MapReduce – What is it ?

Processing engine of Hadoop Developers create Map and Reduce jobs Used for big data batch processing Parallel processing of huge data volumes Fault tolerant Scalable

Page 3: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.

MapReduce – Why use it ?

Your data in Terabyte / Petabyte range You have huge I/O Hadoop framework takes care of

Job and task managementFailuresStorageReplication You just write Map and Reduce jobs

Page 4: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.

MapReduce – How does it work ?

Take word counting as an example, something that Google does all of the time.

Page 5: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.

MapReduce – How does it work ?

Input data split into shards Split data mapped to key,value pairs i.e. Bear,1 Mapped data shuffled/sorted by key i.e. Bear Sorted data reduced i.e. Bear, 2 Final data stored on HDFS There might be extra map layer before shuffle JobTracker controls all tasks in job TaskTracker controls map and reduce

Page 6: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.

MapReduce - Some examples

A visual example with colours to show you the cycleSplit -> Map -> Shuffle -> Reduce

Page 7: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.

MapReduce - Some examples

A visual example of MapReduce with job and task trackers added to individual map and reduce jobs.

Page 8: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.

Hadoop MapReduce – Big users

UsersFacebook Yahoo Amazon Ebay