Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.
-
Upload
godfrey-french -
Category
Documents
-
view
217 -
download
0
Transcript of Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.
![Page 1: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.](https://reader035.fdocuments.us/reader035/viewer/2022081809/56649e8f5503460f94b94051/html5/thumbnails/1.jpg)
Apache Hadoop MapReduce
What is it ? Why use it ? How does it work Some examples Big users
![Page 2: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.](https://reader035.fdocuments.us/reader035/viewer/2022081809/56649e8f5503460f94b94051/html5/thumbnails/2.jpg)
MapReduce – What is it ?
Processing engine of Hadoop Developers create Map and Reduce jobs Used for big data batch processing Parallel processing of huge data volumes Fault tolerant Scalable
![Page 3: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.](https://reader035.fdocuments.us/reader035/viewer/2022081809/56649e8f5503460f94b94051/html5/thumbnails/3.jpg)
MapReduce – Why use it ?
Your data in Terabyte / Petabyte range You have huge I/O Hadoop framework takes care of
Job and task managementFailuresStorageReplication You just write Map and Reduce jobs
![Page 4: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.](https://reader035.fdocuments.us/reader035/viewer/2022081809/56649e8f5503460f94b94051/html5/thumbnails/4.jpg)
MapReduce – How does it work ?
Take word counting as an example, something that Google does all of the time.
![Page 5: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.](https://reader035.fdocuments.us/reader035/viewer/2022081809/56649e8f5503460f94b94051/html5/thumbnails/5.jpg)
MapReduce – How does it work ?
Input data split into shards Split data mapped to key,value pairs i.e. Bear,1 Mapped data shuffled/sorted by key i.e. Bear Sorted data reduced i.e. Bear, 2 Final data stored on HDFS There might be extra map layer before shuffle JobTracker controls all tasks in job TaskTracker controls map and reduce
![Page 6: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.](https://reader035.fdocuments.us/reader035/viewer/2022081809/56649e8f5503460f94b94051/html5/thumbnails/6.jpg)
MapReduce - Some examples
A visual example with colours to show you the cycleSplit -> Map -> Shuffle -> Reduce
![Page 7: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.](https://reader035.fdocuments.us/reader035/viewer/2022081809/56649e8f5503460f94b94051/html5/thumbnails/7.jpg)
MapReduce - Some examples
A visual example of MapReduce with job and task trackers added to individual map and reduce jobs.
![Page 8: Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.](https://reader035.fdocuments.us/reader035/viewer/2022081809/56649e8f5503460f94b94051/html5/thumbnails/8.jpg)
Hadoop MapReduce – Big users
UsersFacebook Yahoo Amazon Ebay