Hadoop and MapReduce

HADOOPFramework and Applications

Prepared by: TEAM HADOOP slide1/22

CONTENTS WHY HADOOP?

INTRODUCTION TO MapReduce

Prepared by: TEAM HADOOP slide 2/22

WHAT?“... to create building blocks for programmers who just happen to have lots of data to store, lots of data to analyze, or lots of machines to coordinate, and who don’t have the time, the skill, or the inclination to become distributed systems experts to build the infrastructure to handle it.” -Tom White Source: Hadoop: The Definitive Guide

WHAT? Hadoop contains many subprojects: Hadoop Common Chukwa HBase ZooKeeper Pig Zombie Hive MapReduce

We will focus on MapReduce

WHO & WHEN? Pre-2004 : Cutting and Cafarella develop

open source projects for web-scale indexing, crawling and search.

WHO & WHEN? 2004: Jeffrey Dean and Sanjay

Ghemawat introduce map reduce model used internally at Google.

WHO & WHEN? 2006: Hadoop becomes official Apache

project, Cutting joins Yahoo!Yahoo adopts Hadoop.

TRENDS

WHO USES IT?

Roughly how long to read 1TB from a commodity hard disk?

Around 4 hours

62 seconds…

WITH HADOOP..

INTRODUCTION TO MapReduce

"Break large problem into smaller parts, solve in parallel, combine results."

Typical scenario How many times is the word ‘IT’

present? You’ll probably count but in a 30k paged document, can you??

Map Reduce Typical Illustration

Map Reduce paradigm

Shuffle/SortReduce

Output

Map Reduce paradigm Map: transforms input record to

intermediate (key, value) pair

Map Reduce paradigm Reduce: transforms all records for given

key to final output.

Map reduce principles

Move code to data (local

computation)

Allow programs to scale

transparently w.r.t size of input

Abstract away fault tolerance, synchronization,

Implementation: Hardware

Prepared by: TEAM HADOOP sroy choudhury7@gmail.com slide 19/22

Map Reduce: strengths

Batch, offline jobs

Write-once, read-many across full data set

Usually, though not always, simple computations

I/O bound by disk/network bandwidth

What it’s not!

What it’s not:

High-performance parallel computing, e.g. MPI

Low-latency random access relational database

Always the right solution

THANK YOU!

QUESTIONS?

Hadoop and MapReduce

Education

Transcript of Hadoop and MapReduce

MapReduce & Hadoop IIcslui/CMSC5702/mapreduce_hadoop2.pdf · MapReduce & Hadoop II ... MapReduce & Hadoop MapReduce Recap ... example, the combiners aggregate term counts across the

Hadoop And Pig And MapReduce

Hadoop: Beyond MapReduce

Introduction to MapReduce and Hadoop

A Micro-Benchmark Suite for Evaluating Hadoop MapReduce …...Hadoop MapReduce 5 Performance of Hadoop MapReduce is influenced by many factors • Network configuration of cluster

Hadoop MapReduce - 123seminarsonly.com · Hadoop MapReduce Felipe Meneses Besson IME-USP, Brazil July 7, 2010

Hadoop: Beyond MapReduce

Hadoop MapReduce joins

Hadoop MapReduce

MapReduce Programming with Apache Hadoop - DSTdst.lbl.gov/ACSDownloads/kjackson/downloads/Hadoop-HDFS8-12pm.… · MapReduce Programming with Apache Hadoop Viraj Bhat ... (hadoop,

Hadoop installation, Configuration, and Mapreduce program

Hadoop, HDFS, MapReduce and Pig

Mapreduce and Hadoop Introduce Mapreduce and Hadoop Dean, J. and Ghemawat, S. 2008. MapReduce: simplified data processing on large clusters. Communication.

MapReduce)and)Hadoop - Indian Statistical Institute · MapReduce)and)Hadoop) Debapriyo Majumdar ... – MapReduce would fail, ... For example matrix A is represented by the relation

Parallel video transcoding using Hadoop MapReduce · 06-01-2017 · 3.2 The distributed video transcoding using Hadoop MapReduce. Distributed video transcoding based on Hadoop MapReduce

Processing with What is MapReduce? Hadoop/MapReduce ...

Überblick Hadoop Einführung HDFS und MapReduce - doag.org · Inhalt Seite 3 1 Apache Hadoop 2 Hadoop Distributed File System (HDFS) 3 MapReduce Überblick Hadoop 4 MapReduce im

Introduction to MapReduce and Hadoop

Hadoop, MapReduce and R = RHadoop

Hadoop and MapReduce