Webinar - Big Data: Einführung in Hadoop und MapReduce

Hadoop & map-reduce

Speakers

Dr. Kathrin SpreyerBig Data Engineer

Patrick ThomaHead of Solution Development

Inevitable hadoop

2004: Google MapReduce paper

2006: Hadoop team around Doug Cutting at Yahoo!

2010/11: IBM’s Watson

2011/12: Hadoop connectors for Oracle products

Oct 2012: Microsoft (connectors f. Azure, HDInsights)

Oct 2012: SAP (cooperation w/ support companies)

Motivation

1. sample use case: logfile analytics @ 1&1

2. 80 TB/month to be processed

3. too slow on existing hardware

4. further scaling not possible -- or extremely expensive

Amazing performance improvement

Overview

1. Map-Reduce

2. HDFS

3. APIs

4. Cluster sizing

1. framework for distributed data processing

2. highly scalable: TBs and PBs

3. originated at Google

4. open-source implementation: Apache Hadoop

The big picture

1. too much data for one machine

2. processing speed

3. scaling out vs. scaling up

Photo by Flo P.

HDFS(hadoop distributed file system)

1. Map-Reduce

2. HDFS

3. APIs

4. Cluster sizing

1. Map-Reduce

2. HDFS

3. APIs

4. Cluster sizing

Basic map-reduce Apis

1. Java

2. C++ (Pipes)

3. Python (Dumbo)

4. streaming (any language)

Higher-level Apis

1. Apache Pig (data flow language)

2. Apache Hive (SQL dialect)

alternative: graphical ETL tools, e.g., Pentaho Data Integration

Cluster sizing

1. Map-Reduce

2. HDFS

3. APIs

4. Cluster sizing

Network topology

1. single data center

2. rack topology

3. bandwidth

Questions?

Contact:bigdata@inovex.de

Webinar - Big Data: Einführung in Hadoop und MapReduce

Documents

Transcript of Webinar - Big Data: Einführung in Hadoop und MapReduce

Mapreduce and Hadoop Introduce Mapreduce and Hadoop Dean, J. and Ghemawat, S. 2008. MapReduce: simplified data processing on large clusters. Communication.

Big Data - Hadoop/MapReduce

Parallel video transcoding using Hadoop MapReduce · 06-01-2017 · 3.2 The distributed video transcoding using Hadoop MapReduce. Distributed video transcoding based on Hadoop MapReduce

Hadoop Training #5: MapReduce Algorithm

Hadoop Mapreduce

MapReduce Improvements in MapR Hadoop

Hadoop hbase mapreduce

MapReduce & Hadoop IIcslui/CMSC5702/mapreduce_hadoop2.pdf · MapReduce & Hadoop II ... MapReduce & Hadoop MapReduce Recap ... example, the combiners aggregate term counts across the

MapReduce Programming With Apache Hadoop

Data Management in Large-Scale Distributed Systems - MapReduce … · Introduction to MapReduce The Hadoop Eco-System HDFS Hadoop MapReduce 4. MapReduce at Google Publication The

Hadoop MapReduce Fundamentals

CS-495/595 Big DataCS-495/595 Big Data:::: Exam #1Exam ...ccartled/Teaching/2015-Spring/Exams/001.pdf– Hadoop, Pig– Hadoop, Pig– Hadoop, Pig – MapReduce– MapReduce– MapReduce

Hadoop and MapReduce

Tutorial Hadoop HDFS MapReduce

Introduction to MapReduce & hadoop

Hadoop MapReduce

Real time hadoop + mapreduce intro

Hadoop Programming. Overview MapReduce Types Input Formats Output Formats Serialization Job g/apache/hadoop/mapreduce/package-

MapReduce with Hadoop

Hadoop: Beyond MapReduce