Hcj 2013-01-21

55
1 ©MapR Technologies - Confidential The Power of Hadoop to Transform Business

description

What is the future of Hadoop? What is the new future of Hadoop? How is that different from the old one? Here is how I answered these questions at the winter Hadoop Conference of Japan 2013.

Transcript of Hcj 2013-01-21

Page 1: Hcj 2013-01-21

1©MapR Technologies - Confidential

The Power of Hadoop to Transform Business

Page 2: Hcj 2013-01-21

2©MapR Technologies - Confidential

My Background

University, Startups– Aptex, MusicMatch, ID Analytics, Veoh– big data since before it was big

Open source– even before the internet– Apache Hadoop, Mahout, Zookeeper, Drill– bought the beer at first HUG

MapR Founding member of Apache Drill

Page 3: Hcj 2013-01-21

3©MapR Technologies - Confidential

MapR Technologies

Silicon Valley Startup– Top investors– Top technical and management team• Google, Microsoft, EMC, NetApp, Oracle

Enterprise quality distribution for Hadoop

Many extensions to basic Hadoop function Strong supporter of Apache Drill

Page 4: Hcj 2013-01-21

4©MapR Technologies - Confidential

Philosophy First

What is History?

Page 5: Hcj 2013-01-21

5©MapR Technologies - Confidential

The study of the past

(what came before now)

Page 6: Hcj 2013-01-21

6©MapR Technologies - Confidential

What is the future?

(it comes after now)

Page 7: Hcj 2013-01-21

7©MapR Technologies - Confidential

Page 8: Hcj 2013-01-21

8©MapR Technologies - Confidential

Page 9: Hcj 2013-01-21

9©MapR Technologies - Confidential

Page 10: Hcj 2013-01-21

10©MapR Technologies - Confidential

But the future also has a past!

Page 11: Hcj 2013-01-21

11©MapR Technologies - Confidential

Do you remember the future?

Page 12: Hcj 2013-01-21

12©MapR Technologies - Confidential

Page 13: Hcj 2013-01-21

13©MapR Technologies - Confidential

Page 14: Hcj 2013-01-21

14©MapR Technologies - Confidential

Page 15: Hcj 2013-01-21

15©MapR Technologies - Confidential

Page 16: Hcj 2013-01-21

16©MapR Technologies - Confidential

Page 17: Hcj 2013-01-21

17©MapR Technologies - Confidential

Some things

turned out as

expected

Page 18: Hcj 2013-01-21

19©MapR Technologies - Confidential

Many things are different!

Page 19: Hcj 2013-01-21

20©MapR Technologies - Confidential

Hadoop has a history

Page 20: Hcj 2013-01-21

21©MapR Technologies - Confidential

Hadoop also has a

future

Page 21: Hcj 2013-01-21

22©MapR Technologies - Confidential

The Old Future of Hadoop

Map-reduce and HDFS– more and more, but not really different

Eco-system additions– Simpler programming (Hive and Pig)– Key-value store– Ad hoc query

Stands apart from other computing– Required by HDFS and other limitations

Page 22: Hcj 2013-01-21

23©MapR Technologies - Confidential

The New Future of Hadoop

Real-time processing– Combines real-time and long-time

Integration with traditional IT– No need to stand apart

Integration with new technologies– Solr, Node.js, Twisted all should interface directly

Fast and flexible computation– Drill logical plan language

Page 23: Hcj 2013-01-21

24©MapR Technologies - Confidential

Example #1Search Abuse

Page 24: Hcj 2013-01-21

25©MapR Technologies - Confidential

History matrix

One row per user

One column per thing

Page 25: Hcj 2013-01-21

26©MapR Technologies - Confidential

Recommendation based on cooccurrence

Cooccurrence gives item-item mapping

One row and column per thing

Page 26: Hcj 2013-01-21

27©MapR Technologies - Confidential

Cooccurrence matrix can also be implemented as a search index

Page 27: Hcj 2013-01-21

28©MapR Technologies - Confidential

SolRIndexerSolR

IndexerSolrindexing

Cooccurrence(Mahout)

Item meta-data

Indexshards

Complete history

Page 28: Hcj 2013-01-21

29©MapR Technologies - Confidential

SolRIndexerSolR

IndexerSolrsearchWeb tier

Item meta-data

Indexshards

User history

Page 29: Hcj 2013-01-21

30©MapR Technologies - Confidential

Objective Results

At a very large credit card company

History is all transactions, all web interaction

Processing time cut from 20 hours per day to 3

Recommendation engine load time decreased from 8 hours to 3 minutes

Page 30: Hcj 2013-01-21

31©MapR Technologies - Confidential

Example #2Web

Technology

Page 31: Hcj 2013-01-21

32©MapR Technologies - Confidential

Fast analysis(Storm)

Analytic output

Real-timedata

Raw logs

Page 32: Hcj 2013-01-21

33©MapR Technologies - Confidential

Large analysis(map-reduce)

Analytic output Raw logs

Page 33: Hcj 2013-01-21

34©MapR Technologies - Confidential

Presentation tier (d3 + node.js)

Analytic output

Browser query

Raw logs

Page 34: Hcj 2013-01-21

35©MapR Technologies - Confidential

Objective Results

Real-time + long-time analysis is seamless

Web tier can be rooted directly on Hadoop cluster

No need to move data

Page 35: Hcj 2013-01-21

36©MapR Technologies - Confidential

Example #3Apache Drill

Page 36: Hcj 2013-01-21

37©MapR Technologies - Confidential

Big Data Processing – Hadoop

Batch processing

Query runtime Minutes to hours

Data volume TBs to PBs

Programming model

MapReduce

Users Developers

Google project MapReduce

Open source project

Hadoop MapReduce

Page 37: Hcj 2013-01-21

38©MapR Technologies - Confidential

Big Data Processing – Hadoop and Storm

Batch processing Stream processing

Query runtime Minutes to hours Never-ending

Data volume TBs to PBs Continuous stream

Programming model

MapReduce DAG (pre-programmed)

Users Developers Developers

Google project MapReduce

Open source project

Hadoop MapReduce

Storm or Apache S4

Page 38: Hcj 2013-01-21

39©MapR Technologies - Confidential

Big Data Processing – The missing part

Batch processing Interactive analysis Stream processing

Query runtime Minutes to hours Never-ending

Data volume TBs to PBs Continuous stream

Programming model

MapReduce DAG (pre-programmed)

Users Developers Developers

Google project MapReduce

Open source project

Hadoop MapReduce

Storm and S4

Page 39: Hcj 2013-01-21

40©MapR Technologies - Confidential

Big Data Processing – The missing part

Batch processing Interactive analysis Stream processing

Query runtime Minutes to hours Milliseconds to minutes

Never-ending

Data volume TBs to PBs GBs to PBs Continuous stream

Programming model

MapReduce Queries(ad hoc)

DAG (pre-programmed)

Users Developers Analysts and developers

Developers

Google project MapReduce

Open source project

Hadoop MapReduce

Storm and S4

Page 40: Hcj 2013-01-21

41©MapR Technologies - Confidential

Big Data Processing

Batch processing Interactive analysis Stream processing

Query runtime Minutes to hours Milliseconds to minutes

Never-ending

Data volume TBs to PBs GBs to PBs Continuous stream

Programming model

MapReduce Queries DAG

Users Developers Analysts and developers

Developers

Google project MapReduce Dremel

Open source project

Hadoop MapReduce

Storm and S4

Page 41: Hcj 2013-01-21

42©MapR Technologies - Confidential

Big Data Processing

Batch processing Interactive analysis Stream processing

Query runtime Minutes to hours Milliseconds to minutes

Never-ending

Data volume TBs to PBs GBs to PBs Continuous stream

Programming model

MapReduce Queries DAG

Users Developers Analysts and developers

Developers

Google project MapReduce Dremel

Open source project

Hadoop MapReduce

Storm and S4

Apache Drill

Page 42: Hcj 2013-01-21

43©MapR Technologies - Confidential

Design Principles

Flexible• Pluggable query languages• Extensible execution engine• Pluggable data formats

• Column-based and row-based• Schema and schema-less

• Pluggable data sources

Easy• Unzip and run• Zero configuration• Reverse DNS not needed• IP addresses can change• Clear and concise log messages

Dependable• No SPOF• Instant recovery from crashes

Fast• C/C++ core with Java support

• Google C++ style guide• Min latency and max throughput

(limited only by hardware)

Page 43: Hcj 2013-01-21

44©MapR Technologies - Confidential

Simple Architecture

Page 44: Hcj 2013-01-21

45©MapR Technologies - Confidential

Standard Interfaces

Page 45: Hcj 2013-01-21

46©MapR Technologies - Confidential

query:[ { op:"sequence", do:[ { op: "scan", memo: "initial_scan", ref: "donuts", source: "local-logs", selection: {data: "activity"} }, { op: "transform", transforms: [ { ref: "donuts.quanity", expr: "donuts.sales”} ] }, { op: "filter", expr: "donuts.ppu < 1.00" }, …

Logical Plan Syntax:

Page 46: Hcj 2013-01-21

47©MapR Technologies - Confidential

Logical Streaming Example

{ @id: <refnum>, op: “window-frame”, input: <input>, keys: [ <name>,... ], ref: <name>, before: 2, after: here}

0 1 2 3 4

0 0 10 1 2 1 2 32 3 4

Page 47: Hcj 2013-01-21

48©MapR Technologies - Confidential

Logical Plan

Page 48: Hcj 2013-01-21

49©MapR Technologies - Confidential

Execution Plan

Page 49: Hcj 2013-01-21

50©MapR Technologies - Confidential

Representing a DAG

{ @id: 19, op: "aggregate", input: 18, type: <simple|running|repeat>, keys: [<name>,...], aggregations: [ {ref: <name>, expr: <aggexpr> },... ]}

Page 50: Hcj 2013-01-21

51©MapR Technologies - Confidential

Non-SQL queries

Page 51: Hcj 2013-01-21

52©MapR Technologies - Confidential

Design Principles

Flexible• Pluggable query languages• Extensible execution engine• Pluggable data formats

• Column-based and row-based• Schema and schema-less

• Pluggable data sources

Easy• Unzip and run• Zero configuration• Reverse DNS not needed• IP addresses can change• Clear and concise log messages

Dependable• No SPOF• Instant recovery from crashes

Fast• C/C++ core with Java support

• Google C++ style guide• Min latency and max throughput

(limited only by hardware)

Page 52: Hcj 2013-01-21

53©MapR Technologies - Confidential

The future is not what we thought it would be

Page 53: Hcj 2013-01-21

54©MapR Technologies - Confidential

It is better!

Page 54: Hcj 2013-01-21

55©MapR Technologies - Confidential

Get Involved!

Tweet:#hcj13w#mapr

@ted_dunning

Page 55: Hcj 2013-01-21

56©MapR Technologies - Confidential

Get Involved!

Download these slides– http://www.mapr.com/company/events/hcj-01-21-2013

Join the Drill project– [email protected] – #apachedrill

Contact me:– [email protected][email protected]– @ted_dunning

Join MapR (in Japan!)– [email protected]