IoT and Big Data - Iot Asia 2014

49
© 2014 MapR Technologies 1 © 2014 MapR Technologies The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd , 2014

description

Presented at IoT Asia 2014 Workshop

Transcript of IoT and Big Data - Iot Asia 2014

Page 1: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 1© 2014 MapR Technologies

The Internet of Things and Big Data: IntroJohn Berns, Solutions Architect, APAC - MapR TechnologiesApril 22nd, 2014

Page 2: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 2

What This Is; What This Is Not• It’s not specific to IoT

– It’s not about any specific type of data or protocol– It’s not specific to any particular industry

• It’s about processing big data– IoT data can be big data– IoT might be the biggest data of the coming decade– But it’s just big data– Same strategies & technologies apply

Page 3: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 3

Page 4: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 4

Page 5: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 5

When Does Data Become “Big?”• When the size of the data, itself, becomes a problem• When the “old way” of processing data just doesn’t work

effectively• It’s “big” when we have to rethink:

– How we store that much data– How we move that much data– How we extract, load & transform that much data– How we explore and analyze that much data– How we process and get meaningful insights from that much data

Page 6: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 6

C’mon! What does that mean in size?• Not gigabytes• Most likely not a few terabytes• Possibly not 10’s of terabytes• Probably 100’s of terabytes• Definitely petabytes

Page 7: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 7

So How Do We Handle Big Data?• Distribute & parallelize!

Page 8: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 8

MPP Analytic Databases or Hadoop

Page 9: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 9

Big Data AnalyticsBridging classic & big data worlds

“Capture only what’s needed”

SQL performance and structure

Hadoop scale and flexibility

IT delivers a platform for storing, refining, and analyzing all data

sourcesBusiness explores data for questions worth answering

Big Data MethodMulti-structured & iterative analysis

IT structures the data to answer those questions

Business determines what questions to ask

Classic MethodStructured & Repeatable Analysis

“Capture in case it’s needed”

Page 10: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 10

Philosophical DifferencesTraditional Methods• More power • Summarize data• Transform and store• Pre-defined schema• Move data -> compute• Less data / more complex

algorithms

Big Data• More machines• Keep all data • Transform on demand• Flexible / no schema• Move compute -> data• Mode data / simple

algorithms

Page 11: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 11

answer = f(all data)• Save all raw data• Data immutability• Transform as needed• Result is based on the raw data

Page 12: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 12

Q & A@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies

Page 13: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 13© 2014 MapR Technologies

Iot and Big Data: Hadoop as a Data PlatformJohn Berns, Solutions Architect, APAC - MapR TechnologiesApril 22nd, 2014

Page 14: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 14

Hadoop: The Disruptive Technology at the Core of Big Data

Page 15: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 15

Forces of AdoptionHadoop TAM comes from disrupting enterprise data warehouse and storage spending

Data IT Budgets

• Gartner, "Forecast Analysis: Enterprise IT Spending by Vertical Industry Market, Worldwide, 2010-2016, 3Q12 Update.“• Wall Street Journal, “Financial Services Companies Firms See Results from Big Data Push”, Jan. 27, 2014

$9,000

$40,000

<$1,000

2013 ENTERPRISE STORAGE

IT BUDGETS GROWING AT 2.5%

2014 2015 2016 2017 DATABASE WAREHOUSE

DATA GROWING AT 40% $ PER TERABYTE

HADOOP

Page 16: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 16© 2014 MapR Technologies

Hadoop 101 (External Presentation)

http://www.slideshare.net/jfxberns/hadoop-101-v2

Page 17: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 17© 2014 MapR Technologies

Hadoop Hardware

Page 18: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 18

Typical Compute Node

• Two CPUs, each with 4-8 cores per CPU• 32-128 GB Memory• 6-24 hard disks• 2-4 10GB Network cards

Page 19: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 19© 2014 MapR Technologies

Hadoop Ecosystem

Page 20: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 20

Ecosystem of Projects Built of Hadoop

Page 21: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 21© 2014 MapR Technologies

SQL On Hadoop

Page 22: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 22

SQL on Hadoop• Generally data has no inherent “schema”• Schema is defined by user / interpreted from structure• Schema is applied during processing• One file can have many schemas applied• Works for many kinds of data—but not all

– Temperature sensor data? Sure– Video feeds? Not really

Page 23: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 23

Key Use Cases

• Exploratory analysis on large scale raw data

• Unknown value• No defined schema• Variety of data types

• Large-scale SQL queries on long history

• Well defined schema• Known value, but high cost in

existing systems

2Big Data Analysis Big Data Exploration

Page 24: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 24

What is Driving the Need for SQL-on-Hadoop?Organizations are looking for• Reuse existing tools and skills to unlock Hadoop data to broader

audience

• Analysis on new types of data

• More complete data analysis

• More up-to-date and real-time data analysis (not just “after the fact”)

Page 25: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 25

Drill 1.0 Hive 0.13 with Tez Impala 1.x Presto 0.56 Shark 0.8 Vertica

Latency Low Medium Low Low Medium Low

Files Yes (all Hive file formats)

Yes (all Hive file formats)

Yes (Parquet, Sequence, …)

Yes (RC, Sequence, Text)

Yes (all Hive file formats)

Yes (all Hive file formats)

HBase/M7 Yes Yes Various issues No Yes No

Schema Hive or schema-less

Hive Hive Hive Hive Proprietary or Hive

SQL support ANSI SQL HiveQL HiveQL (subset) ANSI SQL HiveQL ANSI SQL + advanced analytics

Client support ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC ODBC/JDBC, ADO.NET, …

Large joins Yes Yes No No No Yes

Nested data Yes Limited No Limited Limited Limited

Hive UDFs Yes Yes Limited No Yes No

Transactions No No No No No Yes

Optimizer Limited Limited Limited Limited Limited Yes

Concurrency Limited Limited Limited Limited Limited Yes

SQL on Hadoop: Many OptionsFlexibility to choose when to use which based on use case

Page 26: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 26

ENTERPRISE DATA HUB

MARKETINGANALYTICS

RISKANALYTICS

OPERATIONS INTELLIGENCE

• Multi-structured data staging & archive

• ETL / DW optimization• Mainframe optimization

• Data exploration

• Recommendation engines & targeting

• Ad optimization• Pricing analysis• Lead scoring

• Network security monitoring

• Security information & event management

• Fraudulent behavioral analysis

• Supply chain & logistics• System log analysis• Manufacturing quality assurance

• Preventative maintenance

• Sensor analysis

Proven Hadoop Production Success

Page 27: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 27© 2014 MapR Technologies

Other Tools & Frameworks of Note

Page 28: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 28

Pig

• Procedural Language• Loops, if-then statements

Page 29: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 29

• Map Reduce Framwork• Lingual: SQL-like operations• Pattern: Machine Learning Applications• Scalding: Cascading for Scala• Cascalog: Cascading for Clojure

Page 30: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 30

• Python, Scala and Java• Spark powers a stack of high-level tools including

– Shark for SQL, – MLlib for machine learning, – GraphX, and – Spark Streaming.

• You can combine these frameworks seamlessly in the same application.

Page 31: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 31

• Machine Learning / Predictive Analytics– Collaborative Filtering– Linear / Logistic Regression– Naïve Bayes– Random Forests– K-Mean Clustering– Canopy Clustering– Principal Component Analysis

Page 32: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 32

• Database on Hadoop• Highly scalable• Columnar – Flexible schema• Data source for Map Reduce and Spark jobs

Page 33: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 33

Q & A@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies

Page 34: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 34© 2014 MapR Technologies

Iot and Big Data: Architectures & Use CasesJohn Berns, Solutions Architect, APAC - MapR TechnologiesApril 22nd, 2014

Page 35: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 35© 2014 MapR Technologies

NoSQL

Page 36: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 36

NoSQL Databases• No-SQL or “Not only” SQL• Give up some of the functionality of traditional relational

databases for speed and scalability• Types

– Key-Value – Columnar– Document– Graph

• NoSQL databases favor flexible schemas

Page 37: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 37

HBase

Page 38: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 38© 2014 MapR Technologies

Queues

Page 39: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 39

Queues• Just like a queue at an amusement park • First-in-first out• Queues messages or events

Page 40: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 40

Message Queue

Page 41: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 41© 2014 MapR Technologies

Stream Processing

Page 42: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 42

Stream Processing• Handles data at high velocity• If Hadoop is the ocean, streams are the firehose• Processing in near real-time

Page 43: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 43

Storm

Page 44: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 44© 2014 MapR Technologies

Batch Processing

Page 45: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 45© 2014 MapR Technologies

Combination Architectures

Page 46: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 46

Lambda Architecture

Page 47: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 47

Complex Architectures Using Many Big Data Technologies

Page 48: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 48

Wanna Play?

• http://www.mapr.com/products/mapr-sandbox-hadoop

Page 49: IoT and Big Data - Iot Asia 2014

© 2014 MapR Technologies 49

Q & A@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies