Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect...

74
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Henning Kropp Sr. Systems Architect EMEA

Transcript of Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect...

Page 1: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Henning KroppSr. Systems Architect EMEA

Page 2: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

About Me Henning Kropp

Sr. Systems Architect EMEA –Hortonworks

University Leipzig Graduate

5+ Years Hadoop Experience

2+ Years with Hortonworks

Werder Bremen & RB Leipzig Fan

Page 3: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Agenda

Hortonworks Overview

Data Trends

Open Enterprise Hadoop– (HDF)

– HDP

The Hadoop Ecosystem

Popular Use Cases

SQL on Hadoop

Page 4: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Our Mission:Power the Future of Datawith HDF and Enterprise Apache Hadoop

Who we are

June 2011: Original 24 architects, developers, operators of Hadoop from Yahoo!

June 2014: An enterprise software company with 420+ Employees

Oct 2015: Fastest software company to hit $100 M in revenue

Nov 2016: 1000+ customers with 1050+ Employees

Key Partners

Our model

Innovate and deliver Apache Hadoop as a complete enterprise data platform

completely in the open, backed by a world class support organization

Page 5: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Fastest growing Fortune 1000 customer base

Customer Momentum

• 300+ customers in seven quarters, growing at 75+/quarter

• Two thirds of customers come from F1000

Largest Cluster in North America

32,000 NodesLargest Cluster in Europe

1,000 Nodes

Some notable migrations include many of the early adopters of Hadoop:

Largest Known Cluster in APAC

400 Nodes

60+ customers migrated from other distributions

Page 6: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Key Highlights

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

Subscription Customers

2015 2016

1000+

Customer Growth in 2016

800+

Page 7: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

EMBRACE AN OPEN APPROACH

MASTER THE VALUE OF DATA

EVERY BUSINESS IS A DATA BUSINESS

Page 8: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

The DATA Ages

I N C R E A S I N G D A T A V A R I E T Y A N D C O M P L E X I T Y

USER GENERATED CONTENT

MOBILE WEB

SMS/MMS

SENTIMENT

EXTERNAL DEMOGRAPHICS

HD VIDEO

SPEECH TO TEXT

PRODUCT/SERVICE LOGS

SOCIAL NETWORK

BUSINESS DATA FEEDS

USER CLICK STREAM

WEB LOGS

OFFER HISTORY DYNAMIC PRICING

A/B TESTING

AFFILIATE NETWORKS

SEARCH MARKETING

BEHAVIORAL TARGETING

DYNAMIC FUNNELSPAYMENTRECORD

SUPPORT CONTACTS

CUSTOMER TOUCHESPURCHASE DETAIL

PURCHASERECORD

SEGMENTATIONOFFER DETAILS

P E TA B Y T E S

T E R A B Y T E S

G I G A B Y T E S

E X A B Y T E S

E R P

I O T / B I G D ATA

W E B

C R M

Page 9: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Threats

Existing data architectures make data inaccessible, incomplete, irrelevant, and expensive.

Page 10: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Opportunities

Apache™ Hadoop® transforms your business, making Big Data easily accessible for advanced analytic applications.

Page 11: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Determining how to get value from

big data

Obtaining skills and capablities

needed

Risk and governance

issues

Funding for big data-related

initiatives

Defining our strategy

Integrated multiple data sources

Integrating with existing infrastructure

Infrastructure and/or

architecture

Leadership or organizational

issues

Understanding what is "Big

Data"

Greater than 50%

30% to 49%

20% to 29%

Less than 19%

Challenges adopting big data according to Gartner

Page 12: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

A Connected Data Strategy Solves for All Data

HDF*DATA IN MOTION

HDP**DATA AT REST

* Hortoworks Data Flow

** Hortonworks Data Platform

Page 13: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

The Future of DataActionable Intelligence

STO

RA

GE

STO

RA

GE

GROUP 2GROUP 1

GROUP 4GROUP 3

D ATA AT R E S T

INTERNETOF

ANYTHING

D ATA I N

M O T I O N

Any and all data from sensors, machines, geolocation, clicks,

files, social

HDFDATA I N MOTI ON

HDPDATA AT REST

Page 14: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Open

Enterprise

Hadoop

Open

Interoperable

Central

Ready

Page 15: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Proprietary IPThis is unique intellectual property that is important to keep proprietary and protected for business reasons.

Open source, closed/controlled communityThis is about intellectual property that you are OK sharing with others but still want to retain strict controls over. Ex. public Github repos where your engineers are the only ones who can checkin code / approve pull requests. Others may “Github fork” the code, however, to use for their own purposes.

Open source, open communityThis is about intellectual property that you are OK sharing with others and want to participate in and/or drive forward as part of a broader community that has a defined and consistent governance model. Ex. Apache Software Foundation projects are an example of this.

How To Think About “Open”

Page 16: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

NEW SOFTWARE DEVELOPER GRADS

• GitHub houses their public “portfolio”

• They want companies that embrace open source

Open Source Is The Norm: The Github Generation

Page 17: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hortonworks Data Platform Is Genuinely Open

Eliminates Risk– of vendor lock-in by delivering 100%

Apache open source technology

Maximizes Community Innovation– with hundreds of developers across

hundreds of companies

– Integrates Seamlessly

– through committed co-engineering partnerships with other leading technologies

M A X I M U M C O M M U N I T Y I N N O VAT I O N

T H E

I N N O VAT I O N

A D VA N TA G E

P R O P R I E T A R Y

H A D O O P

T I M E

IN

NO

VA

TI

ON

O P E N

C O M M U N I T Y

Page 18: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Open

Enterprise

Hadoop

Open

Interoperable

Central

Ready

Page 19: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Centralized Platform with YARN-Based Architecture

Centralized Platform

for operations, governance and security

Diverse Applications

run simultaneously on a single cluster

Maximum Data Ingest

including existing and new sources, regardless of raw format

Shared Big Data Assets

across business groups, functions and users

YARND A T A O P E R A T I N G S Y S T E M

OPERATIONS SECURITY

GOVERNANCE

ST

OR

AG

E

ST

OR

AG

E

Machine

LearningBatch

StreamingInteractive

Search

Page 20: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Open

Enterprise

Hadoop

Open

Interoperable

Central

Ready

Page 21: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Offering You the Most Flexibility

A N Y D ATA

Existing and new datasets

A N Y A P P L I C AT I O N

Multiple engines for data analysis

A N Y W H E R E

Complete range of deployment options

Batch

Interactive

Search

Streaming

Machine Learning

Click-

streamSensor

Social Mobile

Geo-

Location

Server

Log Linux Windows

CloudOn-Premise

Page 22: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Open

Enterprise

Hadoop

Open

Interoperable

Central

Ready

Page 23: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Provides Consistent Operations

Centralized

management and monitoring of Hadoop clusters

Automated Provisioning

either on-premises or in the cloud with the Cloudbreak API for clusters in minutes

Managed Services

for high availability and consistent lifecycle controls, with dashboards and alerts

YARND A T A O P E R A T I N G S Y S T E M

OPERATIONS SECURITY

GOVERNANCE

ST

OR

AG

E

ST

OR

AG

E

Machine

LearningBatch

StreamingInteractive

Search

OPERATIONS

Page 24: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Enables Trusted Governance

Data Management

along the entire data lifecycle

Modeling with Metadata

enables comprehensive data lineage through a hybrid approach

Interoperable Solutions

across the Hadoop ecosystem, through a common metadata store

YARND A T A O P E R A T I N G S Y S T E M

OPERATIONS SECURITY

GOVERNANCE

ST

OR

AG

E

ST

OR

AG

E

Machine

LearningBatch

StreamingInteractive

Search

GOVERNANCE

Page 25: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Ensures Comprehensive Security

Comprehensive Security

through a platform approach

Encrypted Data

at rest and in motion

Centralized Administration

of security policies and user authentication

Fine-Grain Authorization

for data access control

YARND A T A O P E R A T I N G S Y S T E M

OPERATIONS SECURITY

GOVERNANCE

ST

OR

AG

E

ST

OR

AG

E

Machine

LearningBatch

StreamingInteractive

Search

SECURITY

Page 26: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Agile Analytics with Enterprise Spark at Scale

Powering Agile Analytics

via data science notebooks and automation for most common analytics (including geospatial and entity resolution)

Seamless Data Access

across as many data types as possible

Unmatched Economics

Combining in-memory processing speed with HDP’s cost efficiencies at scale

Ready for the Enterprise

with robust security, governance and operations coordinated centrally by Apache Hadoop and YARN

OPERATIONS SECURITY

GOVERNANCE

STOR

AG

E

STO

RA

GE

S PA R K O N YA R N

Page 27: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Fast SQL with Apache Hive

Pluggable Architecture

supports Apache Hive, Pivotal HAWQ and other leading SQL engines

Familiar SQL Query Semantics

enable transactions and SQL:2011 Analytics for rich reporting

Unprecedented Speed at Extreme Scale

returns query results in interactive time, even as data sets grow to petabytes

OPERATIONS SECURITY

GOVERNANCE

STOR

AG

E

STO

RA

GE

H I V E O N YA R N

Page 28: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

The Hadoop Ecosystem

Page 29: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

What is Apache Hadoop?

Allows for the distributed processing of large data sets across clusters of computers using simple programming models

Is designed to scale up from single servers to thousands of machines, each offering local computation and storage

Does not rely on hardware to deliver high-availability, but rather the library itself is designed to detect and handle failures at the application layer

Delivers a highly-available service on top of a cluster of computers, each of which may be prone to failures

The Apache Hadoop project describes the technology as a software framework that:

Source: http://hadoop.apache.org

Page 30: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hadoop Core = Storage + Compute

storage storage

storage storage

CPU RAM

Yet Another Resource Negotiator (YARN)

Hadoop Distributed File System (HDFS)

Page 31: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

YARN Enables Multiple Workloads

HADOOP 2.0

Multi Use Data Platform

Batch, Interactive, Online, Streaming, …

Interact with all data in multiple ways simultaneously

Redundant, Reliable StorageHDFS 2

Cluster Resource ManagementYARN

Standard SQL Processing

Hive

BatchMapReduce

InteractiveTez

Online Data Processing

HBase, Accumulo

Real Time Stream Processing

Storm

others…

HADOOP 1.0

HDFS 1(redundant, reliable storage)

MapReduce(distributed data processing

& cluster resource management)

Single Use System

Batch Apps

Data Processing Frameworks

(Hive, Pig, Cascading, …)

Page 32: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Architectural components

Hive provides a data ware house where data can be served to general consumers who do not require real time access

ORC and Parquet provide file formats for sharing data within the Hadoop ecosystem

Spark provides libraries for data manipulation, data access and machine learning. It can serve as a one stop language

HBase provides low latency storage for time series as they are generated and accessed most frequency.

Nifi ensures that data can be reliably acquired and transmitted regardless of source or destination

Zeppelin provide a rich UI to manage time series as they flow the HaaH architecture. The notebook style environment also is a natural home for data scientist who leverage the historical data for insights

Hadoop under pins everything, ensuring data is always available and that resources are effectively managed

Page 33: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

33 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hortonworks Data Platform Architecture – Data Management

Page 34: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

34 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

HDP 2.5

HORTONWORKS DATA PLATFORM

Had

oo

p&

YA

RN

Flu

me

Oo

zie

Pig

Hiv

e

Tez

Sqo

op

Clo

ud

bre

ak

Am

bar

i

Slid

er

Kaf

ka

Kn

ox

Solr

Zoo

keep

er

Spar

k

Falc

on

Ran

ger

HB

ase

Atl

as

Acc

um

ulo

Sto

rm

Ph

oe

nix

4.10.2

DATA MGMT DATA ACCESS GOVERNANCE & INTEGRATION OPERATIONS SECURITY

HDP 2.2

Dec 2014

HDP 2.1

April 2014

HDP 2.0

Oct 2013

HDP 2.2

Dec 2014

HDP 2.1

April 2014

HDP 2.0

Oct 20130.12.0 0.12.0

0.12.1 0.13.0 0.4.0

1.4.4 1.4.4 3.3.23.4.5

0.4.00.5.0

0.14.0 0.14.0 3.4.6 0.5.0 0.4.00.9.30.5.2

4.0.04.7.2

1.2.1 0.60.0 0.98.4 4.2.0 1.6.1 0.6.0 1.5.21.4.5 4.1.02.0.0

1.4.0 1.5.1 4.0.0

1.3.1

1.5.1 1.4.4 3.4.5

2.2.0

2.4.0

2.6.0

2.7.1 1.4.6 1.0.0 0.6.0 0.5.02.1.00.8.2 3.4.61.5.25.2.1 0.80.0 0.5.01.7.04.4.0 0.10.0 0.6.10.7.01.2.10.15.0HDP 2.3

Oct 20154.2.0

0.96.1

0.98.0 0.9.1

0.8.1

1.4.1 1.1.2

2.7.3 1.4.6 1.3.0 0.9.0 0.6.02.4.00.10.0 3.4.61.5.25.5.1 0.91.0 0.7.01.7.04.7.0 1.0.1 0.10.00.7.01.2.1+2.1***

0.16.0HDP 2.5*

2H20164.2.0

1.6.2+2.0**

1.1.2

2.7.1 1.4.6 1.2.0 0.6.0 0.5.02.2.10.9.0 3.4.61.5.25.2.1 0.80.0 0.5.01.7.04.4.0 0.10.0 0.6.10.7.01.2.10.15.0HDP 2.4

Mar 20164.2.01.6.0 1.1.2

Zep

pe

lin

Ongoing Innovation in Apache

0.6.0

Page 35: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

35 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Connected Data Platforms

DA

TA

S

YS

TE

MA

PP

LIC

ATI

ON

S

EnterpriseApplications

BI / Reporting,Ad Hoc Analysis

Interactive Weband Mobile Applications

StreamingApplication

Search StatisticalAnalysis

MachineLearning

Repositories

NoSQL

SOLUTIONS

System Integrators and Consultants

HDP

Govern

ance

&

Inte

gra

tion Data Access

Opera

tions

Security

Data Management

YARN

EDW

DEV AND DATA TOOLS

Build, Devand Test

OPERATIONAL TOOLS

Provision, ManageandMonitor

INFRASTRUCTURE

On-Premises,Cloud and ApplicationsS

OU

RC

ES

ExistingSystems

Clickstream Web andSocial

Geolocation Sensor and Machine

Server Logs Documents& Emails

Page 36: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

36 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hadoop as a +1 Architecture

OPERATIONAL TOOLS

DEV & DATA TOOLS

INFRASTRUCTURE

SO

UR

CE

S

EXISTING Systems

Clickstream Web &Social Geolocation Sensor & Machine

Server Logs Unstructured

DA

TA

S

YS

TE

MA

PP

LIC

ATI

ON

S

HDP

Go

ve

rna

nc

e

& I

nte

gra

tio

n

Se

cu

rity

Op

era

tio

nsData Access

Data Management

YARN

Page 37: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

Popular Use Cases

Page 38: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

38 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hortonworks Delivers Open Enterprise Hadoop

H O R T O N W O R K S D AT A P L AT F O R M

YARN: Data Operating System

CLICKSTREAM SENSOR SOCIAL MOBILE GEOLOCATION SERVER LOG

Batch Interactive Search Streaming Machine Learning

EXISTING

Page 39: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

39 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Modern Data Architecture

OPERATIONS TOOLS

Provision, Manage &Monitor

DEV & DATA TOOLS

Build & Test

DA

TA S

YSTE

M

REPOSITORIES

SOU

RC

ES

RDBMS EDW MPP

OLTP, ERP,CRM

Systems

Documents, Emails

Web Logs,Click Streams

Social Networks

Machine Generated

SensorData

GeolocationData

Go

vern

an

ce

& In

teg

rati

on

Secu

rity

Op

era

tio

nsData Access

Data Management

AP

PLI

CA

TIO

NS

Business Analytics

Custom Applications

PackagedApplications

OLTP, ERP, CRM Systems

Unstructured documents, emails

Clickstream

Server logs

Sentiment, Web Data

Sensor. Machine Data

Geolocation

Page 40: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

40 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

New Analytic Applications from New Types of Data

INDUSTRY USE CASESentiment

& WebClickstream & Behavior

Machine & Sensor

Geographic Server LogsStructured &Unstructured

Financial Services

New Account Risk Screens ✔ ✔

Trading Risk ✔

Insurance Underwriting ✔ ✔ ✔

Telecom

Call Detail Records (CDR) ✔ ✔

Infrastructure Investment ✔ ✔

Real-time Bandwidth Allocation ✔ ✔ ✔

Retail

360° View of the Customer ✔ ✔ ✔

Localized, Personalized Promotions ✔

Website Optimization ✔

Manufacturing

Supply Chain and Logistics ✔

Assembly Line Quality Assurance ✔

Crowd-sourced Quality Assurance ✔

HealthcareUse Genomic Data in Medial Trials ✔ ✔ ✔

Monitor Patient Vitals in Real-Time

PharmaceuticalsRecruit and Retain Patients for Drug Trials ✔ ✔

Improve Prescription Adherence ✔ ✔ ✔ ✔

Oil & GasUnify Exploration & Production Data ✔ ✔ ✔ ✔

Monitor Rig Safety in Real-Time ✔ ✔ ✔

GovernmentETL Offload/Federal Budgetary Pressures ✔ ✔

Sentiment Analysis for Government Programs ✔

Page 41: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

41 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Payment Tracking

Call Analysis Machine Data Product Design

Social Mapping

Factory YieldsDefect

Detection

Due Diligence

M & AProactive

RepairDisaster

MitigationInvestment

Planning

Next Product Recs

Store Design

Risk Modeling Ad PlacementInventory

Predictions

Sentiment Analysis

Ad Placement Basket Analysis Segments

Customer Support

Supply ChainCross-

SellCustomer Retention

Vendor Scorecards

Optimize Inventories

Business executives are driving transformational outcomes with next-generation applications that empower new uses of Big Data including: data discovery, a single view of the customer and predictive analytics.

Page 42: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

42 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Historical Records

OPEXReduction

Mainframe Offloads

Fraud Prevention

Dataas a Service

Public Data Capture

IT executives are delivering substantial reductions in operating costs by modernizing their data architectures with Open Enterprise Hadoop. These cost saving innovations include active archive of cold data, offloading ETL processes and enriching existing data.

Digital Protection

Device DataIngest

Rapid Reporting

Page 43: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

43 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hortonworks® customers leverage our technology to transform their businesses, either by achieving new business objectives or by reducing costs. The journey typically involves both of those goals in combination, across many use cases.

Social Mapping

Payment Tracking

FactoryYields

Defect Detection

Call Analysis Machine Data Product Design M & A

Due Diligence

Next Product Recs

Store Design

Risk Modeling Ad Placement

Proactive Repair

Disaster Mitigation

Investment Planning

Inventory Predictions

Customer Support

Sentiment Analysis

Supply Chain

Ad Placement Basket Analysis Segments

Cross-Sell

Customer Retention

Vendor Scorecards

Optimize Inventories

OPEX Reduction

Mainframe Offloads

Historical Records

Dataas a Service

PublicData Capture

Fraud Prevention

Device DataIngest

Rapid Reporting

Digital Protection

Page 44: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

44 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

New Analytic Applications for New Types of Data

$

• Supplier Consolidation

• Supply Chain and Logistics

• Assembly Line Quality Assurance

• Proactive Maintenance

• Crowdsourced Quality Assurance

• New Account Risk Screens

• Fraud Prevention

• Trading Risk

• Maximize Deposit Spread

• Insurance Underwriting

• Accelerate Loan Processing

• Call Detail Records (CDRs)

• Infrastructure Investment

• Next Product to Buy (NPTB)

• Real-time Bandwidth Allocation

• New Product Development

• 360° View of the Customer

• Analyze Brand Sentiment

• Localized, Personalized Promotions

• Website Optimization

• Optimal Store Layout

Financial Services

Retail Telecom Manufacturing

HealthcareUtilities, Oil & Gas

Public Sector

• Genomic data for medical trials

• Monitor patient vitals

• Reduce re-admittance rates

• Store medical research data

• Recruit cohorts for pharmaceutical trials

• Smart meter stream analysis

• Slow oil well decline curves

• Optimize lease bidding

• Compliance reporting

• Proactive equipment repair

• Seismic image processing

• Analyze public sentiment

• Protect critical networks

• Prevent fraud and waste

• Crowdsource reporting for repairs to infrastructure

• Fulfill open records requests

Page 45: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

The Data Journey to Safe Roads

Page 46: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

46 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Actionable Intelligence Is Shaping the Modern Insurance Industry

Catastrophic Event Data

Customer Onboarding Data

Seismic Data

Biometrics Data

Usage-Based Driver Data

Cyber Threat Metadata

RISK & UNDERWRITING ANALYSIS

USAGE-BASED INSURANCE

CLAIMS ANALYTICS

NEW PRODUCT DEVELOPMENT

CYBER RISK ANALYTICS

Drones & Aerial Imagery

Claims Docs, Notes & Diaries

Weather & Environment

Underwriting Analysis

Policy Histories

Photos

Page 47: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

47 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Progressive Rewards Safe Drivers and Improves Traffic Safety

ETL OFFLOAD

Sensor Data Ingest

DATA DISCOVERY

Web Log Analysis

ACTIVE ARCHIVEIndividual

Driving Histories

100% in2-3 Days

+12 Billion

Web App-Enabled

$2.6 Billion

driving detail captured from Snapshot, in HDF

miles driven stored

customers see driving detail and improve safety

in 2014 Premiums

Existing Data Systems Did Not Scale Efficiently

Usage-Based “Snapshot” Insurance Program

In-Car Sensor Captures IoT Data

~7 Days to Transform Only 25% of UBI Data

S I T UAT I O NDATA

DISCOVERY Online AdPlacement

DATA DISCOVERY Claim Notes

Mining

P R E D I C T I V EA N A LY T I C S

Usage-Based Insurance (UBI)

“We’re looking at datasets that we never dreamed we could look at…It’s joining dots that in the past we didn’t even know we could join.” Pawan Divakarla, Data & Analytics Business Leader

Page 48: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

The Data Journey to Better Health

Page 49: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

49 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Actionable Intelligence Makes Healthcare Precise and Personal

Patient Records

Lab Data

Pharmacy Data

Patient Locations

Wearables

Intra-Network Data

Sensor Data

Claims Data

Social Media Physician

NotesPatient

Satisfaction Data

Clinical (EMR) Data

SINGLE VIEW OF PATIENT

REAL-TIME VITAL S IGN MONITORING

BILLING & REIMBURSEMENTS

EMR OPTIMIZATION

SUPPLY CHAIN OPTIMIZATION

Page 50: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

50 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Mercy Transforms Healthcare Through “One Patient, One Record”

ACTIVE ARCHIVELab Notes

DATADISCOVERY

OPEX Efficiency

SINGLE VIEWBilling

DATA DISCOVERYVital Signs

SINGLE VIEWSingle Patient

Record

ACTIVE ARCHIVEEpic EMR

Replication

ACTIVEARCHIVEPrivacy

Database

DATA ENRICHMENT

Epic Enrichment

PREDICTIVE ANALYTICSDevice Data

Ingest

Move to Clarity wouldn’t enable real-time analytics

Existing platform impeded goals

Data enrichment needed for 1 million patients

Extracting data from Epic to Clarity took a day

S I T UAT I O N

3-5 Minutes

$1M Additional Annual Revenue

From “Never”to “Seconds”

900x more data

move data off Epic to Clarity with HDP

from improvedbilling process

acceleratedresearcher insight

ingest of ICU vital signs

P R E D I C T I V EA N A LY T I C S

Preventive Care

ETL OFFLOADMedical Decision

Support

“[HDP] provides us a place and way to leverage our Epic data in addition to other data that comes from outside of Epic.” Paul Boal, Director of Data Management

Page 51: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

51 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Webtrends

The Data Journey TowardsPersonalized Online Ads

Page 52: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

52 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Massive Volumes of Weblogs Fueled Webtrends Growth—and also its Skyrocketing Storage Costs

Webtrends’ Journey

Webtrends provides digital marketing solutions for more than 2,000 companies in 60 countries – processing 13 billion daily online events

Data used to be processed in relational databases, stored on large NAS appliances, which were not economical at scale

Processing occurred on-premises, without cloud-based capabilities

Diseconomies of scale hampered the company objective to help its customers predict optimal online ad placement

Page 53: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

53 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Webtrends’ Journey

Personalized Online Ads

Per-Customer Click Path

Web LogAnalysis

SQL Server Offload

“We’re able to…look at this data set and process it and do predictions, behavioral analysis. We can do things that allow us to determine ROI for different actions and behavioral patterns.”

Peter Crossley, Chief Architect

Behavioral Segmentation

Ad Click Predictions

LCV Analysis

Innovate

Renovate

Petabytes of Weblogs Analyzed with Sparkat Scale

Data streams from a vast array of desktop and mobile devices

13 billion daily events collected in fewer than milliseconds per event

No data cleansing necessary prior to analysis with Apache Spark

2 clusters consolidated into 1 YARN-based HDP cluster

Launched new product Webtrends Explore™ – powered by HDP

Page 54: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

The Data Journey for Cyber Security

Page 55: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

55 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Symantec’s Journey

Analyzing Streaming Threat Data to Increase Velocity for Time to Protection

The Symantec™ Global Intelligence Network includes more than 57 million attack sensors over 157 countries

Data streams from 75 million users on 120 million devices

Legacy platforms created 3-4 hour processing latencies to analyze logs files for digital threats

Attackers could exploit those processing time windows

Page 56: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

56 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Symantec’s Journey

Digital Security

Metadata Capture

Threat Predictions

Attacker Detection

Unified Security

Security LogAnalysis

Threat Archive

Device Data Ingest

Threat Detection

Greenplum Offload

Innovate

Renovate

Data Science Speeds Time to Protection

Threat detection latency reduced from 4 hours to 2 seconds

Time to protection improved 5000x

Machine learning over tens of petabytes of historical data predicts threats to customers

Cloud team uses Ambari and Cloudbreak for dynamic clusters to meet peak workloads

Page 57: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

57 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hive LLAP

Page 58: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

58 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Fast SQL with Apache Hive at Scale

H I V E O N YA R N

Pluggable Architecturesupports Apache Hive, Pivotal HAWQ and other leading SQL engines

Familiar SQL Query Semanticsenable transactions and SQL:2011 Analyticsfor rich reporting

Unprecedented Speed at Extreme Scalereturns query results in interactive time,even as data sets grow to petabytes

OPERATIONS SECURITY

GOVERNANCE

STO

RA

GE

STO

RA

GE

Page 59: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

59 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Hive: Fast Facts

Most Queries Per Hour

100,000 Queries Per Hour(Yahoo Japan)

Analytics Performance

100 Million rows/s Per Node(with Hive LLAP)

Largest Hive Warehouse

300+ PB Raw Storage(Facebook)

Largest Cluster

4,500+ Nodes(Yahoo)

Page 60: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

60 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

HDP 2.5 is a Major Milestone for Hive

At a High Level:– 2000+ features, improvements and bug fixes in

Hive since HDP 2.4.

– 600+ of these from outside of Hortonworks.

Major Improvements:– Preview: Hive LLAP: Persistent query servers

with intelligent in-memory caching.

– ACID Ready for Production Use: Hardened and proven at scale.

– Expanded SQL Compliance: More capable integration with BI tools.

– Performance: Interactive query, 2x faster ETL.

– Security: Row / Column security extending to views, Column level security for Spark.

– Operations: LLAP integration in Ambari, new Grafana dashboards.

1391

642

From Hortonworks

From Community

Hive 2 Improvements

Interactive Query with Hive LLAP+

SQL ACID Fully Supported+

2x Faster ETL+

Page 61: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

61 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hive 2 with LLAP: Architecture Overview

Dee

p

Sto

rage

HDFSS3 + Other HDFS

Compatible Filesystems

YARN Cluster

LLAP Daemon

QueryExecutors

In-Memory Cache

LLAP Daemon

QueryExecutors

In-Memory Cache

LLAP Daemon

QueryExecutors

In-Memory Cache

LLAP Daemon

QueryExecutors

In-Memory Cache

QueryCoordinators

Coord-inator

Coord-inator

Coord-inator

HiveServer2 (Query

Endpoint)

ODBC /JDBC SQL

Queries

Page 62: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

62 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

0

10

20

30

40

50

0

50

100

150

200

250

Spe

edu

p (

x Fa

cto

r)

Qu

ery

Tim

e(s)

(Lo

wer

is

Bet

ter)

Hive 2 with LLAP averages 26x faster than Hive 1

Hive 1 / Tez Time (s) Hive 2 / LLAP Time(s) Speedup (x Factor)

Hive 2 with LLAP: Performance and Concurrency

0

20

40

60

80

100

120

8 16 32

Tim

e (s

)

Concurrent Queries

Average Query Time by Concurrency

Average Time: 8 Node Average Time: 16 Node

0

50

100

150

200

250

300

8 16 32

Tim

e (s

)Concurrent Queries

Maximum Query Time by Concurrency

Max Time: 8 Node Max Time: 16 Node

Page 63: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

63 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Data Types SQL Features File Formats FuturesNumeric Core SQL Features Columnar Procedural Extensions (PL/SQL)

FLOAT/DOUBLE Date, Time and Arithmetical Functions ORCFile Primary Key / Foreign Key

DECIMAL INNER, OUTER, CROSS and SEMI Joins Parquet Non-Equijoin

INT/TINYINT/SMALLINT/BIGINT Derived Table Subqueries Text Scalable Cross Product

BOOLEAN Correlated + Uncorrelated Subqueries CSV Enhanced OLAP

String UNION ALL Logfile

CHAR / VARCHAR UDFs, UDAFs, UDTFs Nested / Complex ACID MERGE

STRING Common Table Expressions Avro Multi Subquery

BINARY UNION DISTINCT JSON Comparison to sub-select

Date, Time Advanced Analytics XML INTERSECT and EXCEPT

DATE OLAP and Windowing Functions Custom Formats

TIMESTAMP CUBE and Grouping Sets Other Features

Interval Types Nested Data Analytics XPath Analytics

Complex Types Nested Data Traversal

ARRAY Lateral Views

MAP ACID Transactions

STRUCT INSERT / UPDATE / DELETE

UNION MERGE

Apache Hive: Journey to SQL:2011 Analytics

Legend

Existing

Projected: HDP 3.0

Projected: HDP 2.5

Track Hive SQL:2011 Complete: HIVE-13554

Page 64: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

64 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hive SQL:2011 Complete Initiative

Objective: Provide complete ANSI compliance for SQL:2011 analytical capabilities.

Identified 9 specific improvements needed to reach this goal.

Tracked in HIVE-13554.

To be delivered in HDP 3.

Page 65: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

65 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Security Advancements

Dynamic Column-Level Masking:– Value masking or consistent hashing that permits joins.

– Supports per-user policies.

Dynamic Row Filtering:– Filter rows for security policy compliance.

– Supports per-user policies.

Identical Policies for Hive and Spark.

Page 66: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

66 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

LLAP Data Access

User ID Region Total Spend

1 East 5,131

2 East 27,828

3 West 55,493

4 West 7,193

5 East 18,193

Example: Per-User Row Filtering by Region in Hive/SparkSQL

Spark User 2

(East Region)

Spark User 1

(West Region)

Original Query:

SELECT * from CUSTOMERS

WHERE total_spend > 10000

Query Rewrites based on

Dynamic Ranger PoliciesDynamic Rewrite:

SELECT * from CUSTOMERS

WHERE total_spend > 10000

AND region = “east”

Dynamic Rewrite:

SELECT * from CUSTOMERS

WHERE total_spend > 10000

AND region = “west”

Page 67: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

67 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hive Ambari View v1.5.0: More Robust, More Secure

Robustness: 87 fixes in Q2, 121 fixes in 2016.

JDBC Support:– Enables security / SSL.

– HTTP / Knox integration.

– All JDBC options now possible in Hive View.

Works with Hive 1, Hive 2 and Hive LLAP.

1.5.0

Page 68: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

68 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Secure

Real-time

Adaptive

Integrated

Hortonworks DataFlow for Data in MotionPowered by Apache NiFi

Page 69: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

69 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

HDF Use Cases

Data Ingestion

Optimize Log Collection & Analysis Optimize log analytics such as Splunk with HDF for content based routing from the edge and HDP for lower cost storage options

Ingest telemetry for Cyber SecurityIntegrated, easy and secure telemetry collection for real-time data analytics and threat detection

Capture IoT Data Transport disparate and often remote IoT data in real time, despite any limitations in device footprint, power or connectivity— avoiding data loss

Data MovementOptimize resource utilization by moving data between data centers or on-premises and cloud infrastructure

Streaming AnalyticsAccelerate big data ROI by moving streaming data into analytics systems such as Apache Storm or Spark Streaming faster & easier

Page 70: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

70 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

DATA AT RESTDATA IN MOTION

ACTIONABLEINTELLIGENCE

Modern Data Applications

PERISHABLE INSIGHTS

HISTORICAL INSIGHTS

INTERNETOF

ANYTHING

Hortonworks DataFlow

Hortonworks Data Platform

Hortonworks DeliversConnected Data Platforms

Page 71: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

71 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

HDF Provides “Data Plan of Control” by Managing IoT Dataflows

Constrained

High-Latency

Localized Context

Hybrid – Cloud/On-Premise

Low-Latency

Global Context

Data source agnostic collection of data across heterogeneous environments

Page 72: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

72 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Integrated Processes and Control

C O M M O N A R C H I T E C T U R EW I T H O U T H O R T O N W O R K S D A T A F L O W

W I T H H O R T O N W O R K S D A T A F L O W

Ingest

Scripts

Messaging

Scripts

HORTONWORKS DATAFLOW

Optimize the ArchitectureReduce cost an complexity with the most efficient data collection technologies

Assure Efficient OperationsVia real-time control of data inputs, outputs, transportation and transformations

Rely on a Common FoundationEliminating dependence on multiple customized systems

Page 73: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

73 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Page 74: Henning Kropp - Abteilung Datenbanken Leipzig of Data... · Henning Kropp Sr. Systems Architect EMEA – Hortonworks University Leipzig Graduate

74 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

ReportingData Ingest

- Demo: HDF NiFi pulls in BeBop 2 Drone images

- Demo: HDF NiFi routes and parses metadata from drone images including geodata

- Demo: HDF NiFi uses TensorFlowInception v3 to recognize objects in image

- Demo: HDF stores images, metadata and enriched data in HDP.

- Demo: HDF NiFi calls ML Vision REST APIs from vendors

- Kafka Sends Job to Spark Streaming for Anomaly Detection and supervised machine learning

- Kafka Sends Job to Spark for VORA connection to SAP HANA