"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center...

17
# 1 Integration of Hadoop in Business landscape Michal Alexa Service Line Manager Data Innovation Lab December 2016

Transcript of "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center...

Page 1: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 1

Integration of Hadoop in Business landscapeMichal AlexaService Line ManagerData Innovation LabDecember 2016

Page 2: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 2

3.472 images

pinned

72 hours new

video content

uploaded

204.000.000 emails

sent

4.000.000 search

queries

277.000 tweets

347.222 photos

sent

Users sweep

416.667 times

2.460.000 new

items of content

shared

216.000 photos

shared

$ 83.000 in online

sales

48.000 apps

downloaded from

the Itunes store

26.380 new

reviews

What happens on the Internet in 60 seconds (2014)

Page 3: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 3

Big-Data and Business world

Big-Data

Java, Python, PigLatin

Massive clusters for big data processing

Structured & unstructured data

Apache & open source

Distributions (e.g. Cloudera)

Engines (Spark, Impala)

Fast paced evolution since 2006

Page 4: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 4

Big-Data and Business world

Big-Data

Java, Python, PigLatin

Massive clusters for big data processing

Structured & unstructured data

Apache & open source

Distributions (e.g. Cloudera)

Engines (Spark, Impala)

Fast paced evolution since 2006

???

ABAP

Client/Server

classic RDBMS as relational database

Proprietary software with interfaces

Engines OLTP, OLAP

World Positioning: 76% of finance

transactions, 78% of food

production, 82% medical devices

Steady evolution since 1972

Page 5: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 5

Big-Data and Business world

Big-Data

Java, Python, PigLatin

Massive clusters for big data processing

Structured & unstructured data

Apache & open source

Distributions (e.g. Cloudera)

Engines (Spark, Impala)

Fast paced evolution since 2006

Business

ABAP

Client/Server

classic RDBMS as relational database

Proprietary software with interfaces

Engines OLTP, OLAP

World Positioning: 76% of finance

transactions, 78% of food production,

82% medical devices

Steady evolution since 1972

Page 6: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 6

Story…

Page 7: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 7

Story…

Page 8: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 8

Biggest struggles in Data Management

Scalability

Data-Pipelines

Granularity and Velocity

Data-Silos

Extensibility

• Not any more possible to do lifetime sizing of platform during procurement

• HW requirements create limitations to possible growth

• Scale UP comes often with great cost, and scale DOWN is usually

valueless

• Data transformations are I/O intensive operations

• Take lot of time, consume lot of resources

• Limitations on format of data

• Limitations on granularity of data, often only aggregated and cleaned

data are stored

• Raw data are necessary for data science activities

• Too many places for storing data

• No interconnection between company units limits data analyzing

possibilities

• Data analyses requires lot of programing languages

• Limited applications compatibility

Page 9: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 9

What is Apache Hadoop?

A software framework for storing, processing and analyzing

“big data”

ScalableDistributed Fault-TolerantOpen Source

Page 10: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 10

“Data-Lake” In Business infrastructure

Page 11: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 11

“Data-Lake” In Business infrastructure

Data-Lake

BW

Source

systems

logs

Page 12: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 12

“Data-Lake” In Business infrastructure

Data-Lake

BW

Source

systems

logs

BW

Page 13: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 13

Emerging new technologies – Integration answers to Big-Data

Smart Data Access

• Data federation feature

available on SAP HANA

• Not fully read-write

• Sybase ASE, Sybase IQ,

Teradata, and Hadoop and

some other databases

Dynamic Tearing

• Supports only Write

Optimized DSO and PSA

• Some restrictions

• Sybase IQ only

• Limited disaster

recovery

• Read & write, but

only on HANA

SDA DTNearline Storage

• Move data from online to

“nearline” database

• Read only

• Uses DAP (Data Archiving

Processes)

• Wrong assumption of

Sybase IQ as “one and

only” storage

NLSSAP HANA VORA

• DB interface between HANA

and Hadoop (Spark)

• Heavily Java-based – no ABAP

workbench integration etc.

• No UI – engine only

• Allows for reporting within

Hadoop based on Spark

VORA

DLMData Lifecycle Manager

• Hana Native only, no ERP

• Offloading to IQ or Spark

Page 14: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 14

Emerging new technologies – Integration answers to Big Data

Smart Data Access

• Data federation feature

available on SAP HANA

• Not fully read-write

• Sybase ASE, Sybase IQ,

Teradata, and Hadoop and

some other databases

Dynamic Tiering

• Supports only Write

Optimized DSO and PSA

• Some restrictions

• Sybase IQ only

• Limited disaster

recovery

• Read & write, but

only on HANA

SDA DTNearline Storage

• Move data from online to

“nearline” database

• Read only

• Uses DAP (Data Archiving

Processes)

• SAP positions Sybase IQ

as “one and only” storage

NLSSAP HANA VORA

• DB interface between HANA

and Hadoop (Spark)

• Heavily Java-based – no ABAP

workbench integration etc.

• No UI – engine only

• Allows for reporting within

Hadoop based on Spark

VORA

DLMData Lifecycle Manager

• Hana Native only, no ERP

• Offloading to IQ or Spark

Offloading Integration

Page 15: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 15

Business <> Hadoop struggle

Hadoop Integration with Businesses is difficult for

several reasons:

Technology readiness

IT culture

Data integration

Operations

• Development strategy

• Software logistics

• Rapid prototyping

• Data protection / personal

data

• SOX compliance

IT culture gap Data integration gap Operational gap

• ETL

• Loading of data

• Staging & enriching of

data within Hadoop

• Data flows from SAP to

Hadoop and back

• Running applications 24x7

between SAP and Hadoop

• Job scheduling

• Testing

• Patching & upgrades

We should intend to close those gaps

Page 16: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 16

Summary

• Hadoop is awesome! Lets make it really

available for all businesses.

• Start small, small amount of data and

fast turnover.

• Think about how to enable new

technology to others.

Page 17: "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

Details, tech. slides and knowledge is shareable during networking.