The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto...

51
Informatica Open House Session The State of Advanced Analytics 2018 Welcome to

Transcript of The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto...

Page 1: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

Informatica Open House Session

The State of Advanced Analytics

2 0 1 8

W e l c o m e t o

Page 2: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

Thank You To Our Sponsor:

Page 3: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

3 © Informatica. Proprietary and Confidential.

Housekeeping

• Open House Forum rules so questions are encouraged

• Please turn phones on silent

• Get in on the conversation #InformaticaOpenHouse

Snowflake Computing @SnowflakeDB

Informatica ANZ @Informatica_ANZ

Page 4: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

4 © Informatica. Proprietary and Confidential.

Agenda

• Introductions

• Informatica At-a-Glance

• Snowflake Overview

• Informatica + Snowflake – Better Together

• Panel Questions

Page 5: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

5 © Informatica. Proprietary and Confidential.

Today’s Speakers

Daniel ClarkeHead of IoT, Big Data and Emerging Products APAC

Informatica

Clive Astbury,

Sales Engineer,Snowflake

Page 6: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

6 © Informatica. Proprietary and Confidential.

27 new APAC Big Data Management Customers in 2017…

Banking & Insurance

Telecom

Transport, Mining & Logistics

Page 7: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

7 © Informatica. Proprietary and Confidential.

Informatica in Big Data

312

32

77

120

198

250

0

50

100

150

200

250

300

2012 2013 2014 2015 2016 2017 2018

54%46%

Perpetual

Subscription

0

10

20

30

40

50

60

70

2015 Q3 2015 Q4 2016 Q1 2016 Q2 2016 Q3 2016 Q4 2017 Q1 2017 Q2

Big Data Customers

Adoption Production Total

Page 8: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

8 © Informatica. Proprietary and Confidential.

Customer adoption trends

55%57% 56%

15%

25%

31%

0%

10%

20%

30%

40%

50%

60%

2016 2017 2018

Adoption++ Production

Gartner’s prediction for

Production in 2018 (14%)

Gartner’s prediction for

beyond pilot in 2018 (40%)

Page 9: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

9 © Informatica. Proprietary and Confidential.

Next Generation Data Architecture

Page 10: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

`

The Big Data Landscape & Informatica Direction Sumeet Agrawal

Director - Big Data Product Management

Page 11: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

11 © Informatica. Proprietary and Confidential.

Technology Trend

Basic storage in NoSQL and HDFS

File system innovations

(eg HDFS, MapR-FS, etc)

Shared-everything Storage Systems (S3, Azure Blob)

Storage As a Service

On-Premise, Manual Deployment

Hosted, ManualDeployment

Fully Automated Cloud Deployment

Managed ServerlessDeployment

Basic processing in MapReduce

Cluster-aware and In-Memory processing

(eg YARN, Spark, Sqoop)

Elastic, Auto-Scalingprocessing

Compute As a Service

Most comprehensivedata integration

for Big Data

Most comprehensive &

intelligent data integration

for Big Data

Most comprehensive &

intelligent data management

for Big Data

Most comprehensive & intelligent hybrid

Data Mgmt Platform As A Service

Sto

rag

eC

om

pu

teD

ata

Ma

na

ge

me

nt

1 2 3 4

Google, YahooCloudera,Hortonworks,

MapREMR, HDInsight, Altus

Databricks,Qubole, AWS Glue

Ke

y P

lay

ers

Page 12: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

12 © Informatica. Proprietary and Confidential.

Technology Trend - Proof points– On-premise DI and iPaaS vendors taking first step in Serverless

– Qubole investing in Auto scaling and Serverless

– Databricks, creator of Spark is getting more and more popular

• Microsoft announces strategic relationship with Databricks

• Spark is no more “open source”. Databricks started it’s own version of Spark

– AWS announces Serverless Glue, Athena

– Google Bigquery and Dataproc both are serverless

– Cloudera announces “Altus” to be in Serverless business

Page 13: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

13 © Informatica. Proprietary and Confidential.

Mass Ingestion

File

Cloud Data Integration

Database(with CDC)

Cloud Data Lakes

Advance ML based analytics

Streaming Analytics

Ingestion@Scale

• We are extending our mass ingestion functionality to databases & streaming

• Using Database based mass ingestion, customers are quickly bring ten’s of thousands of relational table to cloud. This functionality will support

• Initial Load

• Incremental Load with CDC changes

• Schema Drift

• Using Streaming ingestion, customers can ingest data from IoT & other devices onto the data lake or message hubs

Streaming(IoT devices)

DWH Moderniza

tion

Cloud Data

Lakes

Advance Analytics

Page 14: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

14 © Informatica. Proprietary and Confidential.

Single DI SolutionNext Gen compute engine for iPaaSUse cases-

DatawarehouseModernization,

Database Modernization

Hybrid Integration

PowerCenterEngine

Data Integration for Small to

Medium Workload

iPaaS DI Today

On-premises Hadoop

Deployment

Requires Big data skills

Static Scalability

High Operational cost

Big Data Management Today

Cloud Data Integration@Scale

New use cases-Data Science, ML,Streaming

Optimized for Cloud

New Serverless Spark Engine

Optimized for Big data workload

Cloud based compute cluster

No Big Data Skills

Auto Scale/TuneReduced

Operational Cost

Building a Single DI Offering

• Serverless iPaaS offering

• Informatica will own Compute cluster

• Using container and Kubernetes for compute cluster

• Will use open source project “Spark on Kubernetes”

Page 15: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

15 © Informatica. Proprietary and Confidential.

Next Gen Data Integration @ Scale- Reference Architecture

Salesforce, Adobe Analytics Marketo, …

Discover &

Profile

Parse &

Prepare

Load to Amazon Redshift / S3

Amazon S3 Input bucket

Amazon S3 Output bucket

AmazonRedshift

1

23 4 5

Compute Cluster

Next Gen Data Integration @ Scale

Corporate Data Center (on-prem)

Databases

Application Servers

6Mass Ingestion on IICS

Page 16: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

16 © Informatica. Proprietary and Confidential.

Large scale file migration between on-premises and cloud

Expanded migration of relational databases & streaming processes between on-premise and cloud

Continued orchestration innovationsComplete platform for a service

Automated deployment and management of Hadoop clusters

Continued connectivity innovationsMessaging, Storage, and DBs: Azure, AWS, GCP

Continued expansion of engine support

Distributed engines on HDInsights, EMR, Altus, Cloudera, Hortonworks or MapR on EC2, Azure and GCP

Continued expansion of deployment options

Single click on Azure and Amazon

Expanded migration of relational databases & streaming processes between on-premise and cloud

iPaaS @ Scale

Continued connectivity innovations

Spark serverless: Databricks, Qubole, DataProc

Dockers, Containers, Kubernetes

Cloud Ready Now

Ingestion

ServerlessManagement

Connectivity

Processing

Deployment

Page 17: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

OVO is Lippo Group Digital’s Concierge Platform,integrating mobile payment, loyalty points, and exclusive priority deals.

Lippo Group Assets

OVO Merchant Partners

Page 18: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

18 © Informatica. Proprietary and Confidential.

Big Data

Management

Vibe Data

Stream

Indonesian Conglomerate• Collect and Govern click stream data into Hadoop

Big Data Governance

Big Data Integration

o Collect level 8 click stream and analytics data in real-time directly from probe using VDS.o Ingest directly into the Hadoop architecture.

Phase 1

Collect click-stream data in

data in real-time with VDS

No hand coding for loading data

into Hadoop with BDM

Page 19: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

19 © Informatica. Proprietary and Confidential.

Fast Data Lane implementation at Lippo

Visualization

Kafka

Alerts`ProcessRefine

Enrich

Deliver

Analyze

Streaming

• Sources: IoT, Gateways, Social Media, Clickstreams, Weblogs, … etc.

• Formats: XML, JSON, Avro

Existing Data Assets

Kafka

VDS AGENT

VDS AGENT

VDS AGENT

VDS AGENT`

VDS AGENT

ActionEvent Sense Reason Act

Data lake

Real time offers

Vibe Data

Stream

Big Data

Management

Big DataRelationshipManagement

Intelligent Data

Streaming

Page 20: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

20 © Informatica. Proprietary and Confidential.

The Staging Toward Monetization and Business Optimization

Page 21: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

21 © Informatica. Proprietary and Confidential.

Indonesian Conglomerate• Collect and Govern click stream data into Hadoop

Phase 2

Big Data

Management

Enterprise InformationCatalogue

Intelligent Data Lake

Big Data Relationship

ManagerData

Cleansing

Cleanse, catalogue, analyze and build reports on any data source…..

Page 22: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

22 © Informatica. Proprietary and Confidential.

First steps … use Aggregate Pattern Matching

Page 23: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

23 © Informatica. Proprietary and Confidential.

Complete 360 degree Customer ProfileCorrelating all customer information and transactions to understand their profile and preference in order to interact with them in personalized way

Page 24: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

24 © Informatica. Proprietary and Confidential.

Smart Vending MachineReal-time OVO smart payment & face recognition

Face- Recognition (with sentiment reader)

Point of sales via OVO pay.

Suggestive Advertising

Smart Pricing

Page 25: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

Y O U R D A T A , N O L I M I T S

Data and Advanced AnalyticsHave Arrived

Page 26: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data
Page 27: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

Who needs to lead this?

Page 28: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

Analytics

BusinessIT

Data ScienceAnalyticsStrategy

DataStrategy

CDAO

Ensure the correct models and algorithms are used to support business requirements

Understand how statistics and analytics can help improve business decisions

Ensure data is complete, available, and with a firm future delivery roadmap

Page 29: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

So what do you need?

Page 30: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved.

Complexity Difficult to manage

Scalability Fixed

Diversity Structured data only

Elasticity Rigid. Need to plan ahead

Cost 24/7, plan for worst day

Legacy Data Platforms Modern Data Platforms

Managed by vendor

Unlimited

Instant

Pay for what you use

Structured & Semi-structured data

Page 31: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved.

What is Snowflake?

Built for the cloud

SQL Data Warehouse

Delivered as a service

Page 32: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved.

What is Snowflake?

Built for the cloud

SQL Data Warehouse

Delivered as a service

Page 33: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved.

How does Snowflakemake things easier?

Page 34: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved. 34

Minimal Management

NO Infrastructure

NO Tuning

NO Optimization

NO Indexing

NO Storage worries

NO Vacuuming

NO Partitioning

NO Required sorting

NO Workload mgmt.

NO Manual backups

Page 35: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved.

ETL/ELT

Snowpipe

XS

S

M

M

L

Sales

Data Science

M…

XLS

Multi-cluster

Global Services

Transactional Control

Security

Query Planning & Optimisation

Logical Model

AWS QuickSight

Page 36: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved.

But wait! There’s more…

Page 37: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved.

ETL/ELT

Snowpipe

XS

S

M

M

L

External

Finance

Sales

Data Science

M…

Test/Dev

Clone

Share

Data protection & time travel

XL

Multi-cluster

Structured & semi-structured

Global Services

Transactional Control

Security

Query Planning & Optimisation

Logical Model

AWS QuickSight

Page 38: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved.

What are customers doing with Snowflake?

Page 39: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved.

Modern data landscape

EDW

Data Sources

Data Lake

Data-Marts

BI, Analytics &Data Science

OLTP Databases

Enterprise Applications

DataProviders

Web/LogData

IoT

ETLor

ELT

DataConsumers

Page 40: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data
Page 41: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved.

Wow…so much to remember…

Page 42: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved.

Diversity

One place for all your data

Scalability

Any scale of data, users and

workloads

Flexible Cost

Pay for what you use, when

you use it

Simplicity

Simple,serverless,

plug-and-play

Elasticity

Size for whatyou needright now

Page 43: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

© 2018 Snowflake Computing Inc. All Rights Reserved.

Page 44: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

Informatica + SnowflakeBetter Together

Page 45: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

45 © Informatica. Proprietary and Confidential.

Journey to Snowflake

2) Extend1) Prototype

New data consumption endpoint, could be an app or BI/Analytics

Existing data consumption endpoint, could be an application or BI/Analytics

DB DB DB

EDW

3) Lift-and-Shift

DB DB DB

EDW

DB DB DB

EDW

Beginning the Journey to Snowflake

Page 46: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

46 © Informatica. Proprietary and Confidential.

Informatica + Snowflake Joint Solution

Intelligent Data Catalog

Data Integration & Management

+ +

Page 47: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

47 © Informatica. Proprietary and Confidential.

Cloud Data Management

Analytics&

Visualizations

Business Apps Web Analytics

Optimized Platforms for your Data Journey

47

Traditional DBs

Data Sources

200+ Data Sources

Unstructured DataStructured Data Semi-structured Data

SaaS Apps

Govern

Push-down data transformations to Snowflake

Optimized Native Snowflake Connector*

Cloud Data Warehouse

Big Data

Intelligent Cloud Services

(iPaaS)

Cleanse

Catalog

ProtectConnect | Transform | Filter*Also available on PowerCenter and

Informatica Big Data Management

Additional Data

Management

Platform Services

Page 48: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

48 © Informatica. Proprietary and Confidential.

Laureate Education

After Snowflake and Informatica

• No data load windows• One copy of data • Automatically scale up during peak times; only pay for

what you use• Drastically reduced processing time• Data Sharing with Blackboard

Business Scenario• Worldwide colleges, different rules, processes• 24/7 data availability requirements• GDPR compliance• Expensive, rigid legacy infrastructure

Negative Consequences• Data scatter• Delays in data availability

6-12 hours

Business

Platform

DigitalAnalyticsLegacy DWs

15 minutes

Business

Platform

Digital

AnalyticsInformatica

Case Study: Laureate Education

Page 49: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

49 © Informatica. Proprietary and Confidential.

Power of the Informatica-Snowflake Integration

• Leverage high-performance compute by using Informatica’s mapping language

In-database transformation push down to Snowflake

• Faster loading of dataPartitioning of inbound data sets for optimal parallel loading

• Accelerate deployment in complex environments

Parameterization for rapid implementation

• Deliver data the way the business needs it without coding

Best-in-class transformation capabilities.

• Cloud data management and support, no matter where your data resides

Supports on-premises, hybrid and 100% PaaS Snowflake adoption patterns

• Leverage your investments in Hadoop and make them compatible with Snowflake

Spark-based push-down (Big Data Management Offering)

Page 50: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

50 © Informatica. Proprietary and Confidential.

Snowflake Cross-Schema Pushdown Example

Taskflow

PDO Mapping

Page 51: The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto the data lake or message hubs Streaming(IoT devices) DWH Moderniza tion Cloud Data

Thanks for joining us today

Get in contact with us today: [email protected]#InformaticaOpenHouse

Snowflake Computing @SnowflakeDB

Informatica ANZ @Informatica_ANZ