The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto...
Transcript of The State of Advanced Analytics - Enterprise Cloud Data … · 2018-08-30 · & other devices onto...
Informatica Open House Session
The State of Advanced Analytics
2 0 1 8
W e l c o m e t o
Thank You To Our Sponsor:
3 © Informatica. Proprietary and Confidential.
Housekeeping
• Open House Forum rules so questions are encouraged
• Please turn phones on silent
• Get in on the conversation #InformaticaOpenHouse
Snowflake Computing @SnowflakeDB
Informatica ANZ @Informatica_ANZ
4 © Informatica. Proprietary and Confidential.
Agenda
• Introductions
• Informatica At-a-Glance
• Snowflake Overview
• Informatica + Snowflake – Better Together
• Panel Questions
5 © Informatica. Proprietary and Confidential.
Today’s Speakers
Daniel ClarkeHead of IoT, Big Data and Emerging Products APAC
Informatica
Clive Astbury,
Sales Engineer,Snowflake
6 © Informatica. Proprietary and Confidential.
27 new APAC Big Data Management Customers in 2017…
Banking & Insurance
Telecom
Transport, Mining & Logistics
7 © Informatica. Proprietary and Confidential.
Informatica in Big Data
312
32
77
120
198
250
0
50
100
150
200
250
300
2012 2013 2014 2015 2016 2017 2018
54%46%
Perpetual
Subscription
0
10
20
30
40
50
60
70
2015 Q3 2015 Q4 2016 Q1 2016 Q2 2016 Q3 2016 Q4 2017 Q1 2017 Q2
Big Data Customers
Adoption Production Total
8 © Informatica. Proprietary and Confidential.
Customer adoption trends
55%57% 56%
15%
25%
31%
0%
10%
20%
30%
40%
50%
60%
2016 2017 2018
Adoption++ Production
Gartner’s prediction for
Production in 2018 (14%)
Gartner’s prediction for
beyond pilot in 2018 (40%)
9 © Informatica. Proprietary and Confidential.
Next Generation Data Architecture
`
The Big Data Landscape & Informatica Direction Sumeet Agrawal
Director - Big Data Product Management
11 © Informatica. Proprietary and Confidential.
Technology Trend
Basic storage in NoSQL and HDFS
File system innovations
(eg HDFS, MapR-FS, etc)
Shared-everything Storage Systems (S3, Azure Blob)
Storage As a Service
On-Premise, Manual Deployment
Hosted, ManualDeployment
Fully Automated Cloud Deployment
Managed ServerlessDeployment
Basic processing in MapReduce
Cluster-aware and In-Memory processing
(eg YARN, Spark, Sqoop)
Elastic, Auto-Scalingprocessing
Compute As a Service
Most comprehensivedata integration
for Big Data
Most comprehensive &
intelligent data integration
for Big Data
Most comprehensive &
intelligent data management
for Big Data
Most comprehensive & intelligent hybrid
Data Mgmt Platform As A Service
Sto
rag
eC
om
pu
teD
ata
Ma
na
ge
me
nt
1 2 3 4
Google, YahooCloudera,Hortonworks,
MapREMR, HDInsight, Altus
Databricks,Qubole, AWS Glue
Ke
y P
lay
ers
12 © Informatica. Proprietary and Confidential.
Technology Trend - Proof points– On-premise DI and iPaaS vendors taking first step in Serverless
– Qubole investing in Auto scaling and Serverless
– Databricks, creator of Spark is getting more and more popular
• Microsoft announces strategic relationship with Databricks
• Spark is no more “open source”. Databricks started it’s own version of Spark
– AWS announces Serverless Glue, Athena
– Google Bigquery and Dataproc both are serverless
– Cloudera announces “Altus” to be in Serverless business
13 © Informatica. Proprietary and Confidential.
Mass Ingestion
File
Cloud Data Integration
Database(with CDC)
Cloud Data Lakes
Advance ML based analytics
Streaming Analytics
Ingestion@Scale
• We are extending our mass ingestion functionality to databases & streaming
• Using Database based mass ingestion, customers are quickly bring ten’s of thousands of relational table to cloud. This functionality will support
• Initial Load
• Incremental Load with CDC changes
• Schema Drift
• Using Streaming ingestion, customers can ingest data from IoT & other devices onto the data lake or message hubs
Streaming(IoT devices)
DWH Moderniza
tion
Cloud Data
Lakes
Advance Analytics
14 © Informatica. Proprietary and Confidential.
Single DI SolutionNext Gen compute engine for iPaaSUse cases-
DatawarehouseModernization,
Database Modernization
Hybrid Integration
PowerCenterEngine
Data Integration for Small to
Medium Workload
iPaaS DI Today
On-premises Hadoop
Deployment
Requires Big data skills
Static Scalability
High Operational cost
Big Data Management Today
Cloud Data Integration@Scale
New use cases-Data Science, ML,Streaming
Optimized for Cloud
New Serverless Spark Engine
Optimized for Big data workload
Cloud based compute cluster
No Big Data Skills
Auto Scale/TuneReduced
Operational Cost
Building a Single DI Offering
• Serverless iPaaS offering
• Informatica will own Compute cluster
• Using container and Kubernetes for compute cluster
• Will use open source project “Spark on Kubernetes”
15 © Informatica. Proprietary and Confidential.
Next Gen Data Integration @ Scale- Reference Architecture
Salesforce, Adobe Analytics Marketo, …
Discover &
Profile
Parse &
Prepare
Load to Amazon Redshift / S3
Amazon S3 Input bucket
Amazon S3 Output bucket
AmazonRedshift
1
23 4 5
Compute Cluster
Next Gen Data Integration @ Scale
Corporate Data Center (on-prem)
Databases
Application Servers
6Mass Ingestion on IICS
16 © Informatica. Proprietary and Confidential.
Large scale file migration between on-premises and cloud
Expanded migration of relational databases & streaming processes between on-premise and cloud
Continued orchestration innovationsComplete platform for a service
Automated deployment and management of Hadoop clusters
Continued connectivity innovationsMessaging, Storage, and DBs: Azure, AWS, GCP
Continued expansion of engine support
Distributed engines on HDInsights, EMR, Altus, Cloudera, Hortonworks or MapR on EC2, Azure and GCP
Continued expansion of deployment options
Single click on Azure and Amazon
Expanded migration of relational databases & streaming processes between on-premise and cloud
iPaaS @ Scale
Continued connectivity innovations
Spark serverless: Databricks, Qubole, DataProc
Dockers, Containers, Kubernetes
Cloud Ready Now
Ingestion
ServerlessManagement
Connectivity
Processing
Deployment
OVO is Lippo Group Digital’s Concierge Platform,integrating mobile payment, loyalty points, and exclusive priority deals.
Lippo Group Assets
OVO Merchant Partners
18 © Informatica. Proprietary and Confidential.
Big Data
Management
Vibe Data
Stream
Indonesian Conglomerate• Collect and Govern click stream data into Hadoop
Big Data Governance
Big Data Integration
o Collect level 8 click stream and analytics data in real-time directly from probe using VDS.o Ingest directly into the Hadoop architecture.
Phase 1
Collect click-stream data in
data in real-time with VDS
No hand coding for loading data
into Hadoop with BDM
19 © Informatica. Proprietary and Confidential.
Fast Data Lane implementation at Lippo
Visualization
Kafka
Alerts`ProcessRefine
Enrich
Deliver
Analyze
Streaming
• Sources: IoT, Gateways, Social Media, Clickstreams, Weblogs, … etc.
• Formats: XML, JSON, Avro
Existing Data Assets
Kafka
VDS AGENT
VDS AGENT
VDS AGENT
VDS AGENT`
VDS AGENT
ActionEvent Sense Reason Act
Data lake
Real time offers
Vibe Data
Stream
Big Data
Management
Big DataRelationshipManagement
Intelligent Data
Streaming
20 © Informatica. Proprietary and Confidential.
The Staging Toward Monetization and Business Optimization
21 © Informatica. Proprietary and Confidential.
Indonesian Conglomerate• Collect and Govern click stream data into Hadoop
Phase 2
Big Data
Management
Enterprise InformationCatalogue
Intelligent Data Lake
Big Data Relationship
ManagerData
Cleansing
Cleanse, catalogue, analyze and build reports on any data source…..
22 © Informatica. Proprietary and Confidential.
First steps … use Aggregate Pattern Matching
23 © Informatica. Proprietary and Confidential.
Complete 360 degree Customer ProfileCorrelating all customer information and transactions to understand their profile and preference in order to interact with them in personalized way
24 © Informatica. Proprietary and Confidential.
Smart Vending MachineReal-time OVO smart payment & face recognition
Face- Recognition (with sentiment reader)
Point of sales via OVO pay.
Suggestive Advertising
Smart Pricing
Y O U R D A T A , N O L I M I T S
Data and Advanced AnalyticsHave Arrived
Who needs to lead this?
Analytics
BusinessIT
Data ScienceAnalyticsStrategy
DataStrategy
CDAO
Ensure the correct models and algorithms are used to support business requirements
Understand how statistics and analytics can help improve business decisions
Ensure data is complete, available, and with a firm future delivery roadmap
So what do you need?
© 2018 Snowflake Computing Inc. All Rights Reserved.
Complexity Difficult to manage
Scalability Fixed
Diversity Structured data only
Elasticity Rigid. Need to plan ahead
Cost 24/7, plan for worst day
Legacy Data Platforms Modern Data Platforms
Managed by vendor
Unlimited
Instant
Pay for what you use
Structured & Semi-structured data
© 2018 Snowflake Computing Inc. All Rights Reserved.
What is Snowflake?
Built for the cloud
SQL Data Warehouse
Delivered as a service
© 2018 Snowflake Computing Inc. All Rights Reserved.
What is Snowflake?
Built for the cloud
SQL Data Warehouse
Delivered as a service
© 2018 Snowflake Computing Inc. All Rights Reserved.
How does Snowflakemake things easier?
© 2018 Snowflake Computing Inc. All Rights Reserved. 34
Minimal Management
NO Infrastructure
NO Tuning
NO Optimization
NO Indexing
NO Storage worries
NO Vacuuming
NO Partitioning
NO Required sorting
NO Workload mgmt.
NO Manual backups
© 2018 Snowflake Computing Inc. All Rights Reserved.
ETL/ELT
Snowpipe
XS
S
M
M
L
Sales
Data Science
M…
XLS
Multi-cluster
Global Services
Transactional Control
Security
Query Planning & Optimisation
Logical Model
AWS QuickSight
© 2018 Snowflake Computing Inc. All Rights Reserved.
But wait! There’s more…
© 2018 Snowflake Computing Inc. All Rights Reserved.
ETL/ELT
Snowpipe
XS
S
M
M
L
External
Finance
Sales
Data Science
M…
Test/Dev
Clone
Share
Data protection & time travel
XL
Multi-cluster
Structured & semi-structured
Global Services
Transactional Control
Security
Query Planning & Optimisation
Logical Model
AWS QuickSight
© 2018 Snowflake Computing Inc. All Rights Reserved.
What are customers doing with Snowflake?
© 2018 Snowflake Computing Inc. All Rights Reserved.
Modern data landscape
EDW
Data Sources
Data Lake
Data-Marts
BI, Analytics &Data Science
OLTP Databases
Enterprise Applications
DataProviders
Web/LogData
IoT
ETLor
ELT
DataConsumers
© 2018 Snowflake Computing Inc. All Rights Reserved.
Wow…so much to remember…
© 2018 Snowflake Computing Inc. All Rights Reserved.
Diversity
One place for all your data
Scalability
Any scale of data, users and
workloads
Flexible Cost
Pay for what you use, when
you use it
Simplicity
Simple,serverless,
plug-and-play
Elasticity
Size for whatyou needright now
© 2018 Snowflake Computing Inc. All Rights Reserved.
Informatica + SnowflakeBetter Together
45 © Informatica. Proprietary and Confidential.
Journey to Snowflake
2) Extend1) Prototype
New data consumption endpoint, could be an app or BI/Analytics
Existing data consumption endpoint, could be an application or BI/Analytics
DB DB DB
EDW
3) Lift-and-Shift
DB DB DB
EDW
DB DB DB
EDW
Beginning the Journey to Snowflake
46 © Informatica. Proprietary and Confidential.
Informatica + Snowflake Joint Solution
Intelligent Data Catalog
Data Integration & Management
+ +
47 © Informatica. Proprietary and Confidential.
Cloud Data Management
Analytics&
Visualizations
Business Apps Web Analytics
Optimized Platforms for your Data Journey
47
Traditional DBs
Data Sources
200+ Data Sources
Unstructured DataStructured Data Semi-structured Data
SaaS Apps
Govern
Push-down data transformations to Snowflake
Optimized Native Snowflake Connector*
Cloud Data Warehouse
Big Data
Intelligent Cloud Services
(iPaaS)
Cleanse
Catalog
ProtectConnect | Transform | Filter*Also available on PowerCenter and
Informatica Big Data Management
Additional Data
Management
Platform Services
48 © Informatica. Proprietary and Confidential.
Laureate Education
After Snowflake and Informatica
• No data load windows• One copy of data • Automatically scale up during peak times; only pay for
what you use• Drastically reduced processing time• Data Sharing with Blackboard
Business Scenario• Worldwide colleges, different rules, processes• 24/7 data availability requirements• GDPR compliance• Expensive, rigid legacy infrastructure
Negative Consequences• Data scatter• Delays in data availability
6-12 hours
Business
Platform
DigitalAnalyticsLegacy DWs
15 minutes
Business
Platform
Digital
AnalyticsInformatica
Case Study: Laureate Education
49 © Informatica. Proprietary and Confidential.
Power of the Informatica-Snowflake Integration
• Leverage high-performance compute by using Informatica’s mapping language
In-database transformation push down to Snowflake
• Faster loading of dataPartitioning of inbound data sets for optimal parallel loading
• Accelerate deployment in complex environments
Parameterization for rapid implementation
• Deliver data the way the business needs it without coding
Best-in-class transformation capabilities.
• Cloud data management and support, no matter where your data resides
Supports on-premises, hybrid and 100% PaaS Snowflake adoption patterns
• Leverage your investments in Hadoop and make them compatible with Snowflake
Spark-based push-down (Big Data Management Offering)
50 © Informatica. Proprietary and Confidential.
Snowflake Cross-Schema Pushdown Example
Taskflow
PDO Mapping
Thanks for joining us today
Get in contact with us today: [email protected]#InformaticaOpenHouse
Snowflake Computing @SnowflakeDB
Informatica ANZ @Informatica_ANZ