Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data...
-
Upload
nosqlmatters -
Category
Software
-
view
319 -
download
0
Transcript of Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data...
![Page 1: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/1.jpg)
Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven
Alexandre Vasseur, Pivotal @PivotalFrance
![Page 2: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/2.jpg)
© Copyright 2015 Pivotal. All rights reserved.
If you have one thing to do
Store Massive Data Sets
Achieve Continuous Innovation at Scale
Becoming Data Driven with Apps
![Page 3: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/3.jpg)
Data Driven Apps AGILE
DEV & DATA SCIENCE
MODERN, COLLABORATIVE
APP & DEV PLATFORM:
MODERN, CLOUD-ORIENTED
& OPEN
DATA FABRIC: MODERN
CLOUD-ORIENTED & OPEN
![Page 4: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/4.jpg)
© Copyright 2015 Pivotal. All rights reserved.
The Big Data Problem
Fragmentation Contraints Complexity
![Page 5: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/5.jpg)
© Copyright 2015 Pivotal. All rights reserved.
Pivotal + Hortonworks Alliance
• Started July 2014 around Ambari collaboration • Announcing Pivotal Big Data Suite
on Hortonworks Data Platform • Advanced support from world’s leading Hortonworks
support services • Joint engineering efforts and enhanced Pivotal HD
![Page 6: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/6.jpg)
© Copyright 2015 Pivotal. All rights reserved.
ODP - Standardize Hadoop Ecosystem
• Deliver ODP Core to build a versionned, packaged, tested set of Hadoop components.
• Focus on developing a platform, rather than projects • Initial scope on Apache Hadoop
HDFS / MR / Yarn / Ambari
Remove vendors lock-in
Ecosystem Effect
Shorter Innovation Cycles
http://opendataplatform.org
…
![Page 7: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/7.jpg)
© Copyright 2015 Pivotal. All rights reserved.
Open Sourced but not just Hadoop
• Open sourcing all Pivotal Big Data Suite components – Pivotal GemFire - premium in-memory NoSQL database
– Pivotal HAWQ - world’s leading SQL compliant enterprise SQL on Hadoop
– Pivotal Greenplum Database - advanced enterprise MPP analytic database with Hadoop interconnect
– SpringXD - Unified, distributed, and extensible system for data driven application development
![Page 8: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/8.jpg)
© Copyright 2015 Pivotal. All rights reserved.
HAWQ SQL on Hadoop
PROVEN AT SCALE PRODUCTIVE NATIVE on HADOOP / ODP OPEN & EXTENSIBLE
![Page 9: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/9.jpg)
© Copyright 2015 Pivotal. All rights reserved.
HAWQ SQL on Hadoop
10+ years R&D in Massively Parallel SQL SQL engine at peta scale analytics in world’s largest industries Mature cost based query optimizer Full SQL semantics Rich ecosystem of ELT/dataviz/BI & partners PL/*, build in analytics, R native framing All Hadoop formats (gz, Parquet, HAWQ etc) Data node short circuit reads (colocated, not M/R based) Predicate pushdown to Hive, HBase HAWQ PXF: Query federation to NoSQL, DB, etc
![Page 10: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/10.jpg)
© Copyright 2015 Pivotal. All rights reserved.
SpringXD Data from anywhere, to anywhere Real time & batch
Ingest + analytics + jobs orchestration
Developer friendly Built in connectors
With / without Spark
DSL
Your choice of Hadoop Your choice of messaging
Standalone, YARN & outside Hadoop
![Page 11: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/11.jpg)
© Copyright 2015 Pivotal. All rights reserved.
Simplify Data Driven Applications
• PaaS with NoSQL & Big Data choices built-in • Emergence of vertical services: Mobile, IoT, …
Data centric runtimes built in Java/PHP/Node.js/Ruby Python R/Shiny Scala SpringXD
Large choice of data services DB, clustered MySQL etc Memcache, Redis etc GemFire, Cassandra etc Hadoop, GreenPlum etc
Can run virtualized inside PaaS Can run multi-tenant-ified alongside PaaS
![Page 12: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/12.jpg)
© Copyright 2015 Pivotal. All rights reserved.
DEMO
PHD (or any ODP Core-based Hadoop Distribution)
HDFS
HAWQ (SQL on Hadoop)
GreenplumDB (Analytics DW)
GemFire (JSON/Object
in memory data grid)
Redis (Key Value Store)
Rab
bitM
Q
SpringXD (Stream Processing/scoring)
Spr
ingX
D
Clo
ud F
ound
ry D
ata
Ser
vice
s
HBase Hive
PXF (Filtered Pushdown)
Direct Store Federated
GPHDFS
Write behind Persistence
Analytic Apps Online Apps
Pivotal Big Data Suite
Spark
![Page 13: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/13.jpg)
© Copyright 2015 Pivotal. All rights reserved.
The New Data Imperatives
Converged Data & Cloud
Open Data-Driven Apps
![Page 14: Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake in becoming Data Driven. - NoSQL matters Paris 2015](https://reader034.fdocuments.us/reader034/viewer/2022042701/55a585471a28ab7d3b8b4657/html5/thumbnails/14.jpg)
A NEW PLATFORM FOR A NEW ERA
Meet us at the booth ! Come to do a “HAWQ in 2 min” lab
Win a Solo2 Beats Headphone !