Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability...
Transcript of Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability...
![Page 1: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/1.jpg)
Alluxio: Unify Data at Memory SpeedProduct Overview
September 26, 2017
![Page 2: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/2.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 2
Agenda
2
1
2
3
Why we built Alluxio
Alluxio’s innovations
Use cases
![Page 3: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/3.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 3
Data Ecosystem Yesterday
•One Compute Framework• Single Storage System• Co-located
ETL
ETL
ETL
![Page 4: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/4.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 4
Data Ecosystem Today
…
• Many Compute Frameworks
• Multiple Storage Systems• Most not co-located
…
![Page 5: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/5.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 5
Data Ecosystem Issues
• Each application manage multiple data sources
• Add/Removing data sources require application changes
• Storage optimizations requires application change
• Lower performance due to lack of locality
…
…
![Page 6: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/6.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 6
Data Ecosystem Challenges
2 Data Freshness• Real time data?• Cross-network movement is slow• Each ETL creates more lag
4 Security & Governance• Data security & governance is
increasingly complex
1 Speed & Complexity• Many storage & compute systems• Integration and interoperability issues
(on prem, hybrid, cloud)• Many departments & groups
3 Cost • Data and App explosion driving cost up• Data duplication
6
Heavy integrations create painful organizational drag
![Page 7: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/7.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 7
This is why we built AlluxioA unified data solution for the digital economy
![Page 8: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/8.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 8
Data Ecosystem with Alluxio
• Apps only talk to Alluxio
• Simple Add/Remove
• No App Changes
• Highest performance in Memory
• No Lock in
Native File System Hadoop Compatible File System
REST Web Service Key-Value Interface
HDFS Interface Amazon S3 Interface Swift Interface NFS Interface
…
…
![Page 9: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/9.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 9
Fastest Growing Big Data Open Source Project
0
100
200
300
400
5000 10 20 30 40 45
Num
ber
of C
ontr
ibut
ors
Open Source Contributors by Month (Github)
Alluxio
Spark
Kafka
Redis
HDFS
Cassandra
Hive
Fastest Growing open-source project in the big data ecosystem
Running world’s largest production clusters
600+ Contributors from 100+ organizations
![Page 10: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/10.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 10
Selection of customers
![Page 11: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/11.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 11
Alluxio Design Principles
2 Data Sharing• Don’t own the data• Multiple apps sharing common data• Data stored in multiple, hybrid systems
4 Enterprise Class• Distributed architecture• Commodity hardware• Service-oriented• High availability• Security
1 Big Data & Machine Learning• Interoperability with leading projects• Large scale data sets• High IO
3 High Speed Data Access• Remote data• Hot/warm/cold data• Temporary data• Read/write support
11
![Page 12: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/12.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 12
Alluxio Innovation:
Unified NamespaceEnables effective data management across different Under Stores
Uses Mounting with Transparent Naming
![Page 13: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/13.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 13
Alluxio Innovation:
Unified NamespaceCreate a catalog of available data sources for Data Scientists
/finance/customer-transactions//finance/vendor-transactions//operations/device-logs//operations/phone-call-recordings//operations/check-images//research/us-economic-data//research/intl-economic-data//marketing/advertising-dataset//marketing/marketing-funnel-dataset/
alluxio://
![Page 14: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/14.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 14
Alluxio Innovation:
Server-side API TranslationConvert from Client-side Interface to Native Storage Interface
HDFS Interface
HDFS Interface S3A Interface Swift InterfaceGoogle Cloud Interface
![Page 15: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/15.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 15
Alluxio Innovation:
Server-side API TranslationConvert between different versions of HDFS
HDFS 2.7 Interface
HDP 2.4 InterfaceCDH 5.6 Interface MAPR 5.2 Interface
![Page 16: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/16.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 16
Alluxio Innovation:
Intelligent CacheLocal performance from remote data using native multi-tier storage
RAM SSD HDD
Hot Warm Cold
Read & Write BufferingTransparent to App
Policies for pinning, promotion/demotion, TTL
![Page 17: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/17.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 17
Alluxio Innovation:
Intelligent CacheMaintain read & write operations in the event of an outage
RAM SSD HDD
Hot Warm Cold
Read & Write BufferingTransparent to App
Policies for pinning, promotion/demotion, TTL
X
![Page 18: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/18.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 18
Where to use AlluxioFinding high-fit Alluxio use-cases
Compute ZoneStandalone or managed with Mesos or Yarn
Storage in Different Availability ZoneEither on-prem or cloud
Alluxio is installed with or near compute to unify data stores, stage remote data, and improve system performance.
Spark Tensorflow Presto
HDFS
Guidelinesü Compute separated from storageü Distributed computeü I/O or network latency existsü Unification of many storage systemsü Applications sharing long lived data
More checks result in higher fit applications
![Page 19: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/19.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 19
Where to use AlluxioFinding high-fit Alluxio use-cases
Compute ZoneStandalone or managed with Mesos or Yarn
Storage in Different Availability ZoneEither on-prem or cloud
Alluxio is installed with or near compute to unify data stores, stage remote data, and improve system performance.
Spark Tensorflow Presto
HDFS
Example First ProjectsüBig Data Hybrid StorageüCommon Data CatalogüData Center ContainerizationüCloud Migrationü ETL Alternative
![Page 20: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/20.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 20
Alluxio Offerings
Cap
abili
ty/V
alue
TechnologyValidation
Alluxio OpenSource (AOS)
Open Source
Alluxio EnterpriseEdition (AEE)
EnterpriseDeployment
• Kerberos Authentication
• LDAP Integration• Encryption• Data Replication• Fast Durable Write• Support
Alluxio Manager
Open Source
![Page 21: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/21.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 21
Use Cases
![Page 22: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/22.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 22
Next Gen Analytics PlatformLeading US TechnologyCompany
![Page 23: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/23.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 23
HPC/Deep Learning Partnership -
Alluxio maximizes GPU investment:
• Self-serve data access for data scientists
• Rapid integration of new data sources
• Improved memory management & performance
![Page 24: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/24.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 24
Machine Learning Case Study –
Challenge –Slow training of model for algorithmic trading in $46B data driven Hedge Fund
Data access was slow, costing them $$ in compute cost and lower modeler productivity
SPARK
HDFS
SPARK
HDFS
Solution –With Alluxio, data access are 10-30X faster
Impact –Increased efficiency on training of ML algorithm, lowered compute cost and increased modeler productivity, resulting in 14 day ROI of Alluxio
MES
OS
MES OS
Public Internet
Public Internet
![Page 25: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/25.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 25
Consumer Intelligence Use Case – Top 3 Telco
Challenge –Desired a central view of consumer information in near real time for proactive support.
Many HDFS, different distributions, many incompatible versions. On-prem & cloud. Integration through heavy ETL.
HADOOP
Solution –Alluxio integrates data into central catalog for fast access to consumer interaction records.
Impact –Reduced integration timeFaster data speed & freshness
ML HADOOP
HDFS HDFS HDFS
ML
ETL
HDP
HDFS
CDH
HDFS
MAPR
HDFS
HDFS
![Page 26: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/26.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 26
Big Data Case Study – Top 3 Retailer
Challenge –Bottleneck in Trend Analysis of mission critical daily sales and inventory management
Queries were slow / not interactive, resulting in operational inefficiency
SPARK
HDFS
SPARK
HDFS
Solution –With Alluxio, data queries are 10X faster
Impact –Higher operational efficiency
Use case: http://bit.ly/2ook8Nh
![Page 27: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/27.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 27
Big Data Case Study –
27
Challenge –Gain end to end view of business with large volume of data
Queries were slow / not interactive, resulting in operational inefficiency
SPARK
TERADATA
SPARK
TERADATA
Solution –ETL Data from Teradata to Alluxio
Impact –Faster Time to Market – “Now we don’t have to work Sundays”
Use Case: http://bit.ly/2oMx95W
![Page 28: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/28.jpg)
Confidential © Alluxio, Inc. All Rights Reserved. 28
Enabling Next Gen Big Data Analytics
1
2
3
Unified Storage Bridge
Unified Cache Management
Security & Governance
![Page 29: Alluxio: Unify Data at Memory Speed · Compute Zone Mesosor Yarn Storage in Different Availability Zone Either on-premor cloud Alluxio is installed with or near compute to unify data](https://reader034.fdocuments.us/reader034/viewer/2022050308/5f70289f6094573931187025/html5/thumbnails/29.jpg)
Twitter.com/alluxio
Linkedin.com/alluxio
Websitewww.alluxio.com
@
Social Media
á
�
Confidential © Alluxio, Inc. All Rights Reserved. 29
Thank You!