Alluxio: Unify Data at Memory Speed; 2016-11-18

25
UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li @ AMPLab End of Project Celebration November 18 th , 2016

Transcript of Alluxio: Unify Data at Memory Speed; 2016-11-18

Page 1: Alluxio: Unify Data at Memory Speed; 2016-11-18

UNIFY DATA AT MEMORY SPEED Haoyuan (HY) Li @ AMPLab End of Project Celebration

November 18th, 2016

Page 2: Alluxio: Unify Data at Memory Speed; 2016-11-18

HISTORY

2

Trex12-13

Page 3: Alluxio: Unify Data at Memory Speed; 2016-11-18

HISTORY

2

Trex12-13

Tachyon13-15

Page 4: Alluxio: Unify Data at Memory Speed; 2016-11-18

HISTORY

2

Trex12-13

Tachyon13-15

Alluxio15-

Page 5: Alluxio: Unify Data at Memory Speed; 2016-11-18

FASTEST-GROWING BIG DATA PROJECT

3

Page 6: Alluxio: Unify Data at Memory Speed; 2016-11-18

FASTEST-GROWING BIG DATA PROJECT

3

• Fastest growing open-source project in the big data ecosystem

• 400+ contributors from 100+ organizations

• Running world’s largest production clusters

• Welcome to join the community!

Page 7: Alluxio: Unify Data at Memory Speed; 2016-11-18

CURRENT STATUS

4

Haoyuan Li, CEO

Alluxio (formerly Tachyon) Co-creator, Joined AMPLab Ph.D. Program 2011FOUNDER

INVESTOR

TEAM

From AMD, Dell, Google, Palantir, Uber, Yahoo; Experts in Distributed Systems

MSs and PhDs in CS from CMU,, Stanford, UC Berkeley

Top 10 Committers of the Alluxio Open Source Project

We are Hiring!

COMPANY Founded 2015

Page 8: Alluxio: Unify Data at Memory Speed; 2016-11-18

BIG DATA ECOSYSTEM YESTERDAY

5

Page 9: Alluxio: Unify Data at Memory Speed; 2016-11-18

BIG DATA ECOSYSTEM TODAY

5

Page 10: Alluxio: Unify Data at Memory Speed; 2016-11-18

5

BIG DATA ECOSYSTEM ISSUES

Page 11: Alluxio: Unify Data at Memory Speed; 2016-11-18

BIG DATA ECOSYSTEM WITH ALLUXIO

5

FUSE Compatible File System

Hadoop Compatible File System

Native Key-Value Interface

Native File System

GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface

Page 12: Alluxio: Unify Data at Memory Speed; 2016-11-18

BIG DATA ECOSYSTEM WITH ALLUXIO

5

FUSE Compatible File System

Hadoop Compatible File System

Native Key-Value Interface

Native File System

Enabling Application to Access Data from any Storage System at Memory-speed

GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface

Page 13: Alluxio: Unify Data at Memory Speed; 2016-11-18

WHY ALLUXIO

6

Co-located compute and data with memory-speed access to data

Virtualized across different storage systems under a unified namespace

Scale-out architecture

File system API, software only

Page 14: Alluxio: Unify Data at Memory Speed; 2016-11-18

ALLUXIO BENEFITS

7

Unification

New workflows across any data in any storage system

Orders of magnitude improvement in run time

Choice in compute and storage – grow each independently, buy only what is needed

Performance Flexibility

Page 15: Alluxio: Unify Data at Memory Speed; 2016-11-18

TRUSTED BY THE WORLD LEADING COMPANIES

8

Page 16: Alluxio: Unify Data at Memory Speed; 2016-11-18

ALLUXIO USE CASES

9

Accelerating I/O to and from remote storage

Managing data across disparate storage systems

Sharing data across workloads at memory speed

Page 17: Alluxio: Unify Data at Memory Speed; 2016-11-18

ACCELERATE I/O TO/FROM REMOTE STORAGE

10

Baidu’s PMs and analysts run

interactive queries to gain insights

into their products and business

• 200+ nodes deployment

• 2+ petabytes of storage

• Mix of memory + HDD

ALLUXIO

Baidu File System

Page 18: Alluxio: Unify Data at Memory Speed; 2016-11-18

ACCELERATE I/O TO/FROM REMOTE STORAGE

10

The performance was amazing. With Spark SQL alone, it took 100-150 seconds to finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. - Baidu

RESULTS

• Data queries are now 30x faster with Alluxio

• Alluxio cluster runs stably, providing over 50TB of RAM space

• By using Alluxio, batch queries usually lasting over 15 minutes were transformed into an interactive query taking less than 30 seconds

Baidu’s PMs and analysts run

interactive queries to gain insights

into their products and business

• 200+ nodes deployment

• 2+ petabytes of storage

• Mix of memory + HDD

ALLUXIO

Baidu File System

Page 19: Alluxio: Unify Data at Memory Speed; 2016-11-18

SHARE DATA ACROSS JOBS @ MEMORY SPEED

11

Barclays uses query and machine

learning to train models for risk

management

• 6 node deployment

• 1TB of storage

• Memory only

ALLUXIO

Relational Database

Page 20: Alluxio: Unify Data at Memory Speed; 2016-11-18

SHARE DATA ACROSS JOBS @ MEMORY SPEED

11

Thanks to Alluxio, we now have the raw data immediately available at every iteration and we can skip the costs of loading in terms of time waiting, network traffic, and RDBMS activity. - Barclays

RESULTS

• Barclays workflow iteration time decreased from hours to seconds

• Alluxio enabled workflows that were impossible before

• By keeping data only in memory, the I/O cost of loading and storing in Alluxio is now on the order of seconds

Barclays uses query and machine

learning to train models for risk

management

• 6 node deployment

• 1TB of storage

• Memory only

ALLUXIO

Relational Database

Page 21: Alluxio: Unify Data at Memory Speed; 2016-11-18

MANAGE DATA ACROSS STORAGE SYSTEMS

12

• 200+ nodes deployment

• 6 billion logs (4.5 TB) daily

• Mix of Memory + HDD

ALLUXIO

Qunar uses real-time machine

learning for their website ads.

Page 22: Alluxio: Unify Data at Memory Speed; 2016-11-18

MANAGE DATA ACROSS STORAGE SYSTEMS

12

We’ve been running Alluxio in production for over 9 months, Alluxio’s unified namespace enable different applications and frameworks to easily interact with data from different storage systems - Qunar

RESULTS

• Data sharing among Spark Streaming, Spark batch and Flink jobs provide efficient data sharing

• Improved the performance of their system with 15x – 300x speedups

• Tiered storage feature manages storage resources including memory, SSD and disk

• 200+ nodes deployment

• 6 billion logs (4.5 TB) daily

• Mix of Memory + HDD

ALLUXIO

Qunar uses real-time machine

learning for their website ads.

Page 23: Alluxio: Unify Data at Memory Speed; 2016-11-18

ALLUXIO, INC PRODUCT OFFERINGS

13

Capa

bilit

y/Va

lue

Technology Validation

Alluxio Open Source

Open Source

Alluxio Community

Edition (ACE)

Accelerate Adoption

Alluxio Manager

Open Source

Alluxio Enterprise

Edition (AEE)

Enterprise Deployment

• Kerberos Authentication • Data Replication • Support

Alluxio Manager

Open Source

Page 24: Alluxio: Unify Data at Memory Speed; 2016-11-18

GOING INTO THE FUTURE

14

Page 25: Alluxio: Unify Data at Memory Speed; 2016-11-18

Congrats to the AMPLab!Thank you!Contact: [email protected] or [email protected]

15