BDAAS on the Cloud

BDaaS On The Cloud:Challenges And Optimizations

Abhishek Somani20th January 2017

Why Cloud?

Where Big Data falls

short:

• 6-18 month implementation time

• Only 27% of Big Data initiatives are classified as “Successful” in 2014

Rigid and inflexible

infrastructure

Non adaptive software services

Highly specialized

systems

Difficult to build and operate

• Only 13% of organizations achieve full-scale production

• 57% of organizations cite skills gap as a major inhibitor

3

1. Flexible Infrastructure2. Pay only for what you actually use3. Shared Storage4. Heterogenous Clusters

4

Why Cloud?

• Cloud Compute(Cluster) management– Challenges– Solutions– Advanced Optimizations

• Cloud Storage– Challenges– Solutions and Optimizations

5

Agenda

1. Properties:a. Ephemeralb. Volatile(Spot for AWS, Preemptible for GCP)

2. Challenges:a. Scale as per workloadb. Separation of compute and storagec. Job histories, log files, results all need to be persisted.d. Adapting YARN/HDFS to take into account ephemeral cloud nodes.

6

Cloud Compute

Up-scaling for MR jobs

Resource Manager

Node 1

Node 2

User

Submit Job

Launches MR AM

NodeManager

MR AppMaster

ContainerRequest

Allocate Resources

NodeManager

C1 C2

Task Progress

Up Scale Request

Cluster Manager

Add Node

NodeManager

C3 C4Node 3

Generic Up-scaling

Resource Manager

ClusterManager

MR AppMaster

Spark AppMaster

Tez AppMaster

Up Scale Request

Add Node

Node 2

Down-scaling

Resource Manager

NodeManager

C1 C2

C3 C4

NodeManager

C1 C2

C3 C4

NodeManager

C1 C2

C4C3

Status Update

Evaluates cluster is being underutilized and

can be down scaled

Selects node whose estimated task

completion time is lowest

Graceful Shutdown

User

Submits Job

Allocates container

Job1 Completes

Cluster Manager

Remove Node

Job 1Job 2Job 3

DecommissionNode

Node 1

Node 3

C3

C1

C1

C3

1. Upscalinga. Engine specific algorithmsb. Cannot just look at expected time(parallelism matters)

2. Downscalinga. Decommissioning takes timeb. Need to consider hour boundariesc. Stuck on mapper output

10

Why is it hard?

Job History – Terminated Cluster

Job History – Terminated Cluster

QuboleUIUser Cluster

Proxy

Job History Server

Clicks UI link

Authenticates the request

Finds cluster is down

Fetches jhist file from cloud

Jhist file

Rendered JobHist

Proxifies Link

1. Volatile Nodesa. Lower priced nodes bought in an auction (Spot Nodes in AWS, Preemptible in

GCE)2. Hybrid Clusters

a. Mix of stable and volatile nodes to improve stability3. Heterogenous Clusters

a. Preferred machine types may not be availableb. Preferred machine types may be more expensive than larger machines

4. Autoscaling Optimizationsa. Packing of tasksb. Upload intermediate data to cloud storagec. Recommission nodes

13

Advanced Optimizations

1. Cloud Compute(Cluster) managementa. Challengesb. Scalingc. Advanced Optimizations

2. Cloud Storagea. Challengesb. Solutions and Optimizations

14

Agenda

1. Properties:a. Simple key value storeb. Inexpensive.c. Accessed via REST APIs/SDKd. Is the source of truth.

2. Challenges:a. Connection establishment is expensiveb. Copying/Moving is expensive... no rename

3. Some positives:a. Prefix listing.b. PUTs are atomic: File is created when file is uploaded, unlike HDFS where it is

created on first write.c. Multipart

15

Cloud Storage

• Naive

• Smart

• Up to 1000x improvement

16

Prefix Listing

for path in [‘/x/y/a’, ‘/x/y/b’, ‘/x/z/c’, … ]:result << listObject(path)

pathList = listPrefix(‘/x’)while (entry = pathList.next()):

if entry in [‘/x/y/a’, ‘/x/y/b’, ‘/x/z/c’, … ]:result << entry

Storage OptimizationsC

1. Split Computation : Divide input files into tasks for Map-Reduce/Spark/Presto

2. Recovering Partitions

3. List Paths matching regex pattern (‘/x/y/z/*/*’)

4. and many more ..

17

Prefix Listing - Use Cases


• Normally:

– Write data to temporary location - atomically rename to final location

• With S3:

– Write data to final location

– Atomic PUTs deal with speculation/retries

• By default in Hive, DirectFileOutputCommitter in MR/Spark

• Tricky: retries/speculation must use same path

18

Direct Writes


• Object caches(per bucket): High gain for roles based accounts• Connection pools• Read ahead optimizations• Streaming upload

19

S3 Optimizations

• RubiX: Block level file cache• Metadata caching for ORC and Parquet

20

Cache! Cache! Cache!


• Cache blocks on local disks

• Open Source

• Engine agnostic

• Works well with auto-scaling

• Consistent Hashing to assign files or blocks to nodes.

21

RubiX


22

RubiX


23

Metadata CachingORC File Format

24

Metadata CachingParquet File Format

• Cache on a Redis server running on master• Effective and efficient split computation with PPD• ORC and Parquet• Engine agnostic

25

Metadata Caching

Thank You!

20th January 2017

BDAAS on the Cloud

Data & Analytics

Transcript of BDAAS on the Cloud