Nebula: Distributed Edge Cloud for Data-Intensive...

1
Single Datacenter Nebula: Distributed Edge Cloud Nebula: Distributed Edge Cloud for Data-Intensive Computing Students: Mathew Ryden, Kwangsung Oh PIs: Abhishek Chandra, Jon Weissman Motivation MapReduce is standard for data-intensive computing. But it is designed for a single data-center. Widely distributed data does not fit its design assumptions. Design Goal & Architecture MapReduce on Nebula Future Work Nebula Evaluation Explore how external data can be inserted into Nebula for aggregation and decomposition. Expand the range of data-intensive frameworks and applications ported to Nebula. Partition resources across frameworks and applications. 0 200 400 600 800 1000 1200 1400 1600 10 20 30 Run Time (s) Node Scalability of Nebula MAP REDUCE 0 1000 2000 3000 4000 5000 6000 7000 8000 CSCI CSDI NEBULA CSCI CSDI NEBULA Run Time (s) Comparison to CSCI and CSDI MAP REDUCE 500MB 1GB 0 500 1000 1500 2000 2500 3000 Random LA Random LA Random LA Run Time (s) Nebula scheduler comparison MAP REDUCE 500MB 1GB 2GB Nebula outperforms centralized infrastructure. 0 500 1000 1500 2000 2500 3000 0% 10% 20% 30% 40% 50% 60% 65% 70% Run Time (s) Fixed proportion of compute node failures MAP REDUCE Nebula’s location-awareness improves performance. Nebula provides fault-tolerance. Nebula provides scalability. A location and context-aware Distributed Edge Cloud Infrastructure. Using computation and storage resources of edge volunteer and dedicated nodes. Nebula Application Voluntary compute & storage nodes can easily join, leave, and fail at any time. Distributed data-intensive computing. Location-aware. Sandbox for security. Fault tolerance. Nebula MapReduce reduces network traffic. The data is already dispersed on Nebula. Only intermediate files need to be sent for results. Nebula MapReduce provides fault-tolerance and scalability. Dispersed data can be processed by nodes in- situ or close to the data on Nebula cloud. Ex: Analyzing video feeds taken from distributed airports looking for suspicious activity. Results are only sent when there are suspicious activities. Lots of network traffic Nebula Services Dedicated Nodes Data Nodes Compute Nodes DataStore Master Nebula Central Nebula Monitor ComputePool Master Volunteer Nodes Nebula Users dcsg.cs.umn.edu

Transcript of Nebula: Distributed Edge Cloud for Data-Intensive...

Page 1: Nebula: Distributed Edge Cloud for Data-Intensive …dcsg.cs.umn.edu/Projects/Nebula/posters/Nebula-Poster...Nebula for aggregation and decomposition. Expand the range of data-intensive

Single

Datacenter

Nebula: Distributed Edge Cloud

Nebula: Distributed Edge Cloud for Data-Intensive ComputingStudents: Mathew Ryden, Kwangsung Oh

PIs: Abhishek Chandra, Jon Weissman

Motivation

MapReduce is standard for data-intensive

computing.

But it is designed for a single data-center.

Widely distributed data does not fit its design

assumptions.

Design Goal & Architecture MapReduce on Nebula

Future Work

Nebula

Evaluation

Explore how external data can be inserted into

Nebula for aggregation and decomposition.

Expand the range of data-intensive frameworks

and applications ported to Nebula.

Partition resources across frameworks and

applications.

0

200

400

600

800

1000

1200

1400

1600

10 20 30

Run T

ime (

s)

Node Scalability of Nebula

MAP REDUCE

0

1000

2000

3000

4000

5000

6000

7000

8000

CSCI CSDI NEBULA CSCI CSDI NEBULA

Run T

ime (

s)

Comparison to CSCI and CSDI

MAP REDUCE

500MB

1GB

0

500

1000

1500

2000

2500

3000

Random LA Random LA Random LA

Run T

ime (

s)

Nebula scheduler comparison

MAP REDUCE

500MB

1GB

2GB

Nebula outperforms centralized

infrastructure.

0

500

1000

1500

2000

2500

3000

0% 10% 20% 30% 40% 50% 60% 65% 70%

Run T

ime (

s)

Fixed proportion of compute node failures

MAP REDUCE

Nebula’s location-awareness

improves performance.

Nebula provides fault-tolerance. Nebula provides scalability.

A location and context-aware Distributed Edge

Cloud Infrastructure.

Using computation and storage resources of edge

volunteer and dedicated nodes.

Nebula Application

Voluntary compute & storage nodes can easily

join, leave, and fail at any time.

Distributed data-intensive computing.

Location-aware.

Sandbox for security.

Fault tolerance.

Nebula MapReduce reduces network traffic.

The data is already dispersed on Nebula.

Only intermediate files need to be sent for results.

Nebula MapReduce provides fault-tolerance and

scalability.

Dispersed data can be processed by nodes in-

situ or close to the data on Nebula cloud.

Ex: Analyzing video feeds taken from distributed

airports looking for suspicious activity.

Results are only sent when there are

suspicious activities.

Lots of network traffic

Nebula ServicesDedicated Nodes

Data Nodes Compute Nodes

DataStoreMaster

NebulaCentral

NebulaMonitor

ComputePoolMaster

Volunteer Nodes

NebulaUsers

dcsg.cs.umn.edu