Nebula: Distributed Edge Cloud for Data-Intensive...

Single

Datacenter

Nebula: Distributed Edge Cloud

Nebula: Distributed Edge Cloud for Data-Intensive ComputingStudents: Mathew Ryden, Kwangsung Oh

PIs: Abhishek Chandra, Jon Weissman

Motivation

MapReduce is standard for data-intensive

computing.

But it is designed for a single data-center.

Widely distributed data does not fit its design

assumptions.

Design Goal & Architecture MapReduce on Nebula

Future Work

Nebula

Evaluation

Explore how external data can be inserted into

Nebula for aggregation and decomposition.

Expand the range of data-intensive frameworks

and applications ported to Nebula.

Partition resources across frameworks and

applications.

0

200

400

600

800

1000

1200

1400

1600

10 20 30

Run T

ime (

s)

Node Scalability of Nebula

MAP REDUCE

0

1000

2000

3000

4000

5000

6000

7000

8000

CSCI CSDI NEBULA CSCI CSDI NEBULA

Run T

ime (

s)

Comparison to CSCI and CSDI

MAP REDUCE

500MB

1GB

0

500

1000

1500

2000

2500

3000

Random LA Random LA Random LA

Run T

ime (

s)

Nebula scheduler comparison

MAP REDUCE

500MB

1GB

2GB

Nebula outperforms centralized

infrastructure.

0

500

1000

1500

2000

2500

3000

0% 10% 20% 30% 40% 50% 60% 65% 70%

Run T

ime (

s)

Fixed proportion of compute node failures

MAP REDUCE

Nebula’s location-awareness

improves performance.

Nebula provides fault-tolerance. Nebula provides scalability.

A location and context-aware Distributed Edge

Cloud Infrastructure.

Using computation and storage resources of edge

volunteer and dedicated nodes.

Nebula Application

Voluntary compute & storage nodes can easily

join, leave, and fail at any time.

Distributed data-intensive computing.

Location-aware.

Sandbox for security.

Fault tolerance.

Nebula MapReduce reduces network traffic.

The data is already dispersed on Nebula.

Only intermediate files need to be sent for results.

Nebula MapReduce provides fault-tolerance and

scalability.

Dispersed data can be processed by nodes in-

situ or close to the data on Nebula cloud.

Ex: Analyzing video feeds taken from distributed

airports looking for suspicious activity.

Results are only sent when there are

suspicious activities.

Lots of network traffic

Nebula ServicesDedicated Nodes

Data Nodes Compute Nodes

DataStoreMaster

NebulaCentral

NebulaMonitor

ComputePoolMaster

Volunteer Nodes

NebulaUsers

dcsg.cs.umn.edu

Nebula: Distributed Edge Cloud for Data-Intensive...

Documents

Transcript of Nebula: Distributed Edge Cloud for Data-Intensive...