CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient...

18
KEN BIRMAN Rao Professor of Computer Science Cornell University CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME SETTINGS

Transcript of CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient...

Page 1: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

KEN BIRMAN Rao Professor of Computer Science Cornell University

CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME

SETTINGS

Page 2: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

CAN THE CLOUD DO REAL-TIME?

Internet of Things, Smart Grid / Buildings / Cars, . . . Shared requirement:

➢ We want a system that can carry out some form of continuous monitoring, or continuous control.

➢ It will need to be robust despite “cloudy weather” and offer quick response, often with some form of consistency or fault-tolerance requirement added to the mix

Page 3: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

CLOUD COMPUTING FOR THE SMART GRID

Real-time collection of data from widely deployed Synchronized Phasor Measurement Unit (PMU) and other SCADA data sources

➢ Each PMU device captures 44 byte records at 30Hz ➢ One per “bus” but there can be many PMUs, so aggregated data rates is high

Robust real-time tracking enables shared, consistent situational awareness and coordination

3

Page 4: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

KILLER APPLICATIONS?

Over the horizon “grid radar” helpsoperators understand wide-area grid stress, disturbances

Tools (“apps for the smart grid”) help operators cooperate to solve problems, search knowledge base for past situations with similar fingerprint, explore what-if scenarios

4

Page 5: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

CLOUD COMPUTING FOR THE SMART GRID Why use the cloud? It comes down to money…

By reusing today’s scalable cloud infrastructure, we: ➢ Benefit from a low-cost solution ➢ Leverage a proven, universally accessible technology ➢ The cloud is hosted at geographically diverse places

But cloud platforms aren’t known for high assurance

5

Page 6: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

THE CAP DILEMMA: PICK 2 FROM {CONSISTENCY, AVAILABILITY AND PARTITION TOLERANCE}

Today’s cloud offers scalable snappy response, but is optimized for applications with weak security needs. It lacks

➢ Hardened network protocols aimed at consistent but tightly controlled sharing for collaboration

➢ A new distributed security model supporting total control by regional operator, controlled data flows

Our approach is to run a stronger infrastructure within Amazon’s EC2, augmenting the standard solution

6

Page 7: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

FROM THE SENSOR TO THE SHARD

7

1

1

1

The shard members keep logs of values received indexed by time.

Due to network delay, not all have the same data at the same time.

We use IronStack as our transport layer, then run TCP or TCP/SSL on it

Private network portion

Internet portion

Page 8: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

GRIDCLOUD: MILE HIGH OVERVIEW

8

Page 9: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

MAIN COMPONENTS: RT-HDFS

GridCloud File System: A real-time in-memory file system for secure, strongly consistent real-time mirrored data sharing, extends HDFS ➢ Accepts streams of updates, offers a convenient snapshot feature ➢ Optimized for management of very large memory-mapped files ➢ Leverages RDMA functionality for network line-speed data transfers ➢ Easily integrated with Hadoop

9Leader developer: Weijia Song. Under the hood, makes use of Isis2 (Birman)

Page 10: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

MAIN COMPONENTS: GC-COLLAB

GridCloud Collaboration Tool: A tool for creating a kind of sharable virtual iPad ➢ It graphs the current power network and can show you the status of any line

at a click ➢ Various “apps” can be dragged onto the network and this triggers actions,

like a transient stability analysis or listing “similar network states seen in the past” (we’re the framework. Other people build these apps)

➢ Shared with real-time consistency as needed

10MEng students reporting to Theo Gkountouvas. Leverages Live Distributed Objects + Isis2 (Ostrowski, Birman)

Page 11: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

MAIN COMPONENTS: IRONSTACK

IronStack: A software defined network manager ➢ Focus is on “owned” networks operating under difficult conditions ➢ Employs SDN routers and uses a variety of techniques to circumvent

disruption in the event of storms, component failures. ➢ Interfaces aimed at owners who may have limited IT skill sets

11

Elegant…

Dead Simple

Rock SolidLead developer: Z Teo

Page 12: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

MAIN COMPONENTS: DMAKE

DMake: Manages your GridCloud applications ➢ Based on the popular Unix “makefile” concept ➢ But generalized to support distributed programs where their operating

parameters can be modified at runtime ➢ It handles system repair after failures, load balancing, mapping of your

computation to the cloud computing nodes, etc ➢ Incredibly easy to use.

12Lead developer: Theo Gkountouvas, uses Isis2

Page 13: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

UNDER THE COVERS: POWERED BY ISIS2

Used internally by these other tools •Provides secure, fault-tolerant data replication, coordination and self-repair. Lead: Birman •Employs cutting edge “virtual synchrony” programming model (basis of CORBA FT standard) •Open source, more than 4250 downloads to date from isis2.codeplex.com

13

Egyptian myth: After her brother Osiris was torn apart by Seth, Isis restored him to life

Page 14: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

CONSISTENCY MODEL: INTEGRATES VIRTUAL SYNCHRONY WITH PAXOS Virtual synchrony is a “consistency” model: •Membership epochs: begin when a new configuration is installed and reported by delivery of a new “view” and associated state •Protocols run “during” a single epoch. A new view is reported if a failure occurs

14

pq

r

s

t

Time: 0 10 20 30 40 50 60 70

pq

r

s

t

Time: 0 10 20 30 40 50 60 70

Synchronous execution Virtually synchronous execution

Non-replicated reference executionA=3 B=7 B = B-A A=A+1

Page 15: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

WHY NOT JUST UDP MULTICAST?

15

Isis2 user object

Isis2 user object

Isis2 user object

Isis2 library

Group instances and multicast protocolsFlow Control

Membership Oracle

Large Group Layer TCP tunnels (overlay)Dr. Multicast Platform Security

Reliable Sending Fragmentation Group Security

Sense Runtime EnvironmentSelf-stabilizing

Bootstrap ProtocolSocket Mgt/Send/Rcv

SendCausalSend

OrderedSend SafeSend Query....

Message Library “Wrapped” locks Bounded Buffers

Oracle Membership

Group membership

Report suspected failures

Views

Other group members

◻ These systems are complex, especially if you want to run on platforms like EC2

Page 16: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

WHY NOT JUST UDP MULTICAST?

16

Isis2 user object

Isis2 user object

Isis2 user object

Isis2 library

Group instances and multicast protocolsFlow Control

Membership Oracle

Large Group Layer TCP tunnels (overlay)Dr. Multicast Platform Security

Reliable Sending Fragmentation Group Security

Sense Runtime EnvironmentSelf-stabilizing

Bootstrap ProtocolSocket Mgt/Send/Rcv

SendCausalSend

OrderedSend SafeSend Query....

Message Library “Wrapped” locks Bounded Buffers

Oracle Membership

Group membership

Report suspected failures

Views

Other group members

SafeSend and Send are two of the protocol components hosted over what we call the large-scale properties sandbox. The sandbox addresses issues like flow control, security, etc. All

protocols share and benefit from those properties

◻ These systems are complex, especially if you want to run on platforms like EC2

Page 17: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

WHY NOT JUST UDP MULTICAST?

17

Isis2 user object

Isis2 user object

Isis2 user object

Isis2 library

Group instances and multicast protocolsFlow Control

Membership Oracle

Large Group Layer TCP tunnels (overlay)Dr. Multicast Platform Security

Reliable Sending Fragmentation Group Security

Sense Runtime EnvironmentSelf-stabilizing

Bootstrap ProtocolSocket Mgt/Send/Rcv

SendCausalSend

OrderedSend SafeSend Query....

Message Library “Wrapped” locks Bounded Buffers

Oracle Membership

Group membership

Report suspected failures

Views

Other group members

The SandBox itself is mostly composed of “convergent” protocols that use probabilistic methods

SafeSend and Send are two of the protocol components hosted over what we call the large-scale properties sandbox. The sandbox addresses issues like flow control, security, etc. All

protocols share and benefit from those properties

◻ These systems are complex, especially if you want to run on platforms like EC2

Page 18: CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient stability analysis or listing “similar network states seen in the past” (we’re

SUMMARY

With help from many organizations, Cornell is creating the world’s most robust, secure and consistent system for •Monitoring sensors, like PMUs, even at large scale and with high data rates •Hosting smart applications •Enabling collaborative problem solving, for example by offering grid operators sharable “virtual iPad” that gives easy access to powerful applications

Our prototype enhances Amazon’s EC2 cloud to host this solution cost-effectively with no compromise in its key properties

18