Scalable Elastic System Architecture (SESA)

80
Scalable Elastic System Architecture (SESA) Dan Schatzberg, Boston University Jonathan Appavoo, Boston University Orran Krieger,VMware Eric Van Hensbergen, IBM Research Austin Thursday, March 3, 2011

Transcript of Scalable Elastic System Architecture (SESA)

Page 1: Scalable Elastic System Architecture (SESA)

Scalable Elastic System Architecture (SESA)

Dan Schatzberg, Boston UniversityJonathan Appavoo, Boston University

Orran Krieger, VMwareEric Van Hensbergen, IBM Research Austin

Thursday, March 3, 2011

Page 2: Scalable Elastic System Architecture (SESA)

The goal

Perform more computation with fewer resources

Thursday, March 3, 2011

Page 3: Scalable Elastic System Architecture (SESA)

Fixed Resources

• Hardware as a fixed resource

• Focus on reducing computation’s need for hardware resources

• Multiplex hardware resources for different computations

Thursday, March 3, 2011

Page 4: Scalable Elastic System Architecture (SESA)

Elastic Resources

• Cloud Computing

• Pay as you go hardware

• Focus on providing hardware to the computation that requires it

Thursday, March 3, 2011

Page 5: Scalable Elastic System Architecture (SESA)

Time to scale hardware

Days

Fixed Hardware

Minutes

Cloud Computing

Thursday, March 3, 2011

Page 6: Scalable Elastic System Architecture (SESA)

Time to scale hardware

Days

Fixed Hardware

Minutes

Cloud Computing

Elastic Applications

Thursday, March 3, 2011

Page 7: Scalable Elastic System Architecture (SESA)

Time to scale hardware

Days

Fixed Hardware

Minutes

Cloud Computing

Elastic Applications

Milliseconds

?

Thursday, March 3, 2011

Page 8: Scalable Elastic System Architecture (SESA)

Interactive HPC

• Medical imaging application

• interactive

• 1 megapixel image

• quadratic memory consumption - ~14TB

Thursday, March 3, 2011

Page 9: Scalable Elastic System Architecture (SESA)

Interactive HPC

• Fixed Hardware

• Purchase a cluster

Thursday, March 3, 2011

Page 10: Scalable Elastic System Architecture (SESA)

Interactive HPC

• Cloud Computing

• Allocate a cluster

• Maintain interactivity

• 650+ EC2 instances - $8000 dollars / 8 hour day

Thursday, March 3, 2011

Page 11: Scalable Elastic System Architecture (SESA)

Can we do better?

Thursday, March 3, 2011

Page 12: Scalable Elastic System Architecture (SESA)

Where we’re starting

Treat elasticity as a first-class system characteristic

Thursday, March 3, 2011

Page 13: Scalable Elastic System Architecture (SESA)

OUTLINE1. THE PROBLEM

2. OBSERVATIONS

1. Top-Down Demand

2. Bottom-Up Support

3. Modularity

3. OUR TAKE ON A SOLUTION

4. PROTOTYPE & CHALLENGES

Thursday, March 3, 2011

Page 14: Scalable Elastic System Architecture (SESA)

Top-Down DemandSystem Interface

Hardware

Software

Thursday, March 3, 2011

Page 15: Scalable Elastic System Architecture (SESA)

Top-Down DemandSystem Interface

Hardware

Thursday, March 3, 2011

Page 16: Scalable Elastic System Architecture (SESA)

Top-Down DemandSystem Interface

Hardware

Thursday, March 3, 2011

Page 17: Scalable Elastic System Architecture (SESA)

Top-Down DemandSystem Interface

Hardware

Thursday, March 3, 2011

Page 18: Scalable Elastic System Architecture (SESA)

Top-Down DemandSystem Interface

Hardware

Thursday, March 3, 2011

Page 19: Scalable Elastic System Architecture (SESA)

Top-Down DemandSystem Interface

Hardware

Thursday, March 3, 2011

Page 20: Scalable Elastic System Architecture (SESA)

Events as Load

• Treat a service request as an event that is dispatched to resources

• As events occur, load increases

• As events are handled, load decreases

• Each layer being event-driven forces demand to flow top-down

Thursday, March 3, 2011

Page 21: Scalable Elastic System Architecture (SESA)

Bottom-Up SupportSystem Interface

Hardware

Thursday, March 3, 2011

Page 22: Scalable Elastic System Architecture (SESA)

Bottom-Up SupportSystem Interface

HardwareAllocate/Deallocate

ResourcesThursday, March 3, 2011

Page 23: Scalable Elastic System Architecture (SESA)

Bottom-Up SupportSystem Interface

Hardware

Thursday, March 3, 2011

Page 24: Scalable Elastic System Architecture (SESA)

Elastic Interface

• Support elasticity by interfacing via allocation and deallocation of physical or logical resources

• Each layer is constructed by being explicit with respect to resource consumption

• Be explicit with respect to time to meet a request

Thursday, March 3, 2011

Page 25: Scalable Elastic System Architecture (SESA)

ModularitySystem Interface

Hardware

Thursday, March 3, 2011

Page 26: Scalable Elastic System Architecture (SESA)

ModularitySystem Interface

Hardware

Thursday, March 3, 2011

Page 27: Scalable Elastic System Architecture (SESA)

ModularitySystem Interface

Hardware

Thursday, March 3, 2011

Page 28: Scalable Elastic System Architecture (SESA)

ModularitySystem Interface

Hardware

Thursday, March 3, 2011

Page 29: Scalable Elastic System Architecture (SESA)

Object model

• Objects can take advantage

• the semantics of their request patterns

• the lifetime of an instance

• the occupancy w.r.t memory, processing and communication

• We can optimize for elasticity by taking advantage of modularity in a system

Thursday, March 3, 2011

Page 30: Scalable Elastic System Architecture (SESA)

OUTLINE

1. THE PROBLEM

2. OBSERVATIONS

3. OUR TAKE ON A SOLUTION : SESA

1. EBB’s : Elastic Building Blocks

2. SEE: Scalable Elastic Executive - A LibOS

3. EPIC: Events as Interrupts

4. PROTOTYPE & CHALLENGES

Thursday, March 3, 2011

Page 31: Scalable Elastic System Architecture (SESA)

Architecture Overview

Hardware

Thursday, March 3, 2011

Page 32: Scalable Elastic System Architecture (SESA)

Architecture Overview

SSD

FAWN

Thursday, March 3, 2011

Page 33: Scalable Elastic System Architecture (SESA)

Architecture Overview

Partitioning

SSD

FAWN

Thursday, March 3, 2011

Page 34: Scalable Elastic System Architecture (SESA)

Architecture Overview

SSD

FAWN

Kittyhawk

Thursday, March 3, 2011

Page 35: Scalable Elastic System Architecture (SESA)

Architecture Overview

System Software

SSD

FAWN

Kittyhawk

Thursday, March 3, 2011

Page 36: Scalable Elastic System Architecture (SESA)

Architecture Overview

SSD

FAWN

Kittyhawk

HAL

Thursday, March 3, 2011

Page 37: Scalable Elastic System Architecture (SESA)

Architecture OverviewApplications

SSD

FAWN

Kittyhawk

HAL

Thursday, March 3, 2011

Page 38: Scalable Elastic System Architecture (SESA)

Architecture Overview

SSD

FAWN

Kittyhawk

?HAL

Thursday, March 3, 2011

Page 39: Scalable Elastic System Architecture (SESA)

Architecture Overview

SSD

FAWN

Kittyhawk

?HAL

Thursday, March 3, 2011

Page 40: Scalable Elastic System Architecture (SESA)

SESA

SEHAL SEHAL SEHAL

VM/Node

Elastic Partition of Nodes

VM/Node

VM/Node

SEMachine

SEExecutiveSEE SEE SEE

EBB Namespace

SE APP/SERVICE

Partitioning Layer

System Software Layers

Hardware Abstraction Layer

LibOS Layer

Component Layer

Thursday, March 3, 2011

Page 41: Scalable Elastic System Architecture (SESA)

EBB’s

EBB NameSpace

A new Component Model for expressing and

encapsulating fine grain elasticity.

The Next Generation of Clustered Objects.

Thursday, March 3, 2011

Page 42: Scalable Elastic System Architecture (SESA)

inc()

val() dec()

Memory

Processors

Memory

Processors

c c c c c c c c

inc

val dec

Cinc

val dec

Cinc

val dec

Cinc

val dec

Cinc

val dec

Cinc

val dec

Cinc

val dec

Cinc

val dec

C

dref(ctr)->inc();

val Clustered Objects (CO)

Thursday, March 3, 2011

Page 43: Scalable Elastic System Architecture (SESA)

What did we learn?

• Event-driven architecture for lazy and dynamic instantiation of resources

• Mechanism to create scalable software

Thursday, March 3, 2011

Page 44: Scalable Elastic System Architecture (SESA)

Elastic Building Blocks

• Programming Model for Elastic and Scalable Components

• Span multiple nodes

• Built in On Demand nature -- encapsulation of policies for both allocation and deallocation of resources

Thursday, March 3, 2011

Page 45: Scalable Elastic System Architecture (SESA)

SEE

SEExecutive

SEE SEE SEE

A Distributed Library OS Model designed to enable Elastic Software within the

context of legacy environments.

Next Generation of Libra

Thursday, March 3, 2011

Page 46: Scalable Elastic System Architecture (SESA)

Libra

PartitionController

Partition Partition PartitionApplication Application Application

JVMApp

App

AppApp

App

App

AppDB

LibraLibra LibraOSPurposeGeneral

Hypervisor

Gateway

Figure 1. Proposed system architecture.

nels, hypervisors run other operating systems with few or no mod-ifications [27, 5, 42]. By running an operating system (the con-troller) in a hypervisor partition, applications can be migrated in-crementally to an exokernel system. In the first step of the migra-tion, the application runs in its own partition instead of as an oper-ating system process but accesses system services on the controllerremotely through the 9P distributed filesystem protocol [32]. Later,as needed, these relatively expensive remote calls are replaced byefficient library implementations of the same abstractions.Proponents of exokernels object to hypervisors because “[hy-

pervisors] confine specialized operating systems and associatedprocesses to isolated virtual machines...sacrificing a single view ofthe machine” [25]. However, in our approach, a controller parti-tion provides a single view of the machine: applications runningin other partitions correspond to processes at the controller. This istrue even for applications that run on other machines in a cluster.Section 2 explains how Libra achieves this unified machine view.This paper makes three main contributions:

• The hypervisor approach for transforming existing systems intohigh-performance, specialized systems.

• A case study in which Nutch, a large distributed application thatincludes a commercial JVM (IBM’s J9 [4]), is transformed intosuch a system.

• Examples of Nutch-specific optimizations that result in goodperformance.The rest of the paper is organized as follows. Section 2 explains

the overall architecture we are proposing. Section 3 describes theimplementation of Libra and our port of J9 to Libra. Details ofour port of Nutch to J9/Libra, including specializations that im-prove its throughput, are described in Section 4. Section 5 evaluatesJ9/Libra’s performance on various benchmarks, including Nutch,the standard SPECjvm98 [39] and SPECjbb2000 [38] benchmarks,and two microbenchmarks. Section 6 discusses related work. Fi-nally, Section 7 outlines future work and Section 8 concludes thepaper.

2. DesignThis section explains our proposed architecture, which is depictedin Figure 1. At the bottom of the software stack, a hypervisorhosts a controller partition and one or more application partitions.The hypervisor interacts directly with the hardware to provisionhardware resources among partitions, providing high-level mem-ory, processor, and device management.Above the hypervisor, a general-purpose operating system such

as Linux runs as a “controller” partition. The controller is the pointof administrative control for the system and provides a familiar en-

vironment for both users and applications. Each application par-tition is launched from the controller by a script that invokes thehypervisor to create a new partition and load the application into it.This script also launches a gateway server that permits the applica-tion to access the controller’s resources, services, and environment.The gateway server is an extended version of Inferno [31],

which is a compact operating system that can run on other operatingsystems. Inferno creates a private, file-like namespace that containsservices such as the user’s console, the controller’s filesystem, andthe network (see Figure 2). The application accesses this names-pace remotely via the 9P resource sharing protocol [30], which runsover a shared-memory transport established between the controllerand application partitions. A more detailed description of our ex-tensions to Inferno and a preliminary performance analysis of thetransport is available in another paper [20].Note that nothing in the architecture requires applications to

access all resources through the gateway, because the hypervisorallows resources and peripherals to be dedicated to an applicationpartition and accessed directly. Alternatively, applications can use9P to access resources across the network, either directly or throughthe gateway. Facilities for redundant resource servers, fail over, andautomated recovery have also been explored [16].As in an exokernel system, each application is linked with a

library operating system (libOS). However, unlike in exokernelsystems, the libOS focuses only on performance-critical services;other services are obtained from the controller through the gateway.For example, Libra, the libOS described in this paper, contains athread library implementation but accesses files remotely.This approach not only reduces the cost of developing a libOS

but also reduces the cost of administering new partitions. Becauseapplications share filesystems, network configuration, and systemconfiguration with the controller, administering an additional ap-plication partition is cheaper than administering an additional op-erating system partition. Also, because 9P gateways run as the userwho launched the application, they inherit the same permissionsand limitations, taking advantage of existing security mechanismsand resource policies.Finally, because we use a hypervisor instead of an exokernel to

multiplex system resources, applications can use privileged execu-tion modes and instructions. This enables optimizations and easesmigration, because a libOS can be simply a pared-down, general-purpose operating system. The combination of supervisor-mode ca-pability and 9P for access to remote services is flexible: applicationpartitions can be like traditional operating systems, self-containedwith statically-allocated hardware resources; like microkernel ap-plications, with system services spread across various protectiondomains; or like a hybrid of the two architectures that makes sensefor the particular workload being executed.

3. J9/Libra ImplementationThis section describes the implementation of Libra and the port ofthe J9 [4] virtual machine to this new platform. It was a somewhatatypical porting process, since we were simultaneously portingJ9 to the Libra abstractions while designing and implementingnew Libra abstractions to support J9’s execution. We begin byintroducing the relevant aspects of J9 and briefly describing ourporting and debugging methodology. Next, we discuss the majorLibra subsystems needed by J9. Finally we highlight the majorlimitations of our current implementation.

3.1 J9 OverviewJ9 is one of IBM’s production JVMs and is deployed on morethan a dozen major platforms ranging from cell phones to zSeriesmainframes. Because it runs on such a diverse set of platforms, agreat deal of effort has been invested by the J9 team in defining

Architecture

Thursday, March 3, 2011

Page 47: Scalable Elastic System Architecture (SESA)

Libra

PowerPC Blades: Libra Workers

Pool of Libra PartitionsX86 Linux Front Ends

9p

$

$

Thursday, March 3, 2011

Page 48: Scalable Elastic System Architecture (SESA)

Libra

PowerPC Blades: Libra Workers

Pool of Libra PartitionsX86 Linux Front Ends

$ java -cp my.jar

9p

$

Thursday, March 3, 2011

Page 49: Scalable Elastic System Architecture (SESA)

Libra

PowerPC Blades: Libra Workers

Pool of Libra PartitionsX86 Linux Front Ends

9p

$

$ java -cp my.jar

Thursday, March 3, 2011

Page 50: Scalable Elastic System Architecture (SESA)

Libra

PowerPC Blades: Libra Workers

Pool of Libra PartitionsX86 Linux Front Ends

$ java -cp my.jar

9p

$ for ((i=0;i<44;i++))do java -cp my.jar &done

Thursday, March 3, 2011

Page 51: Scalable Elastic System Architecture (SESA)

Libra

PowerPC Blades: Libra Workers

Pool of Libra PartitionsX86 Linux Front Ends

$ java -cp my.jar

9p

$ for ((i=0;i<44;i++))do java -cp my.jar &done

Thursday, March 3, 2011

Page 52: Scalable Elastic System Architecture (SESA)

PowerPC Blades: Libra Workers

Pool of Libra PartitionsX86 Linux Front Ends

$ java -cp my.jar

9p

$ for ((i=0;i<44;i++))do java -cp my.jar &done

Libra

Thursday, March 3, 2011

Page 53: Scalable Elastic System Architecture (SESA)

What did we learn?

• Specialized environment for each application

• Lightweight system layer implementing services for performance

• General purpose OS for non-performance critical services

Thursday, March 3, 2011

Page 54: Scalable Elastic System Architecture (SESA)

SEE : A LibOS for SESA• Distributed LibOS that

can elastically span nodes

• Instances cooperate to support the allocation and deallocation of EBB’s

• Enables compatibility with Front End nodes running via unified 9p namespace

locality aware memory allocator

event dispatcher

inter-node communication protocols and

primitives

EBB InfrastructureFS Name Space Protocol (9p)

Scalable Elastic Executive (SEE)

Per-node EBB manifestations

SEHAL

Thursday, March 3, 2011

Page 55: Scalable Elastic System Architecture (SESA)

SEMachines and EPICs

SEHAL SEHAL SEHAL

SEMachine Hardware Abstraction Layer : EPIC

Thursday, March 3, 2011

Page 56: Scalable Elastic System Architecture (SESA)

Programmable Interrupt Controller

1 10 0

Interrupt

Execution

Source

Thursday, March 3, 2011

Page 57: Scalable Elastic System Architecture (SESA)

1 11 0

Interrupt

Execution

Source

Programmable Interrupt Controller

Thursday, March 3, 2011

Page 58: Scalable Elastic System Architecture (SESA)

1 11 0

Event

Action

Source

Programmable Interrupt Controller

Thursday, March 3, 2011

Page 59: Scalable Elastic System Architecture (SESA)

Elastic Programmable Interrupt Controller

Event

Action

Source

1 11 0 0 001

Thursday, March 3, 2011

Page 60: Scalable Elastic System Architecture (SESA)

1101

Event

Action

...

...

110 1

...

Elastic Programmable Interrupt Controller

Source

Thursday, March 3, 2011

Page 61: Scalable Elastic System Architecture (SESA)

Elastic Programmable Interrupt Controller• Programmed by the SEE

• Provides the minimum requirement of elastic applications - mapping load to resources

• Portable layer

• Take advantage of network features such as broadcast and multicast

Thursday, March 3, 2011

Page 62: Scalable Elastic System Architecture (SESA)

OUTLINE

1. THE PROBLEM

2. OBSERVATIONS

3. OUR TAKE ON A SOLUTION

4. PROTOTYPE & CHALLENGES

Thursday, March 3, 2011

Page 63: Scalable Elastic System Architecture (SESA)

PROTOTYPE APP

Kittyhaw

k

Elastic Matrix Cache

ElasticMatrixOps

Sage*

OL

SESA SAGE SERVICES

SEE: EBB’s + EH

AL

Traditional HW

Advanced HW

Thursday, March 3, 2011

Page 64: Scalable Elastic System Architecture (SESA)

Challenges and Discussion

Thursday, March 3, 2011

Page 65: Scalable Elastic System Architecture (SESA)

OUTLINE

1. THE PROBLEM

1. Pay as you go computing

2. Insufficient systems support for elasticity

2. OBSERVATIONS

3. OUR TAKE ON A SOLUTION

4. PROTOTYPE & CHALLENGES

Thursday, March 3, 2011

Page 66: Scalable Elastic System Architecture (SESA)

Provider

ConsumerSoftware

Pay as you go hardware

Thursday, March 3, 2011

Page 67: Scalable Elastic System Architecture (SESA)

Provider

ConsumerRequest

Software

Pay as you go hardware

Thursday, March 3, 2011

Page 68: Scalable Elastic System Architecture (SESA)

Provider

ConsumerSoftware

Pay as you go hardware

Thursday, March 3, 2011

Page 69: Scalable Elastic System Architecture (SESA)

Elastic Website

Provider

Consumer

Load Balancer

Thursday, March 3, 2011

Page 70: Scalable Elastic System Architecture (SESA)

Elastic Website

Provider

Consumer

Load Balancer

Thursday, March 3, 2011

Page 71: Scalable Elastic System Architecture (SESA)

Elastic Website

Provider

Consumer

Load Balancer

Thursday, March 3, 2011

Page 72: Scalable Elastic System Architecture (SESA)

Other Elastic Applications

• Analytics

• Batch computation

• Stream processing

Thursday, March 3, 2011

Page 73: Scalable Elastic System Architecture (SESA)

What’s the problem?

• Allocation/Boot-time

• Programmability

Thursday, March 3, 2011

Page 74: Scalable Elastic System Architecture (SESA)

Medical Imaging Application

• Megapixel image

• Quadratic algorithm

• (1 mil pixels * 4 bytes/pixel)^2 ~ 14 TB

• On Amazon EC2 ~ $8000 per day

Thursday, March 3, 2011

Page 75: Scalable Elastic System Architecture (SESA)

Snowflock

Provider

Consumer

Thursday, March 3, 2011

Page 76: Scalable Elastic System Architecture (SESA)

Snowflock

Provider

Consumer

Thursday, March 3, 2011

Page 77: Scalable Elastic System Architecture (SESA)

Snowflock

Provider

Consumer

Thursday, March 3, 2011

Page 78: Scalable Elastic System Architecture (SESA)

Distributing an ObjectNon-Distributed Object Instance

LR0

R1

R2

Region List LockRegion List

Other Data

Structures

Thursday, March 3, 2011

Page 79: Scalable Elastic System Architecture (SESA)

LR0

R1

R2

Root

LR0

Rep0

LR0

Rep2R2

LR1

Rep1

Thursday, March 3, 2011

Page 80: Scalable Elastic System Architecture (SESA)

Elastic Programmable Interrupt Controller

1 11 0

Event

Action

Source

Thursday, March 3, 2011