Appscale at CLOUDCOMP '09

Scalable and Open AppEngine Development and Deployment

Navraj Chohan Chris Bunch Sydney Pang Chandra Krintz Nagy Mostafa Sunil Soman

Rich Wolski

http://www.capgemini.com/technology-blog/2009/04/from_lamp_to_leap_and_beyond.php

Terminology

Infrastructure-as-a-Service (IaaS) e.g., Amazon Web Services Provides full system images

Platform-as-a-Service (PaaS) e.g., Google App Engine

Provides scalable runtime stack

Software-as-a-Service (SaaS) e.g., SalesForce, Gmail

Provides remote application access

•  Open-source, Platform-as-a-Service for research and engineering of cloud computing components, applications, and services

•  Automated deployment of applications to high-performance databases

•  Fine grain control over application environment •  Google App Engine apps hosting on your cluster

–  Real applications –  Familiar API (that is extensible for lock-in avoidance) –  Your data and code on your resources

From Google App Engine (GAE) to AppScale

•  GAE Application Programming Interface –  Datastore (get/put) –  Memcache –  URL Fetching –  Mail –  Images –  Authentication

•  Write Python/Java GAE app –  Use SDK locally to test and generate indexes

•  APIs implemented as non-scalable, simple versions

•  GAE Application Programming Interface –  Datastore (get/put) BigTable –  Memcache Memcached –  URL Fetching –  Mail GMail –  Images –  Authentication Google Accounts

•  Write Python/Java GAE app –  Use SDK locally to test and generate indexes

•  APIs implemented as non-scalable, simple versions –  Upload to Google resources

•  Highly scalable API implementation

Sandboxed Runtime

•  Restricted subset of library calls •  No reading/writing from/to file system •  Data persistence only via get/put interface •  Computation bounded: 30 secs per request •  Access web services over via HTTP / HTTPS

only (ports 80 and 443)

Recent GAE Additions

•  Python and JVM SDKs – JRuby, Clojure, etc. available through Java

•  Task Queue, Cron, XMPP APIs •  New SLAs for paying customers

– $0.10 per CPU core hour – $0.10 per GB bandwidth in – $0.12 per GB bandwidth out – $0.15 per GB data stored per month

Protocol Buffers

•  Google App Engine’s internal data format – And AppScale’s

•  Similar to C-style structs:

message Person { required int32 id = 1; optional string name = 2; }

•  AppScale extends the GAE SDK –  Replaces the simple, non-scalable API implementation

with pluggable, distributed, scalable components •  Using open-source solutions as available/possible •  Communication over SSL

•  Available as source and as system image – Each instance can implement any component

•  Self configuring as part of AppScale cloud deployment –  Deploys over

•  Virtual machine monitors (Xen, KVM) •  Infrastructure (IaaS) cloud layers

IaaS Cloud Systems •  Amazon Web Services (AWS)

–  Elastic Compute Cloud (EC2), Persistent Storage (S3, EBS) –  For-fee, as negotiated in SLA (CPU, network, storage) –  Vast resources available

•  Users access small (opaque) subset, can scale-out

•  Eucalyptus –  Open source implementation of the AWS APIs –  Inspiration for AppScale – familiar, widely-used API

implementation for execution on your cluster •  Limited only by the hardware you have available

Differences in AppScale Deployment Options

•  Xen / KVM: –  Static deployment

•  Can use as many nodes as are manually configured

•  Eucalyptus / EC2 –  Dynamic deployment

•  Can use as many nodes as the system can support (or pay for for EC2 deployment)

–  As part of ongoing/future work: support for dynamic scaling •  Front-end (user-facing) & back-end (data managment & computation) •  SLA renegotiation

AppScale System Layout

GAE App Developer (AppScale

Admin)

GAE App Users

AppScale tools

App Controller

ALB DB M/P

DB S/P

AS GAE App Users GAE App

•  AppLoadBalancer (ALB) •  AppServer (AS) •  Database Master/Slave/Peer (DB M/S/P)

AppController (AC)

•  SOAP Server written in Ruby – Runs on all nodes

•  Middleware layer •  Controls and sets up a node for use

– Sets up configuration files (data replication) – Sets up firewall for security

•  Master AC “heartbeats” all other nodes – Collects performance info as well

AppLoadBalancer (ALB)

•  Ruby on Rails application •  Handles authentication and routing of users

to AppServers •  Three copies are deployed via Mongrel

– Load balanced via nginx

Database Management

•  Five databases currently available: – HBase, Hypertable: Master / Slave – Cassandra, Voldemort: Peer / Peer – Clustered MySQL: Relational

•  Two main components – Protocol Buffer Server: Data access / storage – User / App Server: Authentication

AppServer (AS) •  Modified Google App Engine SDK •  App requests internally are Protocol Buffers

–  Forwards requests to PB Server •  Minimal request set:

–  Put(id) –  Get(id) –  Query: Equivalent to get_all_in_table –  Delete(id) –  Count: Total number of items in database –  GetSchema

AppScale Tools •  Ruby scripts that initiate AppScale

deployment –  Initializes the first AppController for use – Uploads AppEngine app

•  Conceptually similar to Amazon AWS EC2 tools – describe-instances – upload-app: Introduce additional apps –  terminate-instances

Fault Tolerance

•  System can survive the following failures: – AppServer failure – Database Slave failure – Database Peer failure – AppLoadBalancer failure * – AppController failure *

Testing Methodology •  Load testing done via the Grinder •  Test specifics:

–  Initially 3 users – 3 users added every 5 seconds – Done until 160 seconds have passed

•  Each user navigates the page, performs some scripted action

•  Measured total transactions performed and average response time

AppScale Evaluation Cluster

•  Three Grinder nodes, four AppScale nodes – One master, three slaves – Virtualized via Xen – Database: HBase (3x replication) 64 MB HDFS blocks

•  PBServer via Thrift; stores entire protocol buffers

•  Hardware – Quad-core 2.66 GHz machines –  8 GB of RAM – Connected via Gigabit Ethernet

Applications Tested •  Tasks - a to-do list

–  Read and write intensive (44 transactions per user) •  Cccwiki – allows users to edit web pages

–  Read intensive, updates only (74 transactions per user)

•  Guestbook – allows users to post messages –  Retrieves ten most recent posts only (9 transactions

per user) •  Shell – provides an interactive Python shell

–  Compute intensive (14 transactions per user)

Transactions per App

App Response Time

Comparison with Google

Room for Improvement

•  Current bottlenecks: – Queries perform filtering server-side – Filtering is done outside of the DB – AppEngine, PB Server are single-threaded – Entry point to some DBs is single-threaded

•  Future work will address these problems – Will also compare performance across DBs – e.g., BigTable-like DBs vs. P2P DBs

Related Work

•  AppDrop – Proof-of-concept Rails app

•  TyphoonAE – Relatively new (alpha release) – Runs MongoDB only

•  Microsoft Azure – Uses .NET as the platform – Has a similar pricing model to AppEngine

AppScale Recap

•  Distributed, multi-component system – Deployed as a single system image (self

configuring) •  Static deployment over Xen/KVM •  Dynamic deployment over Eucalyptus/EC2

•  Databases supported: – HBase, Hypertable, MySQL, Cassandra,

Voldemort •  Fault-tolerant

AppScale Recap

•  Open cloud research platform –  International user community

•  Goals – Easy to use and extend – Automatic deployment of PaaS cloud and

GAE apps on resources other than Google’s – Support real applications and users

•  Experimentation and testing in real environments

•  Current performance results are a baseline

Performance Improvements

•  AppEngine now multi-process, load balanced •  PB Server now multi-threaded •  Storing data like Google for HBase and

Hypertable – Three tables: Reference, Sort Ascending, Sort

Descending

Future Work

•  Expand out of the web services domain –  Investigating opportunities in streaming –  Integrated MapReduce support for high-

performance computing (HPC) – Co-locate AppEngines and use shared

memory •  Additional databases:

– MongoDB, Scalaris, CouchDB

Thanks!

•  To the AppScale team! – Co-lead Navraj Chohan – Advisor Prof. Chandra Krintz

•  To the open-source community •  To Google, NSF, and IBM for financial support •  To you all for coming out today •  Check us out on the web:

– http://appscale.cs.ucsb.edu

Appscale at CLOUDCOMP '09

Technology

Transcript of Appscale at CLOUDCOMP '09

09 09 09 03

10-09-16 Imago Roadshow presentation 16h.ppt€¦ · 2018-06-10 · Jan-09 Feb-09 Mar-09 Apr-09 May-09 Jun-09 Jul-09 Aug-09 Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10

MONGODB NOSQL SERIES Karol Rástočný 1. Prominent Users 2 AppScale, bit.ly, Business Insider, CERN LHC, craigslist, diaspora, Disney Interactive Media.

The Thorn Programming Language: Robust Concurrent Scriptingwiki.jvmlangsummit.com/images/a/a0/Field-Thorn-Overview-2010.pdf · AppScale cloud 32 HTTP gateway twitter app API page

Active Cloud DB at CloudComp '10

6th Slide Set Cloud Computing - Christian Baun · InfrastructureServices(Eucalyptus+OpenStack) PlatformServices(AppScale) 6thSlideSetCloudComputing Prof.Dr.ChristianBaun FrankfurtUniversityofAppliedSciences

The AppScale Cloud Platform”himanshuvaishnav.us/downloads/An_Evaluation_of_Distributed_Data_Stores_using_the_App...evaluation and comparison of these disparate systems by web software

업무가빨라지는그룹웨어 다우오피스 …office 20190-09 (¥) Q 19-09-09 1829 19-09-09 19-09-09 19-09-09 19-09-09 19-08-28 11:11 19-05-20 16.10 19-05-16151)4 19-04-19 1527

21-11591-dsj Doc 17 Filed 09/09/21 Entered 09/09/21 12:09 ...

AppScale + Neptune @ HPCDB

GEOS 112L 09-08-09, 09-10-09 - Cosmology. from .

SALSASALSASALSASALSA CloudComp 09 Munich, Germany Jaliya Ekanayake, Geoffrey Fox {jekanaya,gcf}@indiana.edu School of Informatics and Computing Pervasive.

Final Edition Calendar of Releases · 11/6/2009 · 11/05/08 01/28/09 04/08/09 05/06/09 06/03/09 07/01/09 08/12/09 09/09/09 04/08/09 130.1 05/06/09 111.0 3.2 06/03/09 97.9 8.7 30.7

AppScale Talk at SBonRails

LESSONS LEARNED FROM IMPLEMENTING A PAA SERVICE BY …aircconline.com/ijccsa/V7N2/7217ijccsa01.pdf · [2][3], Apache Stratos and OpenShift. This work focuses on AppScale, because

F75BET’09 (6D65) F80BET’09 (6D75) F90BET’09 (6D85) F100DET’09

Integration of High Performance Parallel Computing with ...grids.ucs.indiana.edu/ptliupages/publications/cloudcomp...High Performance Parallel Computing with Clouds and Cloud Technologies

CC Noviembre 2009 Nov 08 Dic 08 Ene 09 Feb 09 Mar 09 Abr ...€¦ · Nov 08 Dic 08 Ene 09 Feb 09 Mar 09 Abr 09 May 09 Jun 09 Jul 09 Ago 09 Sep 09 Oct 09 Nov 09 Agui 09 Acum 09 POBLACION

Pinax Tutorial 09/09/09

appscale: open-source platform-level cloud computing · appscale cloud computing • Remote access to distributed and shared cluster resources Potentially owned by someone else (e.g.