Building Web APIs that Scale

Designing for Graceful Degradation

Evan Cooke, Twilio, CTO

@emcooke

Building Web APIs that Scale

Safe Harbor

Safe harbor statement under the Private Securities Litigation Reform Act of 1995:

This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties

materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results

expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be

deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other

financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any

statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.

The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new

functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our

operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of

intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we

operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new

releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization

and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of

salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This

documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of

our Web site.

Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently

available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based

upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-

looking statements.

Cloud services and the APIs they power are

becoming the backbone of modern society. APIs

support the apps that structure how we work, play,

and communicate.

Observations today based on

experience building @twilio

• Founded in 2008

• Infrastructure APIs to automate

phone and SMS

communications

• 120 Employees

• >1000 servers running 24x7

Twilio

Cloud Workloads

Can Be

Unpredictable

6x spike in 5 mins

No time for…

•a human to respond to a pager

•to boot new servers

Twilio SMS API Traffic

Request

Latency

Load

FAIL

Danger!

Load higher than

instantaneous throughput

Typical Scenario

Goal Today

Support graceful degradation of API

performance under extreme load

No Failure

Load

Balancer

Incoming

Requests

AAA AAA AAA

... Throttling Throttling Throttling

Throttling Throttling Throttling

App

Server

App

Server App

Server

App

Server W

W W

W W W

W W

Worker

Pool

Why Failures?

10%

70%

100%+

Failed

Requests

Time

Worker Pools e.g., Apache/Nginx

• Cloud services often use worker pools

to handle incoming requests

• When load goes beyond size of the

worker pool, requests fail

Problem Summary

Queues to the rescue?

1. If we synchronously respond, each item in the queue

still ties up a worker.

2. If we close the incoming connection and free the

worker then we need an asynchronous callback to

respond to the request

Incoming

Requests Process &

Respond

Doh

Doh

Observation 1

A synchronous web API is often much

easier for developers to integrate due

additional complexity of callbacks

Implication Responding to requests

synchronously is often preferable to queuing

the request and responding with an

asynchronous callback

Synchronous vs. Asynchronous Interfaces

d = read_form();

geo = api->lookup(d);

db->store(d, geo);

return “success”;

Sync Async d = read_form();

api->lookup(d);

# in /geo-result

db->store(d, geo);

ws->send(“success”);

Async interface need a separate URL handler,

and websocket connection to return the result

Take POST data from a web form, send it to a geo lookup API, store the

result DB and return status page to user

Observation 2

For many APIs, taking additional time to

service a request is better than failing

that specific request

Implication In many cases, it is better to service

a request with some delay rather than failing it

Observation 3

It is better to fail some requests than all

incoming requests

Implication Under load, it may better to

selectively drop expensive requests that can’t

be serviced and allow others

Event-driven programming and the

Reactor Pattern

req = ‘GET /’;

req.append(‘/r/n/r/n’);

socket.write(req);

resp = socket.read();

print(resp);

1

1

10000x

10000000x

10

Time Worker

Thread/Worker Model

req = ‘GET /’;


socket.write(req);

resp = socket.read();

print(resp);

1

1

10000x

10000000x

10

Time Worker

Thread/Worker Model

Huge IO latency blocks worker

req = ‘GET /’;


socket.write(req, fn() {

socket.read(fn(resp) {

print(resp);

});

});

Make IO

operations async

and “callback”

when done

Event-based Programming

req = ‘GET /’;




print(resp);

});

}); reactor.run_forever();

Central dispatch

to coordinate

event callbacks

Reactor Dispatcher

req = ‘GET /’;




print(resp);

});


1

1

10

10 10

No delay blocking

the worker waiting

for IO

Non-blocking IO

Time

req = ‘GET /’;




print(resp);

});


Request Response Decoupling

Using this

approach we can

decouple the

socket of an

incoming

connection from

the processing of

that connection

js/node.js

python/twisted

python/gevent

c/libevent

c/libev

ruby/eventmachine

java/nio/netty

(Some) Reactor-Pattern Frameworks

Goliath Cramp

req = ‘GET /’

req += ‘/r/n/r/n’

def r(resp):

print resp

def w():

socket.read().addCallback(r)

socket.write().addCallback(w)

Callback Spaghetti

Example of

callback nesting

complexity with

Python Twisted

(Also node.js)

req = ‘GET /’


yield socket.write()

resp = yield socket.read()

print resp

inlineCallbacks to the Rescue

We can clean up

the callbacks

using deferred

generators and

inline callbacks

(similar

frameworks also

exist for js)

req = ‘GET /’


yield socket.write()

resp = yield socket.read()

print resp

Easy Sequential Programming

Easy sequential

programming

with mostly

implicit

asynchronous IO

js/node.js

python/twisted

python/gevent

c/libevent

c/libev

ruby/eventmachine

java/nio/netty

(Some) Reactor-Pattern Frameworks

Goliath Cramp

“gevent is a coroutine-based Python

networking library that uses greenlet to

provide a high-level synchronous API on

top of the libevent event loop.”

socket.write()

resp = socket.read()

print resp

Natively asynchronous

Event Python gevent

Easy sequential

model yet fully

asynchronous

gevent Example

from gevent.server

import StreamServer

def echo(socket, address):

print ('New connection from %s:%s' % address)

socket.sendall('Welcome to the echo server!\r\n')

line = fileobj.readline()

fileobj.write(line)

fileobj.flush()

print ("echoed %r" % line)

if __name__ == '__main__':

server = StreamServer(('0.0.0.0', 6000), echo)

server.serve_forever()

Simple Echo Server

gevent Example

from gevent.server

import StreamServer

def echo(socket, address):

print ('New connection from %s:%s' % address)

socket.sendall('Welcome to the echo server!\r\n')

line = fileobj.readline()

fileobj.write(line)

fileobj.flush()

print ("echoed %r" % line)

if __name__ == '__main__':

server = StreamServer(('0.0.0.0', 6000), echo)

server.serve_forever()

Simple Echo Server

However, gevent requires

daemonization, logging and

other servicification functionality

for production use such

Twisted’s twistd

https://github.com/progrium/ginkgo

Let’s look a simple example that implements a

TCP and HTTP server...

Async Services with Ginkgo

Ginkgo is a simple framework for

composing asynchronous gevent services

with common configuration, logging,

demonizing etc.

https://github.com/progrium/ginkgo

import gevent

from gevent.pywsgi import WSGIServer

from gevent.server import StreamServer

from ginkgo.core import Service

Ginkgo Example

Import

WSGI/TCP

Servers

import gevent




def handle_http(env, start_response):

start_response('200 OK', [('Content-Type', 'text/html')])

print 'new http request!’

return ["hello world”]

Ginkgo Example

HTTP Handler

import gevent







return ["hello world"]

def handle_tcp(socket, address):

print 'new tcp connection!’

while True:

socket.send('hello\n’)

gevent.sleep(1)

Ginkgo Example

TCP Handler

import gevent







return ["hello world"]

def handle_tcp(socket, address):

print 'new tcp connection!’

while True:

socket.send('hello\n’)

gevent.sleep(1)

app = Service()

app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp))

app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))

app.serve_forever()

Ginkgo Example

Service

Composition

Using Ginkgo or another async

framework let’s look at our web-worker

architecture and see how we can modify

it to become fully asynchronous

Toward Fully a Asynchronous API

W W W

W W W

W W

Load

Balancer

Incoming

Requests

AAA AAA AAA

... Throttling Throttling Throttling

Throttling Throttling Throttling

App

Server

App

Server App

Server

App

Server W

W W

W W W

W W

Worker

Pool

The Old Way

Async

Server

Async

Server

Load

Balancer

Incoming

Requests

... Async

Server

Step 1 - Let’s start by replacing our threaded

workers with asynchronous app servers

Async

Server

Async

Server

Load

Balancer

Incoming

Requests

... Async

Server

Step 1 - Let’s start by replacing our threaded

workers with asynchronous app servers

Huzzah, now

idle open

connections

will use very

few server

resources

AAA

Async

Server

AAA

Async

Server

Load

Balancer

Incoming

Requests

... AAA

Async

Server

Step 2 – Define authentication and authorization

layer to identify the user and resource requested

AAA Manager

Goal Perform authentication,

authorization and accounting for each

incoming API request

Extract key parameters

• Account

• Resource Type

AAA

Throttling

Async

Server

AAA

Throttling

Async

Server

Load

Balancer

Incoming

Requests

... AAA

Throttling

Async

Server

Concurrency

Manager

Step 3 – Add a concurrency manager that

determines whether to throttle each request

Concurrency Manager

Goal determine whether to delay or drop

an individual request to limit access to

API resources

Possible inputs

• By Account

• By Resource Type

• By Availability of Dependent Resources

Concurrency Manager

What we’ve found useful

•Tuple (Account, Resource Type)

Supports multi-tenancy

• Protection between Accounts

• Protect within an account between resource

types e.g., Calls & SMS

Concurrency Manager

Concurrency manager returns one of

1. Allow the request immediately

2. Delay the request before being

processed

3. Drop the request and return an error

HTTP 429 - Concurrency Limit

Reached

AAA

Throttling

Throttling

Async

Server

AAA

Throttling

Throttling

Async

Server

Load

Balancer

Incoming

Requests

... AAA

Throttling

Throttling

Async

Server

Concurrency

Manager

Dependent

Services

Step 4 – provide for

concurrency control

between the servers

and backend

resources

Conclusion 1

A synchronous web API is often much

easier for developers to integrate due

additional complexity of callbacks

The proposed asynchronous API framework

allows provides for synchronous API calls

without worrying about worker pools filling up.

It is also easy to add callback where needed.

Conclusion 2

For many APIs, taking additional time to

service a request is better than failing

that specific request


provides the ability to inject into delay the

processing of incoming requests rather than

dropping them.

Latency

Load

Example of Delay Injection

Spread load across a

longer time period

Conclusion 3

It is better to fail some incoming

requests than to fail all requests


provides the ability to selectively drop requests

to limit contention on limited resources

Load

Latency /x Dropped

Latency /*

Example of Dropping Requests

Drop only the requests that we must

due to scare backend resources

Summary

Async frameworks like gevent allow you to easily

decouple a request from access to constrained

resources

Request

Latency

Time

API outage

Building Web APIs that Scale

Documents

Transcript of Building Web APIs that Scale