Building Web APIs that Scale

57
Designing for Graceful Degradation Evan Cooke, Twilio, CTO @emcooke Building Web APIs that Scale

Transcript of Building Web APIs that Scale

Page 1: Building Web APIs that Scale

Designing for Graceful Degradation

Evan Cooke, Twilio, CTO

@emcooke

Building Web APIs that Scale

Page 2: Building Web APIs that Scale

Safe Harbor

Safe harbor statement under the Private Securities Litigation Reform Act of 1995:

This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties

materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results

expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be

deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other

financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any

statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.

The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new

functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our

operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of

intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we

operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new

releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization

and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of

salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This

documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of

our Web site.

Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently

available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based

upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-

looking statements.

Page 3: Building Web APIs that Scale
Page 4: Building Web APIs that Scale
Page 5: Building Web APIs that Scale

Cloud services and the APIs they power are

becoming the backbone of modern society. APIs

support the apps that structure how we work, play,

and communicate.

Page 6: Building Web APIs that Scale

Observations today based on

experience building @twilio

• Founded in 2008

• Infrastructure APIs to automate

phone and SMS

communications

• 120 Employees

• >1000 servers running 24x7

Twilio

Page 7: Building Web APIs that Scale

Cloud Workloads

Can Be

Unpredictable

Page 8: Building Web APIs that Scale

6x spike in 5 mins

No time for…

•a human to respond to a pager

•to boot new servers

Twilio SMS API Traffic

Page 9: Building Web APIs that Scale

Request

Latency

Load

FAIL

Danger!

Load higher than

instantaneous throughput

Typical Scenario

Page 10: Building Web APIs that Scale

Goal Today

Support graceful degradation of API

performance under extreme load

No Failure

Page 11: Building Web APIs that Scale

Load

Balancer

Incoming

Requests

AAA AAA AAA

... Throttling Throttling Throttling

Throttling Throttling Throttling

App

Server

App

Server App

Server

App

Server W

W W

W W W

W W

Worker

Pool

Why Failures?

Page 12: Building Web APIs that Scale

10%

70%

100%+

Failed

Requests

Time

Worker Pools e.g., Apache/Nginx

Page 13: Building Web APIs that Scale

• Cloud services often use worker pools

to handle incoming requests

• When load goes beyond size of the

worker pool, requests fail

Problem Summary

Page 14: Building Web APIs that Scale

Queues to the rescue?

1. If we synchronously respond, each item in the queue

still ties up a worker.

2. If we close the incoming connection and free the

worker then we need an asynchronous callback to

respond to the request

Incoming

Requests Process &

Respond

Doh

Doh

Page 15: Building Web APIs that Scale

Observation 1

A synchronous web API is often much

easier for developers to integrate due

additional complexity of callbacks

Implication Responding to requests

synchronously is often preferable to queuing

the request and responding with an

asynchronous callback

Page 16: Building Web APIs that Scale

Synchronous vs. Asynchronous Interfaces

d = read_form();

geo = api->lookup(d);

db->store(d, geo);

return “success”;

Sync Async d = read_form();

api->lookup(d);

# in /geo-result

db->store(d, geo);

ws->send(“success”);

Async interface need a separate URL handler,

and websocket connection to return the result

Take POST data from a web form, send it to a geo lookup API, store the

result DB and return status page to user

Page 17: Building Web APIs that Scale

Observation 2

For many APIs, taking additional time to

service a request is better than failing

that specific request

Implication In many cases, it is better to service

a request with some delay rather than failing it

Page 18: Building Web APIs that Scale

Observation 3

It is better to fail some requests than all

incoming requests

Implication Under load, it may better to

selectively drop expensive requests that can’t

be serviced and allow others

Page 19: Building Web APIs that Scale

Event-driven programming and the

Reactor Pattern

Page 20: Building Web APIs that Scale

req = ‘GET /’;

req.append(‘/r/n/r/n’);

socket.write(req);

resp = socket.read();

print(resp);

1

1

10000x

10000000x

10

Time Worker

Thread/Worker Model

Page 21: Building Web APIs that Scale

req = ‘GET /’;

req.append(‘/r/n/r/n’);

socket.write(req);

resp = socket.read();

print(resp);

1

1

10000x

10000000x

10

Time Worker

Thread/Worker Model

Huge IO latency blocks worker

Page 22: Building Web APIs that Scale

req = ‘GET /’;

req.append(‘/r/n/r/n’);

socket.write(req, fn() {

socket.read(fn(resp) {

print(resp);

});

});

Make IO

operations async

and “callback”

when done

Event-based Programming

Page 23: Building Web APIs that Scale

req = ‘GET /’;

req.append(‘/r/n/r/n’);

socket.write(req, fn() {

socket.read(fn(resp) {

print(resp);

});

}); reactor.run_forever();

Central dispatch

to coordinate

event callbacks

Reactor Dispatcher

Page 24: Building Web APIs that Scale

req = ‘GET /’;

req.append(‘/r/n/r/n’);

socket.write(req, fn() {

socket.read(fn(resp) {

print(resp);

});

}); reactor.run_forever();

1

1

10

10 10

No delay blocking

the worker waiting

for IO

Non-blocking IO

Time

Page 25: Building Web APIs that Scale

req = ‘GET /’;

req.append(‘/r/n/r/n’);

socket.write(req, fn() {

socket.read(fn(resp) {

print(resp);

});

}); reactor.run_forever();

Request Response Decoupling

Using this

approach we can

decouple the

socket of an

incoming

connection from

the processing of

that connection

Page 26: Building Web APIs that Scale

js/node.js

python/twisted

python/gevent

c/libevent

c/libev

ruby/eventmachine

java/nio/netty

(Some) Reactor-Pattern Frameworks

Goliath Cramp

Page 27: Building Web APIs that Scale

req = ‘GET /’

req += ‘/r/n/r/n’

def r(resp):

print resp

def w():

socket.read().addCallback(r)

socket.write().addCallback(w)

Callback Spaghetti

Example of

callback nesting

complexity with

Python Twisted

(Also node.js)

Page 28: Building Web APIs that Scale

req = ‘GET /’

req += ‘/r/n/r/n’

yield socket.write()

resp = yield socket.read()

print resp

inlineCallbacks to the Rescue

We can clean up

the callbacks

using deferred

generators and

inline callbacks

(similar

frameworks also

exist for js)

Page 29: Building Web APIs that Scale

req = ‘GET /’

req += ‘/r/n/r/n’

yield socket.write()

resp = yield socket.read()

print resp

Easy Sequential Programming

Easy sequential

programming

with mostly

implicit

asynchronous IO

Page 30: Building Web APIs that Scale

js/node.js

python/twisted

python/gevent

c/libevent

c/libev

ruby/eventmachine

java/nio/netty

(Some) Reactor-Pattern Frameworks

Goliath Cramp

Page 31: Building Web APIs that Scale

“gevent is a coroutine-based Python

networking library that uses greenlet to

provide a high-level synchronous API on

top of the libevent event loop.”

socket.write()

resp = socket.read()

print resp

Natively asynchronous

Event Python gevent

Page 32: Building Web APIs that Scale

Easy sequential

model yet fully

asynchronous

gevent Example

from gevent.server

import StreamServer

def echo(socket, address):

print ('New connection from %s:%s' % address)

socket.sendall('Welcome to the echo server!\r\n')

line = fileobj.readline()

fileobj.write(line)

fileobj.flush()

print ("echoed %r" % line)

if __name__ == '__main__':

server = StreamServer(('0.0.0.0', 6000), echo)

server.serve_forever()

Simple Echo Server

Page 33: Building Web APIs that Scale

gevent Example

from gevent.server

import StreamServer

def echo(socket, address):

print ('New connection from %s:%s' % address)

socket.sendall('Welcome to the echo server!\r\n')

line = fileobj.readline()

fileobj.write(line)

fileobj.flush()

print ("echoed %r" % line)

if __name__ == '__main__':

server = StreamServer(('0.0.0.0', 6000), echo)

server.serve_forever()

Simple Echo Server

However, gevent requires

daemonization, logging and

other servicification functionality

for production use such

Twisted’s twistd

Page 34: Building Web APIs that Scale

https://github.com/progrium/ginkgo

Let’s look a simple example that implements a

TCP and HTTP server...

Async Services with Ginkgo

Ginkgo is a simple framework for

composing asynchronous gevent services

with common configuration, logging,

demonizing etc.

Page 35: Building Web APIs that Scale

import gevent

from gevent.pywsgi import WSGIServer

from gevent.server import StreamServer

from ginkgo.core import Service

Ginkgo Example

Import

WSGI/TCP

Servers

Page 36: Building Web APIs that Scale

import gevent

from gevent.pywsgi import WSGIServer

from gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response):

start_response('200 OK', [('Content-Type', 'text/html')])

print 'new http request!’

return ["hello world”]

Ginkgo Example

HTTP Handler

Page 37: Building Web APIs that Scale

import gevent

from gevent.pywsgi import WSGIServer

from gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response):

start_response('200 OK', [('Content-Type', 'text/html')])

print 'new http request!’

return ["hello world"]

def handle_tcp(socket, address):

print 'new tcp connection!’

while True:

socket.send('hello\n’)

gevent.sleep(1)

Ginkgo Example

TCP Handler

Page 38: Building Web APIs that Scale

import gevent

from gevent.pywsgi import WSGIServer

from gevent.server import StreamServer

from ginkgo.core import Service

def handle_http(env, start_response):

start_response('200 OK', [('Content-Type', 'text/html')])

print 'new http request!’

return ["hello world"]

def handle_tcp(socket, address):

print 'new tcp connection!’

while True:

socket.send('hello\n’)

gevent.sleep(1)

app = Service()

app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp))

app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))

app.serve_forever()

Ginkgo Example

Service

Composition

Page 39: Building Web APIs that Scale

Using Ginkgo or another async

framework let’s look at our web-worker

architecture and see how we can modify

it to become fully asynchronous

Toward Fully a Asynchronous API

W W W

W W W

W W

Page 40: Building Web APIs that Scale

Load

Balancer

Incoming

Requests

AAA AAA AAA

... Throttling Throttling Throttling

Throttling Throttling Throttling

App

Server

App

Server App

Server

App

Server W

W W

W W W

W W

Worker

Pool

The Old Way

Page 41: Building Web APIs that Scale

Async

Server

Async

Server

Load

Balancer

Incoming

Requests

... Async

Server

Step 1 - Let’s start by replacing our threaded

workers with asynchronous app servers

Page 42: Building Web APIs that Scale

Async

Server

Async

Server

Load

Balancer

Incoming

Requests

... Async

Server

Step 1 - Let’s start by replacing our threaded

workers with asynchronous app servers

Huzzah, now

idle open

connections

will use very

few server

resources

Page 43: Building Web APIs that Scale

AAA

Async

Server

AAA

Async

Server

Load

Balancer

Incoming

Requests

... AAA

Async

Server

Step 2 – Define authentication and authorization

layer to identify the user and resource requested

Page 44: Building Web APIs that Scale

AAA Manager

Goal Perform authentication,

authorization and accounting for each

incoming API request

Extract key parameters

• Account

• Resource Type

Page 45: Building Web APIs that Scale

AAA

Throttling

Async

Server

AAA

Throttling

Async

Server

Load

Balancer

Incoming

Requests

... AAA

Throttling

Async

Server

Concurrency

Manager

Step 3 – Add a concurrency manager that

determines whether to throttle each request

Page 46: Building Web APIs that Scale

Concurrency Manager

Goal determine whether to delay or drop

an individual request to limit access to

API resources

Possible inputs

• By Account

• By Resource Type

• By Availability of Dependent Resources

Page 47: Building Web APIs that Scale

Concurrency Manager

What we’ve found useful

•Tuple (Account, Resource Type)

Supports multi-tenancy

• Protection between Accounts

• Protect within an account between resource

types e.g., Calls & SMS

Page 48: Building Web APIs that Scale

Concurrency Manager

Concurrency manager returns one of

1. Allow the request immediately

2. Delay the request before being

processed

3. Drop the request and return an error

HTTP 429 - Concurrency Limit

Reached

Page 49: Building Web APIs that Scale

AAA

Throttling

Throttling

Async

Server

AAA

Throttling

Throttling

Async

Server

Load

Balancer

Incoming

Requests

... AAA

Throttling

Throttling

Async

Server

Concurrency

Manager

Dependent

Services

Step 4 – provide for

concurrency control

between the servers

and backend

resources

Page 50: Building Web APIs that Scale

Conclusion 1

A synchronous web API is often much

easier for developers to integrate due

additional complexity of callbacks

The proposed asynchronous API framework

allows provides for synchronous API calls

without worrying about worker pools filling up.

It is also easy to add callback where needed.

Page 51: Building Web APIs that Scale

Conclusion 2

For many APIs, taking additional time to

service a request is better than failing

that specific request

The proposed asynchronous API framework

provides the ability to inject into delay the

processing of incoming requests rather than

dropping them.

Page 52: Building Web APIs that Scale

Latency

Load

Example of Delay Injection

Spread load across a

longer time period

Page 53: Building Web APIs that Scale

Conclusion 3

It is better to fail some incoming

requests than to fail all requests

The proposed asynchronous API framework

provides the ability to selectively drop requests

to limit contention on limited resources

Page 54: Building Web APIs that Scale

Load

Latency /x Dropped

Latency /*

Example of Dropping Requests

Drop only the requests that we must

due to scare backend resources

Page 55: Building Web APIs that Scale

Summary

Async frameworks like gevent allow you to easily

decouple a request from access to constrained

resources

Request

Latency

Time

API outage

Page 56: Building Web APIs that Scale
Page 57: Building Web APIs that Scale