Post on 08-May-2015
Designing for Graceful Degradation
Evan Cooke, Twilio, CTO
@emcooke
Building Web APIs that Scale
Safe Harbor
Safe harbor statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results
expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be
deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other
financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any
statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our
operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of
intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we
operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new
releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization
and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of
salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This
documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of
our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently
available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based
upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-
looking statements.
Cloud services and the APIs they power are
becoming the backbone of modern society. APIs
support the apps that structure how we work, play,
and communicate.
Observations today based on
experience building @twilio
• Founded in 2008
• Infrastructure APIs to automate
phone and SMS
communications
• 120 Employees
• >1000 servers running 24x7
Twilio
Cloud Workloads
Can Be
Unpredictable
6x spike in 5 mins
No time for…
•a human to respond to a pager
•to boot new servers
Twilio SMS API Traffic
Request
Latency
Load
FAIL
Danger!
Load higher than
instantaneous throughput
Typical Scenario
Goal Today
Support graceful degradation of API
performance under extreme load
No Failure
Load
Balancer
Incoming
Requests
AAA AAA AAA
... Throttling Throttling Throttling
Throttling Throttling Throttling
App
Server
App
Server App
Server
App
Server W
W W
W W W
W W
Worker
Pool
Why Failures?
10%
70%
100%+
Failed
Requests
Time
Worker Pools e.g., Apache/Nginx
• Cloud services often use worker pools
to handle incoming requests
• When load goes beyond size of the
worker pool, requests fail
Problem Summary
Queues to the rescue?
1. If we synchronously respond, each item in the queue
still ties up a worker.
2. If we close the incoming connection and free the
worker then we need an asynchronous callback to
respond to the request
Incoming
Requests Process &
Respond
Doh
Doh
Observation 1
A synchronous web API is often much
easier for developers to integrate due
additional complexity of callbacks
Implication Responding to requests
synchronously is often preferable to queuing
the request and responding with an
asynchronous callback
Synchronous vs. Asynchronous Interfaces
d = read_form();
geo = api->lookup(d);
db->store(d, geo);
return “success”;
Sync Async d = read_form();
api->lookup(d);
# in /geo-result
db->store(d, geo);
ws->send(“success”);
Async interface need a separate URL handler,
and websocket connection to return the result
Take POST data from a web form, send it to a geo lookup API, store the
result DB and return status page to user
Observation 2
For many APIs, taking additional time to
service a request is better than failing
that specific request
Implication In many cases, it is better to service
a request with some delay rather than failing it
Observation 3
It is better to fail some requests than all
incoming requests
Implication Under load, it may better to
selectively drop expensive requests that can’t
be serviced and allow others
Event-driven programming and the
Reactor Pattern
req = ‘GET /’;
req.append(‘/r/n/r/n’);
socket.write(req);
resp = socket.read();
print(resp);
1
1
10000x
10000000x
10
Time Worker
Thread/Worker Model
req = ‘GET /’;
req.append(‘/r/n/r/n’);
socket.write(req);
resp = socket.read();
print(resp);
1
1
10000x
10000000x
10
Time Worker
Thread/Worker Model
Huge IO latency blocks worker
req = ‘GET /’;
req.append(‘/r/n/r/n’);
socket.write(req, fn() {
socket.read(fn(resp) {
print(resp);
});
});
Make IO
operations async
and “callback”
when done
Event-based Programming
req = ‘GET /’;
req.append(‘/r/n/r/n’);
socket.write(req, fn() {
socket.read(fn(resp) {
print(resp);
});
}); reactor.run_forever();
Central dispatch
to coordinate
event callbacks
Reactor Dispatcher
req = ‘GET /’;
req.append(‘/r/n/r/n’);
socket.write(req, fn() {
socket.read(fn(resp) {
print(resp);
});
}); reactor.run_forever();
1
1
10
10 10
No delay blocking
the worker waiting
for IO
Non-blocking IO
Time
req = ‘GET /’;
req.append(‘/r/n/r/n’);
socket.write(req, fn() {
socket.read(fn(resp) {
print(resp);
});
}); reactor.run_forever();
Request Response Decoupling
Using this
approach we can
decouple the
socket of an
incoming
connection from
the processing of
that connection
js/node.js
python/twisted
python/gevent
c/libevent
c/libev
ruby/eventmachine
java/nio/netty
(Some) Reactor-Pattern Frameworks
Goliath Cramp
req = ‘GET /’
req += ‘/r/n/r/n’
def r(resp):
print resp
def w():
socket.read().addCallback(r)
socket.write().addCallback(w)
Callback Spaghetti
Example of
callback nesting
complexity with
Python Twisted
(Also node.js)
req = ‘GET /’
req += ‘/r/n/r/n’
yield socket.write()
resp = yield socket.read()
print resp
inlineCallbacks to the Rescue
We can clean up
the callbacks
using deferred
generators and
inline callbacks
(similar
frameworks also
exist for js)
req = ‘GET /’
req += ‘/r/n/r/n’
yield socket.write()
resp = yield socket.read()
print resp
Easy Sequential Programming
Easy sequential
programming
with mostly
implicit
asynchronous IO
js/node.js
python/twisted
python/gevent
c/libevent
c/libev
ruby/eventmachine
java/nio/netty
(Some) Reactor-Pattern Frameworks
Goliath Cramp
“gevent is a coroutine-based Python
networking library that uses greenlet to
provide a high-level synchronous API on
top of the libevent event loop.”
socket.write()
resp = socket.read()
print resp
Natively asynchronous
Event Python gevent
Easy sequential
model yet fully
asynchronous
gevent Example
from gevent.server
import StreamServer
def echo(socket, address):
print ('New connection from %s:%s' % address)
socket.sendall('Welcome to the echo server!\r\n')
line = fileobj.readline()
fileobj.write(line)
fileobj.flush()
print ("echoed %r" % line)
if __name__ == '__main__':
server = StreamServer(('0.0.0.0', 6000), echo)
server.serve_forever()
Simple Echo Server
gevent Example
from gevent.server
import StreamServer
def echo(socket, address):
print ('New connection from %s:%s' % address)
socket.sendall('Welcome to the echo server!\r\n')
line = fileobj.readline()
fileobj.write(line)
fileobj.flush()
print ("echoed %r" % line)
if __name__ == '__main__':
server = StreamServer(('0.0.0.0', 6000), echo)
server.serve_forever()
Simple Echo Server
However, gevent requires
daemonization, logging and
other servicification functionality
for production use such
Twisted’s twistd
https://github.com/progrium/ginkgo
Let’s look a simple example that implements a
TCP and HTTP server...
Async Services with Ginkgo
Ginkgo is a simple framework for
composing asynchronous gevent services
with common configuration, logging,
demonizing etc.
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer
from ginkgo.core import Service
Ginkgo Example
Import
WSGI/TCP
Servers
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer
from ginkgo.core import Service
def handle_http(env, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
print 'new http request!’
return ["hello world”]
Ginkgo Example
HTTP Handler
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer
from ginkgo.core import Service
def handle_http(env, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
print 'new http request!’
return ["hello world"]
def handle_tcp(socket, address):
print 'new tcp connection!’
while True:
socket.send('hello\n’)
gevent.sleep(1)
Ginkgo Example
TCP Handler
import gevent
from gevent.pywsgi import WSGIServer
from gevent.server import StreamServer
from ginkgo.core import Service
def handle_http(env, start_response):
start_response('200 OK', [('Content-Type', 'text/html')])
print 'new http request!’
return ["hello world"]
def handle_tcp(socket, address):
print 'new tcp connection!’
while True:
socket.send('hello\n’)
gevent.sleep(1)
app = Service()
app.add_service(StreamServer(('127.0.0.1', 1234), handle_tcp))
app.add_service(WSGIServer(('127.0.0.1', 8080), handle_http))
app.serve_forever()
Ginkgo Example
Service
Composition
Using Ginkgo or another async
framework let’s look at our web-worker
architecture and see how we can modify
it to become fully asynchronous
Toward Fully a Asynchronous API
W W W
W W W
W W
Load
Balancer
Incoming
Requests
AAA AAA AAA
... Throttling Throttling Throttling
Throttling Throttling Throttling
App
Server
App
Server App
Server
App
Server W
W W
W W W
W W
Worker
Pool
The Old Way
Async
Server
Async
Server
Load
Balancer
Incoming
Requests
... Async
Server
Step 1 - Let’s start by replacing our threaded
workers with asynchronous app servers
Async
Server
Async
Server
Load
Balancer
Incoming
Requests
... Async
Server
Step 1 - Let’s start by replacing our threaded
workers with asynchronous app servers
Huzzah, now
idle open
connections
will use very
few server
resources
AAA
Async
Server
AAA
Async
Server
Load
Balancer
Incoming
Requests
... AAA
Async
Server
Step 2 – Define authentication and authorization
layer to identify the user and resource requested
AAA Manager
Goal Perform authentication,
authorization and accounting for each
incoming API request
Extract key parameters
• Account
• Resource Type
AAA
Throttling
Async
Server
AAA
Throttling
Async
Server
Load
Balancer
Incoming
Requests
... AAA
Throttling
Async
Server
Concurrency
Manager
Step 3 – Add a concurrency manager that
determines whether to throttle each request
Concurrency Manager
Goal determine whether to delay or drop
an individual request to limit access to
API resources
Possible inputs
• By Account
• By Resource Type
• By Availability of Dependent Resources
Concurrency Manager
What we’ve found useful
•Tuple (Account, Resource Type)
Supports multi-tenancy
• Protection between Accounts
• Protect within an account between resource
types e.g., Calls & SMS
Concurrency Manager
Concurrency manager returns one of
1. Allow the request immediately
2. Delay the request before being
processed
3. Drop the request and return an error
HTTP 429 - Concurrency Limit
Reached
AAA
Throttling
Throttling
Async
Server
AAA
Throttling
Throttling
Async
Server
Load
Balancer
Incoming
Requests
... AAA
Throttling
Throttling
Async
Server
Concurrency
Manager
Dependent
Services
Step 4 – provide for
concurrency control
between the servers
and backend
resources
Conclusion 1
A synchronous web API is often much
easier for developers to integrate due
additional complexity of callbacks
The proposed asynchronous API framework
allows provides for synchronous API calls
without worrying about worker pools filling up.
It is also easy to add callback where needed.
Conclusion 2
For many APIs, taking additional time to
service a request is better than failing
that specific request
The proposed asynchronous API framework
provides the ability to inject into delay the
processing of incoming requests rather than
dropping them.
Latency
Load
Example of Delay Injection
Spread load across a
longer time period
Conclusion 3
It is better to fail some incoming
requests than to fail all requests
The proposed asynchronous API framework
provides the ability to selectively drop requests
to limit contention on limited resources
Load
Latency /x Dropped
Latency /*
Example of Dropping Requests
Drop only the requests that we must
due to scare backend resources
Summary
Async frameworks like gevent allow you to easily
decouple a request from access to constrained
resources
Request
Latency
Time
API outage