AWS Lambda from the trenches

179
from the TRENCHES TRENCHES what you should know before you go to production AWS LAMBDA AWS LAMBDA

Transcript of AWS Lambda from the trenches

Page 1: AWS Lambda from the trenches

from the

TRENCHESTRENCHES

what you should know before you go to production

AWS LAMBDAAWS LAMBDA

Page 2: AWS Lambda from the trenches

hi, I’m Yan Cui

Page 3: AWS Lambda from the trenches
Page 4: AWS Lambda from the trenches
Page 5: AWS Lambda from the trenches
Page 6: AWS Lambda from the trenches
Page 7: AWS Lambda from the trenches
Page 8: AWS Lambda from the trenches

AWS user since 2009

Page 9: AWS Lambda from the trenches
Page 10: AWS Lambda from the trenches
Page 11: AWS Lambda from the trenches

apr, 2016

Page 12: AWS Lambda from the trenches

hidden complexities and dependencies

low utilisation to leave room for traffic spikes

EC2 scaling is slow, so scale earlier

lots of cost for unused resources

up to 30 mins for deployment

deployment required downtime

Page 13: AWS Lambda from the trenches

- Dan North

“lead time to someone saying thank you is the only reputation

metric that matters.”

Page 14: AWS Lambda from the trenches
Page 15: AWS Lambda from the trenches

“what would good

look like for us?”

Page 16: AWS Lambda from the trenches

be small be fast

have zero downtime have no lock-step

DEPLOYMENTS SHOULD...

Page 17: AWS Lambda from the trenches

FEATURES SHOULD...be deployable independently

be loosely-coupled

Page 18: AWS Lambda from the trenches

WE WANT TO...minimise cost for unused resources

minimise ops effort reduce tech mess

deliver visible improvements faster

Page 19: AWS Lambda from the trenches

nov, 2016

Page 20: AWS Lambda from the trenches

170 Lambda functions in prod

1.2 GB deployment packages in prod

95% cost saving vs EC2

15x no. of prod releases per month

Page 21: AWS Lambda from the trenches

timeis a good fit

Page 22: AWS Lambda from the trenches

1st function in prod!time

is a good fit

Page 23: AWS Lambda from the trenches

?

timeis a good fit

1st function in prod!

Page 24: AWS Lambda from the trenches

ALERTING

CI / CD

TESTING

LOGGING

MONITORING

Page 25: AWS Lambda from the trenches

170 functions

WOOF!

? ?

timeis a good fit

1st function in prod!

Page 26: AWS Lambda from the trenches

SECURITY

DISTRIBUTEDTRACING

CONFIGMANAGEMENT

Page 27: AWS Lambda from the trenches

evolving the PLATFORM

Page 28: AWS Lambda from the trenches

rebuilt search

Page 29: AWS Lambda from the trenches

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearch

Page 30: AWS Lambda from the trenches

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearchAmazon API Gateway Amazon Lambda

Page 31: AWS Lambda from the trenches

new analytics pipeline

Page 32: AWS Lambda from the trenches

Legacy Monolith Amazon Kinesis Amazon Lambda

Google BigQuery

Page 33: AWS Lambda from the trenches

Legacy Monolith Amazon Kinesis Amazon Lambda

Google BigQuery

1 developer, 2 daysdesign production

(his 1st serverless project)

Page 34: AWS Lambda from the trenches

Legacy Monolith Amazon Kinesis Amazon Lambda

Google BigQuery“nothing ever got done

this fast at Skype!”

- Chris Twamley

Page 35: AWS Lambda from the trenches

- Dan North

“lead time to someone saying thank you is the only reputation

metric that matters.”

Page 36: AWS Lambda from the trenches

Rebuiltwith Lambda

Page 37: AWS Lambda from the trenches
Page 38: AWS Lambda from the trenches
Page 39: AWS Lambda from the trenches
Page 40: AWS Lambda from the trenches
Page 41: AWS Lambda from the trenches
Page 42: AWS Lambda from the trenches
Page 43: AWS Lambda from the trenches

Rebuiltwith Lambda

Page 44: AWS Lambda from the trenches

BigQuery

Page 45: AWS Lambda from the trenches

BigQuery

Page 46: AWS Lambda from the trenches

grapheneDB

BigQuery

Page 47: AWS Lambda from the trenches

grapheneDB

BigQuery

Page 48: AWS Lambda from the trenches

grapheneDB

BigQuery

Page 49: AWS Lambda from the trenches

getting PRODUCTION READY

Page 50: AWS Lambda from the trenches

CHOOSE A

FRAMEWORK

DEPLOYMENT

Page 51: AWS Lambda from the trenches

http://serverless.com

Page 52: AWS Lambda from the trenches

http://apex.run

Page 53: AWS Lambda from the trenches

https://github.com/claudiajs/claudia

Page 54: AWS Lambda from the trenches

https://github.com/Miserlou/Zappa

Page 55: AWS Lambda from the trenches

http://gosparta.io/

Page 56: AWS Lambda from the trenches

TESTING

Page 57: AWS Lambda from the trenches

amzn.to/29Lxuzu

Page 58: AWS Lambda from the trenches

Level of Testing

1.Unitdo our objects do the right thing?are they easy to work with?

Page 59: AWS Lambda from the trenches
Page 60: AWS Lambda from the trenches

Level of Testing

1.Unit2.Integrationdoes our code work against code we can’t change?

Page 61: AWS Lambda from the trenches

handler

Page 62: AWS Lambda from the trenches

handler

test by invoking the handler

Page 63: AWS Lambda from the trenches

Level of Testing

1.Unit2.Integration3.Acceptancedoes the whole system work?

Page 64: AWS Lambda from the trenches

Level of Testing

unit

integration

acceptance

feedb

ack

confidence

Page 65: AWS Lambda from the trenches

“…We find that tests that mock external libraries often need to be complex to get the code into the right state for the functionality we need to exercise.

The mess in such tests is telling us that the design isn’t right but, instead of fixing the problem by improving the code, we have to carry the extra complexity in both code and test…”

Don’t Mock Types You Can’t Change

Page 66: AWS Lambda from the trenches

“…The second risk is that we have to be sure that the behaviour we stub or mock matches what the external library will actually do…

Even if we get it right once, we have to make sure that the tests remain valid when we upgrade the libraries…”

Don’t Mock Types You Can’t Change

Page 67: AWS Lambda from the trenches

Don’t Mock Types You Can’t ChangeServices

Page 68: AWS Lambda from the trenches

“…Wherever possible, an acceptance test should exercise the system end-to-end without directly calling its internal code.

An end-to-end test interacts with the system only from the outside: through its interface…”

Testing End-to-End

Page 69: AWS Lambda from the trenches

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearchAmazon API Gateway Amazon Lambda

Page 70: AWS Lambda from the trenches

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearchAmazon API Gateway Amazon Lambda

Test Input

Page 71: AWS Lambda from the trenches

Legacy Monolith Amazon Kinesis Amazon Lambda

Amazon CloudSearchAmazon API Gateway Amazon Lambda

Test Input

Validate

Page 72: AWS Lambda from the trenches

CI + CD PIPELINE

Page 73: AWS Lambda from the trenches

“the earlier you consider CI + CD, the more time you save in the long run”

- me

Page 74: AWS Lambda from the trenches

“…We prefer to have the end-to-end tests exercise both the system and the process by which it’s built and deployed…

This sounds like a lot of effort (it is), but has to be done anyway repeatedly during the software’s lifetime…”

Testing End-to-End

Page 75: AWS Lambda from the trenches

“deployment scripts that only live on the CI

box is a disaster waiting to happen”

- me

Page 76: AWS Lambda from the trenches

Jenkins build config deploys and tests

unit + integration tests

deploy

acceptance tests

Page 77: AWS Lambda from the trenches

build.sh allows repeatable builds on both local & CI

Page 78: AWS Lambda from the trenches
Page 79: AWS Lambda from the trenches

Auto Auto Manual

Page 80: AWS Lambda from the trenches

LOGGING

Page 81: AWS Lambda from the trenches
Page 82: AWS Lambda from the trenches

2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?

Page 83: AWS Lambda from the trenches

2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?

UTC Timestamp API Gateway Request Id

your log message

Page 84: AWS Lambda from the trenches

function name

date

function version

Page 85: AWS Lambda from the trenches

LOG OVERLOAD

Page 86: AWS Lambda from the trenches

CENTRALISE LOGS

Page 87: AWS Lambda from the trenches

CENTRALISE LOGS

MAKE THEM EASILYSEARCHABLE

Page 88: AWS Lambda from the trenches

+ +the elk stack

Page 89: AWS Lambda from the trenches

CloudWatch Logs

Page 90: AWS Lambda from the trenches

CloudWatch Logs AWS Lambda ELK stack

Page 91: AWS Lambda from the trenches

CloudWatch Events

Page 92: AWS Lambda from the trenches

DISTRIBUTED TRACING

Page 93: AWS Lambda from the trenches
Page 94: AWS Lambda from the trenches

“my followers didn’t receive my new post!”

- a user

Page 95: AWS Lambda from the trenches

where could the problem be?

Page 96: AWS Lambda from the trenches

correlation IDs*

* eg. request-id, user-id, yubl-id, etc.

Page 97: AWS Lambda from the trenches

ROLL YOUR OWNCLIENTS

Page 98: AWS Lambda from the trenches

kinesis client

http client

sns client

Page 99: AWS Lambda from the trenches

ROLL YOUR OWNCLIENTS

X-RAY

Page 100: AWS Lambda from the trenches
Page 101: AWS Lambda from the trenches

MONITORING + ALERTING

Page 102: AWS Lambda from the trenches

“where do I install monitoring agents?”

Page 103: AWS Lambda from the trenches

you can’t

Page 104: AWS Lambda from the trenches

• invocation Count• error Count• latency• throttling• granular to the minute• support custom metrics

Page 105: AWS Lambda from the trenches

• same metrics as CW• better dashboard• support custom metrics

https://www.datadoghq.com/blog/monitoring-lambda-functions-datadog/

Page 106: AWS Lambda from the trenches
Page 107: AWS Lambda from the trenches

“how do I batch up and send logs in the

background?”

Page 108: AWS Lambda from the trenches

you can’t (kinda)

Page 109: AWS Lambda from the trenches

console.log(“hydrating yubls from db…”);

console.log(“fetching user info from user-api”);

console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”);

console.log(“MONITORING|1489795335|8|count|yubls-served”);

timestamp metric value

metric type

metric namemetrics

logs

Page 110: AWS Lambda from the trenches

CloudWatch Logs AWS Lambda

ELK stacklogs

metrics

CloudWatch

Page 111: AWS Lambda from the trenches

DASHBOARDS

Page 112: AWS Lambda from the trenches

DASHBOARDS

SET ALARMS

Page 113: AWS Lambda from the trenches

DASHBOARDS

SET ALARMS

TRACK APP-LEVELMETRICS

Page 114: AWS Lambda from the trenches

Not Only CloudWatch

Page 115: AWS Lambda from the trenches
Page 116: AWS Lambda from the trenches

“you really don't want your monitoring

system to fail at the same time as the

system it monitors” - me

Page 117: AWS Lambda from the trenches

CONFIG MANAGEMENT

Page 118: AWS Lambda from the trenches

easily and quickly propagate config changes

Page 119: AWS Lambda from the trenches
Page 120: AWS Lambda from the trenches

CENTRALISEDCONFIG SERVICE

Page 121: AWS Lambda from the trenches

config servicegoes here

Page 122: AWS Lambda from the trenches
Page 123: AWS Lambda from the trenches
Page 124: AWS Lambda from the trenches
Page 125: AWS Lambda from the trenches

CENTRALISEDCONFIG SERVICE

CLIENT LIBRARY

Page 126: AWS Lambda from the trenches
Page 127: AWS Lambda from the trenches

sensitive data should be encrypted in-flight, and at rest

(credentials, connection string, etc.)

Page 128: AWS Lambda from the trenches

role-based access

Page 129: AWS Lambda from the trenches

KMS

Page 130: AWS Lambda from the trenches

config API

encrypt

role-based access

Page 131: AWS Lambda from the trenches

config API

HTTPS

encrypted at restencrypted in-flight

Page 132: AWS Lambda from the trenches

config API

HTTPSencrypted in-flight

Page 133: AWS Lambda from the trenches

config API

decrypt

role-based access

Page 134: AWS Lambda from the trenches

config API

HTTPSaccess to config API can be controlled with IAM roles*

*http://amzn.to/2mxTOyH

role-based access

Page 135: AWS Lambda from the trenches

KMS

FRAMEWORKPLUG-INS

Page 136: AWS Lambda from the trenches

plug-ins

serverless-plugin-kmsvariables

serverless-secrets

serverless-meta-sync

Page 137: AWS Lambda from the trenches

PRO TIPS

Page 138: AWS Lambda from the trenches

MAP TIMEOUTSTO HTTP 504

Page 139: AWS Lambda from the trenches
Page 140: AWS Lambda from the trenches
Page 141: AWS Lambda from the trenches

AVOID 128MBFOR PRODUCTION

Page 142: AWS Lambda from the trenches

continuous timeout loop…

Page 143: AWS Lambda from the trenches

AVOIDCOLDSTARTS

Page 144: AWS Lambda from the trenches

functions are unloaded if idle for a while

Page 145: AWS Lambda from the trenches

noticeable coldstart time(package size matters)

Page 146: AWS Lambda from the trenches

CloudWatch Event AWS Lambda

Page 147: AWS Lambda from the trenches

CloudWatch Event AWS Lambda

ping

ping

ping

ping

Page 148: AWS Lambda from the trenches

CloudWatch Event AWS Lambda

ping

ping

ping

ping

Page 149: AWS Lambda from the trenches

CloudWatch Event AWS Lambda

ping

ping

ping

ping

HEALTH CHECKS?

Page 150: AWS Lambda from the trenches

even then…

Page 151: AWS Lambda from the trenches

functions are recycled every 4 hours

Page 152: AWS Lambda from the trenches

https://www.iopipe.com/2016/09/understanding-aws-lambda-coldstarts/

Page 153: AWS Lambda from the trenches

https://www.iopipe.com/2016/09/understanding-aws-lambda-coldstarts/

Coldstarts happen, with few exceptions, 4 hours from the creation of a host VM.

Page 154: AWS Lambda from the trenches

AVOID HARDASSUMPTIONS

ABOUT FUNCTIONLIFETIME

Page 155: AWS Lambda from the trenches

USE STATE FOR

OPTIMISATION

Page 156: AWS Lambda from the trenches

CLEAN UP OLDPACKAGES

Page 157: AWS Lambda from the trenches

max 50 MB deployment package size

Page 158: AWS Lambda from the trenches

max 50 MB deployment package sizemax 75 GB total deployment package size*

* limit is per AWS region

Page 159: AWS Lambda from the trenches

Janitor Monkey

Page 160: AWS Lambda from the trenches

Janitor Lambda

http://bit.ly/2nOAzlt

Page 161: AWS Lambda from the trenches

USE RECURSIONFOR LONG

RUNNING TASKS

Page 162: AWS Lambda from the trenches

max 5 mins execution time

Page 163: AWS Lambda from the trenches

CONSIDERPARTIAL

FAILURES

Page 164: AWS Lambda from the trenches

“AWS Lambda polls your stream and invokes your Lambda function. Therefore, if

a Lambda function fails, AWS Lambda attempts to process the erring batch of

records until the time the data expires…”

http://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html

Page 165: AWS Lambda from the trenches

should function fail on partial/any failures?

Page 166: AWS Lambda from the trenches

use local state to facilitate partial retries

Page 167: AWS Lambda from the trenches

DLQ after max attempts

Page 168: AWS Lambda from the trenches

PROCESS SQSWITH RECURSIVE

FUNCTIONS

Page 169: AWS Lambda from the trenches

http://bit.ly/2npomX6

Page 170: AWS Lambda from the trenches

AVOID HOTKINESS

STREAMS

Page 171: AWS Lambda from the trenches

“Each shard can support up to 5 transactions per second for reads, up to a maximum total data

read rate of 2 MB per second.”

http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html

Page 172: AWS Lambda from the trenches

“If your stream has 100 active shards, there will be 100 Lambda functions running concurrently. Then, each

Lambda function processes events on a shard in the order that they arrive.”

http://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html

Page 173: AWS Lambda from the trenches

when no. of processors goes up…

Page 174: AWS Lambda from the trenches

ReadProvisionedThroughputExceeded

can have too many Kinesis read operations…

Page 175: AWS Lambda from the trenches

ReadRecords.IteratorAge

unpredictable spikes in read ‘latency’…

Page 176: AWS Lambda from the trenches

can kinda workaround…

Page 177: AWS Lambda from the trenches

@theburningmonktheburningmonk.comgithub.com/theburningmonk

Page 178: AWS Lambda from the trenches

Yubl’s journey to Serverlesspart 1 : overview http://theburningmonk.com/2016/12/yubls-road-to-serverless-architecture-part-1/

part 2 : test + CI/CD http://theburningmonk.com/2017/02/yubls-road-to-serverless-architecture-part-2/

part 3 : ops http://theburningmonk.com/2017/03/yubls-road-to-serverless-architecture-part-3/

Page 179: AWS Lambda from the trenches

QUESTIONS?