Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San...

39
Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov

Transcript of Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San...

Page 1: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Evolution of Edge@Netflix

QCon San Francisco - 2019Vasily Vlasov

Page 2: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Twitter: @VlasovVasily

Vasily Vlasov

● 15+ years in Software Engineering● 2018-now Edge Services @Netflix ● 2013-2018 Edge Services @Apple

Page 3: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

One of the most popular questions

What is Edge?

Page 4: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

The closer a concern is to a client, the Edgier it is.

Just for the purpose of this talk

Page 5: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Time to market is key

Stage 1/4

What’s important right now?

Page 6: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Introduce sane practices that don’t override good judgement

Standards and tooling take effort. Single app may not need them

Page 7: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

APIAPI

Common early stage architecture

Load

Ba

lanc

er

Internet Company’s ecosystem

API

BusinessLogic

AuthN/AuthZ

Rate Limiting

Insights

Logging

HTT

P

HTT

PS

DB

MyApp is an Edge concernLoad balancer and DNS are as well

Page 8: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Netflix: Early Streaming days

Load

Ba

lanc

er

Internet Netflix Datacenter

BusinessLogic

AuthN/AuthZ

Rate Limiting

Insights

Logging

HTT

P

HTT

PS

DB

APIAPINCCP

BusinessLogic

AuthN/AuthZ

Rate Limiting

Insights

Logging

Find the differences with previous slide

Page 9: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Stage 1 (summary)

● Optimizing for time to market: ○ 3-tier architecture ○ Monolithic API service

● Edge concerns: ○ API is Edge Service○ Load-balancer + TLS○ DNS: Single domain name

Keep it simple

Page 10: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Scale and engineering velocity

Stage 2/4

We are still in business!

Page 11: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Introducing microservices

Microservices bring costs and benefits Internet Company ecosystem

Order

Product

Account

Load

Ba

lanc

er

API

BusinessLogic

AuthN/AuthZ

Rate Limiting

Insights

Logging

HTT

P

HTT

PS

DB

Price

Page 12: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Thank you API team for increased velocity, how can we help you?

The return of a monolith © Josh Evans

Page 13: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Client-side orchestration vs API Gateway orchestration

Microservices at the Edge

API Gateway

LB

LB

LB APIGW

Client-Side

LB INIT

Page 14: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

APISplitting Edge@Netflix

First split by functionality Internet Netflix Datacenter

Load Balancer

Load Balancer

BusinessLogic

AuthN/AuthZ

Rate Limiting

Insights

Logging

NCCP

BusinessLogic

AuthN/AuthZ

Rate Limiting

Insights

Logging

ContentDiscovery

Playback

Page 15: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Logs

Reducing coupling

Internet Netflix

API

API Gateway(Zuul)

BusinessLogic

Base Server

Routing

AuthN

Insights

Rate Limiting

Enrichment

...

Load Balancer

Load Balancer

NCCP

BusinessLogic

Base Server

BusinessLogic

Base Server

Add a level of indirection and reduce coupling

Page 16: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Logs

Reducing coupling

Internet Netflix

API

API Gateway(Zuul)

BusinessLogic

Base Server

Routing

AuthN

Insights

Rate Limiting

Enrichment

...

Load Balancer

Load Balancer

NCCP

BusinessLogic

Base Server

BusinessLogic

Base Server

Add a level of indirection and reduce coupling

Node.js

Page 17: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

API Gateway reduces coupling between clients and edge services.

Which pattern is this: Bridge, Decorator or something else? Please, let me know @vlasovvasily

Page 18: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Authentication

Internet Netflix

Service Zuul

OAuth

mTLS

more...

+ HTTP Header (E2E Identity)JWT: { "cid": "1234567890", "name": "Vasily Vlasov", "atype": “oauth” }

Forward E2EIdentity token

Have you tried writing tests for app that required 2-factor-auth?

Page 19: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Routing

Imagine what happens if incorrect routing rule is deployed... Internet Netflix

Zuul

api-tvui

api-global

Routing Config

ab

ab-vvlasov

Page 20: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

API Gateway provides insights and increases client-perceived resiliency.

Being centrally located - it’s a great place for insights

Page 21: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Insights

Centralized metrics, fine-grained tracing of requests and zero-config anomaly detection Internet Netflix

Zuul

WHERE status >= 500 && path ==~ /^\/ios.*/

SAMPLE 5%

Errors

RAJU ALERT

Atlas

Page 22: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Perceived resilience with API gateway

Resiliency

CustomLoad-balancing

(choice of 2)

Retrieson behalfof client

Zuul Zuul

🔥

Page 23: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

How do we scale Gateway team to provide these features?

Well, we don’t do this, instead there is a contribution model

Page 24: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Extensibility

Network Edge: Why and whenVasily Vlasov Internet Netflix

Net

ty S

erve

r H

andl

ers

Net

ty C

lient

H

andl

ers

Endp

oint

Filte

r

Inbound Filters

OutboundFilters

Orig

in

Serv

ice

Request

Response

Zuul: Netty, Java

Page 25: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Stage 2 (summary)

● Optimizing for engineering velocity:○ Introducing microservices○ Splitting Edge services

● Edge concerns: ○ Multiple domain names○ INIT service to enable domain migration ○ API Gateway (Zuul):

■ Reduces coupling between clients and services■ A leverage point for cross-cutting concerns

Mission accomplished, microservices are in!

Page 26: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Resiliency and QoS

Stage 3/4

We care about our customers more and more!

Page 27: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Most of the incidents are self-inflicted

We just can’t rely on 99.(9) availability of infrastructure

Page 28: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

All regions are equal

Multi-RegionDeployment

Client

Page 29: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

ZuulZuul

DNS-based steering

Such fine grained routing is not easy to achieve with anycast.

api.netflix.com

api.geo.netflix.com

api.us-west-2.prodaa.netflix.com.

api.us-east-1-sa.prodaa.netflix.com.

api.eu-west-1.prodaa.netflix.com.

ZuulZuulZuulZuulHTTP/2

ALB ALB

HTTP/2

api.us-east-1-na.prodaa.netflix.com.

ZuulZuulZuul

ALB

HTTP/2

Page 30: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Stage 3 (summary)

● Resiliency and improved latency:○ Multiple regions○ Active-active prod data replication

● Edge concerns: ○ Geo-DNS traffic steering ○ Tooling for failover○ Cross-region proxying in failed over state

We’ve done a lot!

Page 31: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Speed of light

Stage 4/4

If only we know how to fix that…

Page 32: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Speed of light is finite. Distance affects round-trip time.

300,000 km/s (186,000 miles/s) in vacuum. Only gets lower in non-vacuum

Page 33: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Some clients have limited capabilities and are non-upgradable

Long distance connection

ClientHTTP/1.x

Page 34: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

0 ms

100 ms

200 ms

300 ms

50 ms

150 ms

250 ms

TTFB: 400 ms

Client AWS100 ms

RTT

TCP + TLS Handshake

SYN

SYN ACK

ACK

ClientHello ServerHelloCertificate

ServerHelloDoneClientKeyExchangeChangeCipherSpec

FinishedChangeCipherSpec

Finished

HTTP Request

HTTP Response

TCP

- 100

ms

TLS - 2

00

ms

Page 35: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Point of presence, close to clients and backbone

Bringing TLS close to customer

Client

PoP

HTTP/2

HTTP/1.x

Page 36: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

HTTP Response

0 ms

30 ms

60 ms

90 ms

15 ms

45 ms

75 ms

TTFB: 220 ms

Client PoP30 ms RTT

Request via PoPAWS

100 ms RTT(persistent connection)

SYN

SYN ACKACK

ClientHello ServerHelloCertificate

ServerHelloDoneClientKeyExchangeChangeCipherSpec

Finished ChangeCipherSpecFinished

HTTP Request

TCP - 30m

sTLS - 60m

s

Page 37: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Stage 4 (summary)

● Optimizing client connectivity:○ TCP + TLS ○ TCP loss recovery ○ Congestion avoidance

● Edge concerns: ○ Points-of-presence○ Backbone (buy or rent)○ Anycast/unicast combo for user steering? ○ PoP => API Gateway custom protocol?

Now a truly world-wide deployment!

Page 38: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

A well-designed Edge enables evolution of the business

A takeaway - evolution is key.

Page 39: Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San Francisco - 2019 Vasily Vlasov. Twitter: @VlasovVasily Vasily Vlasov 15+ years in Software

Thank you