Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San...

Post on 27-Jun-2020

10 views 0 download

Transcript of Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San...

Evolution of Edge@Netflix

QCon San Francisco - 2019Vasily Vlasov

Twitter: @VlasovVasily

Vasily Vlasov

● 15+ years in Software Engineering● 2018-now Edge Services @Netflix ● 2013-2018 Edge Services @Apple

One of the most popular questions

What is Edge?

The closer a concern is to a client, the Edgier it is.

Just for the purpose of this talk

Time to market is key

Stage 1/4

What’s important right now?

Introduce sane practices that don’t override good judgement

Standards and tooling take effort. Single app may not need them

APIAPI

Common early stage architecture

Load

Ba

lanc

er

Internet Company’s ecosystem

API

BusinessLogic

AuthN/AuthZ

Rate Limiting

Insights

Logging

HTT

P

HTT

PS

DB

MyApp is an Edge concernLoad balancer and DNS are as well

Netflix: Early Streaming days

Load

Ba

lanc

er

Internet Netflix Datacenter

BusinessLogic

AuthN/AuthZ

Rate Limiting

Insights

Logging

HTT

P

HTT

PS

DB

APIAPINCCP

BusinessLogic

AuthN/AuthZ

Rate Limiting

Insights

Logging

Find the differences with previous slide

Stage 1 (summary)

● Optimizing for time to market: ○ 3-tier architecture ○ Monolithic API service

● Edge concerns: ○ API is Edge Service○ Load-balancer + TLS○ DNS: Single domain name

Keep it simple

Scale and engineering velocity

Stage 2/4

We are still in business!

Introducing microservices

Microservices bring costs and benefits Internet Company ecosystem

Order

Product

Account

Load

Ba

lanc

er

API

BusinessLogic

AuthN/AuthZ

Rate Limiting

Insights

Logging

HTT

P

HTT

PS

DB

Price

Thank you API team for increased velocity, how can we help you?

The return of a monolith © Josh Evans

Client-side orchestration vs API Gateway orchestration

Microservices at the Edge

API Gateway

LB

LB

LB APIGW

Client-Side

LB INIT

APISplitting Edge@Netflix

First split by functionality Internet Netflix Datacenter

Load Balancer

Load Balancer

BusinessLogic

AuthN/AuthZ

Rate Limiting

Insights

Logging

NCCP

BusinessLogic

AuthN/AuthZ

Rate Limiting

Insights

Logging

ContentDiscovery

Playback

Logs

Reducing coupling

Internet Netflix

API

API Gateway(Zuul)

BusinessLogic

Base Server

Routing

AuthN

Insights

Rate Limiting

Enrichment

...

Load Balancer

Load Balancer

NCCP

BusinessLogic

Base Server

BusinessLogic

Base Server

Add a level of indirection and reduce coupling

Logs

Reducing coupling

Internet Netflix

API

API Gateway(Zuul)

BusinessLogic

Base Server

Routing

AuthN

Insights

Rate Limiting

Enrichment

...

Load Balancer

Load Balancer

NCCP

BusinessLogic

Base Server

BusinessLogic

Base Server

Add a level of indirection and reduce coupling

Node.js

API Gateway reduces coupling between clients and edge services.

Which pattern is this: Bridge, Decorator or something else? Please, let me know @vlasovvasily

Authentication

Internet Netflix

Service Zuul

OAuth

mTLS

more...

+ HTTP Header (E2E Identity)JWT: { "cid": "1234567890", "name": "Vasily Vlasov", "atype": “oauth” }

Forward E2EIdentity token

Have you tried writing tests for app that required 2-factor-auth?

Routing

Imagine what happens if incorrect routing rule is deployed... Internet Netflix

Zuul

api-tvui

api-global

Routing Config

ab

ab-vvlasov

API Gateway provides insights and increases client-perceived resiliency.

Being centrally located - it’s a great place for insights

Insights

Centralized metrics, fine-grained tracing of requests and zero-config anomaly detection Internet Netflix

Zuul

WHERE status >= 500 && path ==~ /^\/ios.*/

SAMPLE 5%

Errors

RAJU ALERT

Atlas

Perceived resilience with API gateway

Resiliency

CustomLoad-balancing

(choice of 2)

Retrieson behalfof client

Zuul Zuul

🔥

How do we scale Gateway team to provide these features?

Well, we don’t do this, instead there is a contribution model

Extensibility

Network Edge: Why and whenVasily Vlasov Internet Netflix

Net

ty S

erve

r H

andl

ers

Net

ty C

lient

H

andl

ers

Endp

oint

Filte

r

Inbound Filters

OutboundFilters

Orig

in

Serv

ice

Request

Response

Zuul: Netty, Java

Stage 2 (summary)

● Optimizing for engineering velocity:○ Introducing microservices○ Splitting Edge services

● Edge concerns: ○ Multiple domain names○ INIT service to enable domain migration ○ API Gateway (Zuul):

■ Reduces coupling between clients and services■ A leverage point for cross-cutting concerns

Mission accomplished, microservices are in!

Resiliency and QoS

Stage 3/4

We care about our customers more and more!

Most of the incidents are self-inflicted

We just can’t rely on 99.(9) availability of infrastructure

All regions are equal

Multi-RegionDeployment

Client

ZuulZuul

DNS-based steering

Such fine grained routing is not easy to achieve with anycast.

api.netflix.com

api.geo.netflix.com

api.us-west-2.prodaa.netflix.com.

api.us-east-1-sa.prodaa.netflix.com.

api.eu-west-1.prodaa.netflix.com.

ZuulZuulZuulZuulHTTP/2

ALB ALB

HTTP/2

api.us-east-1-na.prodaa.netflix.com.

ZuulZuulZuul

ALB

HTTP/2

Stage 3 (summary)

● Resiliency and improved latency:○ Multiple regions○ Active-active prod data replication

● Edge concerns: ○ Geo-DNS traffic steering ○ Tooling for failover○ Cross-region proxying in failed over state

We’ve done a lot!

Speed of light

Stage 4/4

If only we know how to fix that…

Speed of light is finite. Distance affects round-trip time.

300,000 km/s (186,000 miles/s) in vacuum. Only gets lower in non-vacuum

Some clients have limited capabilities and are non-upgradable

Long distance connection

ClientHTTP/1.x

0 ms

100 ms

200 ms

300 ms

50 ms

150 ms

250 ms

TTFB: 400 ms

Client AWS100 ms

RTT

TCP + TLS Handshake

SYN

SYN ACK

ACK

ClientHello ServerHelloCertificate

ServerHelloDoneClientKeyExchangeChangeCipherSpec

FinishedChangeCipherSpec

Finished

HTTP Request

HTTP Response

TCP

- 100

ms

TLS - 2

00

ms

Point of presence, close to clients and backbone

Bringing TLS close to customer

Client

PoP

HTTP/2

HTTP/1.x

HTTP Response

0 ms

30 ms

60 ms

90 ms

15 ms

45 ms

75 ms

TTFB: 220 ms

Client PoP30 ms RTT

Request via PoPAWS

100 ms RTT(persistent connection)

SYN

SYN ACKACK

ClientHello ServerHelloCertificate

ServerHelloDoneClientKeyExchangeChangeCipherSpec

Finished ChangeCipherSpecFinished

HTTP Request

TCP - 30m

sTLS - 60m

s

Stage 4 (summary)

● Optimizing client connectivity:○ TCP + TLS ○ TCP loss recovery ○ Congestion avoidance

● Edge concerns: ○ Points-of-presence○ Backbone (buy or rent)○ Anycast/unicast combo for user steering? ○ PoP => API Gateway custom protocol?

Now a truly world-wide deployment!

A well-designed Edge enables evolution of the business

A takeaway - evolution is key.

Thank you