Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San...
Transcript of Evolution of Edge @Netflix - QCon San Francisco€¦ · Evolution of Edge @Netflix QCon San...
Evolution of Edge@Netflix
QCon San Francisco - 2019Vasily Vlasov
Twitter: @VlasovVasily
Vasily Vlasov
● 15+ years in Software Engineering● 2018-now Edge Services @Netflix ● 2013-2018 Edge Services @Apple
One of the most popular questions
What is Edge?
The closer a concern is to a client, the Edgier it is.
Just for the purpose of this talk
Time to market is key
Stage 1/4
What’s important right now?
Introduce sane practices that don’t override good judgement
Standards and tooling take effort. Single app may not need them
APIAPI
Common early stage architecture
Load
Ba
lanc
er
Internet Company’s ecosystem
API
BusinessLogic
AuthN/AuthZ
Rate Limiting
Insights
Logging
HTT
P
HTT
PS
DB
MyApp is an Edge concernLoad balancer and DNS are as well
Netflix: Early Streaming days
Load
Ba
lanc
er
Internet Netflix Datacenter
BusinessLogic
AuthN/AuthZ
Rate Limiting
Insights
Logging
HTT
P
HTT
PS
DB
APIAPINCCP
BusinessLogic
AuthN/AuthZ
Rate Limiting
Insights
Logging
Find the differences with previous slide
Stage 1 (summary)
● Optimizing for time to market: ○ 3-tier architecture ○ Monolithic API service
● Edge concerns: ○ API is Edge Service○ Load-balancer + TLS○ DNS: Single domain name
Keep it simple
Scale and engineering velocity
Stage 2/4
We are still in business!
Introducing microservices
Microservices bring costs and benefits Internet Company ecosystem
Order
Product
Account
Load
Ba
lanc
er
API
BusinessLogic
AuthN/AuthZ
Rate Limiting
Insights
Logging
HTT
P
HTT
PS
DB
Price
Thank you API team for increased velocity, how can we help you?
The return of a monolith © Josh Evans
Client-side orchestration vs API Gateway orchestration
Microservices at the Edge
API Gateway
LB
LB
LB APIGW
Client-Side
LB INIT
APISplitting Edge@Netflix
First split by functionality Internet Netflix Datacenter
Load Balancer
Load Balancer
BusinessLogic
AuthN/AuthZ
Rate Limiting
Insights
Logging
NCCP
BusinessLogic
AuthN/AuthZ
Rate Limiting
Insights
Logging
ContentDiscovery
Playback
Logs
Reducing coupling
Internet Netflix
API
API Gateway(Zuul)
BusinessLogic
Base Server
Routing
AuthN
Insights
Rate Limiting
Enrichment
...
Load Balancer
Load Balancer
NCCP
BusinessLogic
Base Server
BusinessLogic
Base Server
Add a level of indirection and reduce coupling
Logs
Reducing coupling
Internet Netflix
API
API Gateway(Zuul)
BusinessLogic
Base Server
Routing
AuthN
Insights
Rate Limiting
Enrichment
...
Load Balancer
Load Balancer
NCCP
BusinessLogic
Base Server
BusinessLogic
Base Server
Add a level of indirection and reduce coupling
Node.js
API Gateway reduces coupling between clients and edge services.
Which pattern is this: Bridge, Decorator or something else? Please, let me know @vlasovvasily
Authentication
Internet Netflix
Service Zuul
OAuth
mTLS
more...
+ HTTP Header (E2E Identity)JWT: { "cid": "1234567890", "name": "Vasily Vlasov", "atype": “oauth” }
Forward E2EIdentity token
Have you tried writing tests for app that required 2-factor-auth?
Routing
Imagine what happens if incorrect routing rule is deployed... Internet Netflix
Zuul
api-tvui
api-global
Routing Config
ab
ab-vvlasov
API Gateway provides insights and increases client-perceived resiliency.
Being centrally located - it’s a great place for insights
Insights
Centralized metrics, fine-grained tracing of requests and zero-config anomaly detection Internet Netflix
Zuul
WHERE status >= 500 && path ==~ /^\/ios.*/
SAMPLE 5%
Errors
RAJU ALERT
Atlas
Perceived resilience with API gateway
Resiliency
CustomLoad-balancing
(choice of 2)
Retrieson behalfof client
Zuul Zuul
🔥
How do we scale Gateway team to provide these features?
Well, we don’t do this, instead there is a contribution model
Extensibility
Network Edge: Why and whenVasily Vlasov Internet Netflix
Net
ty S
erve
r H
andl
ers
Net
ty C
lient
H
andl
ers
Endp
oint
Filte
r
Inbound Filters
OutboundFilters
Orig
in
Serv
ice
Request
Response
Zuul: Netty, Java
Stage 2 (summary)
● Optimizing for engineering velocity:○ Introducing microservices○ Splitting Edge services
● Edge concerns: ○ Multiple domain names○ INIT service to enable domain migration ○ API Gateway (Zuul):
■ Reduces coupling between clients and services■ A leverage point for cross-cutting concerns
Mission accomplished, microservices are in!
Resiliency and QoS
Stage 3/4
We care about our customers more and more!
Most of the incidents are self-inflicted
We just can’t rely on 99.(9) availability of infrastructure
All regions are equal
Multi-RegionDeployment
Client
ZuulZuul
DNS-based steering
Such fine grained routing is not easy to achieve with anycast.
api.netflix.com
api.geo.netflix.com
api.us-west-2.prodaa.netflix.com.
api.us-east-1-sa.prodaa.netflix.com.
api.eu-west-1.prodaa.netflix.com.
ZuulZuulZuulZuulHTTP/2
ALB ALB
HTTP/2
api.us-east-1-na.prodaa.netflix.com.
ZuulZuulZuul
ALB
HTTP/2
Stage 3 (summary)
● Resiliency and improved latency:○ Multiple regions○ Active-active prod data replication
● Edge concerns: ○ Geo-DNS traffic steering ○ Tooling for failover○ Cross-region proxying in failed over state
We’ve done a lot!
Speed of light
Stage 4/4
If only we know how to fix that…
Speed of light is finite. Distance affects round-trip time.
300,000 km/s (186,000 miles/s) in vacuum. Only gets lower in non-vacuum
Some clients have limited capabilities and are non-upgradable
Long distance connection
ClientHTTP/1.x
0 ms
100 ms
200 ms
300 ms
50 ms
150 ms
250 ms
TTFB: 400 ms
Client AWS100 ms
RTT
TCP + TLS Handshake
SYN
SYN ACK
ACK
ClientHello ServerHelloCertificate
ServerHelloDoneClientKeyExchangeChangeCipherSpec
FinishedChangeCipherSpec
Finished
HTTP Request
HTTP Response
TCP
- 100
ms
TLS - 2
00
ms
Point of presence, close to clients and backbone
Bringing TLS close to customer
Client
PoP
HTTP/2
HTTP/1.x
HTTP Response
0 ms
30 ms
60 ms
90 ms
15 ms
45 ms
75 ms
TTFB: 220 ms
Client PoP30 ms RTT
Request via PoPAWS
100 ms RTT(persistent connection)
SYN
SYN ACKACK
ClientHello ServerHelloCertificate
ServerHelloDoneClientKeyExchangeChangeCipherSpec
Finished ChangeCipherSpecFinished
HTTP Request
TCP - 30m
sTLS - 60m
s
Stage 4 (summary)
● Optimizing client connectivity:○ TCP + TLS ○ TCP loss recovery ○ Congestion avoidance
● Edge concerns: ○ Points-of-presence○ Backbone (buy or rent)○ Anycast/unicast combo for user steering? ○ PoP => API Gateway custom protocol?
Now a truly world-wide deployment!
A well-designed Edge enables evolution of the business
A takeaway - evolution is key.
Thank you