Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the...
Transcript of Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the...
Service MeshTechnology Deep dive and reasons for adoption
Diógenes Rettori - @rettoriExecutive Director - Cloud ArchitectureJPMorgan Chase & Co.
One Message.
One Message.
rettori
Agenda
Quick Introduction to Service Mesh 5m
Service Mesh x Distributed Systems 5m
Technology Options 2m
Istio and Linkerd Deep Dive 20m
How to chose 5m
Recommended tools 3
rettori
Service MeshIstio & Linkerd
Diogenes Rettori & Tiago Vieira
Currently Writing
rettori
Quick Intro to Service Mesh
booking
payments
catalog
notifications
rettori
Quick Intro to Service Mesh
booking
payments
catalog
notifications
?
rettori
AZ-1 AZ-2
Quick Intro to Service Mesh
booking
payments
catalog
notifications
rettori
Quick Intro to Service Mesh
booking
payments
catalog
notificationspayments
catalog
notificationspayments
catalog
rettori
Quick Intro to Service Mesh
booking
payments
catalog
notificationspayments
catalog
notificationspayments
catalog
rettori
Quick Intro to Service Mesh
booking
payments
catalog
notifications
catalog
notifications
catalog
1000/ day50 /second
rettori
Quick Intro to Service Mesh
booking
payments
catalog
notifications
catalog
notifications
catalog
1000/ day50 /second
rettori
Quick Intro to Service Mesh
booking
payments
catalog
notifications
rettori
Quick Intro to Service Mesh
booking
payments
catalog
notifications
rettori
Quick Intro to Service Mesh
booking
payments
catalog
notifications
rettori
Quick Intro to Service Mesh
booking
payments
catalog
notifications
rettori
Quick Intro to Service Mesh
booking
payments
catalog
notifications
rettori
A Service Mesh is an intelligent communications network that understands the relationships
between microservices and interferes in the traffic to increase the reliability and security of the
whole system.
rettori
A Service Mesh is an intelligent communications network that understands the relationships
between microservices and interferes in the traffic to increase the reliability and security of the
whole system.
Addresses needs of distributed systems.
rettori
Service Mesh x Distributed Systems
The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn’t change
There is one administrator.
Transport cost is zero.
The network is homogeneous.
Fallacies of Distributed Systems
Given that they are fallacies, we should
assume the opposite.
rettori
Service Mesh x Distributed Systems
The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn’t change
There is one administrator.
Transport cost is zero.
The network is homogeneous.
Fallacies of Distributed Systems
Circuit breaking and load balancing
Timeouts and retries
Rating and limiting
Mutual TLS
Service discovery
Role-based access control
gRPC / RSocket
Dynamic routing — A/B, canary deployments
Service Mesh features
rettori
Technology Options
AWS App Mesh
Service Mesh
rettori
rettori
Istio & Linkerd Deep Dive
● Traffic Management
● Security
● Installation / Configuration
● Supported Environments
● Observability
● Policy Management
● Performance
rettori
Traffic Management
Istio Comments Linkerd Comments
TCP Proxying Yes Yes
Load Balancing YesSupports: Round Robin, Least Conn, Random and Passthrough
Yesuses EWMA (exponentially weighted moving average) to identify optimal targets
Subset Load Balancing YesUseful for Canary, Blue/Green deployments and A/B tests
No
Session Affinity YesCookie Hash-based LB for HTTP providing soft session affinity
No
Circuit Breaking Yes as Outlier Detectionsee
comments
no configuration options. EWMA balancing will give less preference to unhealthy targets, achieving something circuit-breaker-like
Retries Yes Yes
rettori
Traffic Management
Points to Consider
- Load Balancing algorithms
- Subset Load Balancing
- Session Affinity
Round Robin, Least Conn, Random and Passthrough
Peak EWMA: maintain a moving average of each replica’s round-trip time, weighted by the number of outstanding requests, and distribute traffic to replicas where that cost function is smallest.
catalog v3
catalog v2
catalog v1
catalog green
catalog blue
rettori
Peak EWMA
EWMAt=λYt+(1−λ)EWMAt−1
For t=1,2,…,n.Where- EWMA0 is the mean of historical data (target) - Yt is the observation at time t n is the number of observations to be monitored including EWMA0 - - - 0<λ≤1 is a constant that determines the depth of memory of the EWMA.
rettori
T node1 node2 node3 EWMA NODE3 EWMA NODE3 EWMA NODE3
1 32 43 33
2 35 64 43 30.00 30.00 30.00
3 64 24 53 32.50 47.00 36.50
4 53 53 63 48.25 35.50 44.75
5 13 31 24 50.63 44.25 53.88
6 24 14 35 31.81 37.63 38.94
7 53 32 64 27.91 25.81 36.97
8 45 43 52 40.45 28.91 50.48
9 65 352 22 42.73 35.95 51.24
10 75 124 3402 53.86 193.98 36.62
11 14 464 35 64.43 158.99 1719.31
12 24 32 45 39.22 311.49 877.16
13 26 35 452 31.61 171.75 461.08
14 63 131 53 28.80 103.37 456.54
15 134 24 234 45.90 117.19 254.77
16 1353 531 53 89.95 70.59 244.38
17 314 132 522 721.48 300.80 148.69
517.74 216.40 335.35
rettori
Traffic ManagementIstio Comments Linkerd Comments
Retry Budgets No YesUsed to avoid retry storms and unnecessary retries.
Timeouts Yes Yes
Fault Injection Yes No
Ingress YesProvided by the Istio ingress-gateway, other gateways supported as well.
see comments
Linkerd does not ship its own Ingress proxy but can be configured to work with popular options such as Nginx, Gloo, and others.
Traffic Filters Yescustom envoy filters can be added to the chain.
No
External Routing Yes Yes
Header-based matching Yes No only path-based matching
Add/Change/Remove custom headers
Yes No
rettori
Points to Consider
- Fault Injection
- Custom Envoy Filters
- Header Based Matching
- Add / Remove Headers
Traffic Management
rettori
Custom Envoy Filter - Gloo Example
rettori
Retry Budgets and Retry Storm
A retry storm is an undesirable client/server failure mode where one or
more peers become unhealthy, causing clients to retry a significant
fraction of requests. This has the effect of multiplying the volume of
traffic sent to the unhealthy peers, exacerbating the problem.
Traffic Management
rettori
Retry Budgets and Retry Storm
Traffic Management
paymentsbooking !
paymentsbooking ! ! !
paymentsbooking
- Retry Ratio - amount of retries based on number of requests - example, 20%- TTL - how long should requests be considered
rettori
SecurityIstio Comments Linkerd
Supports mTLS Yes Yes
TLS on By Default see commentsIstio instructions include details on how to install with TLS on both permissive and restrictive mode
Yes
Certificate Rotation Yes Yes
External Root Certificate Support Yes Yes
Both technologies support Mutual TLS and can rely on external Root certificates.
rettori
For Linkerd, the pre-check (or check --pre) verifies if you have the permission to create Kubernetes resources required during the install process.
Installation and Configuration
Istio Linkerd Comments
Prerequisites check No Yes linkerd check --pre
Requires Sidecar No Yes
Supports automatic Sidecar Injection Yes Yes
rettori
Supported Environments and Deployment Models
Istio Comments Linkerd
Kubernetes Yes Yes
Non Kubernetes Yes Virtual Machines, Cloud Foundry, Consul/Nomad No
Multi-cluster Support - Multiple Control Planes Yes Yes
Multi-cluster Support - Single control plane Yes No
Points to Consider
- Linkerd 2.3 only Supports Kubernetes
- Both support multi-cluster with multiple control planes.
- Istio handles more complex multi-cluster scenarios
rettori
Istio - Multi-Cluster - Multiple Control Planes
rettori
Istio - Multi-Cluster - Single Control Plane - VPN
rettori
Istio - Multi-Cluster - SCP - Border Gateways
rettori
Observability
Istio Comments Linkerd Comments
Admin Dashboard No Yes
Observability Dashboard Yes Includes Kiali YesIncludes the Linkerd dashboard and also pre-configured Grafana dashboards
Tracing Yes No
Tracing can still be achieved by instrumenting applications.For debugging purposes, the Tap feature allows you to 'listen' to traffic on a resource.
Point to Consider
- Linkerd does not have Distributed Tracing but has a Tap Feature.
rettori
Observability
rettori
Observability
rettori
Policy Management
Template Provider
API Key
Analytics Apigee
Authorization Apigee, OPA
Check Nothing Denier
Edge
Kubernetes Kubernetes Env
List Entry Denier, List
Log Entry Fluentd, SolarWinds, Stackdriver, Stdio
MetricApache SkyWalking, Circonus, CloudWatch, Datadog, Prometheus, SignalFx, SolarWinds,
Stackdriver, StatsD, Stdio, Wavefront by VMware
Quota Denier, Memory quota, Redis Quota
Report Nothing
Trace Span SignalFx, Stackdriver, Zipkin
rettori
Policy Management
Istio Linkerd Comments
OIDC/Oauth2 Yes No Principal authentication is delegated to the applications
Rate Limits Yes No
Adapter Support Yes No
Point to Consider
- Linkerd does not a policy management system such as Istio. Policy needs to be
implemented at an Ingress or Application Level.
rettori
Performance
On the server side, the Istio/Envoy sidecar uses ~60% more CPU than Linkerd.
Source: https://medium.com/@michael_87395/benchmarking-istio-linkerd-cpu-c36287e32781
rettori
Performance
On the Linkerd2-meshed setup, the p99.9 latency (red) ranged from 8.0 ms to 12.0 ms.
The p99.9 latency (red) incurred by the Istio-meshed setup, ranging from 35.0 ms to 55.0 ms. The p99 latency (orange) fell in the range of 22.6 ms to 27.2 ms.
Source: https://medium.com/@ihcsim/linkerd-2-0-and-istio-performance-benchmark-df290101c2bb
rettori
How to Know if you need a Service Mesh
Service Governance05
Multiple Language Platforms04
Service Availability / SLA03
Running Distributed Systems01
Advanced CI/CD Pipelines02
rettori
Recommended Tools
$ supergloo install istio
$ supergloo install linkerdsupergloo.solo.io
Service Mesh Observabilitykiali.io
flaggerFlagger is a Kubernetes operator that automates the promotion of canary deployments using Istio or App Mesh routing for traffic shifting and Prometheus metrics for canary analysis.
flagger.app
rettori
Thank you.