Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
-
Upload
datawire -
Category
Technology
-
view
280 -
download
7
Transcript of Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
Google confidential │ Do not distribute Google confidential │ Do not distribute
Bringing learnings from Googley microservices with gRPC
Microservices Summit
Varun Talwar
Contents1. Context: Why are we here?2. Learnings from Stubby experience
a. HTTP/JSON doesnt cut itb. Establish a Lingua Francac. Design for fault tolerance and control: Sync/Async, Deadlines, Cancellations, Flow controld. Flying blind without stats e. Diagnosing with tracingf. Load Balancing is critical
3. gRPCa. Cross platform matters !b. Performance and Standards matter: HTTP/2c. Pluggability matters: Interceptors, Name Resolvers, Auth pluginsd. Usability matters !
CONTEXTWHY ARE WE HERE?
Business Agility
Developer Productivity
Performance
INTRODUCING STUBBY
Google confidential │ Do not distribute
Microservices at Google ~O(1010) RPCs per second.
Images by Connie Zhou
Stubby Magic @ Google
Making Google magic available to all
Kubernetes
Borg
Stubby
LEARNINGS FROM
STUBBY
Key learnings
1. HTTP/JSON doesnt cut it !2. Establish a lingua franca3. Design for fault tolerance and provide control knobs4. Dont fly blind: Service Analytics5. Diagnosing problems: Tracing6. Load Balancing is critical
HTTP/JSON doesn’t cut it !
1. WWW, browser growth - bled into services2. Stateless3. Text on the wire4. Loose contracts5. TCP connection per request6. Nouns based7. Harder API evolution8. Think compute, network on cloud platforms
1
Establish a lingua franca
1. Protocol Buffers - Since 2003.2. Start with IDL3. Have a language agnostic way of agreeing on data semantics4. Code Gen in various languages5. Forward and Backward compatibility6. API Evolution
2
How we roll at Google
Google Cloud Platform
Service Definition (weather.proto)
syntax = "proto3";
service Weather { rpc GetCurrent(WeatherRequest) returns (WeatherResponse);}
message WeatherRequest { Coordinates coordinates = 1;
message Coordinates { fixed64 latitude = 1; fixed64 longitude = 2; }}
message WeatherResponse { Temperature temperature = 1; float humidity = 2;}
message Temperature { float degrees = 1; Units units = 2;
enum Units { FAHRENHEIT = 0; CELSIUS = 1; KELVIN = 2; }}
Design for fault tolerance and control
● Sync and Async APIs
● Need fault tolerance: Deadlines, Cancellations
● Control Knobs: Flow control, Service Config, Metadata
3
18
First-class feature in gRPC.
Deadline is an absolute point in time.
Deadline indicates to the server how long the client is willing to wait for an answer.
RPC will fail with DEADLINE_EXCEEDED status code when deadline reached.
gRPC Deadlines
Google Cloud Platform
Deadline Propagation
Gateway
90 ms
Now = 1476600000000
Deadline =
1476600000200
40 ms
20 ms
20 ms 60 ms
withDeadlineAfter(200, MILLISECONDS)
Now = 1476600000040
Deadline =
1476600000200
Now = 1476600000150
Deadline =
1476600000200
Now = 1476600000230
Deadline =
1476600000200
DEADLINE_EXCEEDED DEADLINE_EXCEEDED DEADLINE_EXCEEDED DEADLINE_EXCEEDED
20
Deadlines are expected.
What about unpredictable cancellations?
• User cancelled request.
• Caller is not interested in the result any more.
• etc
Cancellation?
Google Cloud Platform
Cancellation?
GW
Busy Busy Busy
Busy Busy Busy
Busy Busy Busy
Active RPC Active RPC
Active RPC
Active RPC Active RPCActive RPC
Active RPC Active RPC
Active RPC
Google Cloud Platform
Cancellation Propagation
GW
Idle Idle Idle
Idle Idle Idle
Idle Idle Idle
23
Automatically propagated.
RPC fails with CANCELLED status code.
Cancellation status be accessed by the receiver.
Server (receiver) always knows if RPC is valid!
Cancellation
Google Cloud Platform
BiDi Streaming - Slow Client
Fast Server
Request
Responses
Slow Client
CANCELLEDUNAVAILABLE
RESOURCE_EXHAUSTED
Google Cloud Platform
BiDi Streaming - Slow Server
Slow ServerRequest
Response
Fast Client
CANCELLEDUNAVAILABLE
RESOURCE_EXHAUSTED
Requests
26
Flow-control helps to balance computing power and network capacity between client and server.
gRPC supports both client- and server-side flow control.
Flow-Control
Photo taken by Andrey Borisenko.
27
Policies where server tells client what they should do
Can specify deadlines, lb policy, payload size per method of a service
Loved by SREs, they have more control
Discovery via DNS
Service Config
Metadata Exchange - Common cross-cutting concerns
like authentication or tracing rely on the exchange of
data that is not part of the declared interface of a
service. Deployments rely on their ability to evolve these
features at a different rate to the individual APIs
exposed by services.
Metadata helps in exchange of useful information
Don’t fly blind: Stats4
● What is the mean latency time per RPC?● How many RPCs per hour for a service?● Errors in last minute/hour?● How many bytes sent? How many connections to my server?
Data collection by arbitrary metadata is useful● Any service’s resource usage and performance stats in real time by (almost)
any arbitrary metadata1. Service X can monitor CPU usage in their jobs broken down by the name of the invoked RPC
and the mdb user who sent it.2. Social can monitor the RPC latency of shared bigtable jobs when responding to their requests,
broken down by whether the request originated from a user on web/Android/iOS.3. Gmail can collect usage on servers, broken down by according POP/IMAP/web/Android/iOS.
Layer propagates Gmail's metadata down to every service, even if the request was made by an intermediary job that Gmail doesn't own
● Stats layer export data to varz and streamz, and provides stats to many monitoring systems and dashboards
Diagnosing problems: Tracing5
● 1/10K requests takes very long. Its an ad query :-) I need to find out.● Take a sample and store in database; help identify request in sample which
took similar amount of time
● I didnt get a response from the service. What happened? Which link in the service dependency graph got stuck? Stitch a trace and figure out.
● Where is it taking time for a trace? Hotspot analysis● What all are the dependencies for a service?
Load Balancing is important !5
Iteration 1: Stubby BalancerIteration 2: Client side load balancingIteration 3: HybridIteration 4: gRPC-lb
● Current client support intentionally dumb (simplicity).○ Pick first available - Avoid connection establishment latency○ Round-robin-over-list - Lists not sets → ability to represent weights
● For anything more advanced, move the burden to an external "LB Controller", a regular gRPC server and rely on a client-side implementation of the so-called gRPC LB policy.
client LB Controller
backends
1) Control RPC
2) address-list
3) RR over addresses of address-list
gRPC LB
Next gen of load balancing
In summary, what did we learn● Contracts should be strict● Common language helps● Common understanding for deadlines, cancellations, flow control● Common stats/tracing framework is essential for monitoring, debugging● Common framework lets uniform policy application for control and lb
Single point of integration for logging, monitoring, tracing, service discovery and load balancing makes lives much easier !
INTRODUCING gRPC
Open source on Github for C, C++, Java, Node.js, Python, Ruby, Go, C#, PHP, Objective-C
gRPC core
gRPC Java
gRPC Go
● 1.0 with stable APIs● Well documented with an active community● Reliable with continuous running tests on GCE
○ Deployable in your environment
● Measured with an open performance dashboard○ Deployable in your environment
● Well adopted inside and outside Google
Where is the project today?
1. Cross language & Cross platform matters !2. Performance and Standards matter: HTTP/23. Pluggability matters: Interceptors, Name Resolvers,
Auth plugins4. Usability matters !
More lessons
1. Cross language & Cross platform matters !2. Performance and Standards matter: HTTP/23. Pluggability matters: Interceptors, Name Resolvers,
Auth plugins4. Usability matters !
More lessons
Google Cloud Platform
Coverage & Simplicity
The stack should be available on every popular development platform and easy for someone to build for their platform of choice. It should be viable on CPU & memory limited devices.
gRPC Principles & Requirements
http://www.grpc.io/blog/principles
Google Cloud Platform
gRPC Speaks Your Language
● Java● Go● C/C++● C#● Node.js● PHP● Ruby● Python● Objective-C
● MacOS● Linux● Windows● Android● iOS
Service definitions and client libraries Platforms supported
Google Cloud Platform
Interoperability
Java Service
Python Service
GoLang Service
C++ Service
gRPC Service
gRPC Stub
gRPC Stub
gRPC Stub
gRPC Stub
gRPC Service
gRPC Service
gRPC Service
gRPC Stub
1. Cross language & Cross platform matters !2. Performance and Standards matter: HTTP/23. Pluggability matters: Interceptors, Name Resolvers,
Auth plugins4. Usability matters !
More lessons
Google Cloud Platform
• Single TCP connection.
• No Head-of-line blocking.
• Binary framing layer.
• Request –> Stream.
• Header Compression.
HTTP/2 in One Slide
Transport(TCP)
Application (HTTP/2)
Network (IP)
Session (TLS) [optional]
Binary Framing
HEADERS Frame
DATA Frame
HTTP/2
POST: /upload HTTP/1.1 Host: www.javaday.org.ua Content-Type: application/json Content-Length: 27
HTTP/1.x
{“msg”: “Welcome to 2016!”}
Google Cloud Platform
HTTP/2 breaks down the HTTP protocol communication into an exchange of binary-encoded frames, which are then mapped to messages that belong to a stream, and all of which are multiplexed within a single TCP connection.
Binary Framing Stream 1 HEADERS
Stream 2
:method: GET :path: /kyiv :version: HTTP/2 :scheme: https
HEADERS :status: 200 :version: HTTP/2 :server: nginx/1.10.1 ...
DATA
<payload>
Stream N
Request
Response
TCP
Google Cloud Platform
HTTP/1.x vs HTTP/2http://http2.golang.org/gophertileshttp://www.http2demo.io/
Google Cloud Platform
gRPC Service Definitions
Unary RPCs where the client sends a single request to the server and gets a single response back, just like a normal function call.
The client sends a request to the server and gets a stream to read a sequence of messages back. The client reads from the returned stream until there are no more messages.
The client send a sequence of messages to the server using a provided stream. Once the client has finished writing the messages, it waits for the server to read them and return its response.
Client streaming
Both sides send a sequence of messages using a read-write stream. The two streams operate independently. The order of messages in each stream is preserved.
BiDi streamingUnary Server streaming
48
Messaging applications.
Games / multiplayer tournaments.
Moving objects.
Sport results.
Stock market quotes.
Smart home devices.
You name it!
BiDi Streaming Use-Cases
● Open Performance Benchmark and Dashboard ● Benchmarks run in GCE VMs per Pull Request for regression testing.● gRPC Users can run these in their environments.● Good Performance across languages:
○ Java Throughput: 500 K RPCs/Sec and 1.3 M Streaming messages/Sec on 32 core VMs○ Java Latency: ~320 us for unary ping-pong (netperf 120us)○ C++ Throughput: ~1.3 M RPCs/Sec and 3 M Streaming Messages/Sec on 32 core VMs.
Performance
1. Cross language & Cross platform matters !2. Performance and Standards matter: HTTP/23. Pluggability matters: Interceptors, Auth 4. Usability matters !
More lessons
Google Cloud Platform
Pluggable
Large distributed systems need security, health-checking, load-balancing and failover, monitoring, tracing, logging, and so on. Implementations should provide extensions points to allow for plugging in these features and, where useful, default implementations.
gRPC Principles & Requirements
http://www.grpc.io/blog/principles
Google Cloud Platform
Interceptors
Client Server
Request
Response
Client interceptors
Server interceptors
● Auth & Security - TLS [Mutual], Plugin auth mechanism (e.g. OAuth)● Proxies
○ Basic: nghttp2, haproxy, traefik○ Advanced: Envoy, linkerd, Google LB, Nginx (in progress)
● Service Discovery○ etcd, Zookeeper, Eureka, …
● Monitor & Trace○ Zipkin, Prometheus, Statsd, Google, DIY
Pluggability
1. Cross language & Cross platform matters !2. Performance and Standards matter: HTTP/23. Pluggability matters: Interceptors, Auth 4. Usability matters !
More lessons
Get Started
1. Server reflection2. Health Checking3. Automatic retries4. Streaming compression5. Mechanism to do caching6. Binary Logging
a. Debugging, auditing though costly7. Unit Testing support
a. Automated mock testingb. Dont need to bring up all dependent services just to test
8. Web support
Coming soon !
Microservices: in data centres
Streaming telemetry from network devices
Client Server communication/Internal APIs
Some early adopters
Mobile Apps
Thank you!Thank you!
Twitter: @grpcioSite: grpc.ioGroup: [email protected]: github.com/grpc
github.com/grpc/grpc-java github.com/grpc/grpc-go
Q & A
Why gRPC?
Multi-language
9 languages
Open
Open source and growing community
Strict Service contracts
Define and enforce contracts, backward compatible
Performant
1m+ QPS - unary, 3m+ streaming(dashboard)
Pluggable design
Auth, Transport, IDL, LB
Efficiency on wire
2-3X gains
Streaming APIs
Large payloads, speech, logs
Standard compliant
HTTP/2
Easy to use
Single line installation
Google Cloud Platform
The Fallacies of Distributed Computing
The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure
https://blogs.oracle.com/jag/resource/Fallacies.html
Topology doesn't change
There is one administrator
Transport cost is zero
The network is homogeneous
How is gRPC Used?
Direct RPCs :Microservices
On Prem GCP Other
Cloud
How is gRPC Used?
Direct RPCs :Microservices
RPCs to access APIs
Google APIs
Your APIs
On Prem
GCP OtherCloud
How is gRPC Used?
Direct RPCs :Microservices
RPCs to access APIs
Google APIs
Your APIs
Mobile/Web RPCs
Your Mobile/WebApps
On Prem GCP Other
Cloud
Google confidential │ Do not distribute
What are the benefits?
Ease of use
Performance
Versioning
Programming model
Developers
Uniform Monitoring
Debugging/Tracing
Cross platform/language
Operators
Defined Contracts
Single uniform framework for control
Visibility
Architects/Managers
Google Cloud Platform
gRPC Principles & Requirements
Layered
Key facets of the stack must be able to evolve independently. A revision to the wire-format should not disrupt application layer bindings.
http://www.grpc.io/blog/principles
Layered Architecture
Stub
Code Gen’dService API
Code GenSupport Code
Channel API
Transport API
Standard applications
Initialization, interceptors,and advanced applications
Google Cloud Platform
Layered Architecture
HTTP/2
RPC Client-Side App
Channel
Stub Future Stub
Blocking Stub
ClientCall
RPC Server-side Apps
Tran #1 Tran #2 Tran #N
Service Definition(extends generated definition)
ServerCall handler
Transport
ServerCall
NameResolver LoadBalancer
Pluggable Load
Balancing and
Service Discovery
Google Cloud Platform
Takeaways
HTTP/2 is a high performance production-ready multiplexed bidirectional protocol.
gRPC (http://grpc.io): • HTTP/2 transport based, open source, general purpose
standards-based, feature-rich RPC framework.• Bidirectional streaming over one single TCP connection.• Netty transport provides asynchronous and non-blocking I/O.• Deadline and cancellations propagation.• Client- and server-side flow-control.• Layered, pluggable and extensible.• Supports 10 programming languages.• Build-in testing support.• Production-ready (current version is 1.0.1) and growing ecosystem.
Growing Ecosystem
74
Migration.
Testing.
Swagger / OpenAPI tooling.
gRPC Gateway
Photo taken by Andrey Borisenko.
https://github.com/grpc-ecosystem/grpc-gateway
● Protocol Structure○ Request → <Call Spec> <Header Metadata> <Messages>*○ Response → <Header Metadata> <Messages>* <Trailing Metadata> <Status>
● Generic mechanism for attaching metadata to requests and responses● Commonly used to attach “bearer tokens” to requests for Auth
○ OAuth2 access tokens○ JWT e.g. OpenId Connect Id Tokens
● Session state for specific Auth mechanisms is encapsulated in an Auth-credentials object
Metadata and Auth