Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

75
Google confidential Do not distribute Google confidential Do not distribute Bringing learnings from Googley microservices with gRPC Microservices Summit Varun Talwar

Transcript of Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Page 1: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google confidential │ Do not distribute Google confidential │ Do not distribute

Bringing learnings from Googley microservices with gRPC

Microservices Summit

Varun Talwar

Page 2: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Contents1. Context: Why are we here?2. Learnings from Stubby experience

a. HTTP/JSON doesnt cut itb. Establish a Lingua Francac. Design for fault tolerance and control: Sync/Async, Deadlines, Cancellations, Flow controld. Flying blind without stats e. Diagnosing with tracingf. Load Balancing is critical

3. gRPCa. Cross platform matters !b. Performance and Standards matter: HTTP/2c. Pluggability matters: Interceptors, Name Resolvers, Auth pluginsd. Usability matters !

Page 3: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

CONTEXTWHY ARE WE HERE?

Page 4: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Business Agility

Page 5: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Developer Productivity

Page 6: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Performance

Page 7: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

INTRODUCING STUBBY

Page 8: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google confidential │ Do not distribute

Microservices at Google ~O(1010) RPCs per second.

Images by Connie Zhou

Page 9: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Stubby Magic @ Google

Page 10: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Making Google magic available to all

Kubernetes

Borg

Stubby

Page 11: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

LEARNINGS FROM

STUBBY

Page 12: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Key learnings

1. HTTP/JSON doesnt cut it !2. Establish a lingua franca3. Design for fault tolerance and provide control knobs4. Dont fly blind: Service Analytics5. Diagnosing problems: Tracing6. Load Balancing is critical

Page 13: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

HTTP/JSON doesn’t cut it !

1. WWW, browser growth - bled into services2. Stateless3. Text on the wire4. Loose contracts5. TCP connection per request6. Nouns based7. Harder API evolution8. Think compute, network on cloud platforms

1

Page 14: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Establish a lingua franca

1. Protocol Buffers - Since 2003.2. Start with IDL3. Have a language agnostic way of agreeing on data semantics4. Code Gen in various languages5. Forward and Backward compatibility6. API Evolution

2

Page 15: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

How we roll at Google

Page 16: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

Service Definition (weather.proto)

syntax = "proto3";

service Weather { rpc GetCurrent(WeatherRequest) returns (WeatherResponse);}

message WeatherRequest { Coordinates coordinates = 1;

message Coordinates { fixed64 latitude = 1; fixed64 longitude = 2; }}

message WeatherResponse { Temperature temperature = 1; float humidity = 2;}

message Temperature { float degrees = 1; Units units = 2;

enum Units { FAHRENHEIT = 0; CELSIUS = 1; KELVIN = 2; }}

Page 17: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Design for fault tolerance and control

● Sync and Async APIs

● Need fault tolerance: Deadlines, Cancellations

● Control Knobs: Flow control, Service Config, Metadata

3

Page 18: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

18

First-class feature in gRPC.

Deadline is an absolute point in time.

Deadline indicates to the server how long the client is willing to wait for an answer.

RPC will fail with DEADLINE_EXCEEDED status code when deadline reached.

gRPC Deadlines

Page 19: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

Deadline Propagation

Gateway

90 ms

Now = 1476600000000

Deadline =

1476600000200

40 ms

20 ms

20 ms 60 ms

withDeadlineAfter(200, MILLISECONDS)

Now = 1476600000040

Deadline =

1476600000200

Now = 1476600000150

Deadline =

1476600000200

Now = 1476600000230

Deadline =

1476600000200

DEADLINE_EXCEEDED DEADLINE_EXCEEDED DEADLINE_EXCEEDED DEADLINE_EXCEEDED

Page 20: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

20

Deadlines are expected.

What about unpredictable cancellations?

• User cancelled request.

• Caller is not interested in the result any more.

• etc

Cancellation?

Page 21: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

Cancellation?

GW

Busy Busy Busy

Busy Busy Busy

Busy Busy Busy

Active RPC Active RPC

Active RPC

Active RPC Active RPCActive RPC

Active RPC Active RPC

Active RPC

Page 22: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

Cancellation Propagation

GW

Idle Idle Idle

Idle Idle Idle

Idle Idle Idle

Page 23: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

23

Automatically propagated.

RPC fails with CANCELLED status code.

Cancellation status be accessed by the receiver.

Server (receiver) always knows if RPC is valid!

Cancellation

Page 24: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

BiDi Streaming - Slow Client

Fast Server

Request

Responses

Slow Client

CANCELLEDUNAVAILABLE

RESOURCE_EXHAUSTED

Page 25: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

BiDi Streaming - Slow Server

Slow ServerRequest

Response

Fast Client

CANCELLEDUNAVAILABLE

RESOURCE_EXHAUSTED

Requests

Page 26: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

26

Flow-control helps to balance computing power and network capacity between client and server.

gRPC supports both client- and server-side flow control.

Flow-Control

Photo taken by Andrey Borisenko.

Page 27: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

27

Policies where server tells client what they should do

Can specify deadlines, lb policy, payload size per method of a service

Loved by SREs, they have more control

Discovery via DNS

Service Config

Page 28: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Metadata Exchange - Common cross-cutting concerns

like authentication or tracing rely on the exchange of

data that is not part of the declared interface of a

service. Deployments rely on their ability to evolve these

features at a different rate to the individual APIs

exposed by services.

Metadata helps in exchange of useful information

Page 29: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Don’t fly blind: Stats4

● What is the mean latency time per RPC?● How many RPCs per hour for a service?● Errors in last minute/hour?● How many bytes sent? How many connections to my server?

Page 30: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Data collection by arbitrary metadata is useful● Any service’s resource usage and performance stats in real time by (almost)

any arbitrary metadata1. Service X can monitor CPU usage in their jobs broken down by the name of the invoked RPC

and the mdb user who sent it.2. Social can monitor the RPC latency of shared bigtable jobs when responding to their requests,

broken down by whether the request originated from a user on web/Android/iOS.3. Gmail can collect usage on servers, broken down by according POP/IMAP/web/Android/iOS.

Layer propagates Gmail's metadata down to every service, even if the request was made by an intermediary job that Gmail doesn't own

● Stats layer export data to varz and streamz, and provides stats to many monitoring systems and dashboards

Page 31: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Diagnosing problems: Tracing5

● 1/10K requests takes very long. Its an ad query :-) I need to find out.● Take a sample and store in database; help identify request in sample which

took similar amount of time

● I didnt get a response from the service. What happened? Which link in the service dependency graph got stuck? Stitch a trace and figure out.

● Where is it taking time for a trace? Hotspot analysis● What all are the dependencies for a service?

Page 32: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Load Balancing is important !5

Iteration 1: Stubby BalancerIteration 2: Client side load balancingIteration 3: HybridIteration 4: gRPC-lb

Page 33: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

● Current client support intentionally dumb (simplicity).○ Pick first available - Avoid connection establishment latency○ Round-robin-over-list - Lists not sets → ability to represent weights

● For anything more advanced, move the burden to an external "LB Controller", a regular gRPC server and rely on a client-side implementation of the so-called gRPC LB policy.

client LB Controller

backends

1) Control RPC

2) address-list

3) RR over addresses of address-list

gRPC LB

Next gen of load balancing

Page 34: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

In summary, what did we learn● Contracts should be strict● Common language helps● Common understanding for deadlines, cancellations, flow control● Common stats/tracing framework is essential for monitoring, debugging● Common framework lets uniform policy application for control and lb

Single point of integration for logging, monitoring, tracing, service discovery and load balancing makes lives much easier !

Page 35: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

INTRODUCING gRPC

Page 36: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Open source on Github for C, C++, Java, Node.js, Python, Ruby, Go, C#, PHP, Objective-C

gRPC core

gRPC Java

gRPC Go

Page 37: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

● 1.0 with stable APIs● Well documented with an active community● Reliable with continuous running tests on GCE

○ Deployable in your environment

● Measured with an open performance dashboard○ Deployable in your environment

● Well adopted inside and outside Google

Where is the project today?

Page 38: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

1. Cross language & Cross platform matters !2. Performance and Standards matter: HTTP/23. Pluggability matters: Interceptors, Name Resolvers,

Auth plugins4. Usability matters !

More lessons

Page 39: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

1. Cross language & Cross platform matters !2. Performance and Standards matter: HTTP/23. Pluggability matters: Interceptors, Name Resolvers,

Auth plugins4. Usability matters !

More lessons

Page 40: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

Coverage & Simplicity

The stack should be available on every popular development platform and easy for someone to build for their platform of choice. It should be viable on CPU & memory limited devices.

gRPC Principles & Requirements

http://www.grpc.io/blog/principles

Page 41: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

gRPC Speaks Your Language

● Java● Go● C/C++● C#● Node.js● PHP● Ruby● Python● Objective-C

● MacOS● Linux● Windows● Android● iOS

Service definitions and client libraries Platforms supported

Page 42: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

Interoperability

Java Service

Python Service

GoLang Service

C++ Service

gRPC Service

gRPC Stub

gRPC Stub

gRPC Stub

gRPC Stub

gRPC Service

gRPC Service

gRPC Service

gRPC Stub

Page 43: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

1. Cross language & Cross platform matters !2. Performance and Standards matter: HTTP/23. Pluggability matters: Interceptors, Name Resolvers,

Auth plugins4. Usability matters !

More lessons

Page 44: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

• Single TCP connection.

• No Head-of-line blocking.

• Binary framing layer.

• Request –> Stream.

• Header Compression.

HTTP/2 in One Slide

Transport(TCP)

Application (HTTP/2)

Network (IP)

Session (TLS) [optional]

Binary Framing

HEADERS Frame

DATA Frame

HTTP/2

POST: /upload HTTP/1.1 Host: www.javaday.org.ua Content-Type: application/json Content-Length: 27

HTTP/1.x

{“msg”: “Welcome to 2016!”}

Page 45: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

HTTP/2 breaks down the HTTP protocol communication into an exchange of binary-encoded frames, which are then mapped to messages that belong to a stream, and all of which are multiplexed within a single TCP connection.

Binary Framing Stream 1 HEADERS

Stream 2

:method: GET :path: /kyiv :version: HTTP/2 :scheme: https

HEADERS :status: 200 :version: HTTP/2 :server: nginx/1.10.1 ...

DATA

<payload>

Stream N

Request

Response

TCP

Page 46: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

HTTP/1.x vs HTTP/2http://http2.golang.org/gophertileshttp://www.http2demo.io/

Page 47: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

gRPC Service Definitions

Unary RPCs where the client sends a single request to the server and gets a single response back, just like a normal function call.

The client sends a request to the server and gets a stream to read a sequence of messages back. The client reads from the returned stream until there are no more messages.

The client send a sequence of messages to the server using a provided stream. Once the client has finished writing the messages, it waits for the server to read them and return its response.

Client streaming

Both sides send a sequence of messages using a read-write stream. The two streams operate independently. The order of messages in each stream is preserved.

BiDi streamingUnary Server streaming

Page 48: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

48

Messaging applications.

Games / multiplayer tournaments.

Moving objects.

Sport results.

Stock market quotes.

Smart home devices.

You name it!

BiDi Streaming Use-Cases

Page 49: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

● Open Performance Benchmark and Dashboard ● Benchmarks run in GCE VMs per Pull Request for regression testing.● gRPC Users can run these in their environments.● Good Performance across languages:

○ Java Throughput: 500 K RPCs/Sec and 1.3 M Streaming messages/Sec on 32 core VMs○ Java Latency: ~320 us for unary ping-pong (netperf 120us)○ C++ Throughput: ~1.3 M RPCs/Sec and 3 M Streaming Messages/Sec on 32 core VMs.

Performance

Page 50: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

1. Cross language & Cross platform matters !2. Performance and Standards matter: HTTP/23. Pluggability matters: Interceptors, Auth 4. Usability matters !

More lessons

Page 51: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

Pluggable

Large distributed systems need security, health-checking, load-balancing and failover, monitoring, tracing, logging, and so on. Implementations should provide extensions points to allow for plugging in these features and, where useful, default implementations.

gRPC Principles & Requirements

http://www.grpc.io/blog/principles

Page 52: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

Interceptors

Client Server

Request

Response

Client interceptors

Server interceptors

Page 53: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

● Auth & Security - TLS [Mutual], Plugin auth mechanism (e.g. OAuth)● Proxies

○ Basic: nghttp2, haproxy, traefik○ Advanced: Envoy, linkerd, Google LB, Nginx (in progress)

● Service Discovery○ etcd, Zookeeper, Eureka, …

● Monitor & Trace○ Zipkin, Prometheus, Statsd, Google, DIY

Pluggability

Page 54: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

1. Cross language & Cross platform matters !2. Performance and Standards matter: HTTP/23. Pluggability matters: Interceptors, Auth 4. Usability matters !

More lessons

Page 55: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Get Started

Page 56: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

1. Server reflection2. Health Checking3. Automatic retries4. Streaming compression5. Mechanism to do caching6. Binary Logging

a. Debugging, auditing though costly7. Unit Testing support

a. Automated mock testingb. Dont need to bring up all dependent services just to test

8. Web support

Coming soon !

Page 57: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Microservices: in data centres

Streaming telemetry from network devices

Client Server communication/Internal APIs

Some early adopters

Mobile Apps

Page 58: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Thank you!Thank you!

Twitter: @grpcioSite: grpc.ioGroup: [email protected]: github.com/grpc

github.com/grpc/grpc-java github.com/grpc/grpc-go

Page 59: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Q & A

Page 60: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Why gRPC?

Multi-language

9 languages

Open

Open source and growing community

Strict Service contracts

Define and enforce contracts, backward compatible

Performant

1m+ QPS - unary, 3m+ streaming(dashboard)

Pluggable design

Auth, Transport, IDL, LB

Efficiency on wire

2-3X gains

Streaming APIs

Large payloads, speech, logs

Standard compliant

HTTP/2

Easy to use

Single line installation

Page 61: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

The Fallacies of Distributed Computing

The network is reliable

Latency is zero

Bandwidth is infinite

The network is secure

https://blogs.oracle.com/jag/resource/Fallacies.html

Topology doesn't change

There is one administrator

Transport cost is zero

The network is homogeneous

Page 62: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
Page 63: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
Page 64: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google
Page 65: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

How is gRPC Used?

Direct RPCs :Microservices

On Prem GCP Other

Cloud

Page 66: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

How is gRPC Used?

Direct RPCs :Microservices

RPCs to access APIs

Google APIs

Your APIs

On Prem

GCP OtherCloud

Page 67: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

How is gRPC Used?

Direct RPCs :Microservices

RPCs to access APIs

Google APIs

Your APIs

Mobile/Web RPCs

Your Mobile/WebApps

On Prem GCP Other

Cloud

Page 68: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google confidential │ Do not distribute

What are the benefits?

Ease of use

Performance

Versioning

Programming model

Developers

Uniform Monitoring

Debugging/Tracing

Cross platform/language

Operators

Defined Contracts

Single uniform framework for control

Visibility

Architects/Managers

Page 69: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

gRPC Principles & Requirements

Layered

Key facets of the stack must be able to evolve independently. A revision to the wire-format should not disrupt application layer bindings.

http://www.grpc.io/blog/principles

Page 70: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Layered Architecture

Stub

Code Gen’dService API

Code GenSupport Code

Channel API

Transport API

Standard applications

Initialization, interceptors,and advanced applications

Page 71: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

Layered Architecture

HTTP/2

RPC Client-Side App

Channel

Stub Future Stub

Blocking Stub

ClientCall

RPC Server-side Apps

Tran #1 Tran #2 Tran #N

Service Definition(extends generated definition)

ServerCall handler

Transport

ServerCall

NameResolver LoadBalancer

Pluggable Load

Balancing and

Service Discovery

Page 72: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Google Cloud Platform

Takeaways

HTTP/2 is a high performance production-ready multiplexed bidirectional protocol.

gRPC (http://grpc.io): • HTTP/2 transport based, open source, general purpose

standards-based, feature-rich RPC framework.• Bidirectional streaming over one single TCP connection.• Netty transport provides asynchronous and non-blocking I/O.• Deadline and cancellations propagation.• Client- and server-side flow-control.• Layered, pluggable and extensible.• Supports 10 programming languages.• Build-in testing support.• Production-ready (current version is 1.0.1) and growing ecosystem.

Page 73: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

Growing Ecosystem

Page 74: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

74

Migration.

Testing.

Swagger / OpenAPI tooling.

gRPC Gateway

Photo taken by Andrey Borisenko.

https://github.com/grpc-ecosystem/grpc-gateway

Page 75: Bringing Learnings from Googley Microservices with gRPC - Varun Talwar, Google

● Protocol Structure○ Request → <Call Spec> <Header Metadata> <Messages>*○ Response → <Header Metadata> <Messages>* <Trailing Metadata> <Status>

● Generic mechanism for attaching metadata to requests and responses● Commonly used to attach “bearer tokens” to requests for Auth

○ OAuth2 access tokens○ JWT e.g. OpenId Connect Id Tokens

● Session state for specific Auth mechanisms is encapsulated in an Auth-credentials object

Metadata and Auth