Distributed Tracing Velocity2016

Distributed Tracing How to do latency analysis for

microservice-based applications

Reshmi Krishna@reshmi9k

About Me

Software Engineer

Platform Architect, Pivotal

Women In Tech Community Members

Twitter : @reshmi9kMeetUp : Cloud-Native-New-York

http://www.meetup.com/Cloud-Native-New-York/

Agenda Distributed Tracing

Tracers and Tracing Systems

Zipkin

Incorporating distributed tracing into an existing micro service

Demo

From Monolith ….

Customer

Loyalty

Notifications

Payment

Web Frontend

To Microservices .

Troubleshooting Latency issues When was the event? How long did it take?

How do I know it was slow?

Why did it take so long?

Which microservice was responsible?

Distributed Tracing Distributed Tracing is a process of collecting end-to-end transaction graphs in near real

time

A trace represents the entire journey of a request

A span represents single operation call

Distributed Tracing Systems are often used for this purpose. Zipkin is an example

As a request is flowing from one microservice to another, tracers add logic to create unique trace Id, span Id

Visualization - Traces & Spans

UITrace Id : 1, Span Id : 1

Account-MicroserviceTrace Id : 1, Parent Id : 2, Span Id : 5

Back-Office-MicroserviceTrace Id : 1, Parent Id : 1, Span Id : 2

Customer-MicroserviceTrace Id : 1, Parent Id : 2, Span Id : 4

Dapper Paper By Google

@reshmi9k

@reshmi9k

This paper described Dapper, which is Google’s production distributed systems tracing infrastructure

Design Goals :Low overheadApplication-level transparencyScalability

Zipkin

Zipkin is a distributed tracing system

Implementation based on Dapper paper, Google

Aggregate spans into trace trees

Manages both collection and lookup of the data

In 2015, OpenZipkin became the primary fork

Initial Zipkin Architecture

Tracers

Tracers add logic to create unique trace ID Trace ID is generated when the first request is made Span ID is generated as the request arrives at each microservice Example tracer is Spring Cloud Sleuth Tracers execute in your production apps! They are written to not log too much Tracers have instrumentation or sampling policy

Demo : Architecture Diagram

Spring Cloud Sleuth

Collector

Span Store

TransportMq/Http/Log

Spring Cloud Sleuth

Spring Cloud Sleuth

Spring Cloud Sleuth

QueryServerZipkin UI

ZIPKIN

APP

APP

APP

APP

Let’s look at some code & Demo

Summary Distributed tracing allows you to quickly see latency issues in your system

Zipkin is a great tool to visualize the latency graph and system dependencies

Spring Cloud Sleuth integrates with Zipkin and grants you log correlation

Log correlation allows you to match logs for a given trace

Pivotal Cloud Foundry makes integration of your apps and Spring Cloud Sleuth and Zipkin easier

Links Dapper, Google : http://research.google.com/pubs/pub36356.html Code for this presentation : https://github.com/reshmik/DistributedTracingDemo_Velocity2016.git Sleuth’s documentation: http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html Repo with Spring Boot Zipkin server: https://github.com/openzipkin/zipkin-reporter-java.git Zipkin deployed as an PCF :https://github.com/reshmik/Zipkin/tree/master/spring-cloud-sleuth-

samples/spring-cloud-sleuth-sample-zipkin-stream Pivotal Web Services trial : https://run.pivotal.io/ PivotalCloudFoundry on your laptop : https://docs.pivotal.io/pcf-dev/

@reshmi9k

http://research.google.com/pubs/pub36356.html

http://research.google.com/pubs/pub36356.html

https://github.com/reshmik/DistributedTracingDemo_Velocity2016.git

https://github.com/reshmik/DistributedTracingDemo_Velocity2016.git

http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html

Distributed Tracing Velocity2016

Technology

Transcript of Distributed Tracing Velocity2016