Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber...

35
Introducing Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1

Transcript of Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber...

Page 1: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

IntroducingJaeger 1.0Yuri Shkuro (Uber Technologies)

CNCF Webinar Series, Jan-16-2018

1

Page 2: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

● What is distributed tracing● Jaeger in a HotROD● Jaeger under the hood● Jaeger v1.0● Roadmap● Project governance, public meetings, contributions● Q & A

Agenda

2

Page 3: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

● Software engineer at Uber○ NYC Observability team

● Founder of Jaeger● Co-author of OpenTracing Specification

About

3

Page 4: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

4

BILLIONS times a day!

Page 5: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

5

How do we knowwhat’s going on?

Page 6: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Metrics / Stats● Counters, timers,

gauges, histograms● Four golden signals

○ utilization○ saturation○ throughput○ errors

● Prometheus, Grafana

We use MONITORING tools

6

Logging● Application events● Errors, stack traces● ELK, Splunk, Sentry

Monitoring tools must “tell stories” about your system

Page 7: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

2017/12/04 21:30:37 scanning error: bufio.Scanner: token too long

How do you debug this?

7

WHAT IS THE CONTEXT?

Page 8: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Metrics and logs don’t cut it anymore!

Metrics and logs are● per-instance● missing the context

It’s like debugging without a stack traceWe need to monitordistributed transactions

8

Page 9: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Distributed Tracing In A Nutshell

9

A

B

C D

E

{context}{context}

{context}{context}

Unique ID → {context}

Edge service

A

B

E

C

D

time

TRACE

SPANS

Page 10: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Let’s look at some tracesdemo time: http://bit.do/jaeger-hotrod

10

Page 11: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

11

performanceand latencyoptimization

distributed transaction monitoring

service dependency

analysis

root cause analysis

distributed context propagation

Distributed Tracing Systems

Page 12: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Jaeger under the hoodArchitecture, etc.

12

Page 13: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

• Inspired by Google’s Dapper and OpenZipkin

• Started at Uber in August 2015

• Open sourced in April 2017

• Official CNCF project since Sep 2017

• Built-in OpenTracing support

• http://jaegertracing.io

Jaeger - /ˈyāɡər/, noun: hunter

13

Page 14: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Community

● 10 full time engineers at Uber and Red Hat

● 80+ contributors on GitHub

● Already used by many organizations

○ including Uber, Symantec, Red Hat, Base CRM,

Massachusetts Open Cloud, Nets, FarmersEdge,

GrafanaLabs, Northwestern Mutual, Zenly

14

Page 15: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Technology Stack

● Backend components in Go● Pluggable storage

○ Cassandra, Elasticsearch, memory, ...● Web UI in React/Javascript● OpenTracing instrumentation libraries

15

Page 16: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Architecture

16

Host or Container

Application

Instrumentation

OpenTracing API

jaeger-client

jaeger-agent (Go)

jaeger-collector(Go)

memory queue

Data Store(Cassandra)

jaeger-query(Go)

jaeger-ui(React)

Control Flow

TraceReporting

Thrift overTChannel

Control Flow Trace ReportingThrift over UDP

Adaptive Sampling

data miningpipeline

Page 17: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Data model

17

Page 18: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Understanding Sampling

Tracing data can exceed business traffic.

Most tracing systems sample transactions:

● Head-based sampling: the sampling decision is made just before the trace is started, and it is respected by all nodes in the graph

● Tail-based sampling: the sampling decision is made after the trace is completed / collected

18

Page 19: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Jaeger 1.0Released 06-Dec-2017

19

Page 20: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Jaeger 1.0 Highlights

Announcement: http://bit.do/jaeger-v1 ● Multiple storage backends● Various UI improvements● Prometheus metrics by default● Templates for Kubernetes deployment

○ Also a Helm chart● Instrumentation libraries● Backwards compatibility with Zipkin

20

Page 21: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Official● Cassandra 3.4+● Elasticsearch 5.x, 6.x● Memory storage

Experimental (by community)● InfluxDB, ScyllaDB, AWS DynamoDB, …● https://github.com/jaegertracing/jaeger/issues/638

Multiple storage backends

21

Page 22: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

● Improved performance in all screens

● Viewing large traces (e.g. 80,000 spans)

● Keyboard navigation

● Minimap navigation, zooming in & out

● Top menu customization

Jaeger UI

22

Page 23: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Zipkin drop-in replacement

Collector can accept Zipkin spans:

• JSON v1/v2 and Thrift over HTTP• Kafka transport not supported yet

Clients:

• B3 propagation• Jaeger clients in Zipkin environment

23

Page 24: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

● Metrics○ --metrics-backend

■ prometheus (default), expvar

○ --metrics-http-route■ /metrics (default)

● Scraping Endpoints○ Query service - API port 16686○ Collector - HTTP API port 14268○ Agent - sampler port 5778

Monitoring

24

Page 25: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

RoadmapThings we are working on

25

Page 26: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

● APIs have endpoints with different QPS

● Service owners do not know the full impact of

sampling probability

Adaptive Sampling is per service + endpoint,

decided by Jaeger backend based on traffic

Adaptive Sampling

26

Page 27: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

● Based on Kafka and Apache Flink

● Support aggregations and data mining

● Examples:○ Pairwise dependency graph

○ Path-based, per endpoint dependency graph

○ Latency histograms by upstream caller

Data Pipeline

27

Page 28: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Service Dependency Graph

Page 29: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Does Dingo Depend on Dog?

29

Page 30: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Latency Histogram

30

Page 31: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Project & CommunityContributors are welcome

31

Page 32: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Contributing

32

Page 33: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Contributing

• Agree to the Certificate of Origin• Sign all commits (git commit -s)• Test coverage cannot go ↓ (backend - 100%)• Plenty of work to go around

– Backend– Client libraries– Kubernetes templates– Documentation

33

Page 34: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

References

• GitHub: https://github.com/jaegertracing

• Chat: https://gitter.im/jaegertracing/

• Mailing List - [email protected]

• Blog: https://medium.com/jaegertracing

• Twitter: https://twitter.com/JaegerTracing

• Bi-Weekly Online Community Meetings34

Page 35: Jaeger 1.0 Introducing - Cloud Native Computing Foundation · Jaeger 1.0 Yuri Shkuro (Uber Technologies) CNCF Webinar Series, Jan-16-2018 1 What is distributed tracing Jaeger in a

Q & AOpen Discussion

35