Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935:...

59
Hardware Latencies How to flush them out (A use case) Steven Rostedt Red Hat

Transcript of Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935:...

Page 1: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Hardware LatenciesHow to flush them out

(A use case)

Steven RostedtRed Hat

Page 2: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Here’s a story, of a lovely lady...

● No this isn’t the Brady Bunch– Nor is it about a lovely lady

– But it probably could have been a Brady Bunch episode.

Page 3: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Here’s a story, of an upset customer

● Who were seeing lots of latencies on their own

● The machine wasn’t verified yet● Real time requires not just a kernel

– Requires the entire spectrum● Application● Kernel● Hard ware

Page 4: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Here’s a story, of an upset customer

● Who were seeing lots of latencies on their own

● The machine wasn’t verified yet● Real time requires not just a kernel

– Requires the entire spectrum● Application● Kernel● Hard ware

Page 5: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Verification of Hardware

● rteval– A tool by Red Hat to stress the machine

– Measures jitter (using cyclictest)

● Was a large machine– 40 CPUs

– For such a box, we expect no more than● 200us jitter

– Like less, but we are lenient with large HW

Page 6: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Latencies

● Seeing 500us latencies!!!!– May not sound big to you

– But it's huge for PREEMPT_RT

● Took a while to hit that● Was it HW? SW?

– We control the app (rteval)

– Of course I blamed the HW ;-)

– Of course the HW vendor blamed SW

Page 7: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

The Enemy● 500 microsecond latency

The Weapons● Function tracing● Latency tracers● HW Lat detector● Event tracing● trace_printk()

Page 8: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Function Tracing

● echo function > current_tracer● echo function_graph > current_tracer● trace-cmd is nicer

– trace-cmd start -p function_graph

– trace-cmd stop

– trace-cmd extract

– trace-cmd report

Page 9: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

rteval

● hackbench● kernel builds● cyclictest● rteval --duration=100h

Page 10: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

rteval

● Breaking it up– rteval --onlyload --duration=100h

● Does not run cyclictest

– Run cyclictest separately

Page 11: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

cyclictest● cyclictest --numa -p95 -d0 -i100 -qm

– numa implies -a -t -n● a - bind a task to each CPU● t - thread per CPU● n - use nanosleep() not signals

– p95 - set priority to 95

– d0 - all threads run same interval

– i100 - sleep for 100 us

– q - quiet - don't show status during test

– m - mlockall memory

Page 12: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

cyclictest● cyclictest --numa -p95 -d0 -i100 -qm -b 200

– b 200 - break after 200 us latency

– implies running function tracer

– Stops tracer on latency too

● Function tracing adds a lot of overhead!● cyclictest --numa -p95 -d0 -i100 -qm -b 1000

– increase breakpoint by a lot!

Page 13: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

cyclictest● Function tracing adds too much overhead● cyclictest --numa -p95 -d0 -i100 -qm -b 200 -E

– E - uses event tracing instead of function

– Better with latencies

– Not as much info

Page 14: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

cyclictest● Limit function tracing with trace-cmd

– trace-cmd start -p function -n '*lock*

– trace-cmd start -p function -l '*sched*''

● cyclictest --numa -p95 -d0 -i100 -qm -b 300

Page 15: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Latency Tracers

● wakeup-rt– Ignore wakeup tracer

● preemptirqsoff– Just ignore the:

● irqsoff● preemptoff

Page 16: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Wakeup-rt● trace-cmd start -p wakeup-rt● Records the time of the highest -rt task

– From wake up to schedule

● Problems– defaults running function tracer

– trace-cmd start -p wakeup-rt -d● disables function tracing

– Not much info without functions

– trace-cmd start -p wakeup-rt -d -e all● enables all events

Page 17: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Wakeup-rt

● Didn't help :-(– Not enough info with events

– Function tracing caused latencies● Hard to determine if latency was real or

Heisenbug

Page 18: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

preemptirqsoff

● trace-cmd start -p preemptirqsoff -e all -d● Showed us issues with the scheduler● Pointed to load balancing

– but that was a symptom not the cause

Page 19: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Modified cyclictest

● Changed to use function graph instead of function

● trace-cmd -p function -l load_balance

<idle>-0 [000] 60085.036305: function: load_balance<idle>-0 [001] 60085.036305: function: load_balance<idle>-0 [000] 60085.036306: function: load_balance

Page 20: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Modified cyclictest

● trace-cmd -p function_graph -l load_balance– Much more useful

<idle>-0 [002] 60305.482591: 0.795 us | load_balance();<idle>-0 [003] 60305.482591: 1.035 us | load_balance();<idle>-0 [002] 60305.482593: 0.978 us | load_balance();<idle>-0 [003] 60305.482593: 0.456 us | load_balance();

Page 21: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Latency without Load Balance?

● Hit a latency, and load balance wasn't called?

● PREEMPT_RT converts spinlocks to mutex– except for raw_spin_locks!

● trace-cmd start -p function_graph \

-l '*raw_spin_lock*'

<idle>-0 24dN.10 111214.800190: funcgraph_entry: ! 235.991 us | _raw_spin_lock_irqsave();

Page 22: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Latency without Load Balance?

● Hit a latency, and load balance wasn't called?

● PREEMPT_RT converts spinlocks to mutex– except for raw_spin_locks!

● trace-cmd start -p function_graph \

-l '*raw_spin_lock*'

entry: ! 235.991 us | _ra

Page 23: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Latency without Load Balance?

● Hit a latency, and load balance wasn't called?

● PREEMPT_RT converts spinlocks to mutex– except for raw_spin_locks!

● trace-cmd start -p function_graph \

-l '*raw_spin_lock*'

entry: ! 235.991 us | _ra

Page 24: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Graph vs Function Tracing

● graph gives you time of function held● function tracing can give you backtrace

– trace-cmd -p function -l 'raw_spin*' --func-stack

trace-cmd-8725 [002] 148276.692827: function: _raw_spin_lock_irq trace-cmd-8725 [002] 148276.692830: kernel_stack: <stack trace>=> __schedule (ffffffff8146d08f)=> schedule (ffffffff8146dd09)=> do_nanosleep (ffffffff8146c7ec)=> hrtimer_nanosleep (ffffffff8106eecb)=> sys_nanosleep (ffffffff8106f00e)=> system_call_fastpath (ffffffff81476692)

Page 25: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

What to do?

● Keep function graph● Add events● All events added their own latencies

– Limit the events to trace

● trace-cmd start -p function_graph -l '*raw_spin_lock*' -e sched -e timer -e irq

Page 26: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Long story short

● Found latency● rq lock contention in pull_rt_tasks● 30 or more CPUs tried to take the same lock● Between cache line bouncing and locking

the bus, caused a large HW latency– but you can still blame SW

● Fixed by doing IPIs instead

Page 27: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Pull RT Tasks

CPU 0 CPU 1 CPU 2 CPU 40

cyclic testprio 90

cyclic testprio 90

cyclic testprio 90

cyclic testprio 90

Page 28: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Pull RT Tasks

CPU 0 CPU 1 CPU 2 CPU 40

cyclic testprio 90

cyclic testprio 90

watchdogprio 99

cyclic testprio 90

cyclic testprio 90

Page 29: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Pull RT Tasks

CPU 0 CPU 1 CPU 2 CPU 40

cyclic testprio 90

cyclic testprio 90

cyclic testprio 90

cyclic testprio 90

Page 30: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Pull RT Tasks

CPU 0 CPU 1 CPU 2 CPU 40

cyclic testprio 90

cyclic testprio 90

cyclic testprio 90

cyclic testprio 90

irq threadprio 50

Page 31: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Pull RT Tasks

CPU 0 CPU 1 CPU 2 CPU 40

<idle> <idle>cyclic test

prio 90 <idle>

irq threadprio 50

Page 32: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

The Finding Nemo Seagull Effect!

Mine

Page 33: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Pull RT TasksThe Finding Nemo Seagull Effect

CPU 0 CPU 1 CPU 2 CPU 40

<idle> <idle>cyclic test

prio 90 <idle>

irq threadprio 50

Page 34: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Pull RT TasksThe Finding Nemo Seagull Effect

CPU 0 CPU 1 CPU 2 CPU 40

<idle> <idle>cyclic test

prio 90 <idle>

irq threadprio 50

Mine

Mine

Mine

Page 35: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Pull RT TasksIPI to push task

CPU 0 CPU 1 CPU 2 CPU 40

<idle> <idle>cyclic test

prio 90 <idle>

irq threadprio 50

Page 36: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Pull RT TasksIPI to push task

CPU 0 CPU 1 CPU 2 CPU 40

<idle> <idle>cyclic test

prio 90 <idle>

irq threadprio 50

IPI

IPI

IPI

Page 37: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Pull RT TasksIPI to push task

CPU 0 CPU 1 CPU 2 CPU 40

<idle>cyclic test

prio 90 <idle>irq thread

prio 50

Page 38: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

The End?

● Looked like we found our bug!● Started verification process● Told everyone things will be verified shortly

Page 39: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Nope!

● Passed a 12 hour run● Failed a 24 hour run● Lets start again!

Page 40: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

HW Lat Detector

● Hard ware latency detector● Runs periodic stop machine

– Define a period and run

– run != period● system will lock up

● Spins looking for latency

Page 41: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

HW Lat Issue

While (now - start < perido) {tmp = timestamp();now = timestamp();diff = now - tmp;if (diff > thresh)

record();}

Page 42: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

HW Lat Issue

While (now - start < perido) {tmp = timestamp();now = timestamp();diff = now - tmp;if (diff > thresh)

record();}

20%

80%

Page 43: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

HW Lat Issue

Last = 0;While (now - start < perido) {

tmp = timestamp();now = timestamp();if (last) {

diff = tmp - last;if (diff > outer_thresh)

record_outer();}last = now;diff = now - tmp;if (diff > thresh)

record();}

Page 44: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

HW Lat DetectorStop Machine

● Needs to run periodically– Will lock up the system otherwise

– Has chance to miss latency again!

● Changed to a thread– Thread takes up one of the CPUs

– Still needs to yield● Locks up machine otherwise

– But yield is much smaller than periodic● More likely to measure latency

Page 45: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

HW Lat DetectorWorked!

● But not good enough● Vendor did not trust this code ???● Had to use their code

– Did somewhat the same thing

– In userspace

– Could easily miss latencies

Page 46: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

trace-cmd

● trace-cmd start -p function_graph -l ‘raw_spin*’ -e all

● Modified cyclictest to use function graph– Still limited to raw_spin* locks

– Wont disable the events started

● cyclictest will still stop the trace on latency

Page 47: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry: timer=0xffff88403f0ce520 function=delayed_work ksoftirqd/33-216 [033] 55597.719935: funcgraph_entry: 0.069 us | _raw_spin_lock_irqsave(); ksoftirqd/33-216 [033] 55597.719936: funcgraph_entry: 0.047 us | _raw_spin_lock(); ksoftirqd/33-216 [033] 55597.719936: sched_stat_sleep: comm=kworker/33:1 pid=1222 delay=132870067 ksoftirqd/33-216 [033] 55597.719937: sched_wakeup: kworker/33:1:1222 [120] success=1 CPU:033 ksoftirqd/33-216 [033] 55597.719937: timer_expire_exit: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719942: timer_cancel: timer=0xffff88403f0ce620 ksoftirqd/33-216 [033] 55597.719942: timer_expire_entry: timer=0xffff88403f0ce620 function=delayed_work ksoftirqd/33-216 [033] 55597.719943: funcgraph_entry: 0.045 us | _raw_spin_lock_irqsave(); ksoftirqd/33-216 [033] 55597.719943: timer_expire_exit: timer=0xffff88403f0ce620 cyclictest-6110 [007] 55597.719955: funcgraph_entry: 0.194 us | _raw_spin_lock(); cyclictest-6110 [007] 55597.719956: funcgraph_entry: 0.175 us | _raw_spin_lock_irqsave(); cyclictest-6110 [007] 55597.719957: funcgraph_entry: 0.175 us | _raw_spin_lock_irqsave(); cyclictest-6113 [010] 55597.719957: funcgraph_entry: 2.436 us | _raw_spin_lock(); cyclictest-6110 [007] 55597.719957: funcgraph_entry: 0.203 us | _raw_spin_lock(); cyclictest-6110 [007] 55597.719958: sched_wakeup: cyclictest:6113 [4] success=1 CPU:010 cyclictest-6110 [007] 55597.719959: funcgraph_entry: 0.048 us | _raw_spin_lock_irqsave(); cyclictest-6113 [010] 55597.719960: funcgraph_entry: 0.170 us | _raw_spin_lock_irq(); cyclictest-6113 [010] 55597.719961: funcgraph_entry: 0.045 us | _raw_spin_lock_irqsave(); cyclictest-6113 [010] 55597.719962: funcgraph_entry: 0.043 us | _raw_spin_lock_irq(); cyclictest-6110 [007] 55597.719963: print: ffffffff810e5776 hit latency threshold (247 > 200)

Page 48: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

trace-cmd

● trace-cmd report– Lots of information

– Detailed information

– Great to analyze

● TOO MUCH INFO!– Can not understand it all

– Hard to see the big picture

● KernelShark

Page 49: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

kernelshark

Page 50: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

kernelshark

Page 51: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

kernelshark

Page 52: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

kernelshark

Page 53: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

kernelshark

Page 54: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

kernelshark

Page 55: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

kernelshark

Page 56: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

kernelshark

Page 57: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Demo

Page 58: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Questions?

Page 59: Hardware Latencies How to flush them out (A use case)trace-cmd ksoftirqd/33-216 [033] 55597.719935: timer_cancel: timer=0xffff88403f0ce520 ksoftirqd/33-216 [033] 55597.719935: timer_expire_entry:

Questions?

Yeah Right?Like we have time.