S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes...

40
System-Level Debug Jakob Engblom, PhD Technical Marketing Manager – Simics Wind River, Stockholm, Sweden

Transcript of S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes...

Page 1: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

System-Level Debug Jakob Engblom, PhDTechnical Marketing Manager – SimicsWind River, Stockholm, Sweden

Page 2: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

What’s the Problem?

Page 3: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Debug has Always Been with Us

Page 4: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

| ©2011 Wind River. All Rights Reserved.4

Debug Context Increasing in Size

Processorand Memory

SoC Devices Complete Boards Complete Systems and Networks

Devices, Racks of Boards,and Backplanes

Design Scale

Sys

tem

and

Deb

ug C

ompl

exity

Page 5: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Multi-X is here to Stay

Multiple threads Multiple processes

Multiple operating systems

Stacked operating systems (Hypervisor)

You want to debug a single system as a unit

Multicore 

Multiple chips Multiple architectures

Heterogeneous architectures

Page 6: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Current Debugger Design Assumption

Target program

Debugger

Page 7: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Target OS

Target program

… We Need to Think Big

Thread

Thread

ThreadDebugger

Page 8: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Target OS

Target program

… We Need to Think Big

Thread

Thread

ThreadDebugger

Target program

Target program

Page 9: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Target OS

Target program

… We Need to Think Big

Thread

Thread

ThreadDebugger

Target program

Target program

Target OS

Target program

Target program

Thread

Thread

Target OS

Target program

Target program

Thread

Thread

Page 10: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Background:Simics Evolution

Page 11: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

System Debugging: The Beginning

When we started with Simics more than a decade ago, we included a simple debugger inspired by gdb

It had nice commands like:– break (plant breakpoint)– %r (read register value)– sym (lookup symbol)– x (examine memory)– set‐pc (change PC )– ptime (print current time)

Worked fine for debugging an OS booting up on a virtual platform

Target OS

Debugger

Page 12: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Simics Debugging: System Strikes

Over time, targets moved to multiple processors, multiple machines, and OS awareness with multiple processes

So now: where does ”break” apply?

This is the essential question for all system-level debug

Over several generations, a design pattern emerged:– Namespaces –

And then hierarchical namespaces

Page 13: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Example of Namespacing All operations need to specify what they operate on, typically

by using command namespaces– Example in full verbosity:

tuna.vxworks.break (tuna.vxworkssymbols.pos myISR)– tuna – name of machine– vxworks – name of an execution context in the machine,

corresponding to a vxworks kernel– vxworkssymbols – a symbol lookup engine for the binary

corresponding to the vxworks kernel– pos myIsr – command to find position in the code of the function

called myIsr– There can be more machines in the system, and more contexts

A plain “break” could have anything as its target Using a “current processor” does not scale

gdb uses a simple scheme like this for multithreaded debug

Page 14: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Applies to All Debugger Backends

Note that this is not just an issue for virtual platforms– An ICE/hardware debugger can work with multiple cores in a

single box, each with a separate OS– A Hypervisor debug agent has multiple guest OSes that it is

simultaneously debugging– A target agent can work with many programs at once– A debugger can have multiple connections open to multiple

backends in a single target system For example, one JTAG connection per board in a rack, all

controlled from the same host PC

Same debugger frontend across backends, provide a consistent debug experience

– Still exposing the strong points of each backend

Page 15: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

System-Level Issues

Page 16: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Mixed Architectures

In a system setting, target machines can differ in architecture, word length, endianness

Debugger needs to be able to debug several target architectures in a single session

32‐bit PPC BE

Target OS

64‐bit IA LE

Target OS

Apps

8‐bit AVR

Apps

Apps

Debugger

PPC IA

AVR

Page 17: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Example Problems

Reading variables and stack frames from memory – Get endianness and word length right

Remember to deal with execution modes– 64-bit processor running a 32-bit OS: how big are registers?– Common both x86 and PPC to have 32-bit OS on 64-bit HW

Page 18: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Keep it Local

When debugging multiple programs or systems at once, debug contexts have to be local to each target

– Symbol file associations, source code file paths, breakpoints, etc. has to be maintained per debugged entity

– Breakpoints and other actions have to apply to the smallest possible part of the target system

Target program

Debugger context

Target program

Debugger context

Target OS

Target program

Target program

Thread

Thread

Debugger context

Page 19: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Example: Eclipse Breakpoints Eclipse saves breakpoints

across sessions– Same for expressions,

watch expressions, … When a name happens to

match a current breakpoint context, it is replanted

– If several programs contain the same names, breakpoints will be planted in all matching locations File name, variable name,

etc.– Very strange effects for a

user

Page 20: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Step Here, Stop There When following multiple programs, you will end up with some

programs or threads hitting breakpoints while others are stepping

– Essentially, non-stop debugging is the only reasonable choice

running inactive op

step line

hit breakpt

Stop the system for break in app 3, or wait until the step line in app 1 completes?

switch outApp 1

App 2

App 3 step out

switch out

switch in

step complete

If we stop due to app 1 step complete, we need to make the user understand that app 3 is still stepping… 

step complete

step in

Step on an inactive task – we should stop once it activates

The user now focuses on this application and decides to step out of the current function

switch in

step complete

Page 21: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Single Connection Ideally, use a single connection to the target system in its entirety

– Requires connection to handle heterogeneity Most existing debug protocols (such as gdb-serial) assume a homogeneous counterpart Target Communications Framework (TCF) can handle heterogeneous systems

– Homogeneous connection implies several separate connections to target– Coordinating the run control across multiple connections can get painful– Coordination hardware box close to target?

DebuggerDebugger

32‐bit PPC BE

Target OS

64‐bit IA LE

Target OS

Apps

8‐bit AVR

Apps

Apps

Page 22: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Single Connection

Example in action:

Page 23: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Follow Multiple Programs We want to see where several programs are executing

– At the same time, in the same debugger GUI Requires a program-centric user interface Problem also for with multicore debug today

– Eclipse CDT default model of source-centric does not work well– Many other debuggers have a good model for this already in place

Machine 1

Target OS

App 1 App 2

Machine 2

Target OS

App 3

Machine 3

Target OS

Don’t care

Debugger

App 1 App 2

App 3

Page 24: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Source File != Program

Typically, debuggers associate from a source code file to an executing binary to set breakpoints

But the “same” source file can be part of several programs executing in a system

– Same name (main.c) used in many programs – distinguish by compilation path

– Same file, in the same place in the file system, included into multiple programs with different compilation settings Common code base of portable code For example, a piece of middleware compiled for PPC-32-Linux,

x86-64-Windows, ARM-32-VxWorks, all running in the same target system

Page 25: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Source File != Program

Debugger needs to distinguish different context to a shared source code file to correctly target breakpoints and resolve variable values

Machine 1

Target OS

mymiddleware

Machine 2

Target OS

mymiddleware

Machine 3

Target OS

mymiddleware

App 1App 2

App 3

common/mymiddleware.c

Page 26: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Example: Same-Name Programs

Machine 1 (shark, ppc‐32)

Machine 2 (mackerel, x86‐64)

Machine 3(herring, x86‐32)

Page 27: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

The Target is Already Running

In a classic debugger workflow, the debugger has the ability to launch the target program to debug

In a system-level debug setting, that does not work The target software is already there, and it is starting and

stopping based on events inside and outside the target system

What does this require from a debugger?– Ability to debug any existing process, including chasing it as it is

switched in and out by the OS kernel– Ability to hook into the start of a new process to attach to it as

something starts it– More smarts in the backend, fewer round-trips to frontend

Page 28: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

ExampleIn fact the most cumbersome procedure with Android is not necessarily to setup the debugger, as this is a one-time step. What is tedious and time consuming is the ability to attach a debugger to the right process at the right point in time and under the right condition. Within Android, a single service is made up by the interaction of many software entities on all layers of the SW stack. As an Android integrator, you cannot be sure where the problem us rooted. This, you can only find out during debugging. Debugging however, requires the injection of a halt at the beginning of the thread/process as you do not launch those manually. It is practically not possible to inject those halts in all potential error prone processes, as you would need to attach a debugger to all of them and step over. For sure, it is not needed for all processes as some of them already wait. It is just very annoying and complicated to debug native code in Android practically [I have not seen anybody claiming something else so far]. At the end you fall back to printf/log based debugging and guessing rather conducting a systematic analysis using a debugger.

http://www.synopsysoc.org/viewfromtop/2011/09/vp‐software‐debugging‐myths‐and‐facts/comment‐page‐1/#comment‐1506

Page 29: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Search, Filter, Summary

We will need to use search and filter as a way to interact with a debugger

– Imagine 10 boards, each with 10 cores, each with 100 threads running – we quickly get to 10000+ nodes in the system

– Finding your way around requires tools more like desktop search

Such filtering is necessary to use bandwidth smartly– Passing over the state of 10000 threads on each debugger stop

is not practical– Even for a virtual platform on the same host (“infinite

bandwidth”), the data passing will take noticable time

Smart summaries of system state to let users focus in the right place is also needed

Page 30: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Actual Time In classic debug, time is logical

– Lines of code, values of loop variables, progress of processing In system-level debug, physical time becomes important

– Did this breakpoint hit “just after” something else, or “way later”?– Does a step of a certain thread take microseconds or seconds?– Is this before or after something happened in another place?– As you jump between threads, programs, and targets, just where

are you in the overall system execution?– Physical time answers a lot of these questions

Debuggers needs to present current time(s), delta(s) in time– There is more than one time in any moderately complex system– (search the gdb mailing lists for a discussion we had on this)

Real time is obviously really important for debugging real-time systems (in a different way)

Page 31: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Example: Actual Time

Page 32: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Trace to Analysis to Debug

Trace is the basis for modern debug and analysis Trace processing finds an event

– (hiccup, delay, sudden spike in CPU usage, whatever)– ... jump to the point in time, and the system, task, source file, and

line of code where the suspicious event occurred OS awareness needs to permeate debugging at all levels

Target program

Target program

Thread

Thread

Target program

Page 33: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Script!

To solve complex problems, you need to be able to script the debugger

– A GUI is nice… but sometimes you need to program

It allows program-specific custom automatic debug

Page 34: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Conclusions

Page 35: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

System-Level Debug is a Challenge

Backends– OS awareness, same frontend to many backends, more smarts

Communications to backend– Heterogeneity, bandwidth, abstraction

User Interface– Local actions, control over scopes, program-centric

Debugging concepts– Time, trace-to-code, search-and-filter, scripting

Page 36: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Questions?Comments?Hate mail?

Page 37: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Backups

Page 38: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Multiple Connections

Another case is that several debuggers have to be used at once to debug a single system

– One debugger might not support all architectures– Specialized debuggers for architectures like DSPs need to be

used for a subset of processors

32‐bit PPC BE

Target OS

64‐bit IA LE

Target OS

AppsApps

DebuggerDebuggerDebuggerDebugger

Page 39: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness

Multiple Connections: Who’s in Charge?

Some interesting problems– Debugger cannot assume it knows why a target stopped -– Debugger needs to ask the target for why, where, and it stopped

Debug protocols have to support state update from the target to the debuggers

– Don’t use any blocking operations in the GUI - otherwise, deadlock

Page 40: S4D keynote system-level debug Jakob Engblom October 2011€¦ · Simics Debugging: System Strikes Over time, targets moved to multiple processors, multiple machines, and OS awareness