RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre...

Post on 22-Aug-2020

1 views 0 download

Transcript of RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre...

RAS Round-up From the IBM Linux Technology Centre

IBM Linux Technology Centre

Richard J. Moore C.Eng, MIEE, MIEEE, BSc

richardj_moore@uk.ibm.com

14th June 2002 (v8)

UKUUG 2002Bristol

5th July 12:40

Topics1. Dynamic Probes2. Kernel Hooks (GKHI)3. Linux Event Logging for the Enterprise4. Flexible Dump5. System Trace6. Community Adoption7. Miscellaneous8. What's Next9. The Team - Contacts

Dynamic Probes (DProbes)

1.1 Dynamic Probes - What is it?

Low-level Universal DebuggerOperates in extreme conditionsKernel/User, Interrupt/Task, Code/DataFor Service/Support Engineers on Production SystemsMonitors Low-level System ResourcesDynamic & Fully AutomatedTrigger/Enabler for:KDB, LKCD,LTT,evlog, Core Dump, Syslog, etc

1.2 Dynamic Probes - Where Used?

1.2 Dynamic Probes - Where Used?

Service/Support Engineer's FacilityLive SystemsNon-recreatable ProblemsNo source modification requiredTiming Sensitive Problems

Developer's ToolAlternative to temporary printk/printfApplication, Driver, Kernel etc.Timing Sensitive Problems

Test ToolFault Injection

1.3 Dynamic Probes - Mechanics

1.3 Dynamic Probes - Mechanics

Global Breakpoint ProbesIn-core code modificationTrack by Inode-OffsetAvoid COW page privatization using physical addressUnlimited Concurrent Probes

Global Watchpoint ProbesUses Debug RegistersTrack by Virtual Address

Pre-probe Script Driven Probe HandlerRPN assembler language interpreterHLL C-like Compiler

2 Kernel Hooks (GKHI)

Kernel Hooks (GKHI)

112

2

3

4

567

89

1011

2.1 Kernel Hooks - What are they?

Code locations where added function may be inserted

Supplement or replace standard function - subclassing

Function may not be known at build or run time

Function may load later therefore simple call cannot be used

Kernel has a particular need to implement hooks

Used by DProbes

2.2 Why not Patch?

2.2 Why not Patch?

InconvenientMultiple patches may require manual rework

InflexibleCannot select additional functions at run-time

Code BloatAdditional functions always presentObscures the prime function

2.3 Basic Requirements

2.3 Basic Requirements

Multiple hooks to co-exist within a module

Shared use of a hook by multiple exits

Sole use of a hook by a specific exit

Minimal impact to performance when a hook is unused

Exit must be able to operate as if inserted:Have access to local variablesTerminate the function

Group of exits able to insert atomically

Need a Managed Interface2.4 The Managed Interface

2.4 The Managed Interface

For Hooked Code:A HOOK macro - indicate the hook locationhook_initialise - allows use of the hookhook_terminate - disallows use of the hook

For Hook Exits:hook_register - identifies exit routine and priorityhook_arm - activates group of exitshook_disarm - deactivate group of exitshook_deregister - removes exit from interface

3 Linux Event Logging

Linux Event Loggingfor the

Enterprise(evlog)

3.1 evlog - What is it?

Comprehensive Logging CapabilityComplies with draft POSIX SRASS standardPOSIX APIsStructured Event RecordsOptionally Captures Syslog and Klog messagesLogs Binary and Text MessagesUser and Kernel SpaceTask, Init & Interrupt TimeEvent Notification - Automation, System ManagementEvent FilteringLog ManagementAfter-the-fact Formatting

3.2 evlog - Where Used

112

2

3

4

567

89

1011

3.2 evlog - Where Used

Device Driver HardeningAutomated RecoveryOn-line Diagnostic ActionSystem Management

Instrumentation SchemesWrapper macrosEase of Implementation

4 Flexible Dump

Flexible Dump

4.1 Flexible Dump - What is it?

Goals for a Comprehensive System Dump Non-disruptive - Snapshot CapableSystem and (multiple) User Context VisibilityMinimal System DependenceStand-alone CapableCustomisable Dumping - Virtual & Physical Memory Ranges, Objects, Processor Resources etc.Multiple triggers: Exception Kernel/User, API, NMI/KBD InterruptAccess to Swapped DataDump Space/Repository ManagementProgrammable formatterSMP CapableSupport for Alternative Dump Devices (Telco)

4.2 Flexible Dump - Where is it?

4.2 Flexible Dump - Where is it?

Contributions to LKCDSnapshot Dump - DProbes interfaceNon-disruptiveCustom Dump ObjectsMinimal System DependenceSMP fixes + multiple CPU status saving

Working with LKCD Community

5 System Trace

System Trace

5.1 System Trace - What is it?

Generic Trace Recording Mechanism

Community contributions to:Linux Trace Toolkit (Opersys)Dynamic Trace - DProbes interfaceFormatting exit for RAW trace data

Supporting Similar efforts in:Linux Kernel State Trace (LKST) - Hitachi

5.2 Tracing Initiatives

112

2

3

4

567

89

1011

5.2 Tracing InitiativesBuffering:

Per CPUPer ComponentZero LockingVariable Length

ControlSuspend/ResumeGlobal Activate/Deactivate

InstrumentationKernelDriversUser-space SubsystemsFine-grained Dynamic Trace

Formatting

6 Community Adoption

112

2

3

4

567

89

1011

Community Adoption

6.1 Adoption Initiatives

Establishing a RAS Community - OLS RAS BoFs

Minimise Fragmentation - Maximise Contribution

Canvassing Distributors

POSIX

Instrumentation - standards, aids, implementation

Porting & Currency

Preparation for 2.5 Kernel

7 Miscellaneous

7 Miscellaneous

KDBComplex Breakpoints - DProbes InterfaceTwo Patches Accepted

KernelDebug Register Allocation Patch (Dprobes/KDB/gdb)

8 What's Next?

8 What's Next?

Log/Trace Instrumentation of Kernel and Device DriversWe need participation from the Community

DProbes ports for IA64 and zSeriesTurbo Linux release of RAS UtilitiesSampler Probe type for ProfilingDProbes HLL CompilerDump User ContextsKDB User ContextsMission Critical mcore Integration with LKCDOn-line Diagnostics FrameworkFirst Failure System TechnologyPerformance Co-PilotRAS Community BOF at OLS 9 The Team - Contacts

9 The Team - Contacts

End of Presentation

India:

Suparna BhattacharyaS. VamsikrishnaSubodh SoniBharata B. Rao

USA:

Larry KesslerJames KenistonHaren MyneniHien Q NguyenMike SullivanMichael MasonThomas ZanussiDaniel StekloffDavid OleszkiewiczTom Hanrahan (Manager)

Mailing List: systemras-developers@lists.sourceforge.netWeb Page: http://systemras.sourceforge.net/Richard J Moore: richardj_moore@uk.ibm.com

UK:

Richard J Moore (Architect)

Trademarks

Trademarks

IBM, zSeries and S/390 are trademarks of the International Business Machines Corporation in the United States and other countries.

IA32 and IA64 are abbreviations Pentium 32-bit and Itanium 64-bit architectures of the Intel Corporation.