X64 Workshop Linux Information Gathering
-
Upload
aero-plane -
Category
Documents
-
view
2.344 -
download
5
Transcript of X64 Workshop Linux Information Gathering
- 1.
- Thorsten Kellermann
- Sun Microsystems
X64 Work ShopLinux Information Gathering 2. Agenda
- Linux Support Overview
-
- Support model and structure
- Data Collection
-
- Red Hat sysreport
-
- SuSE siga / config.sh
-
- Linux explorer
- Data Analyzing
3. Agenda (cont)
- Advanced Troubleshooting
-
- System Core Dump Capturing
-
- Linux SysRq
-
- Hanging System
- Linux Analysis
-
- CDA
4. Linux Support Overview
- Linux Support from System TSC organization:
-
- EMEA: System TSC VSP
-
-
- coverage from 9am - 5pm
-
-
- AMER/APAC: System TSC OS
5. Linux Support Overview (cont)
- Supported Linux Versions:
-
- Red Hat Enterprise Linux (RHEL)
-
-
- Version 3 and 4
-
-
-
- AS, WS, ES, DESKTOP
-
-
-
- Only existing contracts, no new contracts after the 30.09.2006.
-
-
- Novell/SuSE Linux Enterprise (SLES)
-
-
- Version 8, 9 and 10
-
- Back line support from Vendor available
-
- We have a path to escalate issue to Red Hat or Novell/SuSE.
6. Linux Support Overview (cont)
- What is covered by support?
-
- Bugs within the OS or with Core applications
- What is not covered?
-
- Configuration of the system
-
- HowTo questions
-
- 3 rdParty applications
- Other limitations
-
- Own compiled Kernels, tainted Modules
-
- Sun do not fix bugs within any distribution, its up to the Vendor.
7. Data Collection
- Entitlement information
-
- We need the entitlement for the Linux the customer installed
- General thoughts about data collection
-
- The issue must be visible within the data.
-
- Data must be current.
-
-
- Anything changed to the system? New data!
-
-
- And it must be understandable.
-
-
- if not, try SGRT.
-
8. Data Collection (cont)
- Samples:
-
- Customer has a working and a non working system
-
-
- Collect data from both systems
-
-
- Customer has changed the configuration by the advice of Sun Support, but this doesn't work.
-
-
- Collect again all relevant data from the system to see what was changed.
-
-
- Customer applies online updates to the system, but the issue isn't fixed.
-
-
- We need again the data from the system to see what updates are applied.
-
9. Data Collection (cont)
- Red Hat:
-
- sysreport
-
-
- Mandatory for escalating to Red Hat
-
-
-
- File system hierarchy
-
-
-
- Lack of some interesting information.
-
- SuSE:
-
- siga
-
-
- Insufficient messages etc.
-
-
- config.sh (preferred)
-
-
- Collect much more infos than siga.
-
-
-
- Encapsulate siga report
-
10. Data Collection (cont)
- Others
-
- Linux Explorer
-
-
- Most complete data collection
-
-
-
- Not a Sun tool
-
-
-
- We are in discussion with SuSE to also accept this data set instead of siga/config.sh
-
-
-
- Not accepted by Red Hat for escalation
-
11. Data Analyzing
- There is no automatic tool!
- This presentation isn't complete at all.
- Determinate the Linux Version:
-
- uname -a
-
- /etc/*release*
- Looking up Messages:
-
- messages
-
- dmesg
-
- boot.log
12. Data Analyzing (cont)
- What packages are installed? Which version?
-
- RPM is the packages manager of RHEL and SLES.
-
-
- rpm -qa
-
-
-
- rpm -qaV (takes some time)
-
- SAR report
-
- looking in the sar data (package sysstat) shows the load of the system at the time when an issue occurs
13. Data Analyzing (cont)
- Hardware/Firmware information
-
- lspci [-v[v[v[v[v]]]]]
-
- lsusb
-
- dmidecode
-
-
- not part of sysreport!
-
-
-
- hardware.py
-
-
-
-
- Python script wrapping dmidecode (RH only, may included in sysreport)
-
-
-
- dmesg or /proc releated
-
-
- e.g. firmware of SCSI disk in /proc/scsi/scsi
-
14. Data Analyzing (cont)
- Overview
15. System Core Dump Capturing
- No standard at the moment
-
- Kdump has find it's way into the mainstream kernel.
- RHEL 3 / 4 uses it's own stuff
-
- netdump (preferred)
-
- diskdump
- RHEL 5 uses Kdump
-
- An resident own kernel with small footprint
-
- highly flexible and reliable
16. System Core Dump Capturing (cont)
- SLES 8 / 9 uses LKCD
-
- Based on an IBM/SGI implementation.
- SLES 10 uses kdump
-
- An resident own kernel with small footprint
-
- highly flexible and reliable
17. Setting up RHEL 3 & 4 Netdump
- Install Netdump Server
-
- install package netdump-server
-
- normally no configuration needed.
-
- start service
- Install Netdump Client
-
- install package netdump-client
-
- configure /etc/sysconfig/netdump
-
- "service netdump propagate"
-
- start service
18. Setting up RHEL 5 Kdump
- Installed by Default
- Configuration Dialog
-
- enable / disable kdump
-
- configure dump locations
-
-
- local: file
-
-
-
- net: nfs / ssh
-
-
-
- partitions: ext2 / ext3 / raw
-
- Quite easy to setup with the GUI dialog
19. Setting up SLES 8/9 LKCD
- Install required package:
-
- lkcdutils
- Edit /etc/sysconfig/dump
- Write configuration
-
- # lkcd config
- Activate service
-
- # insserv /etc/init.d/boot.lkcd
20. Seting up SLES 10 Kdump
- Install needed packages:
-
- kexec-tools
-
- kernel-kdump
-
- kernel-*-debuginfo
- Edit /etc/sysconfig/kdump
- Enable kdump init service
-
- via YaST runlevel editor
-
- "chkconfig kdump on"
- Add boot option "crashkernel=64M@16M"
21. Checking dump setup
- Check if everything fit together:
-
- Enable Magic SysRq feature temporarily
-
-
- echo "1" > /proc/sys/kernel/sysrq
-
-
- Force the system to dump
-
-
- echo "c" > /proc/sysrq-trigger
-
22. Linux SysRq Feature
- The Magic SysRq Feature is somewhat similar to Stop-A on Solaris
- It can force the kernel to printout or dump information about the system
- Sometimes really helpful for trouble shouting
- May even work if the system seems to hang
23. Linux SysRq Feature (cont)
- Disabled by default, need to be enabled
-
- temporarily until next reboot
-
-
- echo "1" > /proc/sys/kernel/sysrq
-
-
- permanently
-
-
- edit /etc/sysctl.conf to add the line: kernel.sysrq = 1
-
- Issue locally on keyboard by Alt+SysRQ+
- Issue remote by "echo > /proc/sysrq-trigger"
24. Linux SysRq Feature (cont)
- Some Hotkeys:
-
- K
-
-
- call the Secure Attention function (SAK). SAK terminate every process running on the actual console, to cleanup the terminal.
-
-
- s
-
-
- Synchronized all hard disks.
-
-
- u
-
-
- Remounts all hard disks in read only mode. This will prevent dataloss, when the system is in an unstable situation.
-
-
- t
-
-
- Shows the actual task list.
-
25. Linux SysRq Feature (cont)
- Some Hotkeys (cont):
-
- b
-
-
- boots the system immediately. You should synchronize and remount the hard disks read only before restarting the system.
-
-
- p
-
-
- Prints out the actual register content.
-
-
- m
-
-
- Prints out the memory information.
-
- For a complete list lookup sysrq.txt in the Kernel documentation
26. Crash Dump Analyzing
- Crash utility
-
- Support varios dump fomats
-
-
- Kdump, LKCD, Net/Disk dump
-
-
- Integrated GDB
-
- Can examinate live system Kernel
- http://people.redhat.com/~anderson/
27. Crash Dump Analyzing (cont)
- You need to have the debug information of the kernel
- Crash package need to be installed
- Load vmcore for analyzing
-
- crash System.map vmlinux vmcore
-
-
- dmesg
-
-
-
- ps list
-
-
-
- stack traces
-
-
-
- etc.
-
28. Troubleshoot a Hanging System
- Hard to troubleshoot due to lack of information
- If a deadlocked kernel, NMI watchdog may help
-
- add Kernel boot cmd nmi_watchdog=1 to grub configuration.
-
- When system look is detected, a kernel panic will be initiated.
- There might be a chance to force a dump (SysRq) when system hanging
29. Links
- External Sources:
-
- Linux Explorer http://www.unix-consultants.co.uk/examples/scripts/linux/linux-explorer/
-
- LKCD Setup on SLES http://www.novell.com/coolsolutions/feature/15284.html
-
- Crash Utility http://people.redhat.com/~anderson/
- Internal Sources
-
- System TSC Linux pages http://systems-tsc/twiki/bin/view/Teams/LinuxDataGathering
-
- PTS Linux pages (outdated) http://barentz.germany.sun.com/ptsvs/Wiki.jsp?page=LinuxHowTos
30. Links
- Did you know http://www.google.com/linux?
31. X64 Work ShopLinux Information Gathering
- Thorsten Kellermann
-
- [email_address]