Post on 29-Dec-2015
1
THE SCO GROUP 2007
© The SCO Group, Inc. All Rights Reserved
SCO Unix Diagnostics and Troubleshooting
Alexander Sack (alexs@sco.com)Senior Software Engineer
2
Agenda
Intro Initial System Load (ISL) Common Hardware and Driver Issues System Tuning Networking Tips Reporting Problems Q & A
3
ISL: Overview
Before installing… Has the system itself been certified by the OEM? Is the motherboard in the CHWP? (Intel whitebox) Is it compatible kinda sorta maybe? Do I need a third-party HBA diskette? Network card supported? Does X support my graphic chipset? Disk layout issues, multi-boot?
4
ISL: Debugging
“Alt-SysReq-H” or “Alt-Ctrl-H” to enter console mode
“Alt-SysReq-F1” or “Alt-Ctrl-F1” to go back to install screens
Acess to resmgr, ISL scripts (/isl/ui_modules), note any console messages during install
IVAR_DEBUG_ALL=1 Dumps log files in /tmp/log Transfer logs to floppy via cpio
E.g. find /tmp/log/* | cpio –oc –O /dev/dsk/f03ht cpio –ic –I /dev/dsk/f03ht
5
ISL: Issues
Problem: Installation sees more processors than actually present
Reasons: Bad MPS tables Cores listed as physical CPUs in BIOS Limited ACPI support (OSR5 only)
Solution: Boot in single processor mode (ATUP) and apply latest MP/SMP
pack ACPI=Y, USE_XAPIC=Y, ENABLE_JT=Y, MULTICORE=N Flash BIOS
6
ISL: Issues
Problem: Kernel hangs on boot-up Reasons:
Missing interrupts Mixed stepping processors
Solution: Boot in single processor mode (ATUP) Reverse stepped processors, make the LOWER stepping
processor in slot 1 Check BIOS settings, ACPI vs. MPS Move add-on PCI card to a different slot PnP set to OFF in BIOS
7
ISL: Issues
Problem: Can not load a HBA from USB floppy Reasons:
BIOS does not support legacy mode (OSR5 only) “Device enumeration timeout” USB is disabled in the BIOS ISL CD left in tray
Solution: Check USB BIOS settings Re-plug USB floppy device, verify sdiconfig output on console Follow TA article on renaming disk nodes Remove CD before load Make sure disk was created correctly, dd image to p0 not s0 Try a different USB floppy device
8
ISL: Issues
Problem: Root HBA not found after the DCU runs Reasons:
Didn’t load the right third-party HBA Software based RAID issues Valid media kit USB floppy wasn’t really picked up (ISL will use CD1 for HBA
drivers from an ATAPI drive) Solution:
Disconnect USB floppy after HBA loads Bind third-party resmgr entry to HBA driver manually via DCU Check resmgr entry BOARDID and verify that HBA really
supports the card Download a later driver from IHV website
9
ISL: Issues
Problem: SATA or IDE hangs after loading or fails to recognize my devices
Reasons: Missed interrupts (polling messages) DMA incompatibility Driver in slave only configuration (OSR6/UW7) SATA/PATA card uses custom third-party driver (e.g. Adaptec,
Silicon Image, Marvell) Solution:
Check cables and jumpers Change mode in BIOS: Legacy, Compatible, Enhanced, AHCI
ATAPI_DMA_DISABLE=Y Avoid cable select (legacy PATA)
10
ISL: Issues
Problem: Red screen during mount of CD Reasons:
Missed interrupts (polling messages) DMA incompatibility Driver in slave only configuration (OSR6/UW7) SATA/PATA card uses custom third-party driver (e.g. Adaptec,
Silicon Image, Marvell)
Solution: Check cables and jumpers Change mode in BIOS: Legacy,
Compatible, Enhanced, AHCI ATAPI_DMA_DISABLE=Y Avoid cable select (legacy PATA)
11
ISL: Issues
Problem: NIC is not auto-detected Reasons:
Driver on ISL media is older than card Driver issues with card, driver loads but fails
Solution: Defer networking and pkgadd drivers after install After install, use SCOadmin Network to configure card Bind entry to particular NIC driver if card is within the
same family via DCU Stick in another card!
12
ISL: Issues
Problem: vfs_mountroot() failure Reasons:
Driver on ISL media is older than card Driver issues with card, driver loads but fails “$static” not added to ROOT HBA sdevice file
Solution: Follow TA to mount disk from ISL Use the RECUT media Make sure you are using the latest HBA driver
13
ISL: Issues
Problem: Screen goes blank after logo appears Reasons:
VESA mode is not supported by card On-board chipset uses system memory for
framebuffer
Solution: AGP Gart is now supported, install latest maintenance
pack USE_VESA_BIOS=Y Use a supported graphics chipset!
14
ISL: Issues
Problem: Filesystem is left dirty after ISL and every reboot
Reasons: Aggressive BIOS Power Management RAID battery failure Target issues – CHECK CONDITIONS Older driver and the write cache
Solution: Check RAID battery levels Check HBA and target firmware revision Update to latest driver
15
ISL: Issues
Problem: Installed one OS and another one won’t boot Reasons:
OSR5 8GB limit UW7/OSR6 128GB limit OSR5 on the first partition of a drive is recommended MBR rewritten
Solution: Use CD1 to boot-up and execute fdisk to rewrite MBR from
UW7/OSR6 fdisk Use a third-party boot loader like GRUB
16
ISL: Issues
Problem: Failing to create large logical volumes Reasons:
VXFS technical 2TB limit OSR6/UW7 1TB physical capacity limit HTFS has issues with greater than 1TB filesystems (slow) RAID utility issues
Solution: Use VXFS and ODM Split volumes in 1TB chunks Use RAID BIOS or OEM utility if possible to always setup
volumes
17
ISL: Issues
Problem: ISL load time is very slow Reasons:
ATAPI DMA is disabled Write caching is disabled Media errors Faulty hardware
Solution: Check IDE/SATA settings Some OEM disable write caching which makes install slow –
future boot parameter Check hardware and BIOS settings
18
ISL: Issues
Problem: Kernel link failure at end of ISL Reasons:
IRQ conflicts in System driver file Driver configuration build error
Solution: Check BIOS settings Disable serial or legacy devices you don’t need Chroot into fresh install and check build files Update HBA drivers if available
19
ISL: Issues
Problem: Kernel panics on boot-up Reasons:
Full moon out You weren’t nice to the machine that day The customer is out to get you
Solution: Boot in single processor mode Disable USB via boot parameter or BIOS Take note if possible of the stack trace to discern error Cry to the OEM Cry to SCO support
20
Hardware and Driver Issues: Disk migration
Migrating OSR5 disk to OSR6 Install wd supplement before migration! Administer the disk at the source system FIRST
before migration OSR6 Divvy now works on OSR5 (wd) and OSR6
disks Limitations:
There is no conversion for UW VTOC disks to dual format OSR6
OSR6 does not support extended VTOC slices Always back your data before migration!
21
Hardware and Driver Issues: Multi-core
All Intel based processors are multi-core! ACPI is required to fully support multi-core
(OSR6/UW7) OSR5 supports multi-core provided MPS tables are
sane – has some ACPI support (HT) OEMs have stopped testing MPS table!
SCO licenses per CPU package not core (industry standard)
Mixed steppings headaches
22
Hardware and Driver Issues: HBAs
What driver to use? If in doubt, always use the driver diskette with the higher
IHVVERSION in it! Supported cards can be found in the Drvmap files of the
HBA driver/btld package http://pciids.sourceforge.net/ Sometimes adding a OEM branded BOARDID will work –
sometimes it will panic your system! “echo pcilong | ndcfg”
Management utilities are packaged with the driver if available
Recut media and maintenance packs include latest drivers
Read the README posted on the SCO download area!
23
System Tuning: General
Migrating from OSR5 to OSR6 DO NOT BLINDLY import OSR5 tunables from OSR6 E.g. buffer cache has different use on OSR6 Identify the performance problem you are trying to
solve first! [ GOLDEN RULE ] Take measurements
/etc/conf/bin/idtune SCOadmin has wrapper for idtune
24
System Tuning: Performance
Performance Tuning Identify bottleneck Rtpm, prfstat, sar, prof, lprof
CPU performance sar –u
00:00:00 %usr %sys %wio %idle %intr 00:00:01 30 10 10 46 4
high usr, investigate with truss, prof high sys, intr, investigate with prfstat high wio, storage throughput
25
System Tuning: Simple Example
26
System Tuning: Simple Example
27
System Tuning: Simple Example
28
System Tuning: Storage
Storage Performance Hardware configuration
Device topology don’t connect slow devices and fast devices on the same bus
e.g. put your slow tape drive on a separate controller Cabling
ensure your cables are up to specifications Hardware RAID
performance RAID 0 vs integrity RAID 1 RAID 5
Filesystem tuning fsadm, block size, increase logsize (@ mkfs only) mount options; tmplog
ODM dramatic performance boost for $99
29
System Tuning: Memory
Memory Avoid swapping DEDICATED_MEMORY, use if using shared memory
mkdev dedicated Dedicated memory reserves physical Saves kernel virtual Reduces paging, uses large mappings (PSE)
SEGKMEM_PSE_BYTES Add more memory!
30
System Tuning: Filesystem
Tuning for largefile support HDATLIM, SDATLIM, HVMMLIM, SVMMLIM,
HFSZLIM, SFSZLIM set to 0x7fffffff (unlimited) /etc/conf/bin/idbuild –B && init 6 fsadm /mountpoint or raw device
fsadm –o largefiles /
OSR6 defaults to largefiles, UW7 does not
Building large file aware applications -D_FILE_OFFSET_BITS=64
31
Networking Tips: Configuration
Network configuration netconfig
drivers installed in /etc/inst/nd/ bcfg files are parsed by ndcfg /etc/confnet.d/inet/interface is configured at boot /etc/tcp (c.f. S69inet on UW) is run to link the driver into
dlpi - initialize -U STREAMS based network stack
ndcfg useful for displaying info about the system geared toward network device driver writers
32
Networking Tips: Tuning and Tools
Network monitoring & tuning tools netstat ifconfig inconfig ndstat ndcfg traceroute ping Tcpdump
dlpid logging dlpid –l <logfile> /etc/inst/nd/dlpidPIPE or edit /etc/default/dlpid
LOG=<logfile> NIC failover
automatically and transparently switch to a backup NIC in the event of failure of the primary
Chains of backup NICs supported
33
Networking Tips: Commons Issues
Network is UP but can’t connect to other systems is DNS configured correctly? netstat –rna do you have a default route?
Network performance is poor check cabling ndstat –l
collisions inconfig nfsstat
34
Networking Tips: Common Issues
Network responds to pings but can’t login are the daemons running ? licensed ?
Multiple hosts with the same IP or MAC arp –an (-n disable name resolution) ? (132.147.103.1) at xx:xx:xx:xx:xx:xx (802.3) ? (132.147.103.9) at xx:xx:xx:xx:xx:xx (802.3)
Stopping and starting the interface ifconfig net0 down /etc/tcp stop – daemons stopped, NIC is UP /etc/tcp shutdown – everything down /etc/nd stop start
35
Reporting Problems
crash Primarily used for panic analysis
/var/spool/dump dumpmemory to generate a crash dump on a live system crash –a <dumpfile>; will produce a listing suitable for SCO support provide dumpfile, /stand/unix, all of /etc/conf/mod.d, /usr/sbin/crash
Useful crash commands ps, as, trace, u, eng, od, addstruct, help walk data structures using od
od –f
ksh style history buffer
lsof, can save hours of fun on a live system
36
Reporting Problems
When reporting problems to support: Establish a reproducible case (if possible) Save any crash related files Note stack trace, crash -a Save system log files
/var/adm/
Include hardware specs when filing a bug run sysinfo
Be aware of changes made to /stand/boot bootparam
37
Q & A