9 Troubleshooting

21
ESX Server System Management II Module 9 Troubleshooting ESX Server Prevention Likely problems Responding to issues

description

Vmware trouble shooting

Transcript of 9 Troubleshooting

Page 1: 9 Troubleshooting

ESX Server System Management IIModule 9

TroubleshootingESX ServerPreventionLikely problemsResponding to issues

Page 2: 9 Troubleshooting

2For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.2

ESX Server troubleshooting philosophy

• Most ESX Server problems are caused by• Hardware problems

• Misconfigurations

• Inadequate planning

• An ounce of prevention• Aggressively validate hardware

• Plan and review deployment

• Develop and apply good data-center policies

• A pound of cure• Learn common symptoms, faults, fixes

Page 3: 9 Troubleshooting

3For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.3

Avoiding problems before they occur

• Aggressively validate hardware• Run memtest86 for 72 hours before deployment

• Install a dummy OS on hardware

• Check installed items against supported hardware listhttp://www.vmware.com/pdf/esx2_IO_guide.pdfhttp://www.vmware.com/pdf/esx2_SAN_guide.pdf

• Plan the deployment• Allocate enough resources to Service Console

• Develop datacenter policies

Page 4: 9 Troubleshooting

4For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.4

Service Console resource problems

Resource Problems caused by shortageService Console

RAM

•Poor Remote Console performance

•Poor MUI performance

Service Console

swap

•Randomly killed processes (especially MUI)

•Inability to start new VMs

Service Console

disk

•Underuse of template technique

•Full file systems, causing…

•Inability to launch MUI

•Incompletely written logs

Page 5: 9 Troubleshooting

5For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.5

Datacenter policy problemsPolicy issue Problems caused by lack of policy

Root password too widely

known

•Inappropriately timed maintenance

•VMs get created as root

•Root privilege used casually, worsening impact of operator error

•Audit trail is obscured

•Difficult to change root password

Root should not own VMs

•Encourages casual use of root login

•Each VM should be owned by a named individual or group

Passwords should not be shared

•Audit trail is obscured

•Greater odds of individuals working at cross-purposes

Page 6: 9 Troubleshooting

6For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.6

Installation issues

• VMkernel can only see supported devices• Only devices with drivers in /usr/lib/vmware/vmkmod

• Installation OS is uniprocessor, does not see IOAPIC• If devices not seen, activate APIC system manually boot: esx apic

• Be sure to detach external storage when doing the ESX Server install

• Some hardware models require resetting PCI slots in BIOS when cards change

Page 7: 9 Troubleshooting

7For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.7

When to involve VMware support

• Always let support know when…• The VMkernel panics (the “Purple Screen of Death”)

• A virtual machine crashes, leaving behind a monitor core dump in its home directory

• Whenever you contact support about a VM problem• Find that VM’s world number in its monitor log

• Look in the VMkernel log for references to that world number

• Run /usr/bin/vm-support script and include the resulting file

Page 8: 9 Troubleshooting

8For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.8

The Purple Screen of Death

• Displayed on ESX Server’s video monitor in the event of VMkernel panicVMware ESX Server [Release.1.5.1$Name: build-2173 $]SPIN count exceeded - probable deadlockgate=0x0 cr2=0x40017000 frame=0x801bc8 cr3=0x141c200 cr4=0x6f0eax=0 ebx=0 ecx=0 edx=0ebp=801d40 esi=0 edi=0CPU 0 96 console: cpu 1 93 idle1: cpu 2 94 idle2: cpu 3 95 idle3:[0x43457e]SP_WaitLockIRQ+0xd6(0xe9d58c, 0x0, 0xe5c248)[0x46c827]pci_request_regions+0x1af(0xe7bdf0, 0x61, 0x8f40e0)[0x4187bf]IDTDoInterrupt+0x1af(0x61, 0x801e01, 0x1)[0x41898a]IDT_HandleInterrupt+0x4e(0x801e28, 0x801e58, 0xe5c248)[0x416cca]HostHandleInterrupt+0x26(0x801e28, 0x439a0038, 0xffff0038)[0x45d8a3]HostEntry+0x83(0xeac968, 0x801e88, 0x41bb9d)[0x472eda]scsi_try_bus_reset+0x36(0xeac968, 0xc, 0x0)[0x471ca5]scsi_build_commandblocks+0xc71(0xe9d4f0, 0xea3010, 0xea9220)[0x438d85]SCSIResetCommandInt+0x65(0x105, 0xea9220, 0x801f6c)[0x438b14]SCSIExecuteCommandInt+0x48(0x105, 0xea9220, 0x801f6c)VMK uptime: 0:02:44:27.64Dumping VMkernel core and log ... Done.Waiting for debugger... (world 96)Debugger is listening on serial port ...

Page 9: 9 Troubleshooting

9For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.9

Most frequent types of PSODs

• Machine check exception• A general hardware problem

• VMware Support can help pinpoint the failing subsystem

• NMI ECC or Parity Error• Specifically memory failures

• VMware Support can help pinpoint the failing bank

Page 10: 9 Troubleshooting

10For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.10

In the event of PSOD

• Copy down the screen display, screen-grab it, or take a photo

• If the machine had been running in a steady state, with running VMs• Check for environmental factors

•Especially room temperature

• Check for detached external devices

• If the machine had been recently rebooted• Check for hardware configuration changes

Page 11: 9 Troubleshooting

11For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.11

The vm-support script

• Gathers all support-relevant information, bundles it for delivery to VMware Support

• To run:# cd {writable directory with disk space}# vm-support

• Attach resulting esx-date.id.tgz file to support request

Page 12: 9 Troubleshooting

12For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.12

Key ESX Server logs/var/log

messages vmkernel vmkwarning

Service Console errors, boot failures

VMkernel actions, including world creations

informational messages, not likely to indicate problems

/home/user/vmware/vmname

vmware.log Monitor log for this VM only

Page 13: 9 Troubleshooting

13For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.13

Some possible ESX Server problems

• Can’t start a VM

• Can’t connect to MUI

• Can’t connect to Remote Console

• VM bluescreens or hangs

• Remote Console performance problems

• Application performance problems

Page 14: 9 Troubleshooting

14For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.14

Problem: Can’t start a VM

• Possible causes:• Wrong permissions on virtual disks or config file

• Virtual disks are not in a VMFS

• Physical addresses for virtual disks may be no longer valide.g., vmhba0:1:2:0 is now vmhba1:1:2:0

•Fix: Use VMFS names!

• Not enough memory in the system

• Not enough unreserved VMkernel swap

• Service Console’s hostname has no associated IP address

• Virtual disks are corrupt or in COW format

Page 15: 9 Troubleshooting

15For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.15

Problem: Can’t connect to MUI

• Possible causes:• Loss of IP connectivity

• Wrong DNS name or IP address for ESX Server

• Service Console root file system may be full•Use df –k to check

• Service Console may have run out of swap•Linux may kill processes if this happens

•To check for presence of MUI server:ps –ef | grep httpd

•To manually restart MUI server:/etc/rc.d/rc3.d/S91httpd.vmware start

Page 16: 9 Troubleshooting

16For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.16

Problem: Can’t connect to Remote Console

• Possible causes:• Loss of IP connectivity

• Wrong DNS name or IP address for ESX Server

• NIC duplex or speed mismatch with Ethernet switch

• Service Console root file system may be full•Use df –k to check

• Remote Console may be running on a non-default port number

•Check /etc/xinetd.d/vmware-authd for port number

•Remember to specify port number in client if not 902esx.company.com 8092 /home/fred/vmware/a/a.cfg

Page 17: 9 Troubleshooting

17For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.17

Problem: VM bluescreens or hangs

• Troubleshoot the issue just as on physical hardware

• Possible causes:• Application problems running in the guest OS

• Hardware problems in the ESX Server

• Bugs found in the VMkernel• VMware technical support will post a patch

Page 18: 9 Troubleshooting

18For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.18

Remote Console performance problems

• Possible causes:• Service Console may be swapping

• NIC duplex or speed mismatch with Ethernet switch

• Bit depth of virtual machine may need to be lower for this network path

•Try to operate Linux virtual machines without graphics

• Windows 2000 media detection feature•Start the VM with the Virtual CD-ROM disconnected

• User expectations•Remote Console is not a replacement for Windows Terminal Services or Citrix Metaframe

Page 19: 9 Troubleshooting

19For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.19

Application performance problems

• Distinguish between user perception and actual performance issues• A machine can “feel slow” interactively while still delivering

good transactions per second

• Possible causes• Name resolution issues

• vmnic duplex or speed mismatch with Ethernet switch

• The VM may need more of a limiting resource•CPU, memory, or disk bandwidth

Page 20: 9 Troubleshooting

20For ESX Server 2.0.1 2003-11-17

Copyright © 2003 VMware, Inc. All rights reserved.20

Most frequent ESX Server support issues

• Diagnosing failures due to hardware

• Questions related to SANs and HBA failover• Supported configurations, how-to

• Configuring speed and duplex settings for NICs

• Assessing performance of virtual machines

• Allocating devices to Service Console and VMkernel• vmkpcidivy

• Linux questions and issues not specific to ESX Server

Page 21: 9 Troubleshooting

ESX Server System Management IIModule 9

Questions?