Carolina mini-cl-2014
-
Upload
cisco-public-sector -
Category
Technology
-
view
384 -
download
1
Transcript of Carolina mini-cl-2014
Local Edition
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Local Edition
Cisco UCS Troubleshooting and Best Practices Jose Martinez Technical Leader Services @jose_at_csco
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Agenda
• Cisco UCS Troubleshooting ‒ Things all UCS admins should know and do ‒ Case Studies : Right out from TAC queue ‒ DIY : What resources are available for you?
• Cisco UCS Best Practices ‒ The Basics ‒ Wait… You are upgrading this weekend? ‒ Day to Day Operations
• Miscellaneous • Q&A
3
Cisco UCS Troubleshooting Basics
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Most of the issues that affect the Cisco UCS are investigated via the logs collected in the system
• Even investigation for performance issues or authentication issues start with the logs
• Logs are collected in every component • Depending on the issue multiple logs may be needed • Collect UCSM and Chassis tech-support as soon as possible • Current behavior is to overwrite the last log ‒ Changing with upcoming release : CSCuj56943
The Basics
5
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• The size of the logs can be modified as well as the level of logging ‒ Default log level is info ‒ Default size is 5232880
• Change these values Scope monitoring à sysdebug à mgmt-logging
The Basics
6
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting UCSM Internal Overview
7
GUI CLI Standards (SNMP, IPMI)
XML API
Management Informa;on Tree
Data Management Engine (DME)
Applica;on Gateways (AG)
Managed Endpoints
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting Logs Example
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Scope – allows admin to enter into the specified mode ‒ Examples : scope adapter , scope bios-settings , scope org ‒ Each mode has its own set of commands ‒ Commands allowed depend on assigned role and locale (RBAC) ‒ Configuration changes are usually allowed in this mode (commit-buffer)
• Connect – allows admin to connect to a specific component ‒ Examples : connect adapter , connect iom , connect local-mgmt ‒ This allows better troubleshooting options ‒ This does not allow for configuration options
Navigating the CLI
9
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Scope example :
Navigating the CLI
• Connect example :
10
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• The connect IOM only works to the directly attached IOM • To connect to the other side : ‒ connect local management other side ‒ connect iom <chassis #>
Navigating the CLI
11
FI-‐B
Primary
admin
FI-‐A
Secondary
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Best location for traffic troubleshooting • Debug capability • Display switch running config (non-server config) • Access to ethanalyzer • Typical datacenter switches commands : ‒ Show interface brief ‒ Show vlan ‒ Show mac address vlan <vlan id> ‒ Show port-channel summary ‒ Show lacp neighbor
Navigating the CLI – connect nxos
12
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• connect adapter X/Y/Z ‒ X is chassis # ‒ Y is blade # ‒ Z is adapter #
Navigating the CLI – connect adapter
13
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• The vnic table provides logical interface (lif) IDs that can be use to collect more information via lif and lifstats output
Navigating the CLI – connect adapter
14
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Logs from the ASIC can be seen directly in the adapter using the show-log command
Navigating the CLI – connect adapter
15
Cisco UCS Troubleshooting Case Study #1
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• After a Cisco UCS upgrade, fibre channel performance was severely degraded
• There were many ABTS observed at in the VIC adapter logs :
Case Study #1
17
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Why the ABTS?
Case Study #1
18
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Errors seen indicate a condition of buffer starvation (no credits) and Rx of traffic when not expected
• No drops or congestion in the upstream • Lets look back at the VIC ASIC logs for more information
Case Study #1
19
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• FC traffic is affected on the network as a result of Pause PFC/PG negotiation occurring in the wrong order. Pause configuration on the adapter is incorrect resulting in adapter sending traffic when it is told to stop (Pause) by the IOM. This extra traffic is dropped resulting in the aborts seen.
• Defect tracking : CSCuh61202 • Resolved in : 2.2(1b) and 2.1(3a)
Case Study #1 – Conclusion
20
Cisco UCS Troubleshooting Case Study #2
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Customer has two almost identical domains • Both domain allow access via the CLI • One UCS domain running 2.1(1a) or 2.1(2a) was unable to login via GUI • Customer tried the following to resolve the issue: ‒ Create a new local user account ‒ Cluster lead failover ‒ Reboot of each FI one at a time ‒ Upgrade
Case Study #2 – UCSM Login Issue
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Confirm that authentication is working • Debugs available in the NXOS level to confirm authentication ‒ debug aaa all
• Error says that user is not known
Case Study #2
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Confirm user via sam_techsupportinfo
• There is no tac.testadm • In the many iterations of testing a simple mistake was made
Case Study #2
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• A different approach is to perform a trace capture using ethanalyzer • Collects all data in/out mgmt port (like a sniffer) • Captures default to summary
Case Study #2
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Ethanalyzer can also be set for detail output display • Hypertext information visible
Case Study #2
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• After credentials were corrected the problem was still present • Two options to see what is happening with Java is collecting info thru the
Java Console :
Case Study #2
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Collecting the information from the client log • C:\Users\<username>\AppData\LocalLow\Sun\Java\Deployment\log
\.ucsm\
Case Study #2
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Centrale (client) Logs
Case Study #2
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• One more tool… Visore! • Review the information for the listed classID directly from the UCSM
database • http://<vip.address>/visore.html
Case Study #2
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Visore Output
Case Study #2
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• After looking at the different classId requested via Visore it was found that classId == vmInstance was causing the problem
• The UCS was configured with VM-FEX feature • There were VMs that had special characters in their VM name which were
preventing the client to parse the XML properly this lead to user not being able to connect
• These characters were recognized as escape sequence in XML • Moving the VM from the dVS into the vSwitch allowed access to GUI
again • Defect tracking : CSCui80882 • Work-around was to rename the offending VM names
Case Study #2 – Conclusion
Cisco UCS Troubleshooting Case Study #3
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• After upgrade of Cisco UCS to 2.2(1b) IP communication from some blades in the domain was not working to the Cisco UCS Manager
• In some other cases IP communication was ok to the Cisco UCS Manager, but not with both Fabric Interconnect
• This caused problems with applications running in those blades that need communication to the UCSM (for example, UCS Director or SNMP tools)
Case Study #3
34
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• A PING test was executed to understand what was the pattern (what was common between those that failed)
• The test revealed the following : ‒ Blade traffic switched thru Fabric B, destination mgmt0 of FI-B (same VLAN as
blade) -> FAIL ‒ Blade traffic switched thru Fabric B, destination mgmt0 of FI-A (same VLAN as
blade) -> OK ‒ Blade traffic switched thru Fabric A, destination mgmt0 of FI-B (same VLAN as
blade) -> OK ‒ Blade traffic switched thru Fabric A, destination mgmt0 of FI-A (same VLAN as
blade) -> FAIL
Case Study #3
35
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Looking at the mac address learned in the upstream switches we found no errors
• An ethanalyzer capture in the mgmt 0 of the Fabric Interconnect showed ARP request from Server reaching the Fabric Interconnect mgmt interface and FI sends ARP reply
• Is the Fabric Interconnect dropping the frame?
Case Study #3
36
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• What is causing the drops?
• RPF – Reverse Path Forwarding increasing!!
Case Study #3
37
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Why is that counter increasing? • Is that mgmt mac address seen/learned somewhere else?
Case Study #3
38
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• The use of the same mac address by the mgmt and the FCF (Fibre Channel Forwarder) results in the Fabric Interconnect not forwarding the frames to the vethernet
• This only happens when the blade traffic is switched by the same FI that it is trying to connect to
• Going thru an L3 device (Router) will change the mac address, avoiding this issue
• Defect tracking : CSCun19289 • Workaround : Configuring the mgmt0 in a different VLAN than the blades
will force traffic thru L3 device
Case Study #3 – Conclusion
39
Cisco UCS Troubleshooting On Your Own
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Standalone Offline Diagnostics for UCS Compute Blades • Not a UCS Manager Solution • Blade has to boot from Server Diagnostics ISO • ucs-blade-server-diags.1.0.1a.iso released Oct 2013 • Available from Cisco.com : Cisco UCS B-series Blade Server Software • Independent of any UCS Manager version • ISO image can be booted from vMedia, USB or SD Card
DIY – Blade Diagnostics
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Use Cases ‒ Sanity check after a hardware fix or replacement ‒ Burn-in before deployment in production
• GUI and CLI Interface Options ‒ GUI has same look and feel as SCU Diagnostics for C-Series ‒ memTest86+ integrated in the tool
• Server Inventory, Sensor Information and Logs available from tool • Log files can be saved to a USB device attached
DIY – Blade Diagnostics
42
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Diagnostic Tests ‒ Memory : options include memory size to test and number of loops ‒ Adapter ‒ CIMC : test communication to CIMC ‒ CPU : stress, stream, cache and register ‒ Storage : S.M.A.R.T. report and LSI megaCLI controller test ‒ Video : GUI only
DIY – Blade Diagnostics
43
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting DIY – Blade Diagnostics
44
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Cisco Communities ‒ Tech Talks ‒ Best Practices ‒ Platform Emulator ‒ Script Samples
DIY – Resources Available
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Troubleshooting
• Support Forums ‒ Technical discussions ‒ TAC and BU Participation ‒ Partners Participation ‒ Other Customers Like You
DIY – Resources Available
Cisco UCS Best Practices The Basics
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Best Practices
• Hardware & Software Support Matrixes ‒ Support matrix and guidelines are established by the Data Center Group
(Development & QA teams) ‒ TAC adheres to the releases listed in those documents/tools ‒ Most common “out of matrix” FW? ENIC and FNIC Drivers ‒ Most common question : Does TAC support X combination? ‒ Biggest concern : Does running X combination invalidates my support contract?
The Basics
48
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Best Practices The Basics
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Best Practices The Basics
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Best Practices
• Release Notes ‒ Mixed version support matrix ‒ Minimum version for the different hardware and features ‒ Catalog PID updates ‒ List of new features ‒ List of resolved caveats (fixed bugs) ‒ List of open caveats (bugs in the wild) ‒ Lots of transparency (latest release has a total of 12 resolved and 5 open
caveats)
• Release Bundle Content ‒ Started in 2.0(1) release ‒ All related firmware and BIOSes for all UCS components associated with the
release
The Basics
51
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Best Practices
• Mixed Release Support Matrix
The Basics
52
Cisco UCS Best Practices Upgrades
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Best Practices
• TAC can assist with questions related to the upgrade procedure : ‒ Am I following proper procedure? ‒ Do I understand a caveat properly?
• TAC can review any faults currently present in the system ‒ Do not upgrade a system with Critical/Major/Minor Faults!
• TAC can confirm if a particular defect is fixed in the target version
Upgrades
54
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Best Practices
• Backup your systems! ‒ Many customers do not have a backup of their Cisco UCSM
• The system database residing in the Fabric Interconnect has the configuration for the entire system (pools, service-profiles, vlan, vsan, etc)
• There are four types of Backup options in UCSM. “Logical Configuration” backups can be executed on regular basis to keep up with any changes in service profiles, VLANs, VSANs, pools or policies. The “System Configuration” backup should be executed every time there is changes to username, roles, locales or system IP address
• Store backups outside the Cisco UCS domain
Upgrades
55
Cisco UCS Best Practices Day to Day Operations
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Best Practices
• Starting in 2.1(1) the UCSM offers the capability to schedule automated backups
Day to Day Operations
Full State
All Configura;on
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco UCS Best Practices
• Enable Smart CallHome (SCH) • Administrators should not share “admin” userID. Instead they should use
their own userID and take advantage of RBAC feature • Scripts should not use “admin” userID to login • More than 1 domain? Take advantage of Cisco UCS Central • SDN ready? Yes we are! Programmable Infrastructure thru XML APIs • Collect tech-support as soon as a problem is reported
Day to Day Operations
58
Miscellaneous
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Register for Cisco Live – San Francisco
Cisco Live - Orlando May 18 – 22, 2014 www.ciscolive.com/us
60 60
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public
Cisco Live San Francisco
• BRKCOM-3008 Unraveling UCS Manager Features, Policies and Mechanics
• BRKCOM-2006 Cisco UCS Administration and RBAC • LTRVIR-2999 Deploying Nexus 1000v on ESXi, Hyper-V and OpenStack
61