Carolina mini-cl-2014

62
Local Edition

Transcript of Carolina mini-cl-2014

Page 1: Carolina mini-cl-2014

Local Edition

Page 2: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Local Edition

Cisco UCS Troubleshooting and Best Practices Jose Martinez Technical Leader Services @jose_at_csco

Page 3: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Agenda

•  Cisco UCS Troubleshooting ‒ Things all UCS admins should know and do ‒ Case Studies : Right out from TAC queue ‒ DIY : What resources are available for you?

•  Cisco UCS Best Practices ‒ The Basics ‒ Wait… You are upgrading this weekend? ‒ Day to Day Operations

•  Miscellaneous •  Q&A

3

Page 4: Carolina mini-cl-2014

Cisco UCS Troubleshooting Basics

Page 5: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Most of the issues that affect the Cisco UCS are investigated via the logs collected in the system

•  Even investigation for performance issues or authentication issues start with the logs

•  Logs are collected in every component •  Depending on the issue multiple logs may be needed •  Collect UCSM and Chassis tech-support as soon as possible •  Current behavior is to overwrite the last log ‒ Changing with upcoming release : CSCuj56943

The Basics

5

Page 6: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  The size of the logs can be modified as well as the level of logging ‒ Default log level is info ‒ Default size is 5232880

•  Change these values Scope monitoring à sysdebug à mgmt-logging

The Basics

6

Page 7: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting UCSM Internal Overview

7

GUI   CLI   Standards  (SNMP,  IPMI)  

XML  API  

Management  Informa;on  Tree  

Data  Management  Engine  (DME)  

Applica;on  Gateways  (AG)  

Managed  Endpoints  

Page 8: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting Logs Example

Page 9: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Scope – allows admin to enter into the specified mode ‒ Examples : scope adapter , scope bios-settings , scope org ‒ Each mode has its own set of commands ‒ Commands allowed depend on assigned role and locale (RBAC) ‒ Configuration changes are usually allowed in this mode (commit-buffer)

•  Connect – allows admin to connect to a specific component ‒ Examples : connect adapter , connect iom , connect local-mgmt ‒ This allows better troubleshooting options ‒ This does not allow for configuration options

Navigating the CLI

9

Page 10: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Scope example :

Navigating the CLI

•  Connect example :

10

Page 11: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  The connect IOM only works to the directly attached IOM •  To connect to the other side : ‒ connect local management other side ‒ connect iom <chassis #>

Navigating the CLI

11

FI-­‐B  

Primary  

admin  

FI-­‐A  

Secondary  

Page 12: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Best location for traffic troubleshooting •  Debug capability •  Display switch running config (non-server config) •  Access to ethanalyzer •  Typical datacenter switches commands : ‒ Show interface brief ‒ Show vlan ‒ Show mac address vlan <vlan id> ‒ Show port-channel summary ‒ Show lacp neighbor

Navigating the CLI – connect nxos

12

Page 13: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  connect adapter X/Y/Z ‒ X is chassis # ‒ Y is blade # ‒ Z is adapter #

Navigating the CLI – connect adapter

13

Page 14: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  The vnic table provides logical interface (lif) IDs that can be use to collect more information via lif and lifstats output

Navigating the CLI – connect adapter

14

Page 15: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Logs from the ASIC can be seen directly in the adapter using the show-log command

Navigating the CLI – connect adapter

15

Page 16: Carolina mini-cl-2014

Cisco UCS Troubleshooting Case Study #1

Page 17: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  After a Cisco UCS upgrade, fibre channel performance was severely degraded

•  There were many ABTS observed at in the VIC adapter logs :

Case Study #1

17

Page 18: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Why the ABTS?

Case Study #1

18

Page 19: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Errors seen indicate a condition of buffer starvation (no credits) and Rx of traffic when not expected

•  No drops or congestion in the upstream •  Lets look back at the VIC ASIC logs for more information

Case Study #1

19

Page 20: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  FC traffic is affected on the network as a result of Pause PFC/PG negotiation occurring in the wrong order. Pause configuration on the adapter is incorrect resulting in adapter sending traffic when it is told to stop (Pause) by the IOM. This extra traffic is dropped resulting in the aborts seen.

•  Defect tracking : CSCuh61202 •  Resolved in : 2.2(1b) and 2.1(3a)

Case Study #1 – Conclusion

20

Page 21: Carolina mini-cl-2014

Cisco UCS Troubleshooting Case Study #2

Page 22: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Customer has two almost identical domains •  Both domain allow access via the CLI •  One UCS domain running 2.1(1a) or 2.1(2a) was unable to login via GUI •  Customer tried the following to resolve the issue: ‒ Create a new local user account ‒ Cluster lead failover ‒ Reboot of each FI one at a time ‒ Upgrade

Case Study #2 – UCSM Login Issue

Page 23: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Confirm that authentication is working •  Debugs available in the NXOS level to confirm authentication ‒ debug aaa all

•  Error says that user is not known

Case Study #2

Page 24: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Confirm user via sam_techsupportinfo

•  There is no tac.testadm •  In the many iterations of testing a simple mistake was made

Case Study #2

Page 25: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  A different approach is to perform a trace capture using ethanalyzer •  Collects all data in/out mgmt port (like a sniffer) •  Captures default to summary

Case Study #2

Page 26: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Ethanalyzer can also be set for detail output display •  Hypertext information visible

Case Study #2

Page 27: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  After credentials were corrected the problem was still present •  Two options to see what is happening with Java is collecting info thru the

Java Console :

Case Study #2

Page 28: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Collecting the information from the client log •  C:\Users\<username>\AppData\LocalLow\Sun\Java\Deployment\log

\.ucsm\

Case Study #2

Page 29: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Centrale (client) Logs

Case Study #2

Page 30: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  One more tool… Visore! •  Review the information for the listed classID directly from the UCSM

database •  http://<vip.address>/visore.html

Case Study #2

Page 31: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Visore Output

Case Study #2

Page 32: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  After looking at the different classId requested via Visore it was found that classId == vmInstance was causing the problem

•  The UCS was configured with VM-FEX feature •  There were VMs that had special characters in their VM name which were

preventing the client to parse the XML properly this lead to user not being able to connect

•  These characters were recognized as escape sequence in XML •  Moving the VM from the dVS into the vSwitch allowed access to GUI

again •  Defect tracking : CSCui80882 •  Work-around was to rename the offending VM names

Case Study #2 – Conclusion

Page 33: Carolina mini-cl-2014

Cisco UCS Troubleshooting Case Study #3

Page 34: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  After upgrade of Cisco UCS to 2.2(1b) IP communication from some blades in the domain was not working to the Cisco UCS Manager

•  In some other cases IP communication was ok to the Cisco UCS Manager, but not with both Fabric Interconnect

•  This caused problems with applications running in those blades that need communication to the UCSM (for example, UCS Director or SNMP tools)

Case Study #3

34

Page 35: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  A PING test was executed to understand what was the pattern (what was common between those that failed)

•  The test revealed the following : ‒ Blade traffic switched thru Fabric B, destination mgmt0 of FI-B (same VLAN as

blade) -> FAIL ‒ Blade traffic switched thru Fabric B, destination mgmt0 of FI-A (same VLAN as

blade) -> OK ‒ Blade traffic switched thru Fabric A, destination mgmt0 of FI-B (same VLAN as

blade) -> OK ‒ Blade traffic switched thru Fabric A, destination mgmt0 of FI-A (same VLAN as

blade) -> FAIL

Case Study #3

35

Page 36: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Looking at the mac address learned in the upstream switches we found no errors

•  An ethanalyzer capture in the mgmt 0 of the Fabric Interconnect showed ARP request from Server reaching the Fabric Interconnect mgmt interface and FI sends ARP reply

•  Is the Fabric Interconnect dropping the frame?

Case Study #3

36

Page 37: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  What is causing the drops?

•  RPF – Reverse Path Forwarding increasing!!

Case Study #3

37

Page 38: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Why is that counter increasing? •  Is that mgmt mac address seen/learned somewhere else?

Case Study #3

38

Page 39: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  The use of the same mac address by the mgmt and the FCF (Fibre Channel Forwarder) results in the Fabric Interconnect not forwarding the frames to the vethernet

•  This only happens when the blade traffic is switched by the same FI that it is trying to connect to

•  Going thru an L3 device (Router) will change the mac address, avoiding this issue

•  Defect tracking : CSCun19289 •  Workaround : Configuring the mgmt0 in a different VLAN than the blades

will force traffic thru L3 device

Case Study #3 – Conclusion

39

Page 40: Carolina mini-cl-2014

Cisco UCS Troubleshooting On Your Own

Page 41: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Standalone Offline Diagnostics for UCS Compute Blades •  Not a UCS Manager Solution •  Blade has to boot from Server Diagnostics ISO •  ucs-blade-server-diags.1.0.1a.iso released Oct 2013 •  Available from Cisco.com : Cisco UCS B-series Blade Server Software •  Independent of any UCS Manager version •  ISO image can be booted from vMedia, USB or SD Card

DIY – Blade Diagnostics

Page 42: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Use Cases ‒ Sanity check after a hardware fix or replacement ‒ Burn-in before deployment in production

•  GUI and CLI Interface Options ‒ GUI has same look and feel as SCU Diagnostics for C-Series ‒ memTest86+ integrated in the tool

•  Server Inventory, Sensor Information and Logs available from tool •  Log files can be saved to a USB device attached

DIY – Blade Diagnostics

42

Page 43: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Diagnostic Tests ‒ Memory : options include memory size to test and number of loops ‒ Adapter ‒ CIMC : test communication to CIMC ‒ CPU : stress, stream, cache and register ‒ Storage : S.M.A.R.T. report and LSI megaCLI controller test ‒ Video : GUI only

DIY – Blade Diagnostics

43

Page 44: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting DIY – Blade Diagnostics

44

Page 45: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Cisco Communities ‒ Tech Talks ‒ Best Practices ‒ Platform Emulator ‒ Script Samples

DIY – Resources Available

Page 46: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Troubleshooting

•  Support Forums ‒ Technical discussions ‒ TAC and BU Participation ‒ Partners Participation ‒ Other Customers Like You

DIY – Resources Available

Page 47: Carolina mini-cl-2014

Cisco UCS Best Practices The Basics

Page 48: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Best Practices

•  Hardware & Software Support Matrixes ‒ Support matrix and guidelines are established by the Data Center Group

(Development & QA teams) ‒ TAC adheres to the releases listed in those documents/tools ‒ Most common “out of matrix” FW? ENIC and FNIC Drivers ‒ Most common question : Does TAC support X combination? ‒ Biggest concern : Does running X combination invalidates my support contract?

The Basics

48

Page 49: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Best Practices The Basics

Page 50: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Best Practices The Basics

Page 51: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Best Practices

•  Release Notes ‒ Mixed version support matrix ‒ Minimum version for the different hardware and features ‒ Catalog PID updates ‒ List of new features ‒ List of resolved caveats (fixed bugs) ‒ List of open caveats (bugs in the wild) ‒ Lots of transparency (latest release has a total of 12 resolved and 5 open

caveats)

•  Release Bundle Content ‒ Started in 2.0(1) release ‒ All related firmware and BIOSes for all UCS components associated with the

release

The Basics

51

Page 52: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Best Practices

•  Mixed Release Support Matrix

The Basics

52

Page 53: Carolina mini-cl-2014

Cisco UCS Best Practices Upgrades

Page 54: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Best Practices

•  TAC can assist with questions related to the upgrade procedure : ‒ Am I following proper procedure? ‒ Do I understand a caveat properly?

•  TAC can review any faults currently present in the system ‒ Do not upgrade a system with Critical/Major/Minor Faults!

•  TAC can confirm if a particular defect is fixed in the target version

Upgrades

54

Page 55: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Best Practices

•  Backup your systems! ‒ Many customers do not have a backup of their Cisco UCSM

•  The system database residing in the Fabric Interconnect has the configuration for the entire system (pools, service-profiles, vlan, vsan, etc)

•  There are four types of Backup options in UCSM. “Logical Configuration” backups can be executed on regular basis to keep up with any changes in service profiles, VLANs, VSANs, pools or policies. The “System Configuration” backup should be executed every time there is changes to username, roles, locales or system IP address

•  Store backups outside the Cisco UCS domain

Upgrades

55

Page 56: Carolina mini-cl-2014

Cisco UCS Best Practices Day to Day Operations

Page 57: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Best Practices

•  Starting in 2.1(1) the UCSM offers the capability to schedule automated backups

Day to Day Operations

Full  State  

All  Configura;on  

Page 58: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco UCS Best Practices

•  Enable Smart CallHome (SCH) •  Administrators should not share “admin” userID. Instead they should use

their own userID and take advantage of RBAC feature •  Scripts should not use “admin” userID to login •  More than 1 domain? Take advantage of Cisco UCS Central •  SDN ready? Yes we are! Programmable Infrastructure thru XML APIs •  Collect tech-support as soon as a problem is reported

Day to Day Operations

58

Page 59: Carolina mini-cl-2014

Miscellaneous

Page 60: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Register for Cisco Live – San Francisco

Cisco Live - Orlando May 18 – 22, 2014 www.ciscolive.com/us

60 60

Page 61: Carolina mini-cl-2014

© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco Live San Francisco

•  BRKCOM-3008 Unraveling UCS Manager Features, Policies and Mechanics

•  BRKCOM-2006 Cisco UCS Administration and RBAC •  LTRVIR-2999 Deploying Nexus 1000v on ESXi, Hyper-V and OpenStack

61

Page 62: Carolina mini-cl-2014